Science.gov

Sample records for high sequence variation

  1. High intraindividual variation in internal transcibed spacer sequences in Aeschynanthus (Gesneriaceae): implications for phylogenetics.

    PubMed Central

    Denduangboripant, J; Cronk, Q C

    2000-01-01

    Aeschynanthus (Gesneriaceae) is a large genus of tropical epiphytes that is widely distributed from the Himalayas and China throughout South-East Asia to New Guinea and the Solomon Islands. Polymerase chain reaction (PCR) consensus sequences of the internal transcribed spacers (ITS) of Aeschynanthus nuclear ribosomal DNA showed sequence polymorphism that was difficult to interpret. Cloning individual sequences from the PCR product generated a phylogenetic tree of 23 Aeschynanthus species (two clones per species). The intraindividual clone pairs varied from 0 to 5.01%. We suggest that the high intraindividual sequence variation results from low molecular drive in the ITS of Aeschynanthus. However, this study shows that, despite the variation found within some individuals, it is still possible to use these data to reconstruct phylogenetic relationships of the species, suggesting that clone variation, although persistent, does not pre-date the divergence of Aeschynanthus species. The Aeschynanthus analysis revealed two major clades with different but overlapping geographic distributions and reflected classification based on morphology (particularly seed hair type). PMID:10983824

  2. High intraindividual variation in internal transcibed spacer sequences in Aeschynanthus (Gesneriaceae): implications for phylogenetics.

    PubMed

    Denduangboripant, J; Cronk, Q C

    2000-07-22

    Aeschynanthus (Gesneriaceae) is a large genus of tropical epiphytes that is widely distributed from the Himalayas and China throughout South-East Asia to New Guinea and the Solomon Islands. Polymerase chain reaction (PCR) consensus sequences of the internal transcribed spacers (ITS) of Aeschynanthus nuclear ribosomal DNA showed sequence polymorphism that was difficult to interpret. Cloning individual sequences from the PCR product generated a phylogenetic tree of 23 Aeschynanthus species (two clones per species). The intraindividual clone pairs varied from 0 to 5.01%. We suggest that the high intraindividual sequence variation results from low molecular drive in the ITS of Aeschynanthus. However, this study shows that, despite the variation found within some individuals, it is still possible to use these data to reconstruct phylogenetic relationships of the species, suggesting that clone variation, although persistent, does not pre-date the divergence of Aeschynanthus species. The Aeschynanthus analysis revealed two major clades with different but overlapping geographic distributions and reflected classification based on morphology (particularly seed hair type).

  3. Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

    PubMed

    Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

    2017-08-16

    High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.

  4. Application of high-throughput sequencing for studying genomic variations in congenital heart disease.

    PubMed

    Dorn, Cornelia; Grunert, Marcel; Sperling, Silke R

    2014-01-01

    Congenital heart diseases (CHD) represent the most common birth defect in human. The majority of cases are caused by a combination of complex genetic alterations and environmental influences. In the past, many disease-causing mutations have been identified; however, there is still a large proportion of cardiac malformations with unknown precise origin. High-throughput sequencing technologies established during the last years offer novel opportunities to further study the genetic background underlying the disease. In this review, we provide a roadmap for designing and analyzing high-throughput sequencing studies focused on CHD, but also with general applicability to other complex diseases. The three main next-generation sequencing (NGS) platforms including their particular advantages and disadvantages are presented. To identify potentially disease-related genomic variations and genes, different filtering steps and gene prioritization strategies are discussed. In addition, available control datasets based on NGS are summarized. Finally, we provide an overview of current studies already using NGS technologies and showing that these techniques will help to further unravel the complex genetics underlying CHD.

  5. Comparative analysis of Mycobacterium tuberculosis pe and ppe genes reveals high sequence variation and an apparent absence of selective constraints.

    PubMed

    McEvoy, Christopher R E; Cloete, Ruben; Müller, Borna; Schürch, Anita C; van Helden, Paul D; Gagneux, Sebastien; Warren, Robin M; Gey van Pittius, Nicolaas C

    2012-01-01

    Mycobacterium tuberculosis complex (MTBC) genomes contain 2 large gene families termed pe and ppe. The function of pe/ppe proteins remains enigmatic but studies suggest that they are secreted or cell surface associated and are involved in bacterial virulence. Previous studies have also shown that some pe/ppe genes are polymorphic, a finding that suggests involvement in antigenic variation. Using comparative sequence analysis of 18 publicly available MTBC whole genome sequences, we have performed alignments of 33 pe (excluding pe_pgrs) and 66 ppe genes in order to detect the frequency and nature of genetic variation. This work has been supplemented by whole gene sequencing of 14 pe/ppe (including 5 pe_pgrs) genes in a cohort of 40 diverse and well defined clinical isolates covering all the main lineages of the M. tuberculosis phylogenetic tree. We show that nsSNP's in pe (excluding pgrs) and ppe genes are 3.0 and 3.3 times higher than in non-pe/ppe genes respectively and that numerous other mutation types are also present at a high frequency. It has previously been shown that non-pe/ppe M. tuberculosis genes display a remarkably low level of purifying selection. Here, we also show that compared to these genes those of the pe/ppe families show a further reduction of selection pressure that suggests neutral evolution. This is inconsistent with the positive selection pressure of "classical" antigenic variation. Finally, by analyzing such a large number of genes we were able to detect large differences in mutation type and frequency between both individual genes and gene sub-families. The high variation rates and absence of selective constraints provides valuable insights into potential pe/ppe function. Since pe/ppe proteins are highly antigenic and have been studied as potential vaccine components these results should also prove informative for aspects of M. tuberculosis vaccine design.

  6. High frequency of HMW-GS sequence variation through somatic hybridization between Agropyron elongatum and common wheat.

    PubMed

    Gao, Xin; Liu, Shu Wei; Sun, Qun; Xia, Guang Min

    2010-01-01

    A symmetric somatic hybridization was performed to combine the protoplasts of tall wheatgrass (Agropyron elongatum) and bread wheat (Triticum aestivum). Fertile regenerants were obtained which were morphologically similar to tall wheatgrass, but which contained some introgression segments from wheat. An SDS-PAGE analysis showed that a number of non-parental high-molecular weight glutenin subunits (HMW-GS) were present in the symmetric somatic hybridization derivatives. These sequences were amplified, cloned and sequenced, to deliver 14 distinct HMW-GS coding sequences, eight of which were of the y-type (Hy1-Hy8) and six x-type (Hx1-Hx6). Five of the cloned HMW-GS sequences were successfully expressed in E. coli. The analysis of their deduced peptide sequences showed that they all possessed the typical HMW-GS primary structure. Sequence alignments indicated that Hx5 and Hy1 were probably derived from the tall wheatgrass genes Aex5 and Aey6, while Hy2, Hy3, Hx1 and Hy6 may have resulted from slippage in the replication of a related biparental gene. We found that both symmetric and asymmetric somatic hybridization could promote the emergence of novel alleles. We discussed the origination of allelic variation of HMW-GS genes in somatic hybridization, which might be the result from the response to genomic shock triggered by the merger and interaction of biparent genomes.

  7. Combining Natural Sequence Variation with High Throughput Mutational Data to Reveal Protein Interaction Sites

    PubMed Central

    Melamed, Daniel; Young, David L.; Miller, Christina R.; Fields, Stanley

    2015-01-01

    Many protein interactions are conserved among organisms despite changes in the amino acid sequences that comprise their contact sites, a property that has been used to infer the location of these sites from protein homology. In an inter-species complementation experiment, a sequence present in a homologue is substituted into a protein and tested for its ability to support function. Therefore, substitutions that inhibit function can identify interaction sites that changed over evolution. However, most of the sequence differences within a protein family remain unexplored because of the small-scale nature of these complementation approaches. Here we use existing high throughput mutational data on the in vivo function of the RRM2 domain of the Saccharomyces cerevisiae poly(A)-binding protein, Pab1, to analyze its sites of interaction. Of 197 single amino acid differences in 52 Pab1 homologues, 17 reduce the function of Pab1 when substituted into the yeast protein. The majority of these deleterious mutations interfere with the binding of the RRM2 domain to eIF4G1 and eIF4G2, isoforms of a translation initiation factor. A large-scale mutational analysis of the RRM2 domain in a two-hybrid assay for eIF4G1 binding supports these findings and identifies peripheral residues that make a smaller contribution to eIF4G1 binding. Three single amino acid substitutions in yeast Pab1 corresponding to residues from the human orthologue are deleterious and eliminate binding to the yeast eIF4G isoforms. We create a triple mutant that carries these substitutions and other humanizing substitutions that collectively support a switch in binding specificity of RRM2 from the yeast eIF4G1 to its human orthologue. Finally, we map other deleterious substitutions in Pab1 to inter-domain (RRM2–RRM1) or protein-RNA (RRM2–poly(A)) interaction sites. Thus, the combined approach of large-scale mutational data and evolutionary conservation can be used to characterize interaction sites at single

  8. Application of high-throughput genome sequencing to intrapathovar variation in Pseudomonas syringae.

    PubMed

    Studholme, David J

    2011-10-01

    One reason for the success of Pseudomonas syringae as a model pathogen has been the availability of three complete genome sequences since 2005. Now, at the beginning of 2011, more than 25 strains of P. syringae have been sequenced and many more will soon be released. To date, published analyses of P. syringae have been largely descriptive, focusing on catalogues of genetic differences among strains and between species. Numerous powerful statistical tools are now available that have yet to be applied to P. syringae genomic data for robust and quantitative reconstruction of evolutionary events. The aim of this review is to provide a snapshot of the current status of P. syringae genome sequence data resources, including very recent and unpublished studies, and thereby demonstrate the richness of resources available for this species. Furthermore, certain specific opportunities and challenges in making the best use of these data resources are highlighted.

  9. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians.

    PubMed

    Shen, Hui; Li, Jian; Zhang, Jigang; Xu, Chao; Jiang, Yan; Wu, Zikai; Zhao, Fuping; Liao, Li; Chen, Jun; Lin, Yong; Tian, Qing; Papasian, Christopher J; Deng, Hong-Wen

    2013-01-01

    Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.

  10. Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on "Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  11. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations.

    PubMed

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-10-20

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species.

  12. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations

    PubMed Central

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-01-01

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species. PMID:26492246

  13. Complete genome sequence analysis of goatpox virus isolated from China shows high variation.

    PubMed

    Zeng, Xiancheng; Chi, Xuelin; Li, Wei; Hao, Wenbo; Li, Ming; Huang, Xiaohong; Huang, Yifan; Rock, Daniel L; Luo, Shuhong; Wang, Shihua

    2014-09-17

    Goatpox virus (GTPV), a member of the Capripoxvirus genus of the Poxviridae family, is the causative agent of variolo caprina (goatpox). GTPV can cause significant economic losses of domestic ruminants in endemic regions and can threaten breeding stocks. In this study, we report on the compilation of the complete genomic sequence of an isolated GTPV field strain FZ (GTPV_FZ). The 150,194bp GTPV genome consists of a central coding region bounded by two identical 2301bp inverted terminal repeats and contains 151 putative genes. Comparative genomic analysis reveals the apparent genetic relationships among Capripoxviruses are close, but sufficient genomic variants in the field isolate strain FZ have been identified to distinguish it from other GTPV strains and other Capripoxvirus species. Phylogenetic analysis based on the p32 and complete GTPV genome can be used to differentiate SPPVs, GTPVs and LSDVs. These data may contribute to the epidemiological study of the Chinese capripoxvirus and help to develop more specific detection methods to distinguish GTPVs, SPPVs and LSDVs.

  14. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes.

    PubMed

    Zhuang, Jiali; Weng, Zhiping

    2015-09-30

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs.

  15. High sensitivity of the single-strand conformation polymorphism method for detecting sequence variations in the low-density lipoprotein receptor gene validated by DNA sequencing.

    PubMed

    Jensen, H K; Jensen, L G; Hansen, P S; Faergeman, O; Gregersen, N

    1996-08-01

    We designed oligonucleotide primer pairs to amplify the promoter region, the translated exon sequences, and the flanking intron sequences of all 18 exons of the LDL receptor gene to compare the ability of the PCR single-strand conformation polymorphism (PCR-SSCP) method with semiautomated solid-phase genomic DNA sequencing to detect sequence variations. In 20 apparently unrelated Danish patients with a clinical diagnosis of heterozygous familial hypercholesterolemia (FH), we identified 13 different mutations in the LDL receptor gene: two silent (C331C, N494 N); five missense (W66G, E119K, T383P, W556S, T7051); one nonsense (W23X); three splice-site (313 + 1G-->A, 1061-8T-->C, 1846-1G-->A); and two frameshift (335del10, 1650delG) mutations. Four of these mutations, N494 N, T383P, 1061-8T-->C, and W556S, have not been reported earlier. The pathogenicity of the T383P, 1061-8T-->C, and W556S mutations remains to be established by in vitro mutagenesis and transfection studies. One patient had three mutations (335del10, 1061-8T-->C, and T705I) on the same allele. Further, nine well-known polymorphisms were detectable with this methodological setup. Direct DNA sequencing of the PCR products used for the SSCP analysis did not reveal any sequence variations not detected by the PCR-SSCP method. In two patients we did not detect any mutation by either method. We conclude that the PCR-SSCP analysis, performed as described here, is as sensitive and efficient as DNA sequencing in the ability to identify the sequence variations in the LDL receptor gene of the patients with heterozygous FH of this study.

  16. High-resolution melt analysis to detect sequence variations in highly homologous gene regions: application to CYP2B6.

    PubMed

    Twist, Greyson P; Gaedigk, Roger; Leeder, J Steven; Gaedigk, Andrea

    2013-06-01

    High-resolution melt (HRM) analysis using 'release-on-demand' dyes, such as EvaGreen(®) has the potential to resolve complex genotypes in situations where genotype interpretation is complicated by the presence of pseudogenes or allelic variants in close proximity to the locus of interest. We explored the utility of HRM to genotype a SNP (785A>G, K262R, rs2279343) that is located within exon 5 of the CYP2B6 gene, which contributes to the metabolism of a number of clinically used drugs. Testing of 785A>G is challenging, but crucial for accurate genotype determination. This SNP is part of multiple known CYP2B6 haplotypes and located in a region that is identical to CYP2B7, a nonfunctional pseudogene. Because small CYP2B6-specific PCR amplicons bracketing 785A>G cannot be generated, we simultaneously amplified both genes. A panel of 235 liver tissue DNAs and five Coriell samples were assessed. Eight CYP2B6/CYP2B7 diplotype combinations were found and a novel variant 769G>A (D257N) was discovered. The frequency of 785G corresponded to those reported for Caucasians and African-Americans. Assay performance was confirmed by CYP2B6 and/or CYP2B7 sequence analysis in a subset of samples, using a preamplified CYP2B6-specific long-range-PCR amplicon as HRM template. Inclusion rather than exclusion of a homologous pseudogene allowed us to devise a sensitive, reliable and affordable assay to test this CYP2B6 SNP. This assay design may be utilized to overcome the challenges and limitations of other methods. Owing to the flexibility of HRM, this assay design can easily be adapted to other gene loci of interest.

  17. Transcriptome and genome sequencing uncovers functional variation in humans.

    PubMed

    Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R; 't Hoen, Peter A C; Monlong, Jean; Rivas, Manuel A; Gonzàlez-Porta, Mar; Kurbatova, Natalja; Griebel, Thasso; Ferreira, Pedro G; Barann, Matthias; Wieland, Thomas; Greger, Liliana; van Iterson, Maarten; Almlöf, Jonas; Ribeca, Paolo; Pulyakhina, Irina; Esser, Daniela; Giger, Thomas; Tikhonov, Andrew; Sultan, Marc; Bertier, Gabrielle; MacArthur, Daniel G; Lek, Monkol; Lizano, Esther; Buermans, Henk P J; Padioleau, Ismael; Schwarzmayr, Thomas; Karlberg, Olof; Ongen, Halit; Kilpinen, Helena; Beltran, Sergi; Gut, Marta; Kahlem, Katja; Amstislavskiy, Vyacheslav; Stegle, Oliver; Pirinen, Matti; Montgomery, Stephen B; Donnelly, Peter; McCarthy, Mark I; Flicek, Paul; Strom, Tim M; Lehrach, Hans; Schreiber, Stefan; Sudbrak, Ralf; Carracedo, Angel; Antonarakis, Stylianos E; Häsler, Robert; Syvänen, Ann-Christine; van Ommen, Gert-Jan; Brazma, Alvis; Meitinger, Thomas; Rosenstiel, Philip; Guigó, Roderic; Gut, Ivo G; Estivill, Xavier; Dermitzakis, Emmanouil T

    2013-09-26

    Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project--the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

  18. Chromospheric variations in main-sequence stars

    NASA Technical Reports Server (NTRS)

    Baliunas, S. L.; Donahue, R. A.; Soon, J. H.; Horne, J. H.; Frazer, J.; Woodard-Eklund, L.; Bradford, M.; Rao, L. M.; Wilson, O. C.; Zhang, Q.

    1995-01-01

    The fluxes in passbands 0.1 nm wide and centered on the Ca II H and K emission cores have been monitored in 111 stars of spectral type F2-M2 on or near the main sequence in a continuation of an observing program started by O. C. Wilson. Most of the measurements began in 1966, with observations scheduled monthly until 1980, when observations were schedueld sevral times per week. The records, with a long-term precision of about 1.5%, display fluctuations that can be idntified with variations on timescales similar to the 11 yr cycle of solar activity as well as axial rotation, and the growth and decay of emitting regions. We present the records of chromospheric emission and general conclusions about variations in surface magnetic activity on timescales greater than 1 yr but less than a few decades. The results for stars of spectral type G0-K5 V indicate a pattern of change in rotation and chromospheric activity on an evolutionary timescale, in which (1) young stars exhibit high average levels of activity, rapid rotation rates, no Maunder minimum phase and rarely display a smooth, cyclic variation; (2) stars of intermediate age (approximately 1-2 Gyr for 1 solar mass) have moderate levels of activity and rotation rates, and occasional smooth cycles; and (3) stars as old as the Sun and older have slower rotation rates, lower activity levels and smooth cycles with occasional Maunder minimum-phases.

  19. Transcriptome analysis of the variations between autotetraploid Paulownia tomentosa and its diploid using high-throughput sequencing.

    PubMed

    Fan, Guoqiang; Wang, Limin; Deng, Minjie; Niu, Suyan; Zhao, Zhenli; Xu, Enkai; Cao, Xibin; Zhang, Xiaoshen

    2015-08-01

    Timber properties of autotetraploid Paulownia tomentosa are heritable with whole genome duplication, but the molecular mechanisms for the predominant characteristics remain unclear. To illuminate the genetic basis, high-throughput sequencing technology was used to identify the related unigenes. 2677 unigenes were found to be significantly differentially expressed in autotetraploid P. tomentosa. In total, 30 photosynthesis-related, 21 transcription factor-related, and 22 lignin-related differentially expressed unigenes were detected, and the roles of the peroxidase in lignin biosynthesis, MYB DNA-binding proteins, and WRKY proteins associated with the regulation of relevant hormones are extensively discussed. The results provide transcriptome data that may bring a new perspective to explain the polyploidy mechanism in the long growth cycle of plants and offer some help to the future Paulownia breeding.

  20. Genomic Sequence Variation Markup Language (GSVML).

    PubMed

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  1. The next generation of target capture technologies - large DNA fragment enrichment and sequencing determines regional genomic variation of high complexity.

    PubMed

    Dapprich, Johannes; Ferriola, Deborah; Mackiewicz, Kate; Clark, Peter M; Rappaport, Eric; D'Arcy, Monica; Sasson, Ariella; Gai, Xiaowu; Schug, Jonathan; Kaestner, Klaus H; Monos, Dimitri

    2016-07-09

    The ability to capture and sequence large contiguous DNA fragments represents a significant advancement towards the comprehensive characterization of complex genomic regions. While emerging sequencing platforms are capable of producing several kilobases-long reads, the fragment sizes generated by current DNA target enrichment technologies remain a limiting factor, producing DNA fragments generally shorter than 1 kbp. The DNA enrichment methodology described herein, Region-Specific Extraction (RSE), produces DNA segments in excess of 20 kbp in length. Coupling this enrichment method to appropriate sequencing platforms will significantly enhance the ability to generate complete and accurate sequence characterization of any genomic region without the need for reference-based assembly. RSE is a long-range DNA target capture methodology that relies on the specific hybridization of short (20-25 base) oligonucleotide primers to selected sequence motifs within the DNA target region. These capture primers are then enzymatically extended on the 3'-end, incorporating biotinylated nucleotides into the DNA. Streptavidin-coated beads are subsequently used to pull-down the original, long DNA template molecules via the newly synthesized, biotinylated DNA that is bound to them. We demonstrate the accuracy, simplicity and utility of the RSE method by capturing and sequencing a 4 Mbp stretch of the major histocompatibility complex (MHC). Our results show an average depth of coverage of 164X for the entire MHC. This depth of coverage contributes significantly to a 99.94 % total coverage of the targeted region and to an accuracy that is over 99.99 %. RSE represents a cost-effective target enrichment method capable of producing sequencing templates in excess of 20 kbp in length. The utility of our method has been proven to generate superior coverage across the MHC as compared to other commercially available methodologies, with the added advantage of producing longer sequencing

  2. Comprehensive Sequence Analysis of the Human IL23A Gene Defines New Variation Content and High Rate of Evolutionary Conservation

    PubMed Central

    Tindall, Elizabeth A.; Hayes, Vanessa M.

    2010-01-01

    A newly described heterodimeric cytokine, interleukin-23 (IL-23) is emerging as a key player in both the innate and the adaptive T helper (Th)17 driven immune response as well as an initiator of several autoimmune diseases. The rate-limiting element of IL-23 production is believed to be driven by expression of the unique p19 subunit encoded by IL23A. We set out to perform comprehensive DNA sequencing of this previously under-studied gene in 96 individuals from two evolutionary distinct human population groups, Southern African Bantu and European. We observed a total of 33 different DNA variants within these two groups, 22 (67%) of which are currently not reported in any available database. We further demonstrate both inter-population and intra-species sequence conservation within the coding and known regulatory regions of IL23A, supporting a critical physiological role for IL-23. We conclude that IL23A may have undergone positive selection pressure directed towards conservation, suggesting that functional genetic variants within IL23A will have a significant impact on the host immune response. PMID:20154336

  3. Mining sequence variations in representative polyploid sugarcane germplasm accessions.

    PubMed

    Yang, Xiping; Song, Jian; You, Qian; Paudel, Dev R; Zhang, Jisen; Wang, Jianping

    2017-08-09

    Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes. To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A

  4. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation

    SciTech Connect

    Kelleher, Colin; Chiu, Readman; Shin, Heesun; Bosdet, Ian; Krywinski, Martin; Fjell, Chris; Wilkin, Jennifer; Yin, Tongming; DiFazio, Stephen P; Ali, Johar; Asano, Jennifer; Chan, Susanna; Cloutier, Alison; Girn, Noreen; Leach, Stephen; Lee, Darlene; Mathewson, Carrie; Olson, Teika; O'Connor, Katie; Prabhu, Anna-Liisa; Smailus, Duane; Stott, Jeffery; Tsai, Miranda; Wye, Natasaja; Yang, George; Zhuang, Jun; Holt, Robert A.; Putnam, Nicholas; Vrebalov, Julia; Giovannoni, James; Grimwood, Jane; Schmutz, Jeremy; Rokhsar, Daniel; Jones, Steven; Marra, Marco; Tuskan, Gerald A; Bohlmann, J.; Ellis, Brian; Ritland, Kermit; Douglas, Carl; Schein, Jacqueline

    2007-01-01

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the first maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2,802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the 485+10 Mb Populus genome, as estimated from the genome sequence assembly. BAC ends were sequenced to aid in long-range assembly of whole genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat (SSR)-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. 2,411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa v1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.

  5. Sequence variation within the fragile X locus.

    PubMed

    Mathews, D J; Kashuk, C; Brightwell, G; Eichler, E E; Chakravarti, A

    2001-08-01

    The human genome provides a reference sequence, which is a template for resequencing studies that aim to discover and interpret the record of common ancestry that exists in extant genomes. To understand the nature and pattern of variation and linkage disequilibrium comprising this history, we present a study of approximately 31 kb spanning an approximately 70 kb region of FMR1, sequenced in a sample of 20 humans (worldwide sample) and four great apes (chimp, bonobo, and gorilla). Twenty-five polymorphic sites and two insertion/deletions, distributed in 11 unique haplotypes, were identified among humans. Africans are the only geographic group that do not share any haplotypes with other groups. Parsimony analysis reveals two main clades and suggests that the four major human geographic groups are distributed throughout the phylogenetic tree and within each major clade. An African sample appears to be most closely related to the common ancestor shared with the three other geographic groups. Nucleotide diversity, pi, for this sample is 2.63 +/- 6.28 x 10(-4). The mutation rate, mu is 6.48 x 10(-10) per base pair per year, giving an ancestral population size of approximately 6200 and a time to the most recent common ancestor of approximately 320,000 +/- 72,000 per base pair per year. Linkage disequilibrium (LD) at the FMR1 locus, evaluated by conventional LD analysis and by the length of segment shared between any two chromosomes, is extensive across the region.

  6. Mapping and sequencing of structural variation from eight human genomes

    PubMed Central

    Kidd, Jeffrey M.; Cooper, Gregory M.; Donahue, William F.; Hayden, Hillary S.; Sampas, Nick; Graves, Tina; Hansen, Nancy; Teague, Brian; Alkan, Can; Antonacci, Francesca; Haugen, Eric; Zerr, Troy; Yamada, N. Alice; Tsang, Peter; Newman, Tera L.; Tüzün, Eray; Cheng, Ze; Ebling, Heather M.; Tusneem, Nadeem; David, Robert; Gillett, Will; Phelps, Karen A.; Weaver, Molly; Saranga, David; Brand, Adrianne; Tao, Wei; Gustafson, Erik; McKernan, Kevin; Chen, Lin; Malig, Maika; Smith, Joshua D.; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David A.; Peiffer, Daniel A.; Dorschner, Michael; Stamatoyannopoulos, John; Schwartz, David; Nickerson, Deborah A.; Mullikin, James C.; Wilson, Richard K.; Bruhn, Laurakay; Olson, Maynard V.; Kaul, Rajinder; Smith, Douglas R.; Eichler, Evan E.

    2008-01-01

    Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects. PMID:18451855

  7. Identification of Sequence Variation in the Apolipoprotein A2 Gene and Their Relationship with Serum High-Density Lipoprotein Cholesterol Levels

    PubMed Central

    Bandarian, Fatemeh; Daneshpour, Maryam Sadat; Hedayati, Mehdi; Naseri, Mohsen; Azizi, Fereidoun

    2016-01-01

    Background: Apolipoprotein A2 (APOA2) is the second major apolipoprotein of the high-density lipoprotein cholesterol (HDL-C). The study aim was to identify APOA2 gene variation in individuals within two extreme tails of HDL-C levels and its relationship with HDL-C level. Methods: This cross-sectional survey was conducted on participants from Tehran Glucose and Lipid Study (TLGS) at Research Institute for Endocrine Sciences, Tehran, Iran from April 2012 to February 2013. In total, 79 individuals with extreme low HDL-C levels (≤5th percentile for age and gender) and 63 individuals with extreme high HDL-C levels (≥95th percentile for age and gender) were selected. Variants were identified using DNA amplification and direct sequencing. Results: Screen of all exons and the core promoter region of APOA2 gene identified nine single nucleotide substitutions and one microsatellite; five of which were known and four were new variants. Of these nine variants, two were common tag single nucleotide polymorphisms (SNPs) and seven were rare SNPs. Both exonic substitutions were missense mutations and caused an amino acid change. There was a significant association between the new missense mutation (variant Chr.1:16119226, Ala98Pro) and HDL-C level. Conclusion: None of two common tag SNPs of rs6413453 and rs5082 contributes to the HDL-C trait in Iranian population, but a new missense mutation in APOA2 in our population has a significant association with HDL-C. PMID:26590203

  8. Analyzing Neisseria gonorrhoeae Pilin Antigenic Variation Using 454 Sequencing Technology

    PubMed Central

    Rotman, Ella; Webber, David M.

    2016-01-01

    ABSTRACT Many pathogens use homologous recombination to vary surface antigens in order to avoid immune surveillance. Neisseria gonorrhoeae, the bacterium responsible for the sexually transmitted infection gonorrhea, achieves this in part by changing the sequence of the major subunit of the type IV pilus in a process termed pilin antigenic variation (Av). The N. gonorrhoeae chromosome contains one expression locus (pilE) and many promoterless, partial-coding silent copies (pilS) that act as reservoirs for variant pilin information. Pilin Av occurs by high-frequency gene conversion reactions, which transfer pilS sequences into the pilE locus. We have developed a 454 sequencing-based assay to analyze the frequency and characteristics of pilin Av that allows a more robust analysis of pilin Av than previous assays. We used this assay to analyze mutations and conditions previously shown to affect pilin Av, confirming many but not all of the previously reported phenotypes. We show that mutations or conditions that cause growth defects can result in Av phenotypes when analyzed by phase variation-based assays. Adapting the 454 sequencing to analyze pilin Av demonstrates the utility of this technology to analyze any diversity generation system that uses recombination to develop biological diversity. IMPORTANCE Measuring and analyzing complex recombination-based systems constitute a major barrier to understanding the mechanisms used to generate diversity. We have analyzed the contributions of many gonococcal mutations or conditions to the process of pilin antigenic variation. PMID:27381912

  9. Sequence variation of 22 autosomal STR loci detected by next generation sequencing.

    PubMed

    Gettings, Katherine Butler; Kiesler, Kevin M; Faith, Seth A; Montano, Elizabeth; Baker, Christine H; Young, Brian A; Guerrieri, Richard A; Vallone, Peter M

    2016-03-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified.

  10. Sequence variation of 22 autosomal STR loci detected by next generation sequencing

    PubMed Central

    Gettings, Katherine Butler; Kiesler, Kevin M.; Faith, Seth A.; Montano, Elizabeth; Baker, Christine H.; Young, Brian A.; Guerrieri, Richard A.; Vallone, Peter M.

    2016-01-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified. PMID:26701720

  11. A map of human genome variation from population scale sequencing

    PubMed Central

    2011-01-01

    The 1000 Genomes Project aims to provide a deep characterisation of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. We present results of the pilot phase of the project, designed to develop and compare different strategies for genome wide sequencing with high throughput sequencing platforms. We undertook three projects: low coverage whole genome sequencing of 179 individuals from four populations, high coverage sequencing of two mother-father-child trios, and exon targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million SNPs, 1 million short insertions and deletions and 20,000 structural variants, the majority of which were previously undescribed. We show that over 95% of the currently accessible variants found in any individual are present in this dataset; on average, each person carries approximately 250 to 300 loss of function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We find many putative functional variants with large allele frequency differences between populations. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. PMID:20981092

  12. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  13. Protein structure prediction from sequence variation

    PubMed Central

    Marks, Debora S; Hopf, Thomas A; Sander, Chris

    2015-01-01

    Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics. PMID:23138306

  14. Using chaos to generate variations on movement sequences

    NASA Astrophysics Data System (ADS)

    Bradley, Elizabeth; Stuart, Joshua

    1998-12-01

    We describe a method for introducing variations into predefined motion sequences using a chaotic symbol-sequence reordering technique. A progression of symbols representing the body positions in a dance piece, martial arts form, or other motion sequence is mapped onto a chaotic trajectory, establishing a symbolic dynamics that links the movement sequence and the attractor structure. A variation on the original piece is created by generating a trajectory with slightly different initial conditions, inverting the mapping, and using special corpus-based graph-theoretic interpolation schemes to smooth any abrupt transitions. Sensitive dependence guarantees that the variation is different from the original; the attractor structure and the symbolic dynamics guarantee that the two resemble one another in both aesthetic and mathematical senses.

  15. Unraveling genomic variation from next generation sequencing data

    PubMed Central

    2013-01-01

    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field. PMID:23885890

  16. Unraveling genomic variation from next generation sequencing data.

    PubMed

    Pavlopoulos, Georgios A; Oulas, Anastasis; Iacucci, Ernesto; Sifrim, Alejandro; Moreau, Yves; Schneider, Reinhard; Aerts, Jan; Iliopoulos, Ioannis

    2013-07-25

    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field.

  17. A case study on the genetic origin of the high oleic acid trait through FAD2-1 DNA sequence variation in safflower (Carthamus tinctorius L.).

    PubMed

    Rapson, Sara; Wu, Man; Okada, Shoko; Das, Alpana; Shrestha, Pushkar; Zhou, Xue-Rong; Wood, Craig; Green, Allan; Singh, Surinder; Liu, Qing

    2015-01-01

    The safflower (Carthamus tinctorius L.) is considered a strongly domesticated species with a long history of cultivation. The hybridization of safflower with its wild relatives has played an important role in the evolution of cultivars and is of particular interest with regards to their production of high quality edible oils. Original safflower varieties were all rich in linoleic acid, while varieties rich in oleic acid have risen to prominence in recent decades. The high oleic acid trait is controlled by a partially recessive allele ol at a single locus OL. The ol allele was found to be a defective microsomal oleate desaturase FAD2-1. Here we present DNA sequence data and Southern blot analysis suggesting that there has been an ancient hybridization and introgression of the FAD2-1 gene into C. tinctorius from its wild relative C. palaestinus. It is from this gene that FAD2-1Δ was derived more recently. Identification and characterization of the genetic origin and diversity of FAD2-1 could aid safflower breeders in reducing population size and generations required for the development of new high oleic acid varieties by using perfect molecular marker-assisted selection.

  18. A case study on the genetic origin of the high oleic acid trait through FAD2-1 DNA sequence variation in safflower (Carthamus tinctorius L.)

    PubMed Central

    Rapson, Sara; Wu, Man; Okada, Shoko; Das, Alpana; Shrestha, Pushkar; Zhou, Xue-Rong; Wood, Craig; Green, Allan; Singh, Surinder; Liu, Qing

    2015-01-01

    The safflower (Carthamus tinctorius L.) is considered a strongly domesticated species with a long history of cultivation. The hybridization of safflower with its wild relatives has played an important role in the evolution of cultivars and is of particular interest with regards to their production of high quality edible oils. Original safflower varieties were all rich in linoleic acid, while varieties rich in oleic acid have risen to prominence in recent decades. The high oleic acid trait is controlled by a partially recessive allele ol at a single locus OL. The ol allele was found to be a defective microsomal oleate desaturase FAD2-1. Here we present DNA sequence data and Southern blot analysis suggesting that there has been an ancient hybridization and introgression of the FAD2-1 gene into C. tinctorius from its wild relative C. palaestinus. It is from this gene that FAD2-1Δ was derived more recently. Identification and characterization of the genetic origin and diversity of FAD2-1 could aid safflower breeders in reducing population size and generations required for the development of new high oleic acid varieties by using perfect molecular marker-assisted selection. PMID:26442008

  19. The regions of sequence variation in caulimovirus gene VI.

    PubMed

    Sanger, M; Daubert, S; Goodman, R M

    1991-06-01

    The sequence of gene VI from figwort mosaic virus (FMV) clone x4 was determined and compared with that previously published for FMV clone DxS. Both clones originated from the same virus isolation, but the virus used to clone DxS was propagated extensively in a host of a different family prior to cloning whereas that used to clone x4 was not. Differences in the amino acid sequence inferred from the DNA sequences occurred in two clusters. An N-terminal conserved region preceded two regions of variation separated by a central conserved region. Variation in cauliflower mosaic virus (CaMV) gene VI sequences, all of which were derived from virus isolates from hosts from one host family, was similar to that seen in the FMV comparison, though the extent of variation was less. Alignment of gene VI domains from FMV and CaMV revealed regions of amino acid sequence identical in both viruses within the conserved regions. The similarity in the pattern of conserved and variable domains of these two viruses suggests common host-interactive functions in caulimovirus gene VI homologues, and possibly an analogy between caulimoviruses and certain animal viruses in the influence of the host on sequence variability of viral genes.

  20. Mapping copy number variation by population-scale genome sequencing.

    PubMed

    Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

    2011-02-03

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

  1. A sequence-based variation map of zebrafish.

    PubMed

    Patowary, Ashok; Purkanti, Ramya; Singh, Meghna; Chauhan, Rajendra; Singh, Angom Ramcharan; Swarnkar, Mohit; Singh, Naresh; Pandey, Vikas; Torroja, Carlos; Clark, Matthew D; Kocher, Jean-Pierre; Clark, Karl J; Stemple, Derek L; Klee, Eric W; Ekker, Stephen C; Scaria, Vinod; Sivasubbu, Sridhar

    2013-03-01

    Zebrafish (Danio rerio) is a popular vertebrate model organism largely deployed using outbred laboratory animals. The nonisogenic nature of the zebrafish as a model system offers the opportunity to understand natural variations and their effect in modulating phenotype. In an effort to better characterize the range of natural variation in this model system and to complement the zebrafish reference genome project, the whole genome sequence of a wild zebrafish at 39-fold genome coverage was determined. Comparative analysis with the zebrafish reference genome revealed approximately 5.2 million single nucleotide variations and over 1.6 million insertion-deletion variations. This dataset thus represents a new catalog of genetic variations in the zebrafish genome. Further analysis revealed selective enrichment for variations in genes involved in immune function and response to the environment, suggesting genome-level adaptations to environmental niches. We also show that human disease gene orthologs in the sequenced wild zebrafish genome show a lower ratio of nonsynonymous to synonymous single nucleotide variations.

  2. Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.

    PubMed

    Wildeman, Martin; van Ophuizen, Ernest; den Dunnen, Johan T; Taschner, Peter E M

    2008-01-01

    Unambiguous and correct sequence variant descriptions are of utmost importance, not in the least since mistakes and uncertainties may lead to undesired errors in clinical diagnosis. We developed the Mutation Analyzer (Mutalyzer) sequence variation nomenclature checker (www.lovd.nl/mutalyzer; last accessed 13 September 2007) for automated analysis and correction of sequence variant descriptions using reference sequences from any organism. Mutalyzer handles most variation types: substitution, deletion, duplication, insertion, indel, and splice-site changes following current recommendations of the Human Genome Variation Society (HGVS). Input is a GenBank accession number or an uploaded reference sequence file in GenBank format with user-modified annotation, an HGNC gene symbol, and the variant (single or in a batch file). Mutalyzer generates variant descriptions at DNA level, the level of all annotated transcripts and the deduced outcome at protein level. To validate Mutalyzer's performance and to investigate the sequence variant description quality in locus-specific mutation databases (LSDBs), more than 11,000 variants in the PAH, BIC BRCA2, and HbVar databases were analyzed, showing that 87%, 25%, and 38%, respectively, were error-free and following the recommendations. Low recognition rates in BIC and HbVar (38% and 51%, respectively) were due to lack of a well-annotated genomic reference sequence (HbVar) or noncompliance to the guidelines (BRCA2). Provided with well-annotated genomic reference sequences, Mutalyzer is very effective for the curation of newly discovered sequence variation descriptions and existing LSDB data. Mutalyzer will be linked to the Leiden Open source Variation Database (LOVD) (www.LOVD.nl; last accessed 13 September 2007) and is the first module of a sequence variant effect prediction package. (c) 2007 Wiley-Liss, Inc.

  3. Mitochondrial sequence variation suggests an African influence in Portuguese cattle.

    PubMed Central

    Cymbron, T; Loftus, R T; Malheiro, M I; Bradley, D G

    1999-01-01

    A total of 49 samples from indigenous Portuguese cattle breeds were analysed for sequence variation in the hypervariable region of the mitochondrial DNA D-loop. Sequence comparison and phylogenetic analyses revealed that haplotypes fell into two distinct groups. These corresponded with two separate haplotype clusters into which, respectively, all African, or alternatively all sequences of European origin, have previously been shown to fall. Here, the majority of sequences of African type were encountered in three southern, as compared to three northern breeds. This pattern of African influence may reflect an intercontinental admixture in the initial origins of Iberian breeds, or it is perhaps an introgression dating from the long and influential Moorish occupation of the south of the Iberian peninsula. PMID:10212450

  4. High Throughput Sequencing: An Overview of Sequencing Chemistry.

    PubMed

    Ambardar, Sheetal; Gupta, Rikita; Trakroo, Deepika; Lal, Rup; Vakhlu, Jyoti

    2016-12-01

    In the present century sequencing is to the DNA science, what gel electrophoresis was to it in the last century. From 1977 to 2016 three generation of the sequencing technologies of various types have been developed. Second and third generation sequencing technologies referred commonly to as next generation sequencing technology, has evolved significantly with increase in sequencing speed, decrease in sequencing cost, since its inception in 2004. GS FLX by 454 Life Sciences/Roche diagnostics, Genome Analyzer, HiSeq, MiSeq and NextSeq by Illumina, Inc., SOLiD by ABI, Ion Torrent by Life Technologies are various type of the sequencing platforms available for second generation sequencing. The platforms available for the third generation sequencing are Helicos™ Genetic Analysis System by SeqLL, LLC, SMRT Sequencing by Pacific Biosciences, Nanopore sequencing by Oxford Nanopore's, Complete Genomics by Beijing Genomics Institute and GnuBIO by BioRad, to name few. The present article is an overview of the principle and the sequencing chemistry of these high throughput sequencing technologies along with brief comparison of various types of sequencing platforms available.

  5. An Overview of Genomic Sequence Variation Markup Language (GSVML)

    PubMed Central

    Nakaya, Jun; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Kimura, Michio

    2006-01-01

    Internationally accumulated genomic sequence variation data on human requires the interoperable data exchanging format. We developed the GSVML as the data exchanging format. The GSVML is human health oriented and has three categories. Analyses on the use case in human health domain and the investigation on the databases and markup languages were conducted. An interface ability to Health Level Seven Genotype Model was examined. GSVML provides a sharable platform for both clinical and research applications.

  6. Gene sequence variations and expression patterns of mitochondrial genes are associated with the adaptive evolution of two Gynaephora species (Lepidoptera: Lymantriinae) living in different high-elevation environments.

    PubMed

    Zhang, Qi-Lin; Zhang, Li; Zhao, Tian-Xuan; Wang, Juan; Zhu, Qian-Hua; Chen, Jun-Yuan; Yuan, Ming-Long

    2017-04-30

    The adaptive evolution of animals to high-elevation environments has been extensively studied in vertebrates, while few studies have focused on insects. Gynaephora species (Lepidoptera: Lymantriinae) are endemic to the Qinghai-Tibetan Plateau (QTP) and represent an important insect pest of alpine meadows. Here, we present a detailed comparative analysis of the mitochondrial genomes (mitogenomes) of two Gynaephora species inhabiting different high-elevation environments: G. alpherakii and G. menyuanensis. The results indicated that the general mitogenomic features (genome size, nucleotide composition, codon usage and secondary structures of tRNAs) were well conserved between the two species. All of mitochondrial protein-coding genes were evolving under purifying selection, suggesting that selection constraints may play a role in ensuring adequate energy production. However, a number of substitutions and indels were identified that altered the protein conformations of ATP8 and NAD1, which may be the result of adaptive evolution of the two Gynaephora species to different high-elevation environments. Levels of gene expression for nine mitochondrial genes in nine different developmental stages were significantly suppressed in G. alpherakii, which lives at the higher elevation (~4800m above sea level), suggesting that gene expression patterns could be modulated by atmospheric oxygen content and environmental temperature. These results enhance our understanding of the genetic bases for the adaptive evolution of insects endemic to the QTP.

  7. STR allele sequence variation: Current knowledge and future issues.

    PubMed

    Gettings, Katherine Butler; Aponte, Rachel A; Vallone, Peter M; Butler, John M

    2015-09-01

    This article reviews what is currently known about short tandem repeat (STR) allelic sequence variation in and around the twenty-four loci most commonly used throughout the world to perform forensic DNA investigations. These STR loci include D1S1656, TPOX, D2S441, D2S1338, D3S1358, FGA, CSF1PO, D5S818, SE33, D6S1043, D7S820, D8S1179, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D18S51, D19S433, D21S11, Penta D, and D22S1045. All known reported variant alleles are compiled along with genomic information available from GenBank, dbSNP, and the 1000 Genomes Project. Supplementary files are included which provide annotated reference sequences for each STR locus, characterize genomic variation around the STR repeat region, and compare alleles present in currently available STR kit allelic ladders. Looking to the future, STR allele nomenclature options are discussed as they relate to next generation sequencing efforts underway.

  8. An improved protocol for sequencing of repetitive genomic regions and structural variations using mutagenesis and next generation sequencing.

    PubMed

    Sipos, Botond; Massingham, Tim; Stütz, Adrian M; Goldman, Nick

    2012-01-01

    The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.

  9. Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

    PubMed

    Perry, George H; Melsted, Páll; Marioni, John C; Wang, Ying; Bainer, Russell; Pickrell, Joseph K; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D; Stephens, Matthew; Pritchard, Jonathan K; Gilad, Yoav

    2012-04-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success.

  10. Comparative RNA sequencing reveals substantial genetic variation in endangered primates

    PubMed Central

    Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav

    2012-01-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615

  11. Variational formulation of high performance finite elements: Parametrized variational principles

    NASA Technical Reports Server (NTRS)

    Felippa, Carlos A.; Militello, Carmello

    1991-01-01

    High performance elements are simple finite elements constructed to deliver engineering accuracy with coarse arbitrary grids. This is part of a series on the variational basis of high-performance elements, with emphasis on those constructed with the free formulation (FF) and assumed natural strain (ANS) methods. Parametrized variational principles that provide a foundation for the FF and ANS methods, as well as for a combination of both are presented.

  12. Geochemical variations during the 2012 Emilia seismic sequence

    NASA Astrophysics Data System (ADS)

    Sciarra, Alessandra; Cantucci, Barbara; Galli, Gianfranco; Cinti, Daniele; Pizzino, Luca

    2015-04-01

    , apart one sample, are not thermally anomalous. Stable isotopes of H and O point out the absence of mixing with connate waters, prolonged interaction with the host-rock at high temperature and/or heavy gas-water exchange at depth. Isotopic carbon composition emphasizes its organic (i.e. shallow) origin; only "La Canonica" site, the deepest well sampled in this study, shows a probable deep(er) provenance of dissolved carbon. Waters trend away from the atmospheric end-member composition, dissolving CO2 or CH4 depending on their redox state. Dissolved radon activity is very low, likely due to the particular hydrogeological setting of the study area (i.e. the presence of waters with long residence times in the considered aquifers). Obtained results highlight a different behavior before and after the seismic events, proved also by the different carbon isotopic signature of CH4. These variations could be produced by increasing of bacterial (e.g. peat strata) and methanogenic fermentation processes in the first meters of the soil.

  13. The Quantification of Representative Sequences pipeline for amplicon sequencing: case study on within-population ITS1 sequence variation in a microparasite infecting Daphnia.

    PubMed

    González-Tortuero, E; Rusek, J; Petrusek, A; Gießler, S; Lyras, D; Grath, S; Castro-Monzón, F; Wolinska, J

    2015-11-01

    Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within-population variation. Additionally, a public Illumina data set was used to validate the pipeline on community-level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova) revealed that population structure of C. mesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within-population structure but also the successful application of the QRS pipeline on Illumina-generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences.

  14. DNA Shape versus Sequence Variations in the Protein Binding Process.

    PubMed

    Chen, Chuanying; Pettitt, B Montgomery

    2016-02-02

    The binding process of a protein with a DNA involves three stages: approach, encounter, and association. It has been known that the complexation of protein and DNA involves mutual conformational changes, especially for a specific sequence association. However, it is still unclear how the conformation and the information in the DNA sequences affects the binding process. What is the extent to which the DNA structure adopted in the complex is induced by protein binding, or is instead intrinsic to the DNA sequence? In this study, we used the multiscale simulation method to explore the binding process of a protein with DNA in terms of DNA sequence, conformation, and interactions. We found that in the approach stage the protein can bind both the major and minor groove of the DNA, but uses different features to locate the binding site. The intrinsic conformational properties of the DNA play a significant role in this binding stage. By comparing the specific DNA with the nonspecific in unbound, intermediate, and associated states, we found that for a specific DNA sequence, ∼40% of the bending in the association forms is intrinsic and that ∼60% is induced by the protein. The protein does not induce appreciable bending of nonspecific DNA. In addition, we proposed that the DNA shape variations induced by protein binding are required in the early stage of the binding process, so that the protein is able to approach, encounter, and form an intermediate at the correct site on DNA. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  15. Mitochondrial D-loop sequence variation among Italian horse breeds

    PubMed Central

    Cozzi, Maria Cristina; Strillacci, Maria Giuseppina; Valiati, Paolo; Bighignoli, Barbara; Cancedda, Mario; Zanotti, Marta

    2004-01-01

    The genetic variability of the mitochondrial D-loop DNA sequence in seven horse breeds bred in Italy (Giara, Haflinger, Italian trotter, Lipizzan, Maremmano, Thoroughbred and Sarcidano) was analysed. Five unrelated horses were chosen in each breed and twenty-two haplotypes were identified. The sequences obtained were aligned and compared with a reference sequence and with 27 mtDNA D-loop sequences selected in the GenBank database, representing Spanish, Portuguese, North African, wild horses and an Equus asinus sequence as the outgroup. Kimura two-parameter distances were calculated and a cluster analysis using the Neighbour-joining method was performed to obtain phylogenetic trees among breeds bred in Italy and among Italian and foreign breeds. The cluster analysis indicates that all the breeds but Giara are divided in the two trees, and no clear relationships were revealed between Italian populations and the other breeds. These results could be interpreted as showing the mixed origin of breeds bred in Italy and probably indicate the presence of many ancient maternal lineages with high diversity in mtDNA sequences. PMID:15496286

  16. Ribosomal DNA copy number loss and sequence variation in cancer

    PubMed Central

    Xu, Baoshan; Li, Hua; Perry, John M.; Singh, Vijay Pratap; Yu, Zulin; Zakari, Musinu; Li, Linheng

    2017-01-01

    Ribosomal DNA is one of the most variable regions in the human genome with respect to copy number. Despite the importance of rDNA for cellular function, we know virtually nothing about what governs its copy number, stability, and sequence in the mammalian genome due to challenges associated with mapping and analysis. We applied computational and droplet digital PCR approaches to measure rDNA copy number in normal and cancer states in human and mouse genomes. We find that copy number and sequence can change in cancer genomes. Counterintuitively, human cancer genomes show a loss of copies, accompanied by global copy number co-variation. The sequence can also be more variable in the cancer genome. Cancer genomes with lower copies have mutational evidence of mTOR hyperactivity. The PTEN phosphatase is a tumor suppressor that is critical for genome stability and a negative regulator of the mTOR kinase pathway. Surprisingly, but consistent with the human cancer genomes, hematopoietic cancer stem cells from a Pten-/- mouse model for leukemia have lower rDNA copy number than normal tissue, despite increased proliferation, rRNA production, and protein synthesis. Loss of copies occurs early and is associated with hypersensitivity to DNA damage. Therefore, copy loss is a recurrent feature in cancers associated with mTOR activation. Ribosomal DNA copy number may be a simple and useful indicator of whether a cancer will be sensitive to DNA damaging treatments. PMID:28640831

  17. Ribosomal DNA copy number loss and sequence variation in cancer.

    PubMed

    Xu, Baoshan; Li, Hua; Perry, John M; Singh, Vijay Pratap; Unruh, Jay; Yu, Zulin; Zakari, Musinu; McDowell, William; Li, Linheng; Gerton, Jennifer L

    2017-06-01

    Ribosomal DNA is one of the most variable regions in the human genome with respect to copy number. Despite the importance of rDNA for cellular function, we know virtually nothing about what governs its copy number, stability, and sequence in the mammalian genome due to challenges associated with mapping and analysis. We applied computational and droplet digital PCR approaches to measure rDNA copy number in normal and cancer states in human and mouse genomes. We find that copy number and sequence can change in cancer genomes. Counterintuitively, human cancer genomes show a loss of copies, accompanied by global copy number co-variation. The sequence can also be more variable in the cancer genome. Cancer genomes with lower copies have mutational evidence of mTOR hyperactivity. The PTEN phosphatase is a tumor suppressor that is critical for genome stability and a negative regulator of the mTOR kinase pathway. Surprisingly, but consistent with the human cancer genomes, hematopoietic cancer stem cells from a Pten-/- mouse model for leukemia have lower rDNA copy number than normal tissue, despite increased proliferation, rRNA production, and protein synthesis. Loss of copies occurs early and is associated with hypersensitivity to DNA damage. Therefore, copy loss is a recurrent feature in cancers associated with mTOR activation. Ribosomal DNA copy number may be a simple and useful indicator of whether a cancer will be sensitive to DNA damaging treatments.

  18. Inter-specific sequence conservation and intra-individual sequence variation in a spider silk gene.

    PubMed

    Tai, Pei-Ling; Hwang, Guang-Yuh; Tso, I-Min

    2004-10-01

    Currently, studies on major ampullate spidroin 1 (MaSp1) genes of non-orb weaving spiders are few, and it is not clear whether genes of these organisms exhibit the same characteristics as those of orb-weavers. In addition, many studies have proposed that MaSp1 might be a single gene with allelic variants, but supporting evidence is still lacking. In this study, we compared partial DNA and amino acid sequences of MaSp1 cloned from different spider guilds. We also cloned partial MaSp1 sequences from genomic DNA and cDNA of the same individuals of spiders using the same primer combination to see if different molecular forms existed. In the repetitive region of partial MaSp1 sequences obtained, GGX, GA and poly-A motifs were present in all Araneomorphae and Mygalomorpae species examined. An extreme similarity in MaSp1 non-repetitive portions was found in sequences of ecribellate, cribellate and Mygalomorphae web-builders and such a result suggested that this sequence might exhibit an important function. A comparison of sequences amplified from the same individual showed that substitutions in amino acids occurred in both repetitive and non-repetitive regions, with a much higher variation in the former. These results suggest that the MaSp1 of Araneomorphae spiders exhibits several forms in an individual spider and it might be either a multiple gene or a single gene with a multiple exon/intron organization.

  19. Lysoplex: An efficient toolkit to detect DNA sequence variations in the autophagy-lysosomal pathway

    PubMed Central

    Di Fruscio, Giuseppina; Schulz, Angela; De Cegli, Rossella; Savarese, Marco; Mutarelli, Margherita; Parenti, Giancarlo; Banfi, Sandro; Braulke, Thomas; Nigro, Vincenzo; Ballabio, Andrea

    2015-01-01

    The autophagy-lysosomal pathway (ALP) regulates cell homeostasis and plays a crucial role in human diseases, such as lysosomal storage disorders (LSDs) and common neurodegenerative diseases. Therefore, the identification of DNA sequence variations in genes involved in this pathway and their association with human diseases would have a significant impact on health. To this aim, we developed Lysoplex, a targeted next-generation sequencing (NGS) approach, which allowed us to obtain a uniform and accurate coding sequence coverage of a comprehensive set of 891 genes involved in lysosomal, endocytic, and autophagic pathways. Lysoplex was successfully validated on 14 different types of LSDs and then used to analyze 48 mutation-unknown patients with a clinical phenotype of neuronal ceroid lipofuscinosis (NCL), a genetically heterogeneous subtype of LSD. Lysoplex allowed us to identify pathogenic mutations in 67% of patients, most of whom had been unsuccessfully analyzed by several sequencing approaches. In addition, in 3 patients, we found potential disease-causing variants in novel NCL candidate genes. We then compared the variant detection power of Lysoplex with data derived from public whole exome sequencing (WES) efforts. On average, a 50% higher number of validated amino acid changes and truncating variations per gene were identified. Overall, we identified 61 truncating sequence variations and 488 missense variations with a high probability to cause loss of function in a total of 316 genes. Interestingly, some loss-of-function variations of genes involved in the ALP pathway were found in homozygosity in the normal population, suggesting that their role is not essential. Thus, Lysoplex provided a comprehensive catalog of sequence variants in ALP genes and allows the assessment of their relevance in cell biology as well as their contribution to human disease. PMID:26075876

  20. A sparse model based detection of copy number variations from exome sequencing data

    PubMed Central

    Duan, Junbo; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2016-01-01

    Goal Whole-exome sequencing provides a more cost-effective way than whole-genome sequencing for detecting genetic variants such as copy number variations (CNVs). Although a number of approaches have been proposed to detect CNVs from whole-genome sequencing, a direct adoption of these approaches to whole-exome sequencing will often fail because exons are separately located along a genome. Therefore, an appropriate method is needed to target the specific features of exome sequencing data. Methods In this paper a novel sparse model based method is proposed to discover CNVs from multiple exome sequencing data. First, exome sequencing data are represented with a penalized matrix approximation, and technical variability and random sequencing errors are assumed to follow a generalized Gaussian distribution. Second, an iteratively re-weighted least squares algorithm is used to estimate the solution. Results The method is tested and validated on both synthetic and real data, and compared with other approaches including CoNIFER, XHMM and cn.MOPS. The test demonstrates that the proposed method outperform other approaches. Conclusion The proposed sparse model can detect CNVs from exome sequencing data with high power and precision. Significance Sparse model can target the specific features of exome sequencing data. The software codes are freely available at http://www.tulane.edu/wyp/software/ExonCNV.m PMID:26258935

  1. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  2. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein

  3. Protein 3D structure computed from evolutionary sequence variation.

    PubMed

    Marks, Debora S; Colwell, Lucy J; Sheridan, Robert; Hopf, Thomas A; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures

  4. Length Variation, Heteroplasmy and Sequence Divergence in the Mitochondrial DNA of Four Species of Sturgeon (Acipenser)

    PubMed Central

    Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.

    1996-01-01

    The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850

  5. Length variation, heteroplasmy and sequence divergence in the mitochondrial DNA of four species of sturgeon (Acipenser).

    PubMed

    Brown, J R; Beckenbach, K; Beckenbach, A T; Smith, M J

    1996-02-01

    The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications.

  6. CODEX: a normalization and copy number variation detection method for whole exome sequencing.

    PubMed

    Jiang, Yuchao; Oldridge, Derek A; Diskin, Sharon J; Zhang, Nancy R

    2015-03-31

    High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.

  7. CODEX: a normalization and copy number variation detection method for whole exome sequencing

    PubMed Central

    Jiang, Yuchao; Oldridge, Derek A.; Diskin, Sharon J.; Zhang, Nancy R.

    2015-01-01

    High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures. PMID:25618849

  8. Sequence Polymorphisms and Structural Variations among Four Grapevine (Vitis vinifera L.) Cultivars Representing Sardinian Agriculture

    PubMed Central

    Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore

    2017-01-01

    The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732

  9. Natural Allelic Variations in Highly Polyploidy Saccharum Complex

    DOE PAGES

    Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.; ...

    2016-06-08

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed basedmore » on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less

  10. Natural Allelic Variations in Highly Polyploidy Saccharum Complex.

    PubMed

    Song, Jian; Yang, Xiping; Resende, Marcio F R; Neves, Leandro G; Todd, James; Zhang, Jisen; Comstock, Jack C; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.

  11. Natural Allelic Variations in Highly Polyploidy Saccharum Complex

    PubMed Central

    Song, Jian; Yang, Xiping; Resende, Marcio F. R.; Neves, Leandro G.; Todd, James; Zhang, Jisen; Comstock, Jack C.; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes. PMID:27375658

  12. High frequency of the IVS2-2A>G DNA sequence variation in SLC26A5, encoding the cochlear motor protein prestin, precludes its involvement in hereditary hearing loss.

    PubMed

    Tang, Hsiao-Yuan; Xia, Anping; Oghalai, John S; Pereira, Fred A; Alford, Raye L

    2005-08-08

    Cochlear outer hair cells change their length in response to variations in membrane potential. This capability, called electromotility, is believed to enable the sensitivity and frequency selectivity of the mammalian cochlea. Prestin is a transmembrane protein required for electromotility. Homozygous prestin knockout mice are profoundly hearing impaired. In humans, a single nucleotide change in SLC26A5, encoding prestin, has been reported in association with hearing loss. This DNA sequence variation, IVS2-2A>G, occurs in the exon 3 splice acceptor site and is expected to abolish splicing of exon 3. To further explore the relationship between hearing loss and the IVS2-2A>G transition, and assess allele frequency, genomic DNA from hearing impaired and control subjects was analyzed by DNA sequencing. SLC26A5 genomic DNA sequences from human, chimp, rat, mouse, zebrafish and fruit fly were aligned and compared for evolutionary conservation of the exon 3 splice acceptor site. Alternative splice acceptor sites within intron 2 of human SLC26A5 were sought using a splice site prediction program from the Berkeley Drosophila Genome Project. The IVS2-2A>G variant was found in a heterozygous state in 4 of 74 hearing impaired subjects of Hispanic, Caucasian or uncertain ethnicity and 4 of 150 Hispanic or Caucasian controls (p = 0.45). The IVS2-2A>G variant was not found in 106 subjects of Asian or African American descent. No homozygous subjects were identified (n = 330). Sequence alignment of SLC26A5 orthologs demonstrated that the A nucleotide at position IVS2-2 is invariant among several eukaryotic species. Sequence analysis also revealed five potential alternative splice acceptor sites in intron 2 of human SLC26A5. These data suggest that the IVS2-2A>G variant may not occur more frequently in hearing impaired subjects than in controls. The identification of five potential alternative splice acceptor sites in intron 2 of human SLC26A5 suggests a potential mechanism by which

  13. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    PubMed

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  14. Disentangling sources of variation in SSU rDNA sequences from single cell analyses of ciliates: impact of copy number variation and experimental error.

    PubMed

    Wang, Chundi; Zhang, Tengteng; Wang, Yurui; Katz, Laura A; Gao, Feng; Song, Weibo

    2017-07-26

    Small subunit ribosomal DNA (SSU rDNA) is widely used for phylogenetic inference, barcoding and other taxonomy-based analyses. Recent studies indicate that SSU rDNA of ciliates may have a high level of sequence variation within a single cell, which impacts the interpretation of rDNA-based surveys. However, sequence variation can come from a variety of sources including experimental errors, especially the mutations generated by DNA polymerase in PCR. In the present study, we explore the impact of four DNA polymerases on sequence variation and find that low-fidelity polymerases exaggerate the estimates of single-cell sequence variation. Therefore, using a polymerase with high fidelity is essential for surveys of sequence variation. Another source of variation results from errors during amplification of SSU rDNA within the polyploidy somatic macronuclei of ciliates. To investigate further the impact of SSU rDNA copy number variation, we use a high-fidelity polymerase to examine the intra-individual SSU rDNA polymorphism in ciliates with varying levels of macronuclear amplification: Halteria grandinella, Blepharisma americanum and Strombidium stylifer We estimate the rDNA copy numbers of these three species by single-cell quantitative PCR. The results indicate that: (i) sequence variation of SSU rDNA within a single cell is authentic in ciliates, but the level of intra-individual SSU rDNA polymorphism varies greatly among species; (ii) rDNA copy numbers vary greatly among species, even those within the same class; (iii) the average rDNA copy number of Halteria grandinella is about 567 893 (s.d. = 165 481), which is the highest record of rDNA copy number in ciliates to date; and (iv) based on our data and the records from previous studies, it is not always true in ciliates that rDNA copy numbers are positively correlated with cell or genome size. © 2017 The Author(s).

  15. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  16. Joint genotyping on the fly: Identifying variation among a sequenced panel of inbred lines

    PubMed Central

    Stone, Eric A.

    2012-01-01

    High-throughput sequencing is enabling remarkably deep surveys of genomic variation. It is now possible to completely sequence multiple individuals from a single species, yet the identification of variation among them remains an evolving computational challenge. This challenge is compounded for experimental organisms when strains are studied instead of individuals. In response, we present the Joint Genotyper for Inbred Lines (JGIL) as a method for obtaining genotypes and identifying variation among a large panel of inbred strains or lines. JGIL inputs the sequence reads from each line after their alignment to a common reference. Its probabilistic model includes site-specific parameters common to all lines that describe the frequency of nucleotides segregating in the population from which the inbred panel was derived. The distribution of line genotypes is conditional on these parameters and reflects the experimental design. Site-specific error probabilities, also common to all lines, parameterize the distribution of reads conditional on line genotype and realized coverage. Both sets of parameters are estimated per site from the aggregate read data, and posterior probabilities are calculated to decode the genotype of each line. We present an application of JGIL to 162 inbred Drosophila melanogaster lines from the Drosophila Genetic Reference Panel. We explore by simulation the effect of varying coverage, sequencing error, mapping error, and the number of lines. In doing so, we illustrate how JGIL is robust to moderate levels of error. Supported by these analyses, we advocate the importance of modeling the data and the experimental design when possible. PMID:22367192

  17. Analysis of DNA Sequence Variants Detected by High Throughput Sequencing

    PubMed Central

    Adams, David R; Sincan, Murat; Fajardo, Karin Fuentes; Mullikin, James C; Pierson, Tyler M; Toro, Camilo; Boerkoel, Cornelius F; Tifft, Cynthia J; Gahl, William A; Markello, Tom C

    2014-01-01

    The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. PMID:22290882

  18. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

    PubMed

    Laehnemann, David; Borkhardt, Arndt; McHardy, Alice Carolyn

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

  19. Analysis of sequence variation in Gnathostoma spinigerum mitochondrial DNA by single-strand conformation polymorphism analysis and DNA sequence.

    PubMed

    Ngarmamonpirat, Charinthon; Waikagul, Jitra; Petmitr, Songsak; Dekumyoy, Paron; Rojekittikhun, Wichit; Anantapruti, Malinee T

    2005-03-01

    Morphological variations were observed in the advance third stage larvae of Gnathostoma spinigerum collected from swamp eel (Fluta alba), the second intermediate host. Larvae with typical and three atypical types were chosen for partial cytochrome c oxidase subunit I (COI) gene sequence analysis. A 450 bp polymerase chain reaction product of the COI gene was amplified from mitochondrial DNA. The variations were analyzed by single-strand conformation polymorphism and DNA sequencing. The nucleotide variations of the COI gene in the four types of larvae indicated the presence of an intra-specific variation of mitochondrial DNA in the G. spinigerum population.

  20. Genetic validation of whole-transcriptome sequencing for mapping expression affected by cis-regulatory variation

    PubMed Central

    2010-01-01

    Background Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Results Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Conclusion Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing. PMID:20707912

  1. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

    PubMed

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

    2015-01-15

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development.

  2. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions

    PubMed Central

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M.; Greenwood, Alex D.; Roca, Alfred L.

    2014-01-01

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  3. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    PubMed Central

    Chateigner, Aurélien; Bézier, Annie; Labrousse, Carole; Jiolle, Davy; Barbe, Valérie; Herniou, Elisabeth A.

    2015-01-01

    Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%). K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs). Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential. PMID:26198241

  4. A map of human genome variation from population-scale sequencing.

    PubMed

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  5. Quantification of the variation in percentage identity for protein sequence alignments

    PubMed Central

    Raghava, GPS; Barton, Geoffrey J

    2006-01-01

    Background Percentage Identity (PID) is frequently quoted in discussion of sequence alignments since it appears simple and easy to understand. However, although there are several different ways to calculate percentage identity and each may yield a different result for the same alignment, the method of calculation is rarely reported. Accordingly, quantification of the variation in PID caused by the different calculations would help in interpreting PID values in the literature. In this study, the variation in PID was quantified systematically on a reference set of 1028 alignments generated by comparison of the protein three-dimensional structures. Since the alignment algorithm may also affect the range of PID, this study also considered the effect of algorithm, and the combination of algorithm and PID method. Results The maximum variation in PID due to the calculation method was 11.5% while the effect of alignment algorithm on PID was up to 14.6% across three popular alignment methods. The combined effect of alignment algorithm and PID calculation gave a variation of up to 22% on the test data, with an average of 5.3% ± 2.8% for sequence pairs with < 30% identity. In order to see which PID method was most highly correlated with structural similarity, four different PID calculations were compared to similarity scores (Sc) from the comparison of the corresponding protein three-dimensional structures. The highest correlation coefficient for a PID calculation was 0.80. In contrast, the more sophisticated Z-score calculated by reference to randomized sequences gave a correlation coefficient of 0.84. Conclusion Although it is well known amongst expert sequence analysts that PID is a poor score for discriminating between protein sequences, the apparent simplicity of the percentage identity score encourages its widespread use in establishing cutoffs for structural similarity. This paper illustrates that not only is PID a poor measure of sequence similarity when compared to

  6. Variation in the sequence and modification state of the human insulin gene flanking regions.

    PubMed

    Ullrich, A; Dull, T J; Gray, A; Philips, J A; Peter, S

    1982-04-10

    The nucleotide sequence of a highly repetitive sequence region upstream from the human insulin gene is reported. The length of this region varies between alleles in the population, and appears to be stably transmitted to the next generation in a Mendelian fashion. There is no significant correlation between the length of this sequence and two types of diabetes mellitus. We observe variation in the cleavability of a BglI recognition site downstream from the human insulin gene, which is probably due to variable nucleotide modification. This presumed modification state appears not to be inherited, and varies between tissues within an individual and between individuals for a given tissue. Both alleles in a given tissue DNA sample are modified to the same extent.

  7. A framework for variation discovery and genotyping using next-generation DNA sequencing data

    PubMed Central

    DePristo, M.A.; Banks, E.; Poplin, R.E.; Garimella, K.V.; Maguire, J.R.; Hartl, C.; Philippakis, A.A.; del Angel, G.; Rivas, M.A; Hanna, M.; McKenna, A.; Fennell, T.J.; Kernytsky, A.M.; Sivachenko, A.Y.; Cibulskis, K.; Gabriel, S.B.; Altshuler, D.; Daly, M.J.

    2011-01-01

    Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets. PMID:21478889

  8. Characterization of ADME gene variation in 21 populations by exome sequencing

    PubMed Central

    Hovelson, Daniel H.; Xue, Zhengyu; Zawistowski, Matthew; Ehm, Margaret G.; Harris, Elizabeth C.; Stocker, Sophie L.; Gross, Annette S.; Jang, In-Jin; Ieiri, Ichiro; Lee, Jong-Eun; Cardon, Lon R.; Chissoe, Stephanie L.; Abecasis, Gonçalo

    2017-01-01

    Objective Proteins involving absorption, distribution, metabolism, and excretion (ADME) play a critical role in drug pharmacokinetics. The type and frequency of genetic variation in the ADME genes differ among populations. The aim of this study was to systematically investigate common and rare ADME coding variation in diverse ethnic populations by exome sequencing. Materials and methods Data derived from commercial exome capture arrays and next-generation sequencing were used to characterize coding variation in 298 ADME genes in 251 Northeast Asians and 1181 individuals from the 1000 Genomes Project. Results Approximately 75% of the ADME coding sequence was captured at high quality across the joint samples harboring more than 8000 variants, with 49% of individuals carrying at least one ‘knockout’ allele. ADME genes carried 50% more nonsynonymous variation than non-ADME genes (P=8.2×10–13) and showed significantly greater levels of population differentiation (P=7.6×10–11). Out of the 2135 variants identified that were predicted to be deleterious, 633 were not on commercially available ADME or general-purpose genotyping arrays. Forty deleterious variants within important ADME genes, with frequencies of at least 2% in at least one population, were identified as candidates for future pharmacogenetic studies. Conclusion Exome sequencing was effective in accurately genotyping most ADME variants important for pharmacogenetic research, in addition to identifying rare or potentially de novo coding variants that may be clinically meaningful. Furthermore, as a class, ADME genes are more variable and less sensitive to purifying selection than non-ADME genes. PMID:27984508

  9. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing.

    PubMed

    Ferreira, Pedro G; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R; Rivas, Manuel A; Esteve-Codina, Anna; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-09-12

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing-alternative splice sites, introns, and cleavage sites-which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

  10. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    PubMed Central

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A.C.T; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-01-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts. PMID:27617755

  11. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    NASA Astrophysics Data System (ADS)

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-09-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

  12. Comprehensive assessment of sequence variation within the copy number variable defensin cluster on 8p23 by target enriched in-depth 454 sequencing.

    PubMed

    Taudien, Stefan; Szafranski, Karol; Felder, Marius; Groth, Marco; Huse, Klaus; Raffaelli, Francesca; Petzold, Andreas; Zhang, Xinmin; Rosenstiel, Philip; Hampe, Jochen; Schreiber, Stefan; Platzer, Matthias

    2011-05-18

    In highly copy number variable (CNV) regions such as the human defensin gene locus, comprehensive assessment of sequence variations is challenging. PCR approaches are practically restricted to tiny fractions, and next-generation sequencing (NGS) approaches of whole individual genomes e.g. by the 1000 Genomes Project is confined by an affordable sequence depth. Combining target enrichment with NGS may represent a feasible approach. As a proof of principle, we enriched a ~850 kb section comprising the CNV defensin gene cluster DEFB, the invariable DEFA part and 11 control regions from two genomes by sequence capture and sequenced it by 454 technology. 6,651 differences to the human reference genome were found. Comparison to HapMap genotypes revealed sensitivities and specificities in the range of 94% to 99% for the identification of variations.Using error probabilities for rigorous filtering revealed 2,886 unique single nucleotide variations (SNVs) including 358 putative novel ones. DEFB CN determinations by haplotype ratios were in agreement with alternative methods. Although currently labor extensive and having high costs, target enriched NGS provides a powerful tool for the comprehensive assessment of SNVs in highly polymorphic CNV regions of individual genomes. Furthermore, it reveals considerable amounts of putative novel variations and simultaneously allows CN estimation.

  13. Spatio-temporal Variations of Characteristic Repeating Earthquake Sequences along the Middle America Trench in Mexico

    NASA Astrophysics Data System (ADS)

    Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.

    2015-12-01

    Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.

  14. Phylogenetic Sequence Variations in Bacterial rRNA Affect Species-Specific Susceptibility to Drugs Targeting Protein Synthesis▿‡

    PubMed Central

    Akshay, Subramanian; Bertea, Mihai; Hobbie, Sven N.; Oettinghaus, Björn; Shcherbakov, Dimitri; Böttger, Erik C.; Akbergenov, Rashid

    2011-01-01

    Antibiotics targeting the bacterial ribosome typically bind to highly conserved rRNA regions with only minor phylogenetic sequence variations. It is unclear whether these sequence variations affect antibiotic susceptibility or resistance development. To address this question, we have investigated the drug binding pockets of aminoglycosides and macrolides/ketolides. The binding site of aminoglycosides is located within helix 44 of the 16S rRNA (A site); macrolides/ketolides bind to domain V of the 23S rRNA (peptidyltransferase center). We have used mutagenesis of rRNA sequences in Mycobacterium smegmatis ribosomes to reconstruct the different bacterial drug binding sites and to study the effects of rRNA sequence variations on drug activity. Our results provide a rationale for differences in species-specific drug susceptibility patterns and species-specific resistance phenotypes associated with mutational alterations in the drug binding pocket. PMID:21730122

  15. Storage and retrieval of highly repetitive sequence collections.

    PubMed

    Mäkinen, Veli; Navarro, Gonzalo; Sirén, Jouni; Välimäki, Niko

    2010-03-01

    A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible sequence will be incompressible under this model. We develop new static and dynamic full-text indexes that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations. The new indexes can be plugged into a recent dynamic fully-compressed suffix tree, achieving full functionality for sequence analysis, while retaining the reduced space and the polylogarithmic slowdown. Our experimental results confirm the practicality of our proposal.

  16. The impact of variation in the pulse sequence parameters on image uniformity in magnetic resonance imaging.

    PubMed

    Amin, Naima; Afzal, Mohammad

    2009-04-01

    To evaluate the practical impact of alteration of key imaging parameters of Magnetic Resonance Imaging on image quality and effectiveness provided by widely available fast imaging pulse sequences. A tissue equivalent material for Magnetic resonance Imaging (MRI) has been produced from a polysaccharide gel, agros, containing gadolinium chloride chelated to Ethylene Diamine Tetra- Acetic acid (EDTA) with a sort of T1 and T2 values. Experimental variations in key parameters included echo time (TE) and repetition time TR. Quantitative analysis consisted of image nonuniformity. In T2 weighted images; any change in TE played a critical role in the signal homogeneity in all pulse sequences. The percentage of nonuniformity was incredibly high in T2 weighted image but the change of TR was insignificant in T2-weighted study. Involving T1 weighted images, percentage of nonuniformity was high in gradient recalled echo (GRE), also noticeable in fast fluid attenuated recovery (FLAIR) but quite acceptable in fast spin echo (FSE) and conventional spin echo (CSE). Selection of parameters relatively simple in CSE both in T1, T2-weighted study that maintains image uniformity and quality as well. GRE is a very sensitive pulse sequence for any variation in parameters and loose signal uniformity rapidly.

  17. Variation in the nucleotide sequence of a prolamin gene family in wild rice.

    PubMed

    Barbier, P; Ishihama, A

    1990-07-01

    Variation in the DNA sequence of the 10 kDa prolamin gene family within the wild rice species Oryza rufipogon was probed using the direct sequencing of PCR-amplified genes. A comparison of the nucleotide and deduced amino-acid sequences of eight Asian strains of O. rufipogon and one strain of the related African species O. longistaminata is presented.

  18. Forward Genetics by Sequencing EMS Variation-Induced Inbred Lines

    PubMed Central

    Addo-Quaye, Charles; Buescher, Elizabeth; Best, Norman; Chaikam, Vijay; Baxter, Ivan; Dilkes, Brian P.

    2016-01-01

    In order to leverage novel sequencing techniques for cloning genes in eukaryotic organisms with complex genomes, the false positive rate of variant discovery must be controlled for by experimental design and informatics. We sequenced five lines from three pedigrees of ethyl methanesulfonate (EMS)-mutagenized Sorghum bicolor, including a pedigree segregating a recessive dwarf mutant. Comparing the sequences of the lines, we were able to identify and eliminate error-prone positions. One genomic region contained EMS mutant alleles in dwarfs that were homozygous reference sequences in wild-type siblings and heterozygous in segregating families. This region contained a single nonsynonymous change that cosegregated with dwarfism in a validation population and caused a premature stop codon in the Sorghum ortholog encoding the gibberellic acid (GA) biosynthetic enzyme ent-kaurene oxidase. Application of exogenous GA rescued the mutant phenotype. Our method for mapping did not require outcrossing and introduced no segregation variance. This enables work when line crossing is complicated by life history, permitting gene discovery outside of genetic models. This inverts the historical approach of first using recombination to define a locus and then sequencing genes. Our formally identical approach first sequences all the genes and then seeks cosegregation with the trait. Mutagenized lines lacking obvious phenotypic alterations are available for an extension of this approach: mapping with a known marker set in a line that is phenotypically identical to starting material for EMS mutant generation. PMID:28040779

  19. BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers

    PubMed Central

    Abo, Ryan P.; Ducar, Matthew; Garcia, Elizabeth P.; Thorner, Aaron R.; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M.; Hahn, William C.; Meyerson, Matthew; Lindeman, Neal I.; Van Hummelen, Paul; MacConaill, Laura E.

    2015-01-01

    Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for ‘targeted’ resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a ‘kmer’ strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. PMID:25428359

  20. Intragenomic and interspecific 5S rDNA sequence variation in five Asian pines.

    PubMed

    Liu, Zhan-Lin; Zhang, Daming; Wang, Xiao-Quan; Ma, Xiao-Fei; Wang, Xiao-Ru

    2003-01-01

    Patterns of intragenomic and interspecific variation of 5S rDNA in Pinus (Pinaceae) were studied by cloning and sequencing multiple 5S rDNA repeats from individual trees. Five pines, from both subgenera, Pinus and Strobus, were selected. The 5S rDNA repeat in pines has a conserved 120-base pair (bp) transcribed region and an intergenic spacer region of variable length (382-608 bp). The evolutionary rate in the spacer region is three- to sevenfold higher than in the genic region. We found substantial sequence divergence between the two subgenera. Intragenomic sequence heterogeneity was high for all species, and more than 86% of the clones within each individual were unique. The 5S gene tree revealed that different 5S repeats within individuals are polyphyletic, indicating that their ancestral divergence preceded the speciation events. The degrees of interspecific and intragenomic divergence among diploxylon pines are similar. The observed sequence patterns suggest that concerted evolution has been acting after the diversification of the two subgenera but very weak after the speciation of the four diploxylon pines. Sequence patterns in P. densata are consistent with hybrid origin. It had higher intragenomic diversity and maintained polymorphic copies of the parental types in addition to new and recombinant types unique to the hybrid.

  1. Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion.

    PubMed

    Xi, Ruibin; Hadjipanayis, Angela G; Luquette, Lovelace J; Kim, Tae-Min; Lee, Eunjung; Zhang, Jianhua; Johnson, Mark D; Muzny, Donna M; Wheeler, David A; Gibbs, Richard A; Kucherlapati, Raju; Park, Peter J

    2011-11-15

    DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer and confer susceptibility to a variety of human disorders. Array comparative genomic hybridization has been used widely to identify CNVs genome wide, but the next-generation sequencing technology provides an opportunity to characterize CNVs genome wide with unprecedented resolution. In this study, we developed an algorithm to detect CNVs from whole-genome sequencing data and applied it to a newly sequenced glioblastoma genome with a matched control. This read-depth algorithm, called BIC-seq, can accurately and efficiently identify CNVs via minimizing the Bayesian information criterion. Using BIC-seq, we identified hundreds of CNVs as small as 40 bp in the cancer genome sequenced at 10× coverage, whereas we could only detect large CNVs (> 15 kb) in the array comparative genomic hybridization profiles for the same genome. Eighty percent (14/16) of the small variants tested (110 bp to 14 kb) were experimentally validated by quantitative PCR, demonstrating high sensitivity and true positive rate of the algorithm. We also extended the algorithm to detect recurrent CNVs in multiple samples as well as deriving error bars for breakpoints using a Gibbs sampling approach. We propose this statistical approach as a principled yet practical and efficient method to estimate CNVs in whole-genome sequencing data.

  2. Applications of high-throughput DNA sequencing to benign hematology

    PubMed Central

    Gallagher, Patrick G.

    2013-01-01

    The development of novel technologies for high-throughput DNA sequencing is having a major impact on our ability to measure and define normal and pathologic variation in humans. This review discusses advances in DNA sequencing that have been applied to benign hematologic disorders, including those affecting the red blood cell, the neutrophil, and other white blood cell lineages. Relevant examples of how these approaches have been used for disease diagnosis, gene discovery, and studying complex traits are provided. High-throughput DNA sequencing technology holds significant promise for impacting clinical care. This includes development of improved disease detection and diagnosis, better understanding of disease progression and stratification of risk of disease-specific complications, and development of improved therapeutic strategies, particularly patient-specific pharmacogenomics-based therapy, with monitoring of therapy by genomic biomarkers. PMID:24021670

  3. Wide variation in microsatellite sequences within each Pfcrt mutant haplotype.

    PubMed

    Vinayak, Sumiti; Mittra, Pooja; Sharma, Yagya D

    2006-05-01

    Flanking microsatellites for each of the Pfcrt mutant haplotype of Plasmodium falciparum remain conserved among geographical isolates. We describe here heterogeneity in the intragenic microsatellites among each of the Pfcrt haplotype. There were fourteen different alleles of AT repeats of intron 2 and eight alleles of TA repeats of intron 4 of the pfcrt gene among Indian isolates. This resulted in 33 different two-locus (intron 2 plus intron 4) microsatellite genotypes among 224 isolates. There were 15 different two-locus microsatellite genotypes within the South American Pfcrt haplotype (S72V73M74N75T76S220) and 11 genotypes in the southeast Asian haplotype (C72V73I74E75T76S220) in these isolates. Indian isolates with Pfcrt haplotype C72V73I74E75T76S220 shared one of its two-locus microsatellite genotype with southeast Asian P. falciparum parasite lines from Thailand (K1) and Indochina (Dd2 and W2). Conversely, Indian isolates containing S72V73M74N75T76S220 Pfcrt haplotype did not share any of their two-locus microsatellite genotype with South American parasite line 7G8 from Brazil. Significantly, large number of newer two-locus microsatellite genotypes were detected in a 2-year time period (P<0.05). Microsatellite variation was more prominent in the areas of high malaria transmission. It is concluded that the genetic recombination in the intragenic microsatellites continues in the parasite population even after microsatellites flanking the pfcrt gene had already been fixed. Presence of various Pfcrt haplotypes and a variety of intragenic microsatellites indicates that there is a wide spectrum of chloroquine resistant parasite population in India. This information should be useful for malaria control programs of the country.

  4. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity.

    PubMed

    Petrovski, Slavé; Gussow, Ayal B; Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H; Allen, Andrew S; Goldstein, David B

    2015-09-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, nc

  5. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

    PubMed Central

    Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

    2015-01-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance

  6. Contribution of SNRNP200 sequence variations to retinitis pigmentosa

    PubMed Central

    Zhang, X; Lai, T YY; Chiang, S WY; Tam, P OS; Liu, D TL; Chan, C KM; Pang, C P; Zhao, C; Chen, L J

    2013-01-01

    Purpose Mutations in the SNRNP200 gene have been reported to cause autosomal dominant retinitis pigmentosa (adRP). In this study, we evaluate the mutation profile of SNRNP200 in a cohort of southern Chinese RP patients. Methods Twenty adRP patients from 11 families and 165 index patients with non-syndromic RP with mixed inheritance patterns were screened for mutations in the mutation hotspots of SNRNP200. These included exons 12–16, 22–32, and 38–45, which covered the two helicase ATP-binding domains in DEAD-box and two sec-63 domains. The targeted regions were amplified by polymerase chain reaction and analyzed by direct DNA sequencing, followed by in silico analyses. Results Totally 26 variants were identified, 18 of which were novel. Three non-synonymous variants (p.C502R, p.R1779H and p.I698V) were found exclusively in patients. Two of them, p.C502R and p.R1779H, were each identified in one simplex RP patient, whereas p.I698V occurred in one patient with unknown inheritance pattern. All three residues are highly conserved in SNRNP200 orthologs. Nevertheless, only p.C502R and p.R1779H were predicted to affect protein function by in silico analyses, suggesting these two variants are likely to be disease-causing mutations. Notably, all mutations previously identified in other study populations were not detected in this study. Conclusions Our results reveal a distinct mutation profile of the SNRNP200 gene in a southern Chinese cohort of RP patients. The identification of two novel candidate mutations in two respective patients affirmed that SNRNP200 contributes to a proportion of overall RP. PMID:23887765

  7. CNV-TV: A robust method to discover copy number variation from short sequencing reads

    PubMed Central

    2013-01-01

    Background Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. Results A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. Conclusion The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods. PMID:23634703

  8. From sequence to function: Insights from natural variation in budding yeasts☆

    PubMed Central

    Nieduszynski, Conrad A.; Liti, Gianni

    2011-01-01

    Background Natural variation offers a powerful approach for assigning function to DNA sequence—a pressing challenge in the age of high throughput sequencing technologies. Scope of Review Here we review comparative genomic approaches that are bridging the sequence–function and genotype–phenotype gaps. Reverse genomic approaches aim to analyse sequence to assign function, whereas forward genomic approaches start from a phenotype and aim to identify the underlying genotype responsible. Major Conclusions Comparative genomic approaches, pioneered in budding yeasts, have resulted in dramatic improvements in our understanding of the function of both genes and regulatory sequences. Analogous studies in other systems, including humans, demonstrate the ubiquity of comparative genomic approaches. Recently, forward genomic approaches, exploiting natural variation within yeast populations, have started to offer powerful insights into how genotype influences phenotype and even the ability to predict phenotypes. General Significance Comparative genomic experiments are defining the fundamental rules that govern complex traits in natural populations from yeast to humans. This article is part of a Special Issue entitled Systems Biology of Microorganisms. PMID:21320572

  9. Gorilla MHC class I gene and sequence variation in a comparative context.

    PubMed

    Hans, Jörg B; Bergl, Richard A; Vigilant, Linda

    2017-05-01

    Comparisons of MHC gene content and diversity among closely related species can provide insights into the evolutionary mechanisms shaping immune system variation. After chimpanzees and bonobos, gorillas are humans' closest living relatives; but in contrast, relatively little is known about the structure and variation of gorilla MHC class I genes (Gogo). Here, we combined long-range amplifications and long-read sequencing technology to analyze full-length MHC class I genes in 35 gorillas. We obtained 50 full-length genomic sequences corresponding to 15 Gogo-A alleles, 4 Gogo-Oko alleles, 21 Gogo-B alleles, and 10 Gogo-C alleles including 19 novel coding region sequences. We identified two previously undetected MHC class I genes related to Gogo-A and Gogo-B, respectively, thereby illustrating the potential of this approach for efficient and highly accurate MHC genotyping. Consistent with their phylogenetic position within the hominid family, individual gorilla MHC haplotypes share characteristics with humans and chimpanzees as well as orangutans suggesting a complex history of the MHC class I genes in humans and the great apes. However, the overall MHC class I diversity appears to be low further supporting the hypothesis that gorillas might have experienced a reduction of their MHC repertoire.

  10. From phenotypes to causal sequences: using genome wide association studies to dissect the sequence basis for variation of plant development.

    PubMed

    Ogura, Takehiko; Busch, Wolfgang

    2015-02-01

    Tremendous natural variation of growth and development exists within species. Uncovering the molecular mechanisms that tune growth and development promises to shed light on a broad set of biological issues including genotype to phenotype relations, regulatory mechanisms of biological processes and evolutionary questions. Recent progress in sequencing and data processing capabilities has enabled Genome Wide Association Studies (GWASs) to identify DNA sequence polymorphisms that underlie the variation of biological traits. In the last years, GWASs have proven powerful in revealing the complex genetic bases of many phenotypes in various plant species. Here we highlight successful recent GWASs that uncovered mechanistic and sequence bases of trait variation related to plant growth and development and discuss important considerations for conducting successful GWASs.

  11. Recent research on the high-probability instructional sequence: A brief review.

    PubMed

    Lipschultz, Joshua; Wilder, David A

    2017-04-01

    The high-probability (high-p) instructional sequence consists of the delivery of a series of high-probability instructions immediately before delivery of a low-probability or target instruction. It is commonly used to increase compliance in a variety of populations. Recent research has described variations of the high-p instructional sequence and examined the conditions under which the sequence is most effective. This manuscript reviews the most recent research on the sequence and identifies directions for future research. Recommendations for practitioners regarding the use of the high-p instructional sequence are also provided. © 2017 Society for the Experimental Analysis of Behavior.

  12. Forward genetics by sequencing EMS variation-induced inbred lines

    USDA-ARS?s Scientific Manuscript database

    The dramatic increase in throughput of sequencing techniques enables gene cloning through pre-existing forward genetics approaches. We show that it also brings with it the potential to change the crossing designs and approach of forward genetics. To achieve this for eukaryotic organisms with complex...

  13. Variation in rapid sequence induction techniques: current practice in Wales.

    PubMed

    Koerber, J P; Roberts, G E W; Whitaker, R; Thorpe, C M

    2009-01-01

    A questionnaire survey examining rapid sequence induction techniques was sent to all anaesthetists in Wales. The questionnaire presented five common clinical scenarios: emergency appendicectomy; elective knee arthroscopy with a symptomatic hiatus hernia; elective knee arthroscopy with an asymptomatic hiatus hernia; elective Caesarean section; and emergency laparotomy for bowel obstruction. Completed surveys were received from 421 anaesthetists, a 68% response rate. Rapid sequence induction was chosen by 398/400 respondents (100%) for bowel obstruction, 392/399 (98%) for Caesarean section, 388/408 (95%) for appendicectomy, 328/395 (83%) for symptomatic hiatus hernia but only 98/399 (25%) for asymptomatic hiatus hernia (p < 0.001). Trainees were more likely to use a rapid sequence induction technique than consultants and staff grades for the appendicectomy (p = 0.025), symptomatic hiatus hernia (p = 0.004) and asymptomatic hiatus hernia (p = 0.001) scenarios and were also more likely to use a thiopental-suxamethonium combination for rapid sequence induction (p < 0.001).

  14. Sequence variation in the Tbx4 gene in marine mammals.

    PubMed

    Onbe, Kaori; Nishida, Shin; Sone, Emi; Kanda, Naohisa; Goto, Mutsuo; Pastene, Luis A; Tanabe, Shinsuke; Koike, Hiroko

    2007-05-01

    The amino-acid sequences of the T-domain region of the Tbx4 gene, which is required for hindlimb development, are 100% identical in humans and mice. Cetaceans have lost most of their hindlimb structure, although hindlimb buds are present in very early cetacean embryos. To examine whether the Tbx4 gene has the same function in cetaceans as in other mammals, we analyzed Tbx4 sequences from cetaceans, dugong, artiodactyls and marine carnivores. A total of 39 primers were designed using human and dog Tbx4 nucleotide sequences. Exons 3, 4, 5, 6, 7, and 8 of the Tbx4 genes from cetaceans, artiodactyls, and marine carnivores were sequenced. Non-synonymous substitution sites were detected in the T-domain regions from some cetacean species, but were not detected in those from artiodactyls, the dugong, or the carnivores. The C-terminal regions contained a number of non-synonymous substitutions. Although some indels were present, they were in groups of three nucleotides and therefore did not cause frame shifts. The dN/dS values for the T-domain and C-terminal regions of the cetacean and artiodactylous Tbx4 genes were much lower than 1, indicating that the Tbx4 gene maintains it function in cetaceans, although full expression leading to hindlimb development is suppressed.

  15. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing.

    PubMed

    Li, Kelvin; Shrivastava, Susmita; Brownley, Anushka; Katzel, Dan; Bera, Jayati; Nguyen, Anh Thu; Thovarai, Vishal; Halpin, Rebecca; Stockwell, Timothy B

    2012-11-06

    In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute's (JCVI) high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus sequence that encapsulates the allelic variation of the targeted

  16. Variation in Symbiodinium ITS2 Sequence Assemblages among Coral Colonies

    PubMed Central

    Stat, Michael; Bird, Christopher E.; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J.; Concepcion, Gregory T.; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J.; Gates, Ruth D.

    2011-01-01

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping. PMID:21246044

  17. Variation in Symbiodinium ITS2 sequence assemblages among coral colonies.

    PubMed

    Stat, Michael; Bird, Christopher E; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J; Concepcion, Gregory T; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J; Gates, Ruth D

    2011-01-05

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping.

  18. Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

    PubMed

    Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

    2015-03-15

    The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. Solar Luminosity on the Main Sequence, Standard Model and Variations

    NASA Astrophysics Data System (ADS)

    Ayukov, S. V.; Baturin, V. A.; Gorshkov, A. B.; Oreshina, A. V.

    2017-05-01

    Our Sun became Main Sequence star 4.6 Gyr ago according Standard Solar Model. At that time solar luminosity was 30% lower than current value. This conclusion is based on assumption that Sun is fueled by thermonuclear reactions. If Earth's albedo and emissivity in infrared are unchanged during Earth history, 2.3 Gyr ago oceans had to be frozen. This contradicts to geological data: there was liquid water 3.6-3.8 Gyr ago on Earth. This problem is known as Faint Young Sun Paradox. We analyze luminosity change in standard solar evolution theory. Increase of mean molecular weight in the central part of the Sun due to conversion of hydrogen to helium leads to gradual increase of luminosity with time on the Main Sequence. We also consider several exotic models: fully mixed Sun; drastic change of pp reaction rate; Sun consisting of hydrogen and helium only. Solar neutrino observations however exclude most non-standard solar models.

  20. Sequence variation of alcohol dehydrogenase (Adh) paralogs in cactophilic Drosophila.

    PubMed Central

    Matzkin, Luciano M; Eanes, Walter F

    2003-01-01

    This study focuses on the population genetics of alcohol dehydrogenase (Adh) in cactophilic Drosophila. Drosophila mojavensis and D. arizonae utilize cactus hosts, and each host contains a characteristic mixture of alcohol compounds. In these Drosophila species there are two functional Adh loci, an adult form (Adh-2) and a larval and ovarian form (Adh-1). Overall, the greater level of variation segregating in D. arizonae than in D. mojavensis suggests a larger population size for D. arizonae. There are markedly different patterns of variation between the paralogs across both species. A 16-bp intron haplotype segregates in both species at Adh-2, apparently the product of an ancient gene conversion event between the paralogs, which suggests that there is selection for the maintenance of the intron structure possibly for the maintenance of pre-mRNA structure. We observe a pattern of variation consistent with adaptive protein evolution in the D. mojavensis lineage at Adh-1, suggesting that the cactus host shift that occurred in the divergence of D. mojavensis from D. arizonae had an effect on the evolution of the larval expressed paralog. Contrary to previous work we estimate a recent time for both the divergence of D. mojavensis and D. arizonae (2.4 +/- 0.7 MY) and the age of the gene duplication (3.95 +/- 0.45 MY). PMID:12586706

  1. Hairy matters: MtDNA quantity and sequence variation along and among human head hairs.

    PubMed

    Desmyter, Stijn; Bodner, Martin; Huber, Gabriela; Dognaux, Sophie; Berger, Cordula; Noël, Fabrice; Parson, Walther

    2016-11-01

    Hairs from the same donor have been found to differ in mtDNA sequence within and among themselves and from other tissues, which impacts interpretation of results obtained in a forensic setting. However, little is known on the magnitude of this phenomenon and published data on systematic studies are scarce. We addressed this issue by generating mtDNA control region (CR) profiles of >450 hair fragments from 21 donors by Sanger-type sequencing (STS). To mirror forensic scenarios, we compared hair haplotypes from the same donors to each other, to the corresponding buccal swab reference haplotypes and analyzed several fragments of individual hairs. We also investigated the effects of hair color, donor sex and age, mtDNA haplogroup and chemical treatment on mtDNA quantity, amplification success and variation. We observed a wide range of individual CR sequence variation. The reference haplotype was the only or most common (≥75%) hair haplotype for most donors. However, in two individuals, the reference haplotype was only found in about a third of the investigated hairs, mainly due to differences at highly variable positions. Similarly, most hairs revealed the reference haplotype along their entire length, however, about a fifth of the hairs contained up to 71% of segments with deviant haplotypes, independent of the longitudinal position. Variation affected numerous positions, typically restricted to the individual hair and in most cases heteroplasmic, but also fixed (i.e. homoplasmic) substitutions were observed. While existing forensic mtDNA interpretation guidelines were found still sufficient for all comparisons to reference haplotypes, some comparisons between hairs from the same donor could yield false exclusions when those guidelines are strictly followed. This study pinpoints the special care required when interpreting mtDNA results from hair in forensic casework. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. Sequence variation in two genes determines the efficacy of transmission of citrus tristeza virus by the brown citrus aphid

    USDA-ARS?s Scientific Manuscript database

    Vector transmission is an important part of the viral infection cycle, yet for many viruses little is known about this process, or how viral sequence variation affects transmission efficacy. Here we examined the effect of substituting genes from the highly transmissible FS577 isolate of citrus trist...

  3. Genome sequence variation among isolates of monkey B virus (Macacine alphaherpesvirus 1) from captive macaques.

    PubMed

    Eberle, R; Maxwell, L K; Nicholson, S; Black, D; Jones-Engel, L

    2017-08-01

    Complete genome sequences of 19 strains of monkey B virus (Macacine alphaherpesvirus 1; BV) isolated from several macaque species were determined. A low level of sequence variation was present among BV isolates from rhesus macaques. Most variation among BV strains isolated from rhesus macaques was located in regions of repetitive or quasi-repetitive sequence. Variation in coding sequences (polypeptides and miRNAs) was minor compared to regions of non-coding sequences. Non-coding sequences in the long and short repeat regions of the genome did however exhibit islands of conserved sequence. Oral and genital isolates from a single monkey were identical in sequence and varied only in the number of iterations of repeat units in several areas of repeats. Sequence variation between BV isolates from different macaque species (different BV genotypes) was much greater and was spread across the entire genome, confirming the existence of different genotypes of BV in different macaque species. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato

    PubMed Central

    Uitdewilligen, Jan G. A. M. L.; Wolters, Anne-Marie A.; D’hoop, Bjorn B.; Borm, Theo J. A.; Visser, Richard G. F.; van Eck, Herman J.

    2013-01-01

    Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS) of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a genome wide scale, across the cultivated gene pool. Therefore, we enriched cultivar-specific DNA sequencing libraries using an in-solution hybridisation method (SureSelect). This complexity reduction allowed to confine our study to 807 target genes distributed across the genomes of 83 tetraploid cultivars and one reference (DM 1–3 511). Indexed sequencing libraries were paired-end sequenced in 7 pools of 12 samples using Illumina HiSeq2000. After filtering and processing the raw sequence data, 12.4 Gigabases of high-quality sequence data was obtained, which mapped to 2.1 Mb of the potato reference genome, with a median average read depth of 63× per cultivar. We detected 129,156 sequence variants and genotyped the allele copy number of each variant for every cultivar. In this cultivar panel a variant density of 1 SNP/24 bp in exons and 1 SNP/15 bp in introns was obtained. The average minor allele frequency (MAF) of a variant was 0.14. Potato germplasm displayed a large number of relatively rare variants and/or haplotypes, with 61% of the variants having a MAF below 0.05. A very high average nucleotide diversity (π = 0.0107) was observed. Nucleotide diversity varied among potato chromosomes. Several genes under selection were identified. Genotyping-by-sequencing results, with allele copy number estimates, were validated with a KASP genotyping assay. This validation showed that read depths of ∼60–80× can be used as a lower boundary for reliable assessment of allele copy number of sequence variants in autotetraploids. Genotypic data were associated with traits, and

  5. Screening of nucleotide variations in genomic sequences encoding charged protein regions in the human genome.

    PubMed

    Belmabrouk, Sabrine; Kharrat, Najla; Abdelhedi, Rania; Ben Ayed, Amine; Benmarzoug, Riadh; Rebai, Ahmed

    2017-08-08

    Studying genetic variation distribution in proteins containing charged regions, called charge clusters (CCs), is of great interest to unravel their functional role. Charge clusters are 20 to 75 residue segments with high net positive charge, high net negative charge, or high total charge relative to the overall charge composition of the protein. We previously developed a bioinformatics tool (FCCP) to detect charge clusters in proteomes and scanned the human proteome for the occurrence of CCs. In this paper we investigate the genetic variations in the human proteins harbouring CCs. We studied the coding regions of 317 positively charged clusters and 1020 negatively charged ones previously detected in human proteins. Results revealed that coding parts of CCs are richer in sequence variants than their corresponding genes, full mRNAs, and exonic + intronic sequences and that these variants are predominately rare (Minor allele frequency < 0.005). Furthermore, variants occurring in the coding parts of positively charged regions of proteins are more often pathogenic than those occurring in negatively charged ones. Classification of variants according to their types showed that substitution is the major type followed by Indels (Insertions-deletions). Concerning substitutions, it was found that within clusters of both charges, the charged amino acids were the greatest loser groups whereas polar residues were the greatest gainers. Our findings highlight the prominent features of the human charged regions from the DNA up to the protein sequence which might provide potential clues to improve the current understanding of those charged regions and their implication in the emergence of diseases.

  6. HLA DNA Sequence Variation among Human Populations: Molecular Signatures of Demographic and Selective Events

    PubMed Central

    Buhler, Stéphane; Sanchez-Mazas, Alicia

    2011-01-01

    Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC) genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies. Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model). However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used to explore the

  7. Draft genome sequence of an elite Dura palm and whole-genome patterns of DNA variation in oil palm

    PubMed Central

    Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua

    2016-01-01

    Oil palm is the world’s leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits. PMID:27426468

  8. Draft genome sequence of an elite Dura palm and whole-genome patterns of DNA variation in oil palm.

    PubMed

    Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua

    2016-12-01

    Oil palm is the world's leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits.

  9. The Chlamydophila abortus genome sequence reveals an array of variable proteins that contribute to interspecies variation

    PubMed Central

    Thomson, Nicholas R.; Yeats, Corin; Bell, Kenneth; Holden, Matthew T.G.; Bentley, Stephen D.; Livingstone, Morag; Cerdeño-Tárraga, Ana M.; Harris, Barbara; Doggett, Jon; Ormond, Doug; Mungall, Karen; Clarke, Kay; Feltwell, Theresa; Hance, Zahra; Sanders, Mandy; Quail, Michael A.; Price, Claire; Barrell, Bart G.; Parkhill, Julian; Longbottom, David

    2005-01-01

    The obligate intracellular bacterial pathogen Chlamydophila abortus strain S26/3 (formerly the abortion subtype of Chlamydia psittaci) is an important cause of late gestation abortions in ruminants and pigs. Furthermore, although relatively rare, zoonotic infection can result in acute illness and miscarriage in pregnant women. The complete genome sequence was determined and shows a high level of conservation in both sequence and overall gene content in comparison to other Chlamydiaceae. The 1,144,377-bp genome contains 961 predicted coding sequences, 842 of which are conserved with those of Chlamydophila caviae and Chlamydophila pneumoniae. Within this conserved Cp. abortus core genome we have identified the major regions of variation and have focused our analysis on these loci, several of which were found to encode highly variable protein families, such as TMH/Inc and Pmp families, which are strong candidates for the source of diversity in host tropism and disease causation in this group of organisms. Significantly, Cp. abortus lacks any toxin genes, and also lacks genes involved in tryptophan metabolism and nucleotide salvaging (guaB is present as a pseudogene), suggesting that the genetic basis of niche adaptation of this species is distinct from those previously proposed for other chlamydial species. PMID:15837807

  10. Analysis of Sequence Variation and Risk Association of Human Papillomavirus 52 Variants Circulating in Korea

    PubMed Central

    Choi, Youn Jin; Ki, Eun Young; Zhang, Chuqing; Ho, Wendy C. S.; Lee, Sung-Jong; Jeong, Min Jin

    2016-01-01

    Introduction Human papillomavirus (HPV) 52 is a carcinogenic, high-risk genotype frequently detected in cervical cancer cases from East Asia, including Korea. Materials and Methods Sequences of HPV52 detected in 91 cervical samples collected from women attending Seoul St. Mary’s Hospital were analyzed. HPV52 genomic sequences were obtained by polymerase chain reaction (PCR)-based sequencing and analyzed using Seq-Scape software, and phylogenetic trees were constructed using MEGA6 software. Results Of the 91 cervical samples, 40 were normal, 22 were low-grade lesions, 21 were high-grade lesions and 7 were squamous cell carcinomas. Four HPV52 variant lineages (A, B, C and D) were identified. Lineage B was the most frequently detected lineage, followed by lineage C. By analyzing the two most frequently detected lineages (B and C), we found that distinct variations existed in each lineage. We also found that a lineage B-specific mutation K93R (A379G) was associated with an increased risk of cervical neoplasia. Conclusions To our knowledge, we are the first to reveal the predominance of the HPV52 lineages, B and C, in Korea. We also found these lineages harbored distinct genetic alterations that may affect oncogenicity. Our findings increase our understanding on the heterogeneity of HPV52 variants, and may be useful for the development of new diagnostic assays and therapeutic vaccines. PMID:27977741

  11. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-01-01

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants' isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  12. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-12-31

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  13. Engineering the Dynamic Properties of Protein Networks through Sequence Variation

    PubMed Central

    2016-01-01

    The dynamic behavior of macromolecular networks dominates the mechanical properties of soft materials and influences biological processes at multiple length scales. In hydrogels prepared from self-assembling artificial proteins, stress relaxation and energy dissipation arise from the transient character of physical network junctions. Here we show that subtle changes in sequence can be used to program the relaxation behavior of end-linked networks of engineered coiled-coil proteins. Single-site substitutions in the coiled-coil domains caused shifts in relaxation time over 5 orders of magnitude as demonstrated by dynamic oscillatory shear rheometry and stress relaxation measurements. Networks with multiple relaxation time scales were also engineered. This work demonstrates how time-dependent mechanical responses of macromolecular materials can be encoded in genetic information. PMID:27924309

  14. Identification, variation and transcription of pneumococcal repeat sequences

    PubMed Central

    2011-01-01

    Background Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics. Results Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR. Conclusions BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/. PMID:21333003

  15. Targeted deep sequencing of flowering regulators in Brassica napus reveals extensive copy number variation

    PubMed Central

    Schiessl, Sarah; Huettel, Bruno; Kuehn, Diana; Reinhardt, Richard; Snowdon, Rod J.

    2017-01-01

    Gene copy number variation (CNV) is increasingly implicated in control of complex trait networks, particularly in polyploid plants like rapeseed (Brassica napus L.) with an evolutionary history of genome restructuring. Here we performed sequence capture to assay nucleotide variation and CNV in a panel of central flowering time regulatory genes across a species-wide diversity set of 280 B. napus accessions. The genes were chosen based on prior knowledge from Arabidopsis thaliana and related Brassica species. Target enrichment was performed using the Agilent SureSelect technology, followed by Illumina sequencing. A bait (probe) pool was developed based on results of a preliminary experiment with representatives from different B. napus morphotypes. A very high mean target coverage of ~670x allowed reliable calling of CNV, single nucleotide polymorphisms (SNPs) and insertion-deletion (InDel) polymorphisms. No accession exhibited no CNV, and at least one homolog of every gene we investigated showed CNV in some accessions. Some CNV appear more often in specific morphotypes, indicating a role in diversification. PMID:28291231

  16. Mitochondrial DNA hypervariable region-1 sequence variation and phylogeny of the concolor gibbons, Nomascus.

    PubMed

    Monda, Keri; Simmons, Rachel E; Kressirer, Philipp; Su, Bing; Woodruff, David S

    2007-11-01

    The still little known concolor gibbons are represented by 14 taxa (five species, nine subspecies) distributed parapatrically in China, Myanmar, Vietnam, Laos and Cambodia. To set the stage for a phylogeographic study of the genus we examined DNA sequences from the highly variable mitochondrial hypervariable region-1 (HVR-1 or control region) in 51 animals, mostly of unknown geographic provenance. We developed gibbon-specific primers to amplify mtDNA noninvasively and obtained >477 bp sequences from 38 gibbons in North American and European zoos and >159 bp sequences from ten Chinese museum skins. In hindsight, we believe these animals represent eight of the nine nominal subspecies and four of the five nominal species. Bayesian, maximum likelihood and maximum parsimony haplotype network analyses gave concordant results and show Nomascus to be monophyletic. Significant intraspecific variation within N. leucogenys (17 haplotypes) is comparable with that reported earlier in Hylobates lar and less than half the known interspecific pairwise distances in gibbons. Sequence data support the recognition of five species (concolor, leucogenys, nasutus, gabriellae and probably hainanus) and suggest that nasutus is the oldest and leucogenys, the youngest taxon. In contrast, the subspecies N. c. furvogaster, N. c. jingdongensis, and N. leucogenys siki, are not recognizable at this otherwise informative genetic locus. These results show that HVR-1 sequence is variable enough to define evolutionarily significant units in Nomascus and, if coupled with multilocus microsatellite or SNP genotyping, more than adequate to characterize their phylogeographic history. There is an urgent need to obtain DNA from gibbons of known geographic provenance before they are extirpated to facilitate the conservation genetic management of the surviving animals.

  17. Using XHMM software to detect copy number variation in whole-exome sequencing data

    PubMed Central

    Fromer, Menachem; Purcell, Shaun M.

    2014-01-01

    Copy number variation (CNV) has emerged as an important genetic component in human diseases, which are increasingly being studied for large numbers of samples by sequencing the coding regions of the genome, i.e., exome sequencing. Nonetheless, detecting this variation from such targeted sequencing data is a difficult task of sorting out signal from noise, for which we have recently developed a set of statistical and computational tools called XHMM. In this paper, we give detailed instructions on how to run XHMM and how to use the resulting CNV calls in biological analyses. PMID:24763994

  18. Cattle demographic history modelled from autosomal sequence variation

    PubMed Central

    Murray, Caitriona; Huerta-Sanchez, Emilia; Casey, Fergal; Bradley, Daniel G.

    2010-01-01

    The phylogeography of cattle genetic variants has been extensively described and has informed the history of domestication. However, there remains a dearth of demographic models inferred from such data. Here, we describe sequence diversity at 37 000 bp sampled from 17 genes in cattle from Africa, Europe and India. Clearly distinct population histories are suggested between Bos indicus and Bos taurus, with the former displaying higher diversity statistics. We compare the unfolded site frequency spectra in each to those simulated using a diffusion approximation method and build a best-fitting model of past demography. This implies an earlier, possibly glaciation-induced population bottleneck in B. taurus ancestry with a later, possibly domestication-associated demographic constriction in B. indicus. Strikingly, the modelled indicine history also requires a majority secondary admixture from the South Asian aurochs, indicating a complex, more diffuse domestication process. This perhaps involved multiple domestications and/or introgression from wild oxen to domestic herds; the latter is plausible from archaeological evidence of contemporaneous wild and domestic remains across different regions of South Asia. PMID:20643743

  19. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  20. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  1. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

    PubMed

    Sheynkman, Gloria M; Shortreed, Michael R; Cesnik, Anthony J; Smith, Lloyd M

    2016-06-12

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  2. The complete nucleotide sequence of the Crossostoma lacustre mitochondrial genome: conservation and variations among vertebrates.

    PubMed Central

    Tzeng, C S; Hui, C F; Shen, S C; Huang, P C

    1992-01-01

    The complete mitochondrial (mt) genome of Crossostoma lacustre, a freshwater loach from mountain stream of Taiwan, has been cloned and sequenced. This fish mt genome, consisting of 16558 base-pairs, encodes genes for 13 proteins, two rRNAs, and 22 tRNAs, in addition to a regulatory sequence for replication and transcription (D-loop), is similar to those of the other vertebrates in both the order and orientation of these genes. The protein-coding and ribosomal RNA genes are highly homologous both in size and composition, to their counterparts in mammals, birds, amphibians, and invertebrates, and using essentially the same set of codons, including both the initiation and termination signals, and the tRNAs. Differences do exist, however, in the lengths and sequences of the D-loop regions, and in space between genes, which account for the variations in total lengths of the genomes. Our observations provide evidence for the first time for the conservation of genetic information in the fish mitochondrial genome, especially among the vertebrates. PMID:1408800

  3. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer

    PubMed Central

    Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.

    2010-01-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640

  4. Role of MYOC and OPTN sequence variations in Spanish patients with Primary Open-Angle Glaucoma

    PubMed Central

    López-Martínez, Francisco; López-Garrido, María-Pilar; Sánchez-Sánchez, Francisco; Campos-Mollo, Ezequiel; Coca-Prados, Miguel

    2007-01-01

    Purpose To retrospectively investigate the contribution of myocilin (MYOC) and optineurin (OPTN) sequence variations to adult-onset ocular hypertension (OHT) and primary open-angle glaucoma (POAG) in Spanish patients. Methods The promoter region and the three exons of MYOC were analyzed by direct PCR DNA sequencing in 40 OHT and 110 POAG unrelated patients. We used 98 subjects in whom OHT or glaucoma had been ruled out as controls. We also screened the complete coding region of the OPTN gene (exons 4-16) in all subjects by single-stranded conformational polymorphisms (SSCPs). Results We identified six common single nucleotide polymorphisms (SNPs) in the promoter region of MYOC (-1000C>G, -387C>T, -306G>A, -224T>C, -126T>C and -83G>A) and a polymorphic GT microsatellite (-339(GT)11-19). In addition, we detected four novel, rare DNA polymorphisms. None of these DNA sequence variations were associated with either OHT or POAG. We also found three (2.7%) POAG patients with MYOC pathogenic mutations. Two of these pathogenic mutations (Gln368Stop and Ala445Val) were previously described whereas the third (Tyr479His) was novel. Transient expression of the novel mutation in 293T cells supported its pathogenicity. Only two OPTN polymorphisms, which are not associated with the disease, were detected. Conclusions Overall, our data show that in Spain a minority of adult-onset high-pressure POAG patients carry heterozygous disease-causing mutations in the MYOC gene and that OPTN is not involved in either OHT or POAG. PMID:17615537

  5. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  6. Targeted Exome Sequencing Outcome Variations of Colorectal Tumors within and across Two Sequencing Platforms

    PubMed Central

    Ashktorab, Hassan; Azimi, Hamed; Nickerson, Michael L.; Bass, Sara; Varma, Sudhir; Brim, Hassan

    2016-01-01

    Background and Aim Next generation sequencing (NGS) has quickly the tool of choice for genome and exome data generation. The multitude of sequencing platforms as well as the variabilities within each platform need to be assessed. In this paper we used two platforms (ION TORRENT AND ILLUMINA) to assess single nucleotides variants in colorectal cancer (CRC) specimens. Methods CRC specimens (n = 13) collected from 6 CRC (cancer and matched normal) patients were used to establish the mutational profile using ION TORRENT AND ILLUMINA sequencing platforms. We analyzed a set of samples from Formalin Fixed Paraffin Embedded and FF (FF) samples on both platforms to assess the effect of sample nature (FFPE vs. FF) on sequencing outcome and to evaluate the similarity/differences of SNVs across the two platforms. In addition, duplicates of FF samples were sequenced on each platform to assess variability within platform. Results The comparison of FF replicates to each other gave a concordance of 77% (± 15.3%) in Ion Torrent and 70% (± 3.7%) in Illumina. FFPE vs. FF replicates gave a concordance of 40% (± 32%) in Ion Torrent and 49% (± 19%) in Illumina. For the cross platform concordance were FFPE compared to FF (Average of 75% (± 9.8%) for FFPE samples and 67% (± 32%) for FF and 70% (± 26.8%) overall average). Conclusion Our data show a significant variability within and across platforms. Also the number of detected variants depend on the nature of the specimen; FF vs. FFPE. Validation of NGS discovered mutations is a must to rule-out false positive mutants. This validation might either be performed through a second NGS platform or through Sanger sequencing. PMID:27547838

  7. Sequence variation and methylation of the flax 5S RNA genes.

    PubMed Central

    Goldsbrough, P B; Ellis, T H; Lomonossoff, G P

    1982-01-01

    The complete sequence of the flax 5S DNA repeat is presented. Length heterogeneity is the consequence of the presence or absence of a single direct repeat and the majority of single base changes are transition mutations. No sequence variation has been found in the coding sequence. The extent of methylation of cytosines has been measured at one location in the gene and one in the spacer. The relationship between the observed sequence heterogeneity and the level of methylation is discussed in the context of the operation of a correction mechanism. Images PMID:6290983

  8. Copy number variation of individual cattle genomes using next-generation sequencing

    PubMed Central

    Bickhart, Derek M.; Hou, Yali; Schroeder, Steven G.; Alkan, Can; Cardone, Maria Francesca; Matukumalli, Lakshmi K.; Song, Jiuzhou; Schnabel, Robert D.; Ventura, Mario; Taylor, Jeremy F.; Garcia, Jose Fernando; Van Tassell, Curtis P.; Sonstegard, Tad S.; Eichler, Evan E.; Liu, George E.

    2012-01-01

    Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising ∼55.6-Mbp sequence—476 of which (∼38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (∼52%, χ2 test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome. PMID:22300768

  9. Variation of partial transferrin sequences and phylogenetic relationships among hares (Lepus capensis, Lagomorpha) from Tunisia.

    PubMed

    Awadi, Asma; Suchentrunk, Franz; Makni, Mohamed; Ben Slimen, Hichem

    2016-10-01

    North African hares are currently included in cape hares, Lepus capensis sensu lato, a taxon that may be considered a superspecies or a complex of closely related species. The existing molecular data, however, are not unequivocal, with mtDNA control region sequences suggesting a separate species status and nuclear loci (allozymes, microsatellites) revealing conspecificity of L. capensis and L. europaeus. Here, we study sequence variation in the intron 6 (468 bp) of the transferrin nuclear gene, of 105 hares with different coat colour from different regions in Tunisia with respect to genetic diversity and differentiation, as well as their phylogenetic status. Forty-six haplotypes (alleles) were revealed and compared phylogenetically to all available TF haplotypes of various Lepus species retrieved from GenBank. Maximum Likelihood, neighbor joining and median joining network analyses concordantly grouped all currently obtained haplotypes together with haplotypes belonging to six different Chinese hare species and the African scrub hare L. saxatilis. Moreover, two Tunisian haploypes were shared with L. capensis, L timidus, L. sinensis, L. yarkandensis, and L. hainanus from China. These results indicated the evolutionary complexity of the genus Lepus with the mixing of nuclear gene haplotypes resulting from introgressive hybridization or/and shared ancestral polymorphism. We report the presence of shared ancestral polymorphism between North African and Chinese hares. This has not been detected earlier in the mtDNA sequences of the same individuals. Genetic diversity of the TF sequences from the Tunisian populations was relatively high compared to other hare populations. However, genetic differentiation and gene flow analyses (AMOVA, FST, Nm) indicated little divergence with the absence of geographically meaningful phylogroups and lack of clustering with coat colour types. These results confirm the presence of a single hare species in Tunisia, but a sound inference on

  10. Detecting Alu insertions from high-throughput sequencing data

    PubMed Central

    David, Matei; Mustafa, Harun; Brudno, Michael

    2013-01-01

    High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions. PMID:23921633

  11. Variation in the prion protein sequence in Dutch goat breeds.

    PubMed

    Windig, J J; Hoving, R A H; Priem, J; Bossers, A; van Keulen, L J M; Langeveld, J P M

    2016-10-01

    Scrapie is a neurodegenerative disease occurring in goats and sheep. Several haplotypes of the prion protein increase resistance to scrapie infection and may be used in selective breeding to help eradicate scrapie. In this study, frequencies of the allelic variants of the PrP gene are determined for six goat breeds in the Netherlands. Overall frequencies in Dutch goats were determined from 768 brain tissue samples in 2005, 766 in 2008 and 300 in 2012, derived from random sampling for the national scrapie surveillance without knowledge of the breed. Breed specific frequencies were determined in the winter 2013/2014 by sampling 300 breeding animals from the main breeders of the different breeds. Detailed analysis of the scrapie-resistant K222 haplotype was carried out in 2014 for 220 Dutch Toggenburger goats and in 2015 for 942 goats from the Saanen derived White Goat breed. Nine haplotypes were identified in the Dutch breeds. Frequencies for non-wild type haplotypes were generally low. Exception was the K222 haplotype in the Dutch Toggenburger (29%) and the S146 haplotype in the Nubian and Boer breeds (respectively 7 and 31%). The frequency of the K222 haplotype in the Toggenburger was higher than for any other breed reported in literature, while for the White Goat breed it was with 3.1% similar to frequencies of other Saanen or Saanen derived breeds. Further evidence was found for the existence of two M142 haplotypes, M142 /S240 and M142 /P240 . Breeds vary in haplotype frequencies but frequencies of resistant genotypes are generally low and consequently selective breeding for scrapie resistance can only be slow but will benefit from animals identified in this study. The unexpectedly high frequency of the K222 haplotype in the Dutch Toggenburger underlines the need for conservation of rare breeds in order to conserve genetic diversity rare or absent in other breeds. © 2016 Blackwell Verlag GmbH.

  12. Tough Coating Proteins: Subtle Sequence Variation Modulates Cohesion

    PubMed Central

    Das, Saurabh; Miller, Dusty R.; Kaufman, Yair; Martinez Rodriguez, Nadine R.; Pallaoro, Alessia; Harrington, Matthew J.; Gylys, Maryte; Israelachvili, Jacob N.; Waite, J. Herbert

    2015-01-01

    Mussel foot protein-1 (mfp-1) is an essential constituent of the protective cuticle covering all exposed portions of the byssus (plaque and the thread) that marine mussels use to attach to intertidal rocks. The reversible complexation of Fe3+ by the 3,4-dihydroxyphenylalanine (Dopa) side chains in mfp-1 in Mytilus californianus cuticle is responsible for its high extensibility (120%) as well as its stiffness (2 GPa) due to the formation of sacrificial bonds that help to dissipate energy and avoid accumulation of stresses in the material. We have investigated the interactions between Fe3+ and mfp-1 from two mussel species, M. californianus (Mc) and M. edulis (Me), using both surface sensitive and solution phase techniques. Our results show that although mfp-1 homologues from both species bind Fe3+, mfp-1 (Mc) contains Dopa with two distinct Fe3+-binding tendencies and prefers to form intramolecular complexes with Fe3+. In contrast, mfp-1 (Me) is better adapted to intermolecular Fe3+ binding by Dopa. Addition of Fe3+ did not significantly increase the cohesion energy between the mfp-1 (Mc) films at pH 5.5. However, iron appears to stabilize the cohesive bridging of mfp-1 (Mc) films at the physiologically relevant pH of 7.5, where most other mfps lose their ability to adhere reversibly. Understanding the molecular mechanisms underpinning the capacity of M. californianus cuticle to withstand twice the strain of M. edulis cuticle is important for engineering of tunable strain tolerant composite coatings for biomedical applications. PMID:25692318

  13. Tough coating proteins: subtle sequence variation modulates cohesion.

    PubMed

    Das, Saurabh; Miller, Dusty R; Kaufman, Yair; Martinez Rodriguez, Nadine R; Pallaoro, Alessia; Harrington, Matthew J; Gylys, Maryte; Israelachvili, Jacob N; Waite, J Herbert

    2015-03-09

    Mussel foot protein-1 (mfp-1) is an essential constituent of the protective cuticle covering all exposed portions of the byssus (plaque and the thread) that marine mussels use to attach to intertidal rocks. The reversible complexation of Fe(3+) by the 3,4-dihydroxyphenylalanine (Dopa) side chains in mfp-1 in Mytilus californianus cuticle is responsible for its high extensibility (120%) as well as its stiffness (2 GPa) due to the formation of sacrificial bonds that help to dissipate energy and avoid accumulation of stresses in the material. We have investigated the interactions between Fe(3+) and mfp-1 from two mussel species, M. californianus (Mc) and M. edulis (Me), using both surface sensitive and solution phase techniques. Our results show that although mfp-1 homologues from both species bind Fe(3+), mfp-1 (Mc) contains Dopa with two distinct Fe(3+)-binding tendencies and prefers to form intramolecular complexes with Fe(3+). In contrast, mfp-1 (Me) is better adapted to intermolecular Fe(3+) binding by Dopa. Addition of Fe(3+) did not significantly increase the cohesion energy between the mfp-1 (Mc) films at pH 5.5. However, iron appears to stabilize the cohesive bridging of mfp-1 (Mc) films at the physiologically relevant pH of 7.5, where most other mfps lose their ability to adhere reversibly. Understanding the molecular mechanisms underpinning the capacity of M. californianus cuticle to withstand twice the strain of M. edulis cuticle is important for engineering of tunable strain tolerant composite coatings for biomedical applications.

  14. Mitochondrial DNA sequence variation in Iranian native dogs.

    PubMed

    Amiri Ghanatsaman, Zeinab; Adeola, Adeniyi C; Asadi Fozi, Masood; Ma, Ya-Ping; Peng, Min-Sheng; Wang, Guo-Dong; Esmailizadeh, Ali; Zhang, Ya-Ping

    2017-03-17

    The dog mtDNA diversity picture from wide geographical sampling but from a small number of individuals per region or breed, displayed little geographical correlation and high degree of haplotype sharing between very distant breeds. For a clear picture, we extensively surveyed Iranian native dogs (n = 305) in comparison with published European (n = 443) and Southwest Asian (n = 195) dogs. Twelve haplotypes related to haplogroups A, B and C were shared by Iranian, European, Southwest Asian and East Asian dogs. In Iran, haplotype and nucleotide diversities were highest in east, southeast and northwest populations while western population had the least. Sarabi and Saluki dog populations can be assigned into haplogroups A, B, C and D; Qahderijani and Kurdi to haplogroups A, B and C, Torkaman to haplogroups A, B and D while Sangsari and Fendo into haplogroups A and B, respectively. Evaluation of population differentiation using pairwise FST generally revealed no clear population structure in most Iranian dog populations. The genetic signal of a recent demographic expansion was detected in East and Southeast populations. Further, in accordance with previous studies on dog-wolf hybridization for haplogroup d2 origin, the highest number of d2 haplotypes in Iranian dog as compared to other areas of Mediterranean basin suggests Iran as the probable center of its origin. Historical evidence showed that Silk Road linked Iran to countries in South East Asia and other parts of the world, which might have probably influenced effective gene flow within Iran and these regions. The medium nucleotide diversity observed in Iranian dog calls for utilization of appropriate management techniques in increasing effective population size.

  15. High-Throughput Variation Detection and Genotyping Using Microarrays

    PubMed Central

    Cutler, David J.; Zwick, Michael E.; Carrasquillo, Minerva M.; Yohn, Christopher T.; Tobin, Katherine P.; Kashuk, Carl; Mathews, Debra J.; Shah, Nila A.; Eichler, Evan E.; Warrington, Janet A.; Chakravarti, Aravinda

    2001-01-01

    The genetic dissection of complex traits may ultimately require a large number of SNPs to be genotyped in multiple individuals who exhibit phenotypic variation in a trait of interest. Microarray technology can enable rapid genotyping of variation specific to study samples. To facilitate their use, we have developed an automated statistical method (ABACUS) to analyze microarray hybridization data and applied this method to Affymetrix Variation Detection Arrays (VDAs). ABACUS provides a quality score to individual genotypes, allowing investigators to focus their attention on sites that give accurate information. We have applied ABACUS to an experiment encompassing 32 autosomal and eight X-linked genomic regions, each consisting of ∼50 kb of unique sequence spanning a 100-kb region, in 40 humans. At sufficiently high-quality scores, we are able to read ∼80% of all sites. To assess the accuracy of SNP detection, 108 of 108 SNPs have been experimentally confirmed; an additional 371 SNPs have been confirmed electronically. To access the accuracy of diploid genotypes at segregating autosomal sites, we confirmed 1515 of 1515 homozygous calls, and 420 of 423 (99.29%) heterozygotes. In replicate experiments, consisting of independent amplification of identical samples followed by hybridization to distinct microarrays of the same design, genotyping is highly repeatable. In an autosomal replicate experiment, 813,295 of 813,295 genotypes are called identically (including 351 heterozygotes); at an X-linked locus in males (haploid), 841,236 of 841,236 sites are called identically. PMID:11691856

  16. Homologous recombination drives both sequence diversity and gene content variation in Neisseria meningitidis.

    PubMed

    Kong, Ying; Ma, Jennifer H; Warren, Keisha; Tsang, Raymond S W; Low, Donald E; Jamieson, Frances B; Alexander, David C; Hao, Weilong

    2013-01-01

    The study of genetic and phenotypic variation is fundamental for understanding the dynamics of bacterial genome evolution and untangling the evolution and epidemiology of bacterial pathogens. Neisseria meningitidis (Nm) is among the most intriguing bacterial pathogens in genomic studies due to its dynamic population structure and complex forms of pathogenicity. Extensive genomic variation within identical clonal complexes (CCs) in Nm has been recently reported and suggested to be the result of homologous recombination, but the extent to which recombination contributes to genomic variation within identical CCs has remained unclear. In this study, we sequenced two Nm strains of identical serogroup (C) and multi-locus sequence type (ST60), and conducted a systematic analysis with an additional 34 Nm genomes. Our results revealed that all gene content variation between the two ST60 genomes was introduced by homologous recombination at the conserved flanking genes, and 94.25% or more of sequence divergence was caused by homologous recombination. Recombination was found in genes associated with virulence factors, antigenic outer membrane proteins, and vaccine targets, suggesting an important role of homologous recombination in rapidly altering the pathogenicity and antigenicity of Nm. Recombination was also evident in genes of the restriction and modification systems, which may undermine barriers to DNA exchange. In conclusion, homologous recombination can drive both gene content variation and sequence divergence in Nm. These findings shed new light on the understanding of the rapid pathoadaptive evolution of Nm and other recombinogenic bacterial pathogens.

  17. Sequence variation of ribosomal internal transcribed spacers (ITS) in commercially important Phytoseiidae mites.

    PubMed

    Navajas, M; Lagnel, J; Fauvel, G; de Moraes, G

    1999-11-01

    Preliminary work is needed to assess the usefulness of different markers at different taxonomic scales when a new group is analyzed, such as the commercially important Phytoseiidae mites. We investigate here the level of sequence variation of the nuclear ribosomal spacers ITS 1 and 2 and the 5.8S gene in six species of Phytoseiidae: Neoseiulus culifornicus, N. fallacis, Euseius concordis, Metaseiulus occidentalis, Typhlodromus pyri and Phytoseiulus persimilis. As expected, the 5.8S gene (148 base pairs) is markedly conserved and displays little variation in between genera comparisons. ITS1 and ITS2 show contrasting patterns: while the ITS2 is short (80-89 bp) and shows little variation, the ITS1 is longer (303-404 bp) and is very variable in sequence. This fact compromises reliable nucleotide homologies when comparing the genera. The comparison of ITS1 sequence similarity at the species level might be useful for species identification, however, the value of ITS in taxonomic studies does not extend to the level of the family. The intraspecific variations of ITS were investigated in three species: N. californicus, N. fallacis and E. concordis. The first species has identical ITS1 sequences and the last two display low polymorphism (2 nucleotide substitutions). The ITS2 and 5.8S sequences were identical in all three subspecies comparisons.

  18. Characterization of genetic sequence variation of 58 STR loci in four major population groups.

    PubMed

    Novroski, Nicole M M; King, Jonathan L; Churchill, Jennifer D; Seah, Lay Hong; Budowle, Bruce

    2016-11-01

    Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  19. Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

    PubMed

    Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

    2014-06-01

    It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to

  20. Sequence variation in two genes determines the efficacy of transmission of citrus tristeza virus by the brown citrus aphid.

    PubMed

    Harper, S J; Killiny, N; Tatineni, S; Gowda, S; Cowell, S J; Shilts, T; Dawson, W O

    2016-12-01

    Vector transmission is an important part of the viral infection cycle, yet for many viruses little is known about this process, or how viral sequence variation affects transmission efficacy. Here we examined the effect of substituting genes from the highly transmissible FS577 isolate of citrus tristeza virus (CTV) in to the poorly transmissible T36-based infectious clone. We found that introducing p65 or p61 sequences from FS577 significantly increased transmission efficacy. Interestingly, replacement of both genes produced a greater increase than either gene alone, suggesting that CTV transmission requires the concerted action of co-evolved p65 and p61 proteins.

  1. Using Next-Generation Sequencing for DNA Barcoding: Capturing Allelic Variation in ITS2

    PubMed Central

    Batovska, Jana; Cogan, Noel O. I.; Lynch, Stacey E.; Blacket, Mark J.

    2016-01-01

    Internal Transcribed Spacer 2 (ITS2) is a popular DNA barcoding marker; however, in some animal species it is hypervariable and therefore difficult to sequence with traditional methods. With next-generation sequencing (NGS) it is possible to sequence all gene variants despite the presence of single nucleotide polymorphisms (SNPs), insertions/deletions (indels), homopolymeric regions, and microsatellites. Our aim was to compare the performance of Sanger sequencing and NGS amplicon sequencing in characterizing ITS2 in 26 mosquito species represented by 88 samples. The suitability of ITS2 as a DNA barcoding marker for mosquitoes, and its allelic diversity in individuals and species, was also assessed. Compared to Sanger sequencing, NGS was able to characterize the ITS2 region to a greater extent, with resolution within and between individuals and species that was previously not possible. A total of 382 unique sequences (alleles) were generated from the 88 mosquito specimens, demonstrating the diversity present that has been overlooked by traditional sequencing methods. Multiple indels and microsatellites were present in the ITS2 alleles, which were often specific to species or genera, causing variation in sequence length. As a barcoding marker, ITS2 was able to separate all of the species, apart from members of the Culex pipiens complex, providing the same resolution as the commonly used Cytochrome Oxidase I (COI). The ability to cost-effectively sequence hypervariable markers makes NGS an invaluable tool with many applications in the DNA barcoding field, and provides insights into the limitations of previous studies and techniques. PMID:27799340

  2. Analysis of the sequence variations in the Mhc DRB1-like gene of the endangered Humboldt penguin (Spheniscus humboldti).

    PubMed

    Kikkawa, Eri F; Tsuda, Tomi T; Naruse, Taeko K; Sumiyama, Daisuke; Fukuda, Michio; Kurita, Masanori; Murata, Koichi; Wilson, Rory P; LeMaho, Yvon; Tsuda, Michio; Kulski, Jerzy K; Inoko, Hidetoshi

    2005-04-01

    The Major Histocompatibility Complex (Mhc) genomic region of many vertebrates is known to contain at least one highly polymorphic class II gene that is homologous in sequence to one or other of the human Mhc DRB1 class II genes. The diversity of the avian Mhc class II gene sequences have been extensively studied in chickens, quails, and some songbirds, but have been largely ignored in the oceanic birds, including the flightless penguins. We have previously reported that several penguin species have a high degree of polymorphism on exon 2 of the Mhc class II DRB1-like gene. In this study, we present for the first time the complete nucleotide sequences of exon 2, intron 2, and exon 3 of the DRB1-like gene of 20 Humboldt penguins, a species that is presently vulnerable to the dangers of extinction. The Humboldt DRB1-like nucleotide and amino acid sequences reveal at least eight unique alleles. Phylogenetic analysis of all the available avian DRB-like sequences showed that, of five penguin species and nine other bird species, the sequences of the Humboldt penguins grouped most closely to the Little penguin and the mallard, respectively. The present analysis confirms that the sequence variations of the Mhc class II gene, DRB1, are useful for discriminating among individuals within the same penguin population as well those within different penguin population groups and species.

  3. Repetitive sequence variation and dynamics in the ribosomal DNA array of Saccharomyces cerevisiae as revealed by whole-genome resequencing

    PubMed Central

    James, Stephen A.; O'Kelly, Michael J.T.; Carter, David M.; Davey, Robert P.; van Oudenaarden, Alexander; Roberts, Ian N.

    2009-01-01

    Ribosomal DNA (rDNA) plays a key role in ribosome biogenesis, encoding genes for the structural RNA components of this important cellular organelle. These genes are vital for efficient functioning of the cellular protein synthesis machinery and as such are highly conserved and normally present in high copy numbers. In the baker's yeast Saccharomyces cerevisiae, there are more than 100 rDNA repeats located at a single locus on chromosome XII. Stability and sequence homogeneity of the rDNA array is essential for function, and this is achieved primarily by the mechanism of gene conversion. Detecting variation within these arrays is extremely problematic due to their large size and repetitive structure. In an attempt to address this, we have analyzed over 35 Mbp of rDNA sequence obtained from whole-genome shotgun sequencing (WGSS) of 34 strains of S. cerevisiae. Contrary to expectation, we find significant rDNA sequence variation exists within individual genomes. Many of the detected polymorphisms are not fully resolved. For this type of sequence variation, we introduce the term partial single nucleotide polymorphism, or pSNP. Comparative analysis of the complete data set reveals that different S. cerevisiae genomes possess different patterns of rDNA polymorphism, with much of the variation located within the rapidly evolving nontranscribed intergenic spacer (IGS) region. Furthermore, we find that strains known to have either structured or mosaic/hybrid genomes can be distinguished from one another based on rDNA pSNP number, indicating that pSNP dynamics may provide a reliable new measure of genome origin and stability. PMID:19141593

  4. Molecular indicators for palaeoenvironmental change in a Messinian evaporitic sequence (Vena del Gesso, Italy). II: High-resolution variations in abundances and 13C contents of free and sulphur-bound carbon skeletons in a single marl bed

    NASA Technical Reports Server (NTRS)

    Kenig, F.; Damste, J. S.; Frewin, N. L.; Hayes, J. M.; De Leeuw, J. W.

    1995-01-01

    The extractable organic matter of 10 immature samples from a marl bed of one evaporitic cycle of the Vena del Gesso sediments (Gessoso-solfifera Fm., Messinian, Italy) was analyzed quantitatively for free hydrocarbons and organic sulphur compounds. Nickel boride was used as a desulphurizing agent to recover sulphur-bound lipids from the polar and asphaltene fractions. Carbon isotopic compositions (delta vs PDB) of free hydrocarbons and of S-bound hydrocarbons were also measured. Relationships between these carbon skeletons, precursor biolipids, and the organisms producing them could then be examined. Concentrations of S-bound lipids and free hydrocarbons and their delta values were plotted vs depth in the marl bed and the profiles were interpreted in terms of variations in source organisms, 13 C contents of the carbon source, and environmentally induced changes in isotopic fractionation. The overall range of delta values measured was 24.7%, from -11.6% for a component derived from green sulphur bacteria (Chlorobiaceae) to -36.3% for a lipid derived from purple sulphur bacteria (Chromatiaceae). Deconvolution of mixtures of components deriving from multiple sources (green and purple sulphur bacteria, coccolithophorids, microalgae and higher plants) was sometimes possible because both quantitative and isotopic data were available and because either the free or S-bound pool sometimes appeared to contain material from a single source. Several free n-alkanes and S-bound lipids appeared to be specific products of upper-water-column primary producers (i.e. algae and cyanobacteria). Others derived from anaerobic photoautotrophs and from heterotrophic protozoa (ciliates), which apparently fed partly on Chlorobiaceae. Four groups of n-alkanes produced by algae or cyanobacteria were also recognized based on systematic variations of abundance and isotopic composition with depth. For hydrocarbons probably derived from microalgae, isotopic variations are well correlated with

  5. Molecular indicators for palaeoenvironmental change in a Messinian evaporitic sequence (Vena del Gesso, Italy). II: High-resolution variations in abundances and 13C contents of free and sulphur-bound carbon skeletons in a single marl bed

    NASA Technical Reports Server (NTRS)

    Kenig, F.; Damste, J. S.; Frewin, N. L.; Hayes, J. M.; De Leeuw, J. W.

    1995-01-01

    The extractable organic matter of 10 immature samples from a marl bed of one evaporitic cycle of the Vena del Gesso sediments (Gessoso-solfifera Fm., Messinian, Italy) was analyzed quantitatively for free hydrocarbons and organic sulphur compounds. Nickel boride was used as a desulphurizing agent to recover sulphur-bound lipids from the polar and asphaltene fractions. Carbon isotopic compositions (delta vs PDB) of free hydrocarbons and of S-bound hydrocarbons were also measured. Relationships between these carbon skeletons, precursor biolipids, and the organisms producing them could then be examined. Concentrations of S-bound lipids and free hydrocarbons and their delta values were plotted vs depth in the marl bed and the profiles were interpreted in terms of variations in source organisms, 13 C contents of the carbon source, and environmentally induced changes in isotopic fractionation. The overall range of delta values measured was 24.7%, from -11.6% for a component derived from green sulphur bacteria (Chlorobiaceae) to -36.3% for a lipid derived from purple sulphur bacteria (Chromatiaceae). Deconvolution of mixtures of components deriving from multiple sources (green and purple sulphur bacteria, coccolithophorids, microalgae and higher plants) was sometimes possible because both quantitative and isotopic data were available and because either the free or S-bound pool sometimes appeared to contain material from a single source. Several free n-alkanes and S-bound lipids appeared to be specific products of upper-water-column primary producers (i.e. algae and cyanobacteria). Others derived from anaerobic photoautotrophs and from heterotrophic protozoa (ciliates), which apparently fed partly on Chlorobiaceae. Four groups of n-alkanes produced by algae or cyanobacteria were also recognized based on systematic variations of abundance and isotopic composition with depth. For hydrocarbons probably derived from microalgae, isotopic variations are well correlated with

  6. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  7. DNA-protein recognition and sequence-dependent variations of DNA conformational properties

    NASA Astrophysics Data System (ADS)

    Vologodskii, Alexander

    2015-03-01

    Parameters of B-DNA, the major form of the double helix, depend on its sequence. This dependence can contribute to the recognition of specific DNA sequences by proteins. Here we try to analyze this contribution quantitatively. In the first approach to this goal we used experimental data on the sequence dependence of DNA bending rigidity and its helical repeat. The solution data on these parameters of B-DNA were derived from the experiments on cyclization of short DNA fragments with specially designed sequences. The data allowed calculating the sequence variations of DNA bending energy, as well as the variations of the energy of torsional deformation of the double helix associated with a protein binding. The results show that DNA conformational parameters can have very limited influence on the sequence specificity of protein binding. In the second approach we analyzed the experimental data on the binding affinity of the nucleosome core with DNA fragments of different sequences. The conclusions derived in these two approaches are in a good agreement with one another.

  8. Genetic variation among the Mapuche Indians from the Patagonian region of Argentina: mitochondrial DNA sequence variation and allele frequencies of several nuclear genes.

    PubMed

    Ginther, C; Corach, D; Penacino, G A; Rey, J A; Carnese, F R; Hutz, M H; Anderson, A; Just, J; Salzano, F M; King, M C

    1993-01-01

    DNA samples from 60 Mapuche Indians, representing 39 maternal lineages, were genetically characterized for (1) nucleotide sequences of the mtDNA control region; (2) presence or absence of a nine base duplication in mtDNA region V; (3) HLA loci DRB1 and DQA1; (4) variation at three nuclear genes with short tandem repeats; and (5) variation at the polymorphic marker D2S44. The genetic profile of the Mapuche population was compared to other Amerinds and to worldwide populations. Two highly polymorphic portions of the mtDNA control region, comprising 650 nucleotides, were amplified by the polymerase chain reaction (PCR) and directly sequenced. The 39 maternal lineages were defined by two or three generation families identified by the Mapuches. These 39 lineages included 19 different mtDNA sequences that could be grouped into four classes. The same classes of sequences appear in other Amerinds from North, Central, and South American populations separated by thousands of miles, suggesting that the origin of the mtDNA patterns predates the migration to the Americas. The mtDNA sequence similarity between Amerind populations suggests that the migration throughout the Americas occurred rapidly relative to the mtDNA mutation rate. HLA DRB1 alleles 1602 and 1402 were frequent among the Mapuches. These alleles also occur at high frequency among other Amerinds in North and South America, but not among Spanish, Chinese or African-American populations. The high frequency of these alleles throughout the Americas, and their specificity to the Americas, supports the hypothesis that Mapuches and other Amerind groups are closely related.(ABSTRACT TRUNCATED AT 250 WORDS)

  9. A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.

    PubMed

    Wallberg, Andreas; Han, Fan; Wellhagen, Gustaf; Dahle, Bjørn; Kawata, Masakado; Haddad, Nizar; Simões, Zilá Luz Paulino; Allsopp, Mike H; Kandemir, Irfan; De la Rúa, Pilar; Pirk, Christian W; Webster, Matthew T

    2014-10-01

    The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system- and sperm motility-related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees.

  10. Whole-genome sequencing reveals the diversity of cattle copy number variations and multicopy genes

    USDA-ARS?s Scientific Manuscript database

    Structural and functional impacts of copy number variations (CNVs) on livestock genomes are not yet well understood. We identified 1853 CNV regions using population-scale sequencing data generated from 75 cattle representing 8 breeds (Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, Romagnol...

  11. Copy number variation of individual cattle genomes using next-generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...

  12. Copy number variation of individual cattle genomes using next-generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...

  13. Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation

    PubMed Central

    Zelewska, Marta A.; Pulijala, Madhuri; Spencer-Smith, Russell; Mahmood, Hiba-Tun-Noor A.; Norman, Billie; Churchward, Colin P.; Calder, Alan

    2016-01-01

    There are many types of repeated DNA sequences in the genomes of the species of the genus Neisseria, from homopolymeric tracts to tandem repeats of hundreds of bases. Some of these have roles in the phase-variable expression of genes. When a repeat mediates phase variation, reversible switching between tract lengths occurs, which in the species of the genus Neisseria most often causes the gene to switch between on and off states through frame shifting of the open reading frame. Changes in repeat tract lengths may also influence the strength of transcription from a promoter. For phenotypes that can be readily observed, such as expression of the surface-expressed Opa proteins or pili, verification that repeats are mediating phase variation is relatively straightforward. For other genes, particularly those where the function has not been identified, gathering evidence of repeat tract changes can be more difficult. Here we present analysis of the repetitive sequences that could mediate phase variation in the Neisseria gonorrhoeae strain NCCP11945 genome sequence and compare these results with other gonococcal genome sequences. Evidence is presented for an updated phase-variable gene repertoire in this species, including a class of phase variation that causes amino acid changes at the C-terminus of the protein, not previously described in N. gonorrhoeae. PMID:28348872

  14. Sequence variation in the androgen receptor gene is not a common determinant of male sexual orientation

    SciTech Connect

    Macke, J.P.; Nathans, J.; King, V.L. ); Hu, N.; Hu, S.; Hamer, D.; Bailey, M. ); Brown, T. )

    1993-10-01

    To test the hypothesis that DNA sequence variation in the androgen receptor gene plays a causal role in the development of male sexual orientation, the authors have (1) measured the degree of concordance of androgen receptor alleles in 36 pairs of homosexual brothers, (2) compared the lengths of polyglutamine and polyglycine tracts in the amino-terminal domain of the androgen receptor in a sample of 197 homosexual males and 213 unselected subjects, and (3) screened the entire androgen receptor coding region for sequence variation by PCR and denaturing gradient-gel electrophoresis (DGGE) and/or single-strand conformation polymorphism analysis in 20 homosexual males with homosexual or bisexual brothers and one homosexual male with no homosexual brothers, and screened the amino-terminal domain of the receptor for sequence variation in an additional 44 homosexual males, 37 of whom had one or more first- or second-degree male relatives who were either homosexual or bisexual. These analyses show that (1) homosexual brothers are as likely to be discordant as concordant for androgen receptor alleles; (2) there are no large-scale differences between the distributions of polyglycine or polyglutamine tract lengths in the homosexual and control groups; and (3) coding region sequence variation is not commonly found within the androgen receptor gene of homosexual men. The DGGE screen identified two rare amino acid substitutions, ser[sup 205] -to-arg and glu[sup 793]-to-asp, the biological significance of which is unknown. 32 refs., 2 figs., 2 tabs.

  15. Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus

    PubMed Central

    Weldatsadik, Rigbe G.; Wang, Jingwen; Puhakainen, Kai; Jiao, Hong; Jalava, Jari; Räisänen, Kati; Datta, Neeta; Skoog, Tiina; Vuopio, Jaana; Jokiranta, T. Sakari; Kere, Juha

    2017-01-01

    Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development. Pooled sequencing (Pool-seq) is a cost effective approach for population level genetic studies that require large numbers of samples such as various strains of a microbe. To test the use of Pool-seq in identifying variation, we pooled DNA of 100 Streptococcus pyogenes strains of different emm types in two pools, each containing 50 strains. We used four variant calling tools (Freebayes, UnifiedGenotyper, SNVer, and SAMtools) and one emm1 strain, SF370, as a reference genome. In total 63719 SNPs and 164 INDELs were identified in the two pools concordantly by at least two of the tools. Majority of the variants (93.4%) from six individually sequenced strains used in the pools could be identified from the two pools and 72.3% and 97.4% of the variants in the pools could be mined from the analysis of the 44 complete Str. pyogenes genomes and 3407 sequence runs deposited in the European Nucleotide Archive respectively. We conclude that DNA sequencing of pooled samples of large numbers of bacterial strains is a robust, rapid and cost-efficient way to discover sequence variation. PMID:28361960

  16. Multi-Sample Pooling and Illumina Genome Analyzer Sequencing Methods to Determine Gene Sequence Variation for Database Development

    PubMed Central

    Margraf, Rebecca L.; Durtschi, Jacob D.; Dames, Shale; Pattison, David C.; Stephens, Jack E.; Mao, Rong; Voelkerding, Karl V.

    2010-01-01

    Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13–16. Ten samples with 13 known unique variants (“singleton variants” within the pool) and seven common changes were amplified and then equimolar-pooled before sequencing on a single flow cell lane, generating 36 base reads. For comparison, a single “control” sample was run in a different lane. After alignment, a 24-base quality score-screening threshold and 3` read end trimming of three bases yielded low background error rates with a 27% decrease in aligned read coverage. Sequencing data were evaluated using an established variant detection method (percent variant reads), by the presented subtractive correction method, and with SNPSeeker software. In total, 41 variants (of which 23 were singleton variants) were detected in the 10 pool data, which included all Sanger-identified variants. The 23 singleton variants were detected near the expected 5% allele frequency (average 5.17%±0.90% variant reads), well above the highest background error (1.25%). Based on background error rates, read coverage, simulated 30, 40, and 50 sample pool data, expected singleton allele frequencies within pools, and variant detection methods; ≥30 samples (which demonstrated a minimum 1% variant reads for singletons) could be pooled to reliably detect singleton variants by GA sequencing. PMID:20808642

  17. Extensive sequence variation in the 3′ untranslated region of the KRAS gene in lung and ovarian cancer cases

    PubMed Central

    Kim, Minlee; Chen, Xiaowei; Chin, Lena J; Paranjape, Trupti; Speed, William C; Kidd, Kenneth K; Zhao, Hongyu; Weidhaas, Joanne B; Slack, Frank J

    2014-01-01

    While cancer is a serious health issue, there are very few genetic biomarkers that predict predisposition, prognosis, diagnosis, and treatment response. Recently, sequence variations that disrupt microRNA (miRNA)-mediated regulation of genes have been shown to be associated with many human diseases, including cancer. In an early example, a variant at one particular single nucleotide polymorphism (SNP) in a let-7 miRNA complementary site in the 3′ untranslated region (3′ UTR) of the KRAS gene was associated with risk and outcome of various cancers. The KRAS oncogene is an important regulator of cellular proliferation, and is frequently mutated in cancers. To discover additional sequence variants in the 3′ UTR of KRAS with the potential as genetic biomarkers, we resequenced the complete region of the 3′ UTR of KRAS in multiple non-small cell lung cancer and epithelial ovarian cancer cases either by Sanger sequencing or capture enrichment followed by high-throughput sequencing. Here we report a comprehensive list of sequence variations identified in cases, with some potentially dysregulating expression of KRAS by altering putative miRNA complementary sites. Notably, rs712, rs9266, and one novel variant may have a functional role in regulation of KRAS by disrupting complementary sites of various miRNAs, including let-7 and miR-181. PMID:24552817

  18. Analysis of exome sequencing data sets reveals structural variation in the coding region of ABO in individuals of African ancestry.

    PubMed

    Fox, Keolu; Johnsen, Jill M; Coe, Bradley P; Frazar, Chris D; Reiner, Alexander P; Eichler, Evan E; Nickerson, Deborah A

    2016-11-01

    ABO is a blood group system of high clinical significance due to the prevalence of ABO variation that can cause major, potentially life-threatening, transfusion reactions. Using multiple large-scale next-generation sequence data sets, we demonstrate the application of read-depth approaches to discover previously unsuspected structural variation (SV) in the ABO gene in individuals of African ancestry. Our analysis of SV in the ABO gene across 6432 exomes reveals a partial deletion in the ABO gene in 32 individuals of African ancestry that predicts a novel O allele. Our study demonstrates the power that analyses of large-scale sequencing data, particularly data sets containing underrepresented populations, can provide in identifying novel SVs. © 2016 AABB.

  19. Optimal assembly for high throughput shotgun sequencing

    PubMed Central

    2013-01-01

    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization. PMID:23902516

  20. Sequence variations of the locus-specific 5' untranslated regions of SLA class I genes and the development of a comprehensive genomic DNA-based high-resolution typing method for SLA-2.

    PubMed

    Choi, H; Le, M T; Lee, H; Choi, M-K; Cho, H-S; Nagasundarapandian, S; Kwon, O-J; Kim, J-H; Seo, K; Park, J-K; Lee, J-H; Ho, C-S; Park, C

    2015-10-01

    The genetic diversity of the major histocompatibility complex (MHC) class I molecules of pigs has not been well characterized. Therefore, the influence of MHC genetic diversity on the immune-related traits of pigs, including disease resistance and other MHC-dependent traits, is not well understood. Here, we attempted to develop an efficient method for systemic analysis of the polymorphisms in the epitope-binding region of swine leukocyte antigens (SLA) class I genes. We performed a comparative analysis of the last 92 bp of the 5' untranslated region (UTR) to the beginning of exon 4 of six SLA classical class I-related genes, SLA-1, -2, -3, -4, -5, and -9, from 36 different sequences. Based on this information, we developed a genomic polymerase chain reaction (PCR) and direct sequencing-based comprehensive typing method for SLA-2. We successfully typed SLA-2 from 400 pigs and 8 cell lines, consisting of 9 different pig breeds, and identified 49 SLA-2 alleles, including 31 previously reported alleles and 18 new alleles. We observed differences in the composition of SLA-2 alleles among different breeds. Our method can be used to study other SLA class I loci and to deepen our knowledge of MHC class I genes in pigs. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

    PubMed Central

    2012-01-01

    Background Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis. PMID:23020678

  2. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution.

    PubMed

    Purves, Joanne; Blades, Matthew; Arafat, Yasrab; Malik, Salman A; Bayliss, Christopher D; Morrissey, Julie A

    2012-09-28

    Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis.

  3. Phylogenetic and functional analysis of sequence variation of human papillomavirus type 31 E6 and E7 oncoproteins.

    PubMed

    Ferenczi, Annamária; Gyöngyösi, Eszter; Szalmás, Anita; László, Brigitta; Kónya, József; Veress, György

    2016-09-01

    High-risk human papillomaviruses (HPV) are the causative agents of cervical and other anogenital cancers as well as a subset of head and neck cancers. The E6 and E7 oncoproteins of HPV contribute to oncogenesis by associating with the tumour suppressor protein p53 and pRb, respectively. For HPV types 16 and 18, intratypic sequence variation was shown to have biological and clinical significance. The functional significance of sequence variation among HPV 31 variants was studied less intensively. HPV 31 variants belonging to different variant lineages were found to have differences in persistence and in the ability to cause high grade cervical intraepithelial neoplasia. In the present study, we started to explore the functional effects of natural sequence variation of HPV 31 E6 and E7 oncoproteins. The E6 variants were tested for their effects on p53 protein stability and transcriptional activity, while the E7 variants were tested for their effects on pRb protein level and also on the transcriptional activity of E2F transcription factors. HPV 31 E7 variants displayed uniform effects on pRb stability and also on the activity of E2F transcription factors. HPV 31 E6 variants had remarkable differences in the ability to inhibit the trans-activation function of p53 but not in the ability to induce the in vivo degradation of p53. Our results indicate that natural sequence variation of the HPV 31 E6 protein may be involved in the observed differences in the oncogenic potential between HPV 31 variants.

  4. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  5. DNA Sequence Analysis of SLC26A5, Encoding Prestin, in a Patient-Control Cohort: Identification of Fourteen Novel DNA Sequence Variations

    PubMed Central

    Minor, Jacob S.; Tang, Hsiao-Yuan; Pereira, Fred A.; Alford, Raye Lynn

    2009-01-01

    Background Prestin, encoded by the gene SLC26A5, is a transmembrane protein of the cochlear outer hair cell (OHC). Prestin is required for the somatic electromotile activity of OHCs, which is absent in OHCs and causes severe hearing impairment in mice lacking prestin. In humans, the role of sequence variations in SLC26A5 in hearing loss is less clear. Although prestin is expected to be required for functional human OHCs, the clinical significance of reported putative mutant alleles in humans is uncertain. Methodology/Principal Findings To explore the hypothesis that SLC26A5 may act as a modifier gene, affecting the severity of hearing loss caused by an independent etiology, a patient-control cohort was screened for DNA sequence variations in SLC26A5 using sequencing and allele specific methods. Patients in this study carried known pathogenic or controversial sequence variations in GJB2, encoding Connexin 26, or confirmed or suspected sequence variations in SLC26A5; controls included four ethnic populations. Twenty-three different DNA sequence variations in SLC26A5, 14 of which are novel, were observed: 4 novel sequence variations were found exclusively among patients; 7 novel sequence variations were found exclusively among controls; and, 12 sequence variations, 3 of which are novel, were found in both patients and controls. Twenty-one of the 23 DNA sequence variations were located in non-coding regions of SLC26A5. Two coding sequence variations, both novel, were observed only in patients and predict a silent change, p.S434S, and an amino acid substitution, p.I663V. In silico analysis of the p.I663V amino acid variation suggested this variant might be benign. Using Fisher's exact test, no statistically significant difference was observed between patients and controls in the frequency of the identified DNA sequence variations. Haplotype analysis using HaploView 4.0 software revealed the same predominant haplotype in patients and controls and derived haplotype blocks

  6. AFLP and DNA sequence variation in an Andean domesticate, pepino (Solanum muricatum, Solanaceae): implications for evolution and domestication.

    PubMed

    Blanca, José M; Prohens, Jaime; Anderson, Gregory J; Zuriaga, Elena; Cañizares, Joaquín; Nuez, Fernando

    2007-07-01

    The pepino (Solanum muricatum) is a vegetatively propagated, domesticated native of the Andes, where it grows with wild relatives. We used AFLPs and a 1-kb sequence of the 3-methylcrotonyl-CoA carboxylase gene to study variation of 27 accessions of S. muricatum and 35 collections of 10 species of wild relatives (Solanum section Basarthrum). A total of 298 AFLP fragments and 29 DNA sequence haplotypes were detected. Cluster and principal coordinate analyses and other genetic parameters estimated from both types of markers, show that S. muricatum is closely related to the species from one of the series (Caripensia) of section Basarthrum and that >90% of the variation of the cultigen is also represented in that series. Pepino is highly diverse, either because it is not monophyletic or it has been subjected to regular introgression with wild species, or both. Although a continuous distribution of the genetic variation occurred within the cultivated species, three genetic clusters were recognized. Cluster 1 is mostly centered in Ecuador, cluster 2 in Ecuador and Peru, and cluster 3 in Colombia and Ecuador. Cluster 3 also includes all modern cultivars studied. These results and other evidence suggest that northern Ecuador/southern Colombia is the main center of pepino diversity and the center of origin. The high genetic variation of this cultigen indicates that domestication does not always produce a genetic bottleneck.

  7. Rice pseudomolecule-anchored cross-species DNA sequence alignments indicate regional genomic variation in expressed sequence conservation

    PubMed Central

    Armstead, Ian; Huang, Lin; King, Julie; Ougham, Helen; Thomas, Howard; King, Ian

    2007-01-01

    Background Various methods have been developed to explore inter-genomic relationships among plant species. Here, we present a sequence similarity analysis based upon comparison of transcript-assembly and methylation-filtered databases from five plant species and physically anchored rice coding sequences. Results A comparison of the frequency of sequence alignments, determined by MegaBLAST, between rice coding sequences in TIGR pseudomolecules and annotations vs 4.0 and comprehensive transcript-assembly and methylation-filtered databases from Lolium perenne (ryegrass), Zea mays (maize), Hordeum vulgare (barley), Glycine max (soybean) and Arabidopsis thaliana (thale cress) was undertaken. Each rice pseudomolecule was divided into 10 segments, each containing 10% of the functionally annotated, expressed genes. This indicated a correlation between relative segment position in the rice genome and numbers of alignments with all the queried monocot and dicot plant databases. Colour-coded moving windows of 100 functionally annotated, expressed genes along each pseudomolecule were used to generate 'heat-maps'. These revealed consistent intra- and inter-pseudomolecule variation in the relative concentrations of significant alignments with the tested plant databases. Analysis of the annotations and derived putative expression patterns of rice genes from 'hot-spots' and 'cold-spots' within the heat maps indicated possible functional differences. A similar comparison relating to ancestral duplications of the rice genome indicated that duplications were often associated with 'hot-spots'. Conclusion Physical positions of expressed genes in the rice genome are correlated with the degree of conservation of similar sequences in the transcriptomes of other plant species. This relative conservation is associated with the distribution of different sized gene families and segmentally duplicated loci and may have functional and evolutionary implications. PMID:17708759

  8. Identification of staphylococcal species based on variations in protein sequences (mass spectrometry) and DNA sequence (sodA microarray).

    PubMed

    Kooken, Jennifer; Fox, Karen; Fox, Alvin; Altomare, Diego; Creek, Kim; Wunschel, David; Pajares-Merino, Sara; Martínez-Ballesteros, Ilargi; Garaizar, Javier; Oyarzabal, Omar; Samadpour, Mansour

    2014-02-01

    This report is among the first using sequence variation in newly discovered protein markers for staphylococcal (or indeed any other bacterial) speciation. Variation, at the DNA sequence level, in the sodA gene (commonly used for staphylococcal speciation) provided excellent correlation. Relatedness among strains was also assessed using protein profiling using microcapillary electrophoresis and pulsed field electrophoresis. A total of 64 strains were analyzed including reference strains representing the 11 staphylococcal species most commonly isolated from man (Staphylococcus aureus and 10 coagulase negative species [CoNS]). Matrix assisted time of flight ionization/ionization mass spectrometry (MALDI TOF MS) and liquid chromatography-electrospray ionization tandem mass spectrometry (LC ESI MS/MS) were used for peptide analysis of proteins isolated from gel bands. Comparison of experimental spectra of unknowns versus spectra of peptides derived from reference strains allowed bacterial identification after MALDI TOF MS analysis. After LC-MS/MS analysis of gel bands bacterial speciation was performed by comparing experimental spectra versus virtual spectra using the software X!Tandem. Finally LC-MS/MS was performed on whole proteomes and data analysis also employing X!tandem. Aconitate hydratase and oxoglutarate dehydrogenase served as marker proteins on focused analysis after gel separation. Alternatively on full proteomics analysis elongation factor Tu generally provided the highest confidence in staphylococcal speciation.

  9. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  10. Resolving postglacial phylogeography using high-throughput sequencing

    PubMed Central

    Emerson, Kevin J.; Merz, Clayton R.; Catchen, Julian M.; Hohenlohe, Paul A.; Cresko, William A.; Bradshaw, William E.; Holzapfel, Christina M.

    2010-01-01

    The distinction between model and nonmodel organisms is becoming increasingly blurred. High-throughput, second-generation sequencing approaches are being applied to organisms based on their interesting ecological, physiological, developmental, or evolutionary properties and not on the depth of genetic information available for them. Here, we illustrate this point using a low-cost, efficient technique to determine the fine-scale phylogenetic relationships among recently diverged populations in a species. This application of restriction site-associated DNA tags (RAD tags) reveals previously unresolved genetic structure and direction of evolution in the pitcher plant mosquito, Wyeomyia smithii, from a southern Appalachian Mountain refugium following recession of the Laurentide Ice Sheet at 22,000–19,000 B.P. The RAD tag method can be used to identify detailed patterns of phylogeography in any organism regardless of existing genomic data, and, more broadly, to identify incipient speciation and genome-wide variation in natural populations in general. PMID:20798348

  11. Sequence Variation in the ftsZ Gene of Bartonella henselae Isolates and Clinical Samples

    PubMed Central

    Ehrenborg, C.; Wesslén, L.; Jakobson, Å.; Friman, G.; Holmberg, M.

    2000-01-01

    In a search for methods for subtyping of Bartonella henselae in clinical samples, we amplified and sequenced a 701-bp region in the 3′ end of the ftsZ gene in 15 B. henselae isolates derived from cats and humans in the United States and Europe. The ftsZ sequence variants that were discovered were designated variants Bh ftsZ 1, 2, and 3 and were compared with 16S rRNA genotypes I and II of the same isolates. There was no ftsZ gene variation in the strains of 16S rRNA type I, all of which were Bh ftsZ 1. The type II strains constituted two groups, with nucleotide sequence variation in the ftsZ gene resulting in amino acid substitutions at three positions, one of which was shared by the two groups. One 16S rRNA type II isolate had an ftsZ gene sequence identical to those of the type I strains. Variants Bh ftsZ 1 and 2 were detected in tissue specimens from seven Swedish patients with diagnoses such as chronic multifocal osteomyelitis, cardiomyopathy, and lymphadenopathy. Patients with similar clinical entities displayed either Bh ftsZ variant. The etiological role of B. henselae in these patients was supported by positive Bartonella antibody titers and/or amplification and sequencing of a part of the B. henselae gltA gene. B. henselae ftsZ gene sequence variation may be useful in providing knowledge about the epidemiology of various B. henselae strains in clinical samples, especially when isolation attempts have failed. This report also describes manifestations of atypical Bartonella infections in Sweden. PMID:10655367

  12. Traditional karyotyping vs copy number variation sequencing for detection of chromosomal abnormalities associated with spontaneous miscarriage.

    PubMed

    Liu, S; Song, L; Cram, D S; Xiong, L; Wang, K; Wu, R; Liu, J; Deng, K; Jia, B; Zhong, M; Yang, F

    2015-10-01

    To compare the performance of traditional G-banding karyotyping with that of copy number variation sequencing (CNV-Seq) for detection of chromosomal abnormalities associated with miscarriage. Products of conception (POC) were collected from spontaneous miscarriages. Chromosomal abnormalities were detected using high-resolution G-banding karyotyping and CNV sequencing. Quantitative fluorescent polymerase chain reaction analysis of maternal and POC DNA for short tandem repeat (STR) markers was used to both monitor maternal cell contamination and confirm the chromosomal status and sex of the miscarriage tissue. A total of 64 samples of POC, comprising 16 with an abnormal and 48 with a normal karyotype, were selected and coded for analysis by CNV-Seq. CNV-Seq results were concordant for 14 (87.5%) of the 16 gross chromosomal abnormalities identified by karyotyping, including 11 autosomal trisomies and three sex chromosomal aneuploidies (45,X). Of the two discordant results, a 69,XXX polyploidy was missed by CNV-Seq, although supporting STR marker analysis confirmed the triploidy. In contrast, CNV-Seq identified a sample with 45,X karyotype as a 45,X/46,XY mosaic. In the remaining 48 samples of POC with a normal karyotype, CNV-Seq detected a 2.58-Mb 22q deletion associated with DiGeorge syndrome and nine different smaller CNVs of no apparent clinical significance. CNV-Seq used in parallel with STR profiling is a reliable and accurate alternative to karyotyping for identifying chromosome copy number abnormalities associated with spontaneous miscarriage. Copyright © 2015 ISUOG. Published by John Wiley & Sons Ltd.

  13. Sequence variation in the Mc1r gene for a group of polymorphic snakes.

    PubMed

    Cox, Christian L; Rabosky, Alison R Davis; Chippindale, Paul T

    2013-01-25

    Studying the genetic factors underlying phenotypic traits can provide insight into dynamics of selection and molecular basis of adaptation, but this goal can be difficult for non-model organisms without extensive genomic resources. However, sequencing candidate genes for the trait of interest can facilitate the study of evolutionary genetics in natural populations. We sequenced the melanocortin-1 receptor (Mc1r) to study the genetic basis of color polymorphism in a group of snake species with variable black banding, the genera Sonora, Chilomeniscus, and Chionactis. Mc1r is an important gene in the melanin synthesis pathway and is associated with ecologically important variation in color pattern in birds, mammals, and other squamate reptiles. We found that Mc1r nucleotide sequence was variable and that within our focal Sonora species, there are both fixed and heterozygous nucleotide substitutions that result in an amino acid change and selection analyses indicated that Mc1r sequence was likely under purifying selection. However, we did not detect any statistical association with the presence or absence of black bands. Our results agree with other studies that have found no role for sequence variation in Mc1r and highlight the importance of comparative data for studying the phenotypic associations of candidate genes. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    NASA Astrophysics Data System (ADS)

    Hofmann, Hansjörg; Sakti, Sakriani; Hori, Chiori; Kashioka, Hideki; Nakamura, Satoshi; Minker, Wolfgang

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  15. FEATnotator: A tool for integrated annotation of sequence features and variation, facilitating interpretation in genomics experiments.

    PubMed

    Podicheti, Ram; Mockaitis, Keithanne

    2015-06-01

    As approaches are sought for more efficient and democratized uses of non-model and expanded model genomics references, ease of integration of genomic feature datasets is especially desirable in multidisciplinary research communities. Valuable conclusions are often missed or slowed when researchers refer experimental results to a single reference sequence that lacks integrated pan-genomic and multi-experiment data in accessible formats. Association of genomic positional information, such as results from an expansive variety of next-generation sequencing experiments, with annotated reference features such as genes or predicted protein binding sites, provides the context essential for conclusions and ongoing research. When the experimental system includes polymorphic genomic inputs, rapid calculation of gene structural and protein translational effects of sequence variation from the reference can be invaluable. Here we present FEATnotator, a lightweight, fast and easy to use open source software program that integrates and reports overlap and proximity in genomic information from any user-defined datasets including those from next generation sequencing applications. We illustrate use of the tool by summarizing whole genome sequence variation of a widely used natural isolate of Arabidopsis thaliana in the context of gene models of the reference accession. Previous discovery of a protein coding deletion influencing root development is replicated rapidly. Appropriate even in investigations of a single gene or genic regions such as QTL, comprehensive reports provided by FEATnotator better prepare researchers for interpretation of their experimental results. The tool is available for download at http://featnotator.sourceforge.net.

  16. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly

    PubMed Central

    Lam, Ernest T; Hastie, Alex; Lin, Chin; Ehrlich, Dean; Das, Somes K; Austin, Michael D; Deshpande, Paru; Cao, Han; Nagarajan, Niranjan; Xiao, Ming; Kwok, Pui-Yan

    2013-01-01

    We describe genome mapping on nanochannel arrays. In this approach, specific sequence motifs in single DNA molecules are fluorescently labeled, and the DNA molecules are uniformly stretched in thousands of silicon channels on a nanofluidic device. Fluorescence imaging allows the construction of maps of the physical distances between occurrences of the sequence motifs. We demonstrate the analysis, individually and as mixtures, of 95 bacterial artificial chromosome (BAC) clones that cover the 4.7-Mb human major histocompatibility complex region. We obtain accurate, haplotype-resolved, sequence motif maps hundreds of kilobases in length, resulting in a median coverage of 114× for the BACs. The final sequence motif map assembly contains three contigs. With an average distance of 9 kb between labels, we detect 22 haplotype differences. We also use the sequence motif maps to provide scaffolds for de novo assembly of sequencing data. Nanochannel genome mapping should facilitate de novo assembly of sequencing reads from complex regions in diploid organisms, haplotype and structural variation analysis and comparative genomics. PMID:22797562

  17. Using Next-Generation Sequencing for DNA Barcoding: Capturing Allelic Variation in ITS2.

    PubMed

    Batovska, Jana; Cogan, Noel O I; Lynch, Stacey E; Blacket, Mark J

    2017-01-05

    Internal Transcribed Spacer 2 (ITS2) is a popular DNA barcoding marker; however, in some animal species it is hypervariable and therefore difficult to sequence with traditional methods. With next-generation sequencing (NGS) it is possible to sequence all gene variants despite the presence of single nucleotide polymorphisms (SNPs), insertions/deletions (indels), homopolymeric regions, and microsatellites. Our aim was to compare the performance of Sanger sequencing and NGS amplicon sequencing in characterizing ITS2 in 26 mosquito species represented by 88 samples. The suitability of ITS2 as a DNA barcoding marker for mosquitoes, and its allelic diversity in individuals and species, was also assessed. Compared to Sanger sequencing, NGS was able to characterize the ITS2 region to a greater extent, with resolution within and between individuals and species that was previously not possible. A total of 382 unique sequences (alleles) were generated from the 88 mosquito specimens, demonstrating the diversity present that has been overlooked by traditional sequencing methods. Multiple indels and microsatellites were present in the ITS2 alleles, which were often specific to species or genera, causing variation in sequence length. As a barcoding marker, ITS2 was able to separate all of the species, apart from members of the Culex pipiens complex, providing the same resolution as the commonly used Cytochrome Oxidase I (COI). The ability to cost-effectively sequence hypervariable markers makes NGS an invaluable tool with many applications in the DNA barcoding field, and provides insights into the limitations of previous studies and techniques. Copyright © 2017 Batovska et al.

  18. Rethinking microbial diversity analysis in the high throughput sequencing era.

    PubMed

    Lemos, Leandro N; Fulthorpe, Roberta R; Triplett, Eric W; Roesch, Luiz F W

    2011-07-01

    The analysis of amplified and sequenced 16S rRNA genes has become the most important single approach for microbial diversity studies. The new sequencing technologies allow for sequencing thousands of reads in a single run and a cost-effective option is split into a single run across many samples. However for this type of investigation the key question that needs to be answered is how many samples can be sequenced without biasing the results due to lack of sequence representativeness? In this work we demonstrated that the level of sequencing effort used for analyzing soil microbial communities biases the results and determines the most effective type of analysis for small and large datasets. Many simulations were performed with four independent pyrosequencing-generated 16S rRNA gene libraries from different environments. The analysis performed here illustrates the lack of resolution of OTU-based approaches for datasets with low sequence coverage. This analysis should be performed with at least 90% of sequence coverage. Diversity index values increase with sample size making normalization of the number of sequences in all samples crucial. An important finding of this study was the advantage of phylogenetic approaches for examining microbial communities with low sequence coverage. However, if the environments being compared were closely related, a deeper sequencing would be necessary to detect the variation in the microbial composition. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. Enterovirus D68 in Hospitalized Children: Sequence Variation, Viral Loads and Clinical Outcomes

    PubMed Central

    Salamon, Douglas; Leber, Amy; Mejias, Asuncion

    2016-01-01

    Background An outbreak of enterovirus D68 (EV-D68) caused severe respiratory illness in 2014. The disease spectrum of EV-D68 infections in children with underlying medical conditions other than asthma, the role of EV-D68 loads on clinical illness, and the variation of EV-D68 strains within the same institution over time have not been described. We sought to define the association between EV-D68 loads and sequence variation, and the clinical characteristic in hospitalized children at our institution from 2011 to 2014. Methods May through November 2014, and August to September 2011 to 2013, a convenience sample of nasopharyngeal specimens from children with rhinovirus (RV)/EV respiratory infections were tested for EV-D68 by RT-PCR. Clinical data were compared between children with RV/EV-non-EV-D68 and EV-D68 infections, and among children with EV-D68 infections categorized as healthy, asthmatics, and chronic medical conditions. EV-D68 loads were analyzed in relation to disease severity parameters and sequence variability characterized over time. Results In 2014, 44% (192/438) of samples tested positive for EV-D68 vs. 10% (13/130) in 2011–13 (p<0.0001). PICU admissions (p<0.0001) and non-invasive ventilation (p<0.0001) were more common in children with EV-D68 vs. RV/EV-non-EV-D68 infections. Asthmatic EV-D68+ children, required supplemental oxygen administration (p = 0.03) and PICU admissions (p <0.001) more frequently than healthy children or those with chronic medical conditions; however oxygen duration (p<0.0001), and both PICU and total hospital stay (p<0.01) were greater in children with underlying medical conditions, irrespective of viral burden. By phylogenetic analysis, the 2014 EV-D68 strains clustered into a new sublineage within clade B. Conclusions This is one of the largest pediatric cohorts described from the EV-D68 outbreak. Irrespective of viral loads, EV-D68 was associated with high morbidity in children with asthma and co-morbidities. While EV-D68

  20. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

    USDA-ARS?s Scientific Manuscript database

    Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...

  1. Unique Features of Germline Variation in Five Egyptian Familial Breast Cancer Families Revealed by Exome Sequencing

    PubMed Central

    Kim, Yeong C.; Soliman, Amr S.; Cui, Jian; Ramadan, Mohamed; Hablas, Ahmed; Abouelhoda, Mohamed; Hussien, Nehal; Ahmed, Ola; Zekri, Abdel-Rahman Nabawy; Seifeldin, Ibrahim A.

    2017-01-01

    Genetic predisposition increases the risk of familial breast cancer. Recent studies indicate that genetic predisposition for familial breast cancer can be ethnic-specific. However, current knowledge of genetic predisposition for the disease is predominantly derived from Western populations. Using this existing information as the sole reference to judge the predisposition in non-Western populations is not adequate and can potentially lead to misdiagnosis. Efforts are required to collect genetic predisposition from non-Western populations. The Egyptian population has high genetic variations in reflecting its divergent ethnic origins, and incident rate of familial breast cancer in Egypt is also higher than the rate in many other populations. Using whole exome sequencing, we investigated genetic predisposition in five Egyptian familial breast cancer families. No pathogenic variants in BRCA1, BRCA2 and other classical breast cancer-predisposition genes were present in these five families. Comparison of the genetic variants with those in Caucasian familial breast cancer showed that variants in the Egyptian families were more variable and heterogeneous than the variants in Caucasian families. Multiple damaging variants in genes of different functional categories were identified either in a single family or shared between families. Our study demonstrates that genetic predisposition in Egyptian breast cancer families may differ from those in other disease populations, and supports a comprehensive screening of local disease families to determine the genetic predisposition in Egyptian familial breast cancer. PMID:28076423

  2. HIV-1 Tat and Viral Latency: What We Can Learn from Naturally Occurring Sequence Variations.

    PubMed

    Kamori, Doreen; Ueno, Takamasa

    2017-01-01

    Despite the effective use of antiretroviral therapy, the remainder of a latently HIV-1-infected reservoir mainly in the resting memory CD4(+) T lymphocyte subset has provided a great setback toward viral eradication. While host transcriptional silencing machinery is thought to play a dominant role in HIV-1 latency, HIV-1 protein such as Tat, may affect both the establishment and the reversal of latency. Indeed, mutational studies have demonstrated that insufficient Tat transactivation activity can result in impaired transcription of viral genes and the establishment of latency in cell culture experiments. Because Tat protein is one of highly variable proteins within HIV-1 proteome, it is conceivable that naturally occurring Tat mutations may differentially modulate Tat functions, thereby influencing the establishment and/or the reversal of viral latency in vivo. In this mini review, we summarize the recent findings of Tat naturally occurring polymorphisms associating with host immune responses and we highlight the implication of Tat sequence variations in relation to HIV latency.

  3. HIV-1 Tat and Viral Latency: What We Can Learn from Naturally Occurring Sequence Variations

    PubMed Central

    Kamori, Doreen; Ueno, Takamasa

    2017-01-01

    Despite the effective use of antiretroviral therapy, the remainder of a latently HIV-1-infected reservoir mainly in the resting memory CD4+ T lymphocyte subset has provided a great setback toward viral eradication. While host transcriptional silencing machinery is thought to play a dominant role in HIV-1 latency, HIV-1 protein such as Tat, may affect both the establishment and the reversal of latency. Indeed, mutational studies have demonstrated that insufficient Tat transactivation activity can result in impaired transcription of viral genes and the establishment of latency in cell culture experiments. Because Tat protein is one of highly variable proteins within HIV-1 proteome, it is conceivable that naturally occurring Tat mutations may differentially modulate Tat functions, thereby influencing the establishment and/or the reversal of viral latency in vivo. In this mini review, we summarize the recent findings of Tat naturally occurring polymorphisms associating with host immune responses and we highlight the implication of Tat sequence variations in relation to HIV latency. PMID:28194140

  4. Unique Features of Germline Variation in Five Egyptian Familial Breast Cancer Families Revealed by Exome Sequencing.

    PubMed

    Kim, Yeong C; Soliman, Amr S; Cui, Jian; Ramadan, Mohamed; Hablas, Ahmed; Abouelhoda, Mohamed; Hussien, Nehal; Ahmed, Ola; Zekri, Abdel-Rahman Nabawy; Seifeldin, Ibrahim A; Wang, San Ming

    2017-01-01

    Genetic predisposition increases the risk of familial breast cancer. Recent studies indicate that genetic predisposition for familial breast cancer can be ethnic-specific. However, current knowledge of genetic predisposition for the disease is predominantly derived from Western populations. Using this existing information as the sole reference to judge the predisposition in non-Western populations is not adequate and can potentially lead to misdiagnosis. Efforts are required to collect genetic predisposition from non-Western populations. The Egyptian population has high genetic variations in reflecting its divergent ethnic origins, and incident rate of familial breast cancer in Egypt is also higher than the rate in many other populations. Using whole exome sequencing, we investigated genetic predisposition in five Egyptian familial breast cancer families. No pathogenic variants in BRCA1, BRCA2 and other classical breast cancer-predisposition genes were present in these five families. Comparison of the genetic variants with those in Caucasian familial breast cancer showed that variants in the Egyptian families were more variable and heterogeneous than the variants in Caucasian families. Multiple damaging variants in genes of different functional categories were identified either in a single family or shared between families. Our study demonstrates that genetic predisposition in Egyptian breast cancer families may differ from those in other disease populations, and supports a comprehensive screening of local disease families to determine the genetic predisposition in Egyptian familial breast cancer.

  5. Characterization of a highly repeated DNA sequence family in five species of the genus Eulemur.

    PubMed

    Ventura, M; Boniotto, M; Cardone, M F; Fulizio, L; Archidiacono, N; Rocchi, M; Crovella, S

    2001-09-19

    The karyotypes of Eulemur species exhibit a high degree of variation, as a consequence of the Robertsonian fusion and/or centromere fission. Centromeric and pericentromeric heterochromatin of eulemurs is constituted by highly repeated DNA sequences (including some telomeric TTAGGG repeats) which have so far been investigated and used for the study of the systematic relationships of the different species of the genus Eulemur. In our study, we have cloned a set of repetitive pericentromeric sequences of five Eulemur species: E. fulvus fulvus (EFU), E. mongoz (EMO), E. macaco (EMA), E. rubriventer (ERU), and E. coronatus (ECO). We have characterized these clones by sequence comparison and by comparative fluorescence in situ hybridization analysis in EMA and EFU. Our results showed a high degree of sequence similarity among Eulemur species, indicating a strong conservation, within the five species, of these pericentromeric highly repeated DNA sequences.

  6. Biochemical and in vitro biological significance of natural sequence variation in the ovine leptin gene.

    PubMed

    Reicher, Shay; Gertler, Arieh; Seroussi, Eyal; Shpilman, Michal; Gootwine, Elisha

    2011-08-01

    The hormone leptin is involved in diverse biological processes, including regulation of food intake, body-weight homeostasis and energy balance. Sequence variation in the bovine leptin gene has been found to be associated with variations in carcass fat content and average daily gain, as well as in milk yield, milk somatic cell count and several traits governing reproduction. We sequenced genomic DNA and cDNA samples of individuals from three divergent sheep breeds and revealed synonymous as well as novel non-synonymous allelic variation at the third exon of the ovine leptin gene (oLEP) as compared to the sequence published at Accession No. U84247 (reference sequence). In addition, two alternatively spliced oLEP transcripts were found in the abdominal fat tissue. The biochemical and the in vitro biological significance of the sequence variation in the oLEP was examined by generating recombinant oLEP-protein variants namely: p.Q28del, p.N78S, p.R84Q, p.P99Q, p.V123L and p.R138Q, carrying the corresponding sequence variation. Surface plasmon resonance experiments revealed, in most cases, reduced affinity of the oLEP protein variants examined, to human leptin-binding domain (hLBD), relative to the reference variant, being 0.75, 0.60, 0.60, 0.89, 0.92 and 1.03, respectively. In competitive binding assays between biotinylated oLEP and the recombinant leptin protein variants, p.N78S and p.R84Q variants exhibited the lowest affinity to hLBD (0.18 and 0.41, respectively) as compared to the reference hormone. We then tested the protein variants' ability to induce proliferation in Baf-3 cells stably expressing the long form of the human leptin receptor: significant differences in proliferative activity were only found for p.N78S (1.8-fold higher) and p.R138Q (4.2-fold lower) relative to the reference oLEP variant. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. Analysis of sequence variations in several human genes using phosphoramidite bond DNA fragmentation and chip-based MALDI-TOF.

    PubMed

    Smylie, Kevin J; Cantor, Charles R; Denissenko, Mikhail F

    2004-01-01

    The challenge in the postgenome era is to measure sequence variations over large genomic regions in numerous patient samples. This massive amount of work can only be completed if more accurate, cost-effective, and high-throughput solutions become available. Here we describe a novel DNA fragmentation approach for single nucleotide polymorphism (SNP) discovery and sequence validation. The base-specific cleavage is achieved by creating primer extension products, in which acid-labile phosphoramidite (P-N) bonds replace the 5' phosphodiester bonds of newly incorporated pyrimidine nucleotides. Sequence variations are detected by hydrolysis of this acid-labile bond and MALDI-TOF analysis of the resulting fragments. In this study, we developed a robust protocol for P-N-bond fragmentation and investigated additional ways to improve its sensitivity and reproducibility. We also present the analysis of several human genomic targets ranging from 100-450 bp in length. By using a semiautomated sample processing protocol, we investigated an array of SNPs within a 240-bp segment of the NFKBIA gene in 48 human DNA samples. We identified and measured frequencies for the two common SNPs in the 3'UTR of NFKBIA (separated by 123 bp) and then confirmed these values in an independent genotyping experiment. The calculated allele frequencies in white and African American groups differed significantly, yet both fit Hardy-Weinberg expectations. This demonstrates the utility and effectiveness of PN-bond DNA fragmentation and subsequent MALDI-TOF MS analysis for the high-throughput discovery and measurement of sequence variations in fragments up to 0.5 kb in length in multiple human blood DNA samples.

  8. Characterization of the Polish Primitive Horse (Konik) maternal lines using mitochondrial D-loop sequence variation.

    PubMed

    Cieslak, Jakub; Wodas, Lukasz; Borowska, Alicja; Cothran, Ernest G; Khanshour, Anas M; Mackowski, Mariusz

    2017-01-01

    The Polish Primitive Horse (PPH, Konik) is a Polish native horse breed managed through a conservation program mainly due to its characteristic phenotype of a primitive horse. One of the most important goals of PPH breeding strategy is the preservation and equal development of all existing maternal lines. However, until now there was no investigation into the real genetic diversity of 16 recognized PPH dam lines using mtDNA sequence variation. Herein, we describe the phylogenetic relationships between the PPH maternal lines based upon partial mtDNA D-loop sequencing of 173 individuals. Altogether, 19 mtDNA haplotypes were detected in the PPH population. Five haplotypes were putatively novel while the remaining 14 showed the 100% homology with sequences deposited in the GenBank database, represented by both modern and primitive horse breeds. Generally, comparisons found the haplotypes conformed to 10 different recognized mtDNA haplogroups (A, B, E, G, J, M, N, P, Q and R). A multi-breed analysis has indicated the phylogenetic similarity of PPH and other indigenous horse breeds derived from various geographical regions (e.g., Iberian Peninsula, Eastern Europe and Siberia) which may support the hypothesis that within the PPH breed numerous ancestral haplotypes (found all over the world) are still present. Only in the case of five maternal lines (Bona, Dzina I, Geneza, Popielica and Zaza) was the segregation of one specific mtDNA haplotype observed. The 11 remaining lines showed a higher degree of mtDNA haplotype variability (2-5 haplotypes segregating in each line). This study has revealed relatively high maternal genetic diversity in the small, indigenous PPH breed (19 haplotypes, overall HapD = 0.92). However, only some traditionally distinguished maternal lines can be treated as genetically pure. The rest show evidence of numerous mistakes recorded in the official PPH pedigrees. This study has proved the importance of maternal genetic diversity monitoring based upon

  9. Generating barcoded libraries for multiplex high-throughput sequencing.

    PubMed

    Knapp, Michael; Stiller, Mathias; Meyer, Matthias

    2012-01-01

    Molecular barcoding is an essential tool to use the high throughput of next generation sequencing platforms optimally in studies involving more than one sample. Various barcoding strategies allow for the incorporation of short recognition sequences (barcodes) into sequencing libraries, either by ligation or polymerase chain reaction (PCR). Here, we present two approaches optimized for generating barcoded sequencing libraries from low copy number extracts and amplification products typical of ancient DNA studies.

  10. Magnetic susceptibility variations in Loess sequences and their relationship to astronomical forcing

    NASA Technical Reports Server (NTRS)

    Verosub, Kenneth L.; Singer, Michael J.

    1992-01-01

    The long, well-exposed and often continuous sequences of loess found throughout the world are generally thought to provide an excellent opportunity for studying long-term, large-scale environmental change during the last few million years. In recent years, the most fruitful loess studies have been those involving the deposits of the loess in China. One of the most intriguing results of that work has been the discovery of an apparent correlation between variations in the magnetic susceptibility of the loess sequence and the oxygen isotope record of the deep sea. This correlation implies that magnetic susceptibility variations are being driven by astronomical parameters. However, the basic data have been interpreted in various ways by different authors, most of whom assumed that the magnetic minerals in the loess have not been affected by post-depositional processes. Using a chemical extraction procedure that allows us to separate the contribution of secondary pedogenic magnetic minerals from primary inherited magnetic minerals, we have found that the magnetic susceptibility of the Chinese paleosols is largely due to a pedogenic component which is present to a lesser degree in the loess. We have also found that the smaller inherited component of the magnetic susceptibility is about the same in the paleosols and the loess. These results demonstrate the need for additional study of the processes that create magnetic susceptibility variations in order to interpret properly the role of astronomical forcing in producing these variations.

  11. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.

    PubMed Central

    Clark, A G; Weiss, K M; Nickerson, D A; Taylor, S L; Buchanan, A; Stengård, J; Salomaa, V; Vartiainen, E; Perola, M; Boerwinkle, E; Sing, C F

    1998-01-01

    Allelic variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL) was scored in 71 healthy individuals (142 chromosomes) from three populations: African Americans (24) from Jackson, MS; Finns (24) from North Karelia, Finland; and non-Hispanic Whites (23) from Rochester, MN. The sequences had a total of 88 variable sites, with a nucleotide diversity (site-specific heterozygosity) of .002+/-.001 across this 9.7-kb region. The frequency spectrum of nucleotide variation exhibited a slight excess of heterozygosity, but, in general, the data fit expectations of the infinite-sites model of mutation and genetic drift. Allele-specific PCR helped resolve linkage phases, and a total of 88 distinct haplotypes were identified. For 1,410 (64%) of the 2,211 site pairs, all four possible gametes were present in these haplotypes, reflecting a rich history of past recombination. Despite the strong evidence for recombination, extensive linkage disequilibrium was observed. The number of haplotypes generally is much greater than the number expected under the infinite-sites model, but there was sufficient multisite linkage disequilibrium to reveal two major clades, which appear to be very old. Variation in this region of LPL may depart from the variation expected under a simple, neutral model, owing to complex historical patterns of population founding, drift, selection, and recombination. These data suggest that the design and interpretation of disease-association studies may not be as straightforward as often is assumed. PMID:9683608

  12. Impact of next generation sequencing: the 2009 Human Genome Variation Society Scientific Meeting.

    PubMed

    Oetting, William S

    2010-04-01

    The annual scientific meeting of the Human Genome Variation Society (HGVS) was held on the 20th of October, 2009, in Honolulu, Hawaii. The theme of this meeting was the "Impact of Next Generation Sequencing." Presenters spoke on issues ranging from advances in the technology of large-scale genome sequencing to how this information can be analyzed to uncover genetic variants associated with disease. Many of the challenges resulting from the implementation of these new technologies were presented, but possible solutions, or at least paths to the solutions, were also given. With the combined efforts of investigators using next-generation sequencing to help understand the impact of genetic variants on disease, the use of the personal genome in medicine will soon become a reality.

  13. Population subdivision in Europe's great bustard inferred from mitochondrial and nuclear DNA sequence variation.

    PubMed

    Pitra, C; Lieckfeldt, D; Alonso, J C

    2000-08-01

    A continent-wide survey of sequence variation in mitochondrial (mt) and nuclear (n) DNA of the endangered great bustard (Otis tarda) was conducted to assess the extent of phylogeographic structure in a morphologically monotypic bird. DNA sequence variation in a combined 809 bp segment of the mtDNA genome from 66 individuals from the last six breeding regions showed relatively low levels of intraspecific sequence diversity (n = 0.32%) but significant differences in the regional distribution of 11 haplotypes (phiST = 0.49). Despite their exceptional potential for dispersal, a complete and long-term historical separation between the populations from the Iberian Peninsula (Spain) and mainland Europe (Hungary, Slovakia, Germany, and Russia) was demonstrated. Divergence between populations based on a 3-bp insertion-deletion polymorphism within the intron region of the nuclear CHD-Z gene was geographically concordant with the primary subdivision identified within the mtDNA sequences. Inferred aspects of phylogeography were used to formulate conservation recommendations for this endangered species.

  14. Analysis of amino acid sequence variations and immunoglobulin E-binding epitopes of German cockroach tropomyosin.

    PubMed

    Jeong, Kyoung Yong; Lee, Jongweon; Lee, In-Yong; Ree, Han-Il; Hong, Chein-Soo; Yong, Tai-Soon

    2004-09-01

    The allergenicities of tropomyosins from different organisms have been reported to vary. The cDNA encoding German cockroach tropomyosin (Bla g 7) was isolated, expressed, and characterized previously. In the present study, the amino acid sequence variations in German cockroach tropomyosin were analyzed in order to investigate its influence on allergenicity. We also undertook the identification of immunodominant peptides containing immunoglobulin E (IgE) epitopes which may facilitate the development of diagnostic and immunotherapeutic strategies based on the recombinant proteins. Two-dimensional gel electrophoresis and immunoblot analysis with mouse anti-recombinant German cockroach tropomyosin serum was performed to investigate the isoforms at the protein level. Reverse transcriptase PCR (RT-PCR) was applied to examine the sequence diversity. Eleven different variants of the deduced amino acid sequences were identified by RT-PCR. German cockroach tropomyosin has only minor sequence variations that did not seem to affect its allergenicity significantly. These results support the molecular basis underlying the cross-reactivities of arthropod tropomyosins. Recombinant fragments were also generated by PCR, and IgE-binding epitopes were assessed by enzyme-linked immunosorbent assay. Sera from seven patients revealed heterogeneous IgE-binding responses. This study demonstrates multiple IgE-binding epitope regions in a single molecule, suggesting that full-length tropomyosin should be used for the development of diagnostic and therapeutic reagents.

  15. Genetic variation in and spatial structure of natural populations of Dipterocarpus alatus (Dipterocarpaceae) determined using single sequence repeat markers.

    PubMed

    Tam, N M; Duy, V D; Duc, N M; Giap, V D; Xuan, B T T

    2014-07-24

    Dipterocarpus alatus (Dipterocarpaceae) is widely distributed in lowland forests in central and southern Vietnam, Cambodia, Laos, Myanmar, Philippines, Thailand, and India. Due to over-exploitation and habitat destruction, the species is now threatened. The genetic variation within and among populations of D. alatus was investigated on the basis of 9 microsatellite (single sequence repeat, SSR) loci. In all, 268 sampled trees from 10 populations in central and southern Vietnam were analyzed in this study. The SSR data showed a high genetic variability within populations with an average of HO = 0.209 and HE = 0.239. Genetic differentiation among populations was high (FST = 0.266), indicating limited gene flow (Nm = 0.69). Analysis of molecular variance showed that most genetic variation was within populations (74.96%). This study highlights the importance of conserving the genetic resources of D. alatus species.

  16. Sequence and expression variations suggest an adaptive role for the DA1-like gene family in the evolution of soybeans.

    PubMed

    Zhao, Man; Gu, Yongzhe; He, Lingli; Chen, Qingshan; He, Chaoying

    2015-05-15

    The DA1 gene family is plant-specific and Arabidopsis DA1 regulates seed and organ size, but the functions in soybeans are unknown. The cultivated soybean (Glycine max) is believed to be domesticated from the annual wild soybeans (Glycine soja). To evaluate whether DA1-like genes were involved in the evolution of soybeans, we compared variation at both sequence and expression levels of DA1-like genes from G. max (GmaDA1) and G. soja (GsoDA1). Sequence identities were extremely high between the orthologous pairs between soybeans, while the paralogous copies in a soybean species showed a relatively high divergence. Moreover, the expression variation of DA1-like paralogous genes in soybean was much greater than the orthologous gene pairs between the wild and cultivated soybeans during development and challenging abiotic stresses such as salinity. We further found that overexpressing GsoDA1 genes did not affect seed size. Nevertheless, overexpressing them reduced transgenic Arabidopsis seed germination sensitivity to salt stress. Moreover, most of these genes could improve salt tolerance of the transgenic Arabidopsis plants, corroborated by a detection of expression variation of several key genes in the salt-tolerance pathways. Our work suggested that expression diversification of DA1-like genes is functionally associated with adaptive radiation of soybeans, reinforcing that the plant-specific DA1 gene family might have contributed to the successful adaption to complex environments and radiation of the plants.

  17. mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

    PubMed

    Vellarikkal, Shamsudheen Karuthedath; Dhiman, Heena; Joshi, Kandarp; Hasija, Yasha; Sivasubbu, Sridhar; Scaria, Vinod

    2015-04-01

    The human mitochondrial genome has been reported to have a very high mutation rate as compared with the nuclear genome. A large number of mitochondrial mutations show significant phenotypic association and are involved in a broad spectrum of diseases. In recent years, there has been a remarkable progress in the understanding of mitochondrial genetics. The availability of next-generation sequencing (NGS) technologies have not only reduced sequencing cost by orders of magnitude but has also provided us good quality mitochondrial genome sequences with high coverage, thereby enabling decoding of a number of human mitochondrial diseases. In this study, we report a computational and experimental pipeline to decipher the human mitochondrial DNA variations and examine them for their clinical correlation. As a proof of principle, we also present a clinical study of a patient with Leigh disease and confirmed maternal inheritance of the causative allele. The pipeline is made available as a user-friendly online tool to annotate variants and find haplogroup, disease association, and heteroplasmic sites. The "mit-o-matic" computational pipeline represents a comprehensive cloud-based tool for clinical evaluation of mitochondrial genomic variations from NGS datasets. The tool is freely available at http://genome.igib.res.in/mitomatic/. © 2015 WILEY PERIODICALS, INC.

  18. High-throughput sequencing and vaccine design.

    PubMed

    Luciani, F

    2016-04-01

    Next-generation sequencing (NGS) technologies have reshaped genome research. The resulting increase in sequencing depth and resolution has led to an unprecedented level of genomic detail and thus an increasing awareness of the complexity of animal, human and pathogen genomes. This has resulted in new approaches to vaccine research. On the one hand, the increase in genome complexity challenges our ability to study and understand pathogen biology and pathogen-host interactions. On the other hand, the increase in genomic data also provides key information for developing and designing improved vaccines against pathogens that were previously extremely difficult to deal with, such as rapidly mutating RNA viruses or bacteria that have complex interactions with the host immune system. This review describes how the broad application of NGS technologies to genome research is affecting vaccine research. It focuses on implications for the field of viral genomics, and includes recent animal and human studies.

  19. Simple sequence repeat variations expedite phage divergence: Mechanisms of indels and gene mutations.

    PubMed

    Lin, Tiao-Yin

    2016-07-01

    Phages are the most abundant biological entities and influence prokaryotic communities on Earth. Comparing closely related genomes sheds light on molecular events shaping phage evolution. Simple sequence repeat (SSR) variations impart over half of the genomic changes between T7M and T3, indicating an important role of SSRs in accelerating phage genetic divergence. Differences in coding and noncoding regions of phages infecting different hosts, coliphages T7M and T3, Yersinia phage ϕYeO3-12, and Salmonella phage ϕSG-JL2, frequently arise from SSR variations. Such variations modify noncoding and coding regions; the latter efficiently changes multiple amino acids, thereby hastening protein evolution. Four classes of events are found to drive SSR variations: insertion/deletion of SSR units, expansion/contraction of SSRs without alteration of genome length, changes of repeat motifs, and generation/loss of repeats. The categorization demonstrates the ways SSRs mutate in genomes during phage evolution. Indels are common constituents of genome variations and human diseases, yet, how they occur without preexisting repeat sequence is less understood. Non-repeat-unit-based misalignment-elongation (NRUBME) is proposed to be one mechanism for indels without adjacent repeats. NRUBME or consecutive NRUBME may also change repeat motifs or generate new repeats. NRUBME invoking a non-Watson-Crick base pair explains insertions that initiate mononucleotide repeats. Furthermore, NRUBME successfully interprets many inexplicable human di- to tetranucleotide repeat generations. This study provides the first evidence of SSR variations expediting phage divergence, and enables insights into the events and mechanisms of genome evolution. NRUBME allows us to emulate natural evolution to design indels for various applications.

  20. Distinct intraspecific variations of garlic (Allium sativum L.) revealed by the exon-intron sequences of the alliinase gene.

    PubMed

    Endo, Aki; Imai, Yukiko; Nakamura, Mizuho; Yanagisawa, Eri; Taguchi, Takaaki; Torii, Kosuke; Okumura, Hidenobu; Ichinose, Koji

    2014-04-01

    Garlic (Allium sativum L.) has been used worldwide as a food and for medicinal purposes since early times. Garlic cultivars exhibit considerable morphological diversity despite the fact that they are mostly sterile and are grown only by vegetative propagation of cloves. Considerable recombination occurs in garlic genomes, including the genes involved in secondary metabolites. We examined the genomic DNAs (gDNAs) from garlic, encoding alliinase, a key enzyme involved in organosulfur metabolism in Allium plants. The 1.7-kb gDNA fragments, covering three exons (2, 3, and 4) and all four introns, were amplified from total DNAs prepared from garlic samples produced in Asia and Europe, leading to 73 sequences in total: Japan (JPN), China (CHN), India (IND), Spain (ESP), and France (FRA). The exon sequences were highly conserved among all the sequences, probably reflecting the fully functional alliinase associated with the flavor quality. Distinct intraspecific variations were detected for all four intron sequences, leading to the haplotype classifications. A close relationship between JPN and CHN was observed for all four introns, whereas IND showed a more divergent distribution. ESP and FRA afforded clearly different variants compared with those from Asian sequences. The present study provides information that could be useful in the development of an additional molecular marker for garlic authentication and quality control.

  1. Phylogeography of the endangered Cathaya argyrophylla (Pinaceae) inferred from sequence variation of mitochondrial and nuclear DNA.

    PubMed

    Wang, Hong-Wei; Ge, Song

    2006-11-01

    Cathaya argyrophylla is an endangered conifer restricted to subtropical mountains of China. To study phylogeographical pattern and demographic history of C. argyrophylla, species-wide genetic variation was investigated using sequences of maternally inherited mtDNA and biparentally inherited nuclear DNA. Of 15 populations sampled from all four distinct regions, only three mitotypes were detected at two loci, without single region having a mixed composition (G(ST) = 1). Average nucleotide diversity (theta(ws) = 0.0024; pi(s) = 0.0029) across eight nuclear loci is significantly lower than those found for other conifers (theta(ws) = 0.003 approximately 0.015; pi(s) = 0.002 approximately 0.012) based on estimates of multiple loci. Because of its highest diversity among the eight nuclear loci and evolving neutrally, one locus (2009) was further used for phylogeographical studies and eight haplotypes resulting from 12 polymorphic sites were obtained from 98 individuals. All the four distinct regions had at least four haplotypes, with the Dalou region (DL) having the highest diversity and the Bamian region (BM) the lowest, paralleling the result of the eight nuclear loci. An AMOVA revealed significant proportion of diversity attributable to differences among regions (13.4%) and among populations within regions (8.9%). F(ST) analysis also indicated significantly high differentiation among populations (F(ST) = 0.22) and between regions (F(ST) = 0.12-0.38). Non-overlapping distribution of mitotypes and high genetic differentiation among the distinct geographical groups suggest the existence of at least four separate glacial refugia. Based on network and mismatch distribution analyses, we do not find evidence of long distance dispersal and population expansion in C. argyrophylla. Ex situ conservation and artificial crossing are recommended for the management of this endangered species.

  2. Population clustering based on copy number variations detected from next generation sequencing data

    PubMed Central

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2015-01-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering. PMID:25152046

  3. Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing.

    PubMed

    Bodian, Dale L; McCutcheon, Justine N; Kothiyal, Prachi; Huddleston, Kathi C; Iyer, Ramaswamy K; Vockley, Joseph G; Niederhuber, John E

    2014-01-01

    Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.

  4. Cis-regulatory sequence variation and association with Mycoplasma load in natural populations of the house finch (Carpodacus mexicanus)

    PubMed Central

    Backström, Niclas; Shipilina, Daria; Blom, Mozes P K; Edwards, Scott V

    2013-01-01

    Characterization of the genetic basis of fitness traits in natural populations is important for understanding how organisms adapt to the changing environment and to novel events, such as epizootics. However, candidate fitness-influencing loci, such as regulatory regions, are usually unavailable in nonmodel species. Here, we analyze sequence data from targeted resequencing of the cis-regulatory regions of three candidate genes for disease resistance (CD74, HSP90α, and LCP1) in populations of the house finch (Carpodacus mexicanus) historically exposed (Alabama) and naïve (Arizona) to Mycoplasma gallisepticum. Our study, the first to quantify variation in regulatory regions in wild birds, reveals that the upstream regions of CD74 and HSP90α are GC-rich, with the former exhibiting unusually low sequence variation for this species. We identified two SNPs, located in a GC-rich region immediately upstream of an inferred promoter site in the gene HSP90α, that were significantly associated with Mycoplasma pathogen load in the two populations. The SNPs are closely linked and situated in potential regulatory sequences: one in a binding site for the transcription factor nuclear NFYα and the other in a dinucleotide microsatellite ((GC)6). The genotype associated with pathogen load in the putative NFYα binding site was significantly overrepresented in the Alabama birds. However, we did not see strong effects of selection at this SNP, perhaps because selection has acted on standing genetic variation over an extremely short time in a highly recombining region. Our study is a useful starting point to explore functional relationships between sequence polymorphisms, gene expression, and phenotypic traits, such as pathogen resistance that affect fitness in the wild. PMID:23532859

  5. Sequencing strategy for the whole mitochondrial genome resulting in high quality sequences

    PubMed Central

    Fendt, Liane; Zimmermann, Bettina; Daniaux, Martin; Parson, Walther

    2009-01-01

    Background It has been demonstrated that a reliable and fail-safe sequencing strategy is mandatory for high-quality analysis of mitochondrial (mt) DNA, as the sequencing and base-calling process is prone to error. Here, we present a high quality, reliable and easy handling manual procedure for the sequencing of full mt genomes that is also appropriate for laboratories where fully automated processes are not available. Results We amplified whole mitochondrial genomes as two overlapping PCR-fragments comprising each about 8500 bases in length. We developed a set of 96 primers that can be applied to a (manual) 96 well-based technology, which resulted in at least double strand sequence coverage of the entire coding region (codR). Conclusion This elaborated sequencing strategy is straightforward and allows for an unambiguous sequence analysis and interpretation including sometimes challenging phenomena such as point and length heteroplasmy that are relevant for the investigation of forensic and clinical samples. PMID:19331681

  6. Inferring the conformation of RNA base pairs and triples from patterns of sequence variation.

    PubMed

    Gautheret, D; Gutell, R R

    1997-04-15

    The success of comparative analysis in resolving RNA secondary structure and numerous tertiary interactions relies on the presence of base covariations. Although the majority of base covariations in aligned sequences is associated to Watson-Crick base pairs, many involve non-canonical or restricted base pair exchanges (e.g. only G:C/A:U), reflecting more specific structural constraints. We have developed a computer program that determines potential base pairing conformations for a given set of paired nucleotides in a sequence alignment. This program (ISOPAIR) assumes that the base pair conformation is maintained through sequence variation without significantly affecting the path of the sugar-phosphate backbone. ISOPAIR identifies such 'isomorphic' structures for any set of input base pair or base triple sequences. The program was applied to base pairs and triples with known structures and sequence exchanges. In several instances, isomorphic structures were correctly identified with ISOPAIR. Thus, ISOPAIR is useful when assessing non-canonical base pair conformations in comparative analysis. ISOPAIR applications are limited to those cases where unusual base pair exchanges indeed reflect a non-canonical conformation.

  7. PSCC: Sensitive and Reliable Population-Scale Copy Number Variation Detection Method Based on Low Coverage Sequencing

    PubMed Central

    Vogel, Ida; Choy, Kwong Wai; Chen, Fang; Christensen, Rikke; Zhang, Chunlei; Ge, Huijuan; Jiang, Haojun; Yu, Chang; Huang, Fang; Wang, Wei; Jiang, Hui; Zhang, Xiuqing

    2014-01-01

    Background Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method. Methodology/Principal Findings In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS. Conclusions/Significance Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing. PMID:24465483

  8. DNA sequence variation and regulation of genes involved in pathogenesis of pulmonary tuberculosis.

    PubMed

    Qidwai, T; Jamal, F; Khan, M Y

    2012-06-01

    DNA sequence variations [copy number variations, single nucleotide polymorphisms (SNPs) and microsatellite repeats] play an important role in susceptibility/resistance to tuberculosis and other infectious diseases like malaria and HIV. Different population exhibit variable associations with tuberculosis susceptibility and severity because of DNA sequence variations in both host and parasite. A number of genes and their polymorphisms have been identified that appear to be important in tuberculosis. In this article, several case-control studies of tuberculosis including a number of genes in different population have been explored. Furthermore, this review summarizes the current studies of host polymorphisms and their association with tuberculosis in different population. We have computationally predicted 275 SNPs which occur in transcription factor binding sites for transcription factors in 19 genes involved in pathogenesis of tuberculosis. Some common SNPs are rs1327474, rs755622, rs1801274, rs396991, rs5030737, rs1800451, rs1800450, rs3763313 rs3763313, rs9268494 and rs9268492 that have been found to play a role in disease. Presence of non-synonimous polymorphisms in coding region might affect the structure of protein, whereas polymorphisms in promoter region affect the level of gene products, consequently altering the susceptibility/resistance to disease. Based on this prediction, we hypothesize that these genes play an important role in susceptibility to tuberculosis through an altered expression of gene product via the modification of transcriptional regulation of gene. © 2012 The Authors. Scandinavian Journal of Immunology © 2012 Blackwell Publishing Ltd.

  9. Large scale DNA sequencing: new challenges emerge--the 2007 Human Genome Variation Society scientific meeting.

    PubMed

    Oetting, William S

    2008-05-01

    The annual scientific meeting of the Human Genome Variation Society (HGVS) was held on 23 October 2007, in San Diego, CA. The major theme of this meeting was "New DNA Sequencing Technologies & Human Genome Variation." A series of speakers provided information on several new technologies that produce DNA sequence data on a scale far beyond what was possible even a few years ago. These new technologies produce up to gigabases of nucleotides on a single run. Already, two individuals have had their entire genome sequenced, resulting in the identification of many novel DNA variants. Several new questions now need to be answered. What impact do these novel variants have on the phenotypes? How are we to associate private variants in a single individual with disease, especially when current association studies require genotyping thousands of individuals? Further work will be required to create methodologies to analyze these variants to determine if they are potentially disease-producing or are phenotypically silent. For the technology to be useful in a medical setting it will be crucial to answer to these questions.

  10. Complete plastid genome sequence of Primula sinensis (Primulaceae): structure comparison, sequence variation and evidence for accD transfer to nucleus

    PubMed Central

    Liu, Tong-Jian; Zhang, Cai-Yun; Yan, Hai-Fei; Zhang, Lu

    2016-01-01

    Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp) were separated by a large single-copy region (82,064 bp) and a small single-copy region (17,725 bp). The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF) were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36–rps8, rps16–trnQ, trnH–psbA and ndhC–trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis. PMID:27375965

  11. High-throughput sequence alignment using Graphics Processing Units.

    PubMed

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-12-10

    The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  12. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    PubMed

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-04

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs.

  13. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma

    PubMed Central

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-01-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  14. Natural Allelic Variations in Highly Polyploidy Saccharum Complex

    SciTech Connect

    Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.; Neves, Leandro G.; Todd, James; Zhang, Jisen; Comstock, Jack C.; Wang, Jianping

    2016-06-08

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.

  15. Individual and population variation in invertebrates revealed by Inter-simple Sequence Repeats (ISSRs)

    PubMed Central

    Abbot, Patrick

    2001-01-01

    PCR-based molecular markers are well suited for questions requiring large scale surveys of plant and animal populations. Inter-simple Sequence Repeats or ISSRs are analyzed by a recently developed technique based on the amplification of the regions between inverse-oriented microsatellite loci with oligonucleotides anchored in microsatellites themselves. ISSRs have shown much promise for the study of the population biology of plants, but have not yet been explored for similar studies of animals. The value of ISSRs is demonstrated for the study of animal species with low levels of within-population variation. Sets of primers are identified which reveal variation in two aphid species, Acyrthosiphon pisum and Pemphigus obesinymphae, in the yellow fever mosquito Aedes aegypti, and in a rotifer in the genus Philodina. PMID:15455068

  16. Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription.

    PubMed

    Kilpinen, Helena; Waszak, Sebastian M; Gschwind, Andreas R; Raghav, Sunil K; Witwicki, Robert M; Orioli, Andrea; Migliavacca, Eugenia; Wiederkehr, Michaël; Gutierrez-Arcelus, Maria; Panousis, Nikolaos I; Yurovsky, Alisa; Lappalainen, Tuuli; Romano-Palumbo, Luciana; Planchon, Alexandra; Bielser, Deborah; Bryois, Julien; Padioleau, Ismael; Udin, Gilles; Thurnheer, Sarah; Hacker, David; Core, Leighton J; Lis, John T; Hernandez, Nouria; Reymond, Alexandre; Deplancke, Bart; Dermitzakis, Emmanouil T

    2013-11-08

    DNA sequence variation has been associated with quantitative changes in molecular phenotypes such as gene expression, but its impact on chromatin states is poorly characterized. To understand the interplay between chromatin and genetic control of gene regulation, we quantified allelic variability in transcription factor binding, histone modifications, and gene expression within humans. We found abundant allelic specificity in chromatin and extensive local, short-range, and long-range allelic coordination among the studied molecular phenotypes. We observed genetic influence on most of these phenotypes, with histone modifications exhibiting strong context-dependent behavior. Our results implicate transcription factors as primary mediators of sequence-specific regulation of gene expression programs, with histone modifications frequently reflecting the primary regulatory event.

  17. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    PubMed Central

    Michaeli, Miri; Noga, Hila; Tabibian-Keissar, Hilla; Barshack, Iris; Mehr, Ramit

    2012-01-01

    High-throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig High-Throughput Sequencing Cleaner (Ig-HTS-Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig Insertion—Deletion Identifier (Ig-Indel-Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets. PMID:23293637

  18. MRI assessment of internal acoustic canal variations using 3D-FIESTA sequences.

    PubMed

    Erdogan, Nezahat; Altay, Canan; Akay, Emrah; Karakas, Levent; Uluc, Engin; Mete, Berna; Oygen, Aysegul; Oyar, Orhan; Gelal, Fazıl; Songu, Murat; Katilmis, Huseyin; Calli, Cağlar

    2013-02-01

    Magnetic resonance imaging (MRI) of the internal acoustic canal is the standard diagnostic tool for a wide range of indications in patients. This study aims to investigate the vascular variations and compression of the cranial nerves (CNs) VII and VIII at the cerebellopontine angle in patients with neuro-otologic symptoms using 3D-fast imaging employing steady-state acquisition (FIESTA) MR imaging. One hundred and eighty-seven patients (374 temporal bones) were examined on a 1.5-T MRI. In addition to conventional MR sequences, a 3D-FIESTA MR imaging was acquired. Magnetic resonance images thus obtained were evaluated with special regard to the presence of vascular contact to the CNs VII and VIII, as well as the presence of the vascular variations of the anterior inferior cerebellar artery (AICA) causing the compression of CNs. The Chi-squared test was used for statistical analysis. No statistically significant differences were found between the presence and absence of the AICA loop and/or vascular contact for the clinical symptoms of patients (P > 0.05). The cisternal and canalicular segments of CNs VII and VIII and adjacent vascular variations are well identified using 3D-FIESTA, especially by determining the relationship of the AICA variations between CNs.

  19. Capturing genomic signatures of DNA sequence variation using a standard anonymous microarray platform

    PubMed Central

    Cannon, C. H.; Kua, C. S.; Lobenhofer, E. K.; Hurban, P.

    2006-01-01

    Comparative genomics, using the model organism approach, has provided powerful insights into the structure and evolution of whole genomes. Unfortunately, only a small fraction of Earth's biodiversity will have its genome sequenced in the foreseeable future. Most wild organisms have radically different life histories and evolutionary genomics than current model systems. A novel technique is needed to expand comparative genomics to a wider range of organisms. Here, we describe a novel approach using an anonymous DNA microarray platform that gathers genomic samples of sequence variation from any organism. Oligonucleotide probe sequences placed on a custom 44 K array were 25 bp long and designed using a simple set of criteria to maximize their complexity and dispersion in sequence probability space. Using whole genomic samples from three known genomes (mouse, rat and human) and one unknown (Gonystylus bancanus), we demonstrate and validate its power, reliability, transitivity and sensitivity. Using two separate statistical analyses, a large numbers of genomic ‘indicator’ probes were discovered. The construction of a genomic signature database based upon this technique would allow virtual comparisons and simple queries could generate optimal subsets of markers to be used in large-scale assays, using simple downstream techniques. Biologists from a wide range of fields, studying almost any organism, could efficiently perform genomic comparisons, at potentially any phylogenetic level after performing a small number of standardized DNA microarray hybridizations. Possibilities for refining and expanding the approach are discussed. PMID:17000641

  20. Variation in HTLV-I sequences from rabbit cell lines with diverse in vivo effects.

    PubMed

    Zhao, T M; Robinson, M A; Sawasdikosol, S; Simpson, R M; Kindt, T J

    1993-07-01

    Comparison of nucleotide sequences determined for HTLV-I integrated provirus from two rabbit cell lines, RH/K30 and RH/K34, revealed greater than 99% identity to one another. Substitutions encoding amino acid interchanges were observed in the gag, pol, and rex regions whereas the env and tax proteins were identical in the two lines. Comparison with the human prototypic HTLV-I sequence revealed considerably more variation, especially in the viral envelope region where the rabbit sequences are identical. The HTLV-I lines differed in their potential to cause disease in rabbits: injection of the RH/K34 cell line caused human adult T-cell leukemia/lymphoma-like (ATLL) disease which was fatal within 10 days, whereas all rabbits injected with the same or higher doses of RH/K30 survived with a low-grade leukemia that showed evidence of acute rejection. Correlation of lethality with viral sequence was tested by injection of rabbits with two other rabbit cell lines with HTLV-I provirus identical to RH/K34 in LTR, gag, and env regions. The fact that only one of these lines produced fatal disease suggests that pathogenic determinants lie outside of these regions or, alternatively, that the structure of the integrated virus is not the sole factor in the cell lines' ability to cause ATLL-like disease.

  1. Management of High-Throughput DNA Sequencing Projects: Alpheus

    PubMed Central

    Miller, Neil A.; Kingsmore, Stephen F.; Farmer, Andrew; Langley, Raymond J.; Mudge, Joann; Crow, John A.; Gonzalez, Alvaro J.; Schilkey, Faye D.; Kim, Ryan J.; van Velkinburgh, Jennifer; May, Gregory D.; Black, C. Forrest; Myers, M. Kathy; Utsey, John P.; Frost, Nicholas S.; Sugarbaker, David J.; Bueno, Raphael; Gullans, Stephen R.; Baxter, Susan M.; Day, Steve W.; Retzel, Ernest F.

    2009-01-01

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem’s SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis. PMID:20151039

  2. Cloning and characterization of a highly repetitive fish nucleotide sequence.

    PubMed

    Datta, U; Dutta, P; Mandal, R K

    1988-01-01

    We have cloned and sequenced a highly repetitive HindIII fragment of DNA from the common carp Cyprinus carpio. It represents a tandemly repeated sequence with a monomeric unit of 245 bp and comprises 8% of the fish genome. Higher units of this monomer appear as a ladder in Southern blots. The monomeric unit has been sequenced; it is A + T-rich with some direct and some inverse-repeat nucleotide clusters.

  3. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

    PubMed Central

    Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

    2008-01-01

    Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465

  4. Highly conserved repetitive DNA sequences are present at human centromeres.

    PubMed Central

    Grady, D L; Ratliff, R L; Robinson, D L; McCanlies, E C; Meyne, J; Moyzis, R K

    1992-01-01

    Highly conserved repetitive DNA sequence clones, largely consisting of (GGAAT)n repeats, have been isolated from a human recombinant repetitive DNA library by high-stringency hybridization with rodent repetitive DNA. This sequence, the predominant repetitive sequence in human satellites II and III, is similar to the essential core DNA of the Saccharomyces cerevisiae centromere, centromere DNA element (CDE) III. In situ hybridization to human telophase and Drosophila polytene chromosomes shows localization of the (GGAAT)n sequence to centromeric regions. Hyperchromicity studies indicate that the (GGAAT)n sequence exhibits unusual hydrogen bonding properties. The purine-rich strand alone has the same thermal stability as the duplex. Hyperchromicity studies of synthetic DNA variants indicate that all sequences with the composition (AATGN)n exhibit this unusual thermal stability. DNA-mobility-shift assays indicate that specific HeLa-cell nuclear proteins recognize this sequence with a relative affinity greater than 10(5). The extreme evolutionary conservation of this DNA sequence, its centromeric location, its unusual hydrogen bonding properties, its high affinity for specific nuclear proteins, and its similarity to functional centromeres isolated from yeast suggest that this sequence may be a component of the functional human centromere. Images PMID:1542662

  5. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  6. The sequence of learning cycle activities in high school chemistry

    NASA Astrophysics Data System (ADS)

    Abraham, Michael R.; Renner, John W.

    The sequence of the three phases of two high school learning cycles in chemistry was altered in order to: (I ) give insights into the factors which account for the success of the learning cycle, (2) serve as an indirect test of the association between Piaget's theory and the learning cycle, and (3) to compare the learning cycle with traditional instruction. Each of the six sequences (one n o d and five altered) was studied with content and atritudc measures. The outcomes of the study supported the contention that the normal learning cycle sequence is the optimum sequence for achievement of content knowledge.

  7. The sequence variation and functional differentiation of CRDs in a scallop multiple CRDs containing lectin.

    PubMed

    Huang, Mengmeng; Wang, Lingling; Zhang, Huan; Yang, Chuanyan; Liu, Rui; Xu, Jiachao; Jia, Zhihao; Song, Linsheng

    2017-02-01

    A C-type lectin of multiple CRDs (CfLec-4) from Chlamys farreri was selected to investigate the sequence variation and functional differentiation of its CRDs. Its four CRDs with EPD/LSD, EPN/FAD, EPN/LND and EPN/YND key motifs were recombined separately. The recombinant proteins of CRD1 and CRD2 (designated as rCRD1 and rCRD2) could bind LPS and mannan, while the recombinant proteins of CRD3 and CRD4 (designated as rCRD3 and rCRD4) could bind LPS, PGN, mannan and glucan. Moreover, rCRD3 displayed broad microbe binding spectrum towards Gram-positive bacteria Staphylococcus aureus and Micrococcus luteus, Gram-negative bacteria Escherichia coli and Vibrio anguillarum, as well as fungi Pichia pastoris and Yarrowia lipolytica. These results indicated CRD3 contributed more to CfLec-4's nonself-recognition ability. Furthermore, CRD1, CRD3 and CRD4 functioned as opsonin participating in the clearance against invaders in scallops. The sequence variation in Ca(2+) binding site 2 among CRDs was suspected to be associated with such functional differentiation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Sequence Variation in TMEM18 in Association with Body Mass Index: The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Targeted Sequencing Study

    PubMed Central

    Liu, Ching-Ti; Young, Kristin L.; Brody, Jennifer; Olden, Matthias; Wojczynski, Mary K.; Heard-Costa, Nancy; Li, Guo; Morrison, Alanna C.; Muzny, Donna; Gibbs, Richard A.; Reid, Jeffrey G.; Shao, Yaming; Zhou, Yanhua; Boerwinkle, Eric; Heiss, Geraldo; Wagenknecht, Lynne; McKnight, Barbara; Borecki, Ingrid B.; Fox, Caroline S.; North, Kari E.; Cupples, L. Adrienne

    2014-01-01

    Background Genome-wide association studies (GWAS) for body mass index (BMI) previously identified a locus near TMEM18. We conducted targeted sequencing of this region to investigate the role of common, low frequency, and rare variation influencing BMI. Methods and Results We sequenced TMEM18 and regions downstream of TMEM18 on chromosome 2 in 3976 individuals of European ancestry from three community-based cohorts (Atherosclerosis Risk in Communities, Cardiovascular Health Study and Framingham Heart Study), including 200 adults selected for high BMI. We examined the association between BMI and variants identified in the region from nucleotide position 586,432 to 677,539 (hg18). Rare variants (MAF <1%) were analyzed using a burden test and the Sequence Kernel of Association Test (SKAT). Results from the three cohort studies were meta-analyzed. We estimate that mean BMI is 0.43 kg/m2 higher for each copy of the G allele of SNP rs7596758 (MAF=29%, p=3.46 × 10−4) using a Bonferroni threshold of p <4.6 × 10−4). Analyses conditional on previous GWAS SNPs associated with BMI in the region led to attenuation of this signal and uncovered another independent (r2<0.2), statistically significant association, rs186019316 (p=2.11 × 10−4). Both rs186019316 and rs7596758 or proxies are located in transcription factor binding regions. No significant association with rare variants was found in either the exons of TMEM18 or the 3’ GWAS region. Conclusions Targeted sequencing around TMEM18 identified two novel BMI variants with possible regulatory function. PMID:24951660

  9. Exon-intron structure and sequence variation of the calreticulin gene among Rhipicephalus sanguineus group ticks.

    PubMed

    Porretta, Daniele; Latrofa, Maria Stefania; Dantas-Torres, Filipe; Mastrantonio, Valentina; Iatta, Roberta; Otranto, Domenico; Urbanelli, Sandra

    2016-12-12

    Calreticulin proteins (CRTs) are important components of tick saliva, which is involved in the blood meal success, pathogen transmission and host allergic responses. The characterization of the genes encoding for salivary proteins, such as CRTs, is pivotal to understand the mechanisms of tick-host interaction during blood meal and to develop tick control strategies based on their inhibition. In hard ticks, crt genes were shown to have only one intron with conserved position among species. In this study we investigated the exon-intron structure and variation of the crt gene in Rhipicephalus spp. ticks in order to assess the crt exon-intron structure and the potential utility of crt gene as a molecular marker. We sequenced the exon-intron region of crt gene in ticks belonging to so-called tropical and temperate lineages of Rhipicephalus sanguineus (sensu lato), Rhipicephalus sp. I, Rhipicephalus sp. III, Rhipicephalus sp. IV, R. guilhoni, R. muhsamae and R. turanicus. Genetic divergence and phylogenetic relationships between the sequences obtained were estimated. All individuals belonging to the tropical lineage of R. sanguineus (s.l.), R. guilhoni, R. muhsamae, R. turanicus, Rhipicephalus sp. III and Rhipicephalus sp. IV analysed showed crt intron-present alleles. However, both crt intron-present and intron-absent alleles were found in Rhipicephalus sp. I and the temperate lineage of R. sanguineus (s.l.), showing the occurrence of an intraspecific intron presence-absence polymorphism. Phylogenetic relationships among the crt intron-present sequences showed distinct lineages for all taxa, with the tropical and temperate lineages of R. sanguineus (s.l.) being more closely related to each other. We expanded previous studies about the characterization of crt gene in hard ticks. Our results highlighted a previously overlooked variation in the crt structure among Rhipicephalus spp., and among hard ticks in general. Notably, the intron presence/absence polymorphism

  10. Neurofibromatosis type 1 gene mutation analysis using sequence capture and high-throughput sequencing.

    PubMed

    Uusitalo, Elina; Hammais, Anna; Palonen, Elina; Brandt, Annika; Mäkelä, Ville-Veikko; Kallionpää, Roope; Jouhilahti, Eeva-Mari; Pöyhönen, Minna; Soini, Juhani; Peltonen, Juha; Peltonen, Sirkku

    2014-11-01

    Neurofibromatosis type 1 syndrome (NF1) is caused by mutations in the NF1 gene. Availability of new sequencing technology prompted us to search for an alternative method for NF1 mutation analysis. Genomic DNA was isolated from saliva avoiding invasive sampling. The NF1 exons with an additional 50bp of flanking intronic sequences were captured and enriched using the SeqCap EZ Choice Library protocol. The captured DNA was sequenced with the Roche/454 GS Junior system. The mean coverages of the targeted regions were 41x and 74x in 2 separate sets of samples. An NF1 mutation was discovered in 10 out of 16 separate patient samples. Our study provides proof of principle that the sequence capture methodology combined with high-throughput sequencing is applicable to NF1 mutation analysis. Deep intronic mutations may however remain undetectable, and change at the DNA level may not predict the outcome at the mRNA or protein levels.

  11. Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing.

    PubMed

    Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre

    2017-04-01

    Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.

  12. Effects of sequence variation on differential allelic transcription factor occupancy and gene expression

    PubMed Central

    Reddy, Timothy E.; Gertz, Jason; Pauli, Florencia; Kucera, Katerina S.; Varley, Katherine E.; Newberry, Kimberly M.; Marinov, Georgi K.; Mortazavi, Ali; Williams, Brian A.; Song, Lingyun; Crawford, Gregory E.; Wold, Barbara; Willard, Huntington F.; Myers, Richard M.

    2012-01-01

    A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphoblastoid cell line GM12878. Overall, 5% of human TF binding sites have an allelic imbalance in occupancy. At many sites, TFs clustered in TF-binding hubs on the same homolog in especially open chromatin. While genetic variation in core TF binding motifs generally resulted in large allelic differences in TF occupancy, most allelic differences in occupancy were subtle and associated with disruption of weak or noncanonical motifs. We also measured genome-wide differential allelic expression of genes with and without heterozygous exonic variants in the same cells. We found that genes with differential allelic expression were overall less expressed both in GM12878 cells and in unrelated human cell lines. Comparing TF occupancy with expression, we found strong association between allelic occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs. Sites of differential allelic occupancy were significantly enriched for variants associated with disease, particularly autoimmune disease, suggesting that allelic differences in TF occupancy give functional insights into intergenic variants associated with disease. Our results have the potential to increase the power and interpretability of association studies by targeting functional intergenic variants in addition to protein coding sequences. PMID:22300769

  13. A molecular footprint of limb loss: sequence variation of the autopodial identity gene Hoxa-13.

    PubMed

    Kohlsdorf, Tiana; Cummings, Michael P; Lynch, Vincent J; Stopper, Geffrey F; Takahashi, Kazuhiko; Wagner, Günter P

    2008-12-01

    The homeobox gene Hoxa-13 codes for a transcription factor involved in multiple functions, including body axis and hand/foot development in tetrapods. In this study we investigate whether the loss of one function (e.g., limb loss in snakes) left a molecular footprint in exon 1 of Hoxa-13 that could be associated with the release of functional constraints caused by limb loss. Fragments of the Hoxa-13 exon 1 were sequenced from 13 species and analyzed, with additional published sequences of the same region, using relative rates and likelihood-ratio tests. Five amino acid sites in exon 1 of Hoxa-13 were detected as evolving under positive selection in the stem lineage of snakes. To further investigate whether there is an association between limb loss and sequence variation in Hoxa-13, we used the random forest method on an alignment that included shark, basal fish lineages, and "eu-tetrapods" such as mammals, turtle, alligator, and birds. The random forest method approaches the problem as one of classification, where we seek to predict the presence or absence of autopodium based on amino acid variation in Hoxa-13 sequences. Different alignments tested were associated with similar error rates (18.42%). The random forest method suggested that phenotypic states (autopodium present and absent) can often be correctly predicted based on Hoxa-13 sequences. Basal, nontetrapod gnat-hostomes that never had an autopodium were consistently classified as limbless together with the snakes, while eu-tetrapods without any history of limb loss in their phylogeny were also consistently classified as having a limb. Misclassifications affected mostly lizards, which, as a group, have a history of limb loss and limb re-evolution, and the urodele and caecilian in our sample. We conclude that a molecular footprint can be detected in Hoxa-13 that is associated with the lack of an autopodium; groups with classification ambiguity (lizards) are characterized by a history of repeated limb loss

  14. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  15. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association

    PubMed Central

    Strachan, Norval J. C.; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J.; Hanson, Mary F.; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H. A. M.; French, Nigel P.; George, Tessy; Biggs, Patrick J.; Forbes, Ken J.

    2015-01-01

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei’s genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei’s genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world. PMID:26442781

  16. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association.

    PubMed

    Strachan, Norval J C; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J; Hanson, Mary F; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H A M; French, Nigel P; George, Tessy; Biggs, Patrick J; Forbes, Ken J

    2015-10-07

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei's genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei's genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world.

  17. DNA sequence variation in the mitochondrial control region of red-backed voles (Clethrionomys).

    PubMed

    Matson, C W; Baker, R J

    2001-08-01

    The complete mitochondrial DNA (mtDNA) control region was sequenced for 71 individuals from five species of the rodent genus Clethrionomys both to understand patterns of variation and to explore the existence of previously described domains and other elements. Among species, the control region ranged from 942 to 971 bp in length. Our data were compatible with the proposal of three domains (extended terminal associated sequences [ETAS], central, conserved sequence blocks [CSB]) within the control region. The most conserved region in the control region was the central domain (12% of nucleotide positions variable), whereas in the ETAS and CSB domains, 22% and 40% of nucleotide positions were variable, respectively. Tandem repeats were encountered only in the ETAS domain of Clethrionomys rufocanus. This tandem repeat found in C. rufocanus was 24 bp in length and was located at the 5' end of the control region. Only two of the proposed CSB and ETAS elements appeared to be supported by our data; however, a "CSB1-like" element was also documented in the ETAS domain.

  18. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    PubMed Central

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-01-01

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. PMID:27172215

  19. DNA sequence variation in BpMADS2 gene in two populations of Betula pendula.

    PubMed

    Järvinen, Pia; Lemmetyinen, Juha; Savolainen, Outi; Sopanen, Tuomas

    2003-02-01

    The PISTILLATA (PI) homologue, BpMADS2, was isolated from silver birch (Betula pendula Roth) and used to study nucleotide polymorphism. Two regions (together about 2450 bp) comprising mainly untranslated sequences were sequenced from 10 individuals from each of two populations in Finland. The nucleotide polymorphism was low in the BpMADS2 locus, especially in the coding region. The synonymous site overall nucleotide diversity (pis) was 0.0043 and the nonsynonymous nucleotide diversity (pia) was only 0.000052. For the whole region, the pi values for the two populations were 0.0039 and 0.0045, and for the coding regions, the pi values were only 0 and 0.00066 (for the corresponding coding regions of Arabidopsis thaliana PI world-wide pi was 0.0021). Estimates of pi or theta did not differ significantly between the two populations, and the two populations were not diverged from each other. Two classes of BpMADS2 alleles were present in both populations, suggesting that this gene exhibits allelic dimorphism. In addition to the nucleotide site variation, two microsatellites were also associated within the haplotypes. This allelic dimorphism might be the result of postglacial re-colonization partly from northwestern, partly from southeastern/eastern refugia. The sequence comparison detected five recombination events in the regions studied. The large number of microsatellites in all of the three introns studied suggests that BpMADS2 is a hotspot for microsatellite formation.

  20. Identification of copy number variations associated with congenital heart disease by chromosomal microarray analysis and next-generation sequencing.

    PubMed

    Zhu, Xiangyu; Li, Jie; Ru, Tong; Wang, Yaping; Xu, Yan; Yang, Ying; Wu, Xing; Cram, David S; Hu, Yali

    2016-04-01

    To determine the type and frequency of pathogenic chromosomal abnormalities in fetuses diagnosed with congenital heart disease (CHD) using chromosomal microarray analysis (CMA) and validate next-generation sequencing as an alternative diagnostic method. Chromosomal aneuploidies and submicroscopic copy number variations (CNVs) were identified in amniocytes DNA samples from CHD fetuses using high-resolution CMA and copy number variation sequencing (CNV-Seq). Overall, 21 of 115 CHD fetuses (18.3%) referred for CMA had a pathogenic chromosomal anomaly. In six of 73 fetuses (8.2%) with an isolated CHD, CMA identified two cases of DiGeorge syndrome, and one case each of 1q21.1 microdeletion, 16p11.2 microdeletion and Angelman/Prader Willi syndromes, and 22q11.21 microduplication syndrome. In 12 of 42 fetuses (28.6%) with CHD and additional structural abnormalities, CMA identified eight whole or partial trisomies (19.0%), five CNVs (11.9%) associated with DiGeorge, Wolf-Hirschhorn, Miller-Dieker, Cri du Chat and Blepharophimosis, Ptosis, and Epicanthus Inversus syndromes and four other rare pathogenic CNVs (9.5%). Overall, there was a 100% diagnostic concordance between CMA and CNV-Seq for detecting all 21 pathogenic chromosomal abnormalities associated with CHD. CMA and CNV-Seq are reliable and accurate prenatal techniques for identifying pathogenic fetal chromosomal abnormalities associated with cardiac defects. © 2016 John Wiley & Sons, Ltd. © 2016 John Wiley & Sons, Ltd.

  1. DNA Sequence and Expression Variation of Hop (Humulus lupulus) Valerophenone Synthase (VPS), a Key Gene in Bitter Acid Biosynthesis

    PubMed Central

    Castro, Consuelo B.; Whittock, Lucy D.; Whittock, Simon P.; Leggett, Grey; Koutoulis, Anthony

    2008-01-01

    Background The hop plant (Humulus lupulus) is a source of many secondary metabolites, with bitter acids essential in the beer brewing industry and others having potential applications for human health. This study investigated variation in DNA sequence and gene expression of valerophenone synthase (VPS), a key gene in the bitter acid biosynthesis pathway of hop. Methods Sequence variation was studied in 12 varieties, and expression was analysed in four of the 12 varieties in a series across the development of the hop cone. Results Nine single nucleotide polymorphisms (SNPs) were detected in VPS, seven of which were synonymous. The two non-synonymous polymorphisms did not appear to be related to typical bitter acid profiles of the varieties studied. However, real-time quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of VPS expression during hop cone development showed a clear link with the bitter acid content. The highest levels of VPS expression were observed in two triploid varieties, ‘Symphony’ and ‘Ember’, which typically have high bitter acid levels. Conclusions In all hop varieties studied, VPS expression was lowest in the leaves and an increase in expression was consistently observed during the early stages of cone development. PMID:18519445

  2. Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

    PubMed Central

    2010-01-01

    Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441

  3. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R).

    PubMed

    Derelle, Romain; Kondrashov, Fyodor A; Arkhipov, Vladimir Y; Corbel, Hélène; Frantz, Adrien; Gasparini, Julien; Jacquin, Lisa; Jacob, Gwenaël; Thibault, Sophie; Baudry, Emmanuelle

    2013-08-05

    Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons.

  4. [Genetic variation of Manchurian pheasant (Phasianus colchicus pallasi Rotshild, 1903) inferred from mitochondrial DNA control region sequences].

    PubMed

    Kozyrenko, M M; Fisenko, P V; Zhuravlev, Iu N

    2009-04-01

    Sequence variation of the mitochondrial DNA control region was studied in Manchurian pheasants (Phasianus colchicus pallasi Rotshild, 1903) representing three geographic populations from the southern part of the Russian Far East. Extremely low population genetic differentiation (F(ST) = 0.0003) pointed to a very high gene exchange between the populations. Combination of such characters as high haplotype diversity (0.884 to 0.913), low nucleotide diversity (0.0016 to 0.0022), low R2 values (0.1235 to 0.1337), certain patterns of pairwise-difference distributions, and the absence of phylogenetic structure suggested that the phylogenetic history of Ph. C. pallasi included passing through a bottleneck with further expansion in the postglacial period. According to the data obtained, it was suggested that differentiation between the mitochondrial lineages started approximately 100 000 years ago.

  5. Contribution of rare and low-frequency whole-genome sequence variants to complex traits variation in dairy cattle.

    PubMed

    Zhang, Qianqian; Calus, Mario P L; Guldbrandtsen, Bernt; Lund, Mogens Sandø; Sahana, Goutam

    2017-08-01

    Whole-genome sequencing and imputation methodologies have enabled the study of the effects of genomic variants with low to very low minor allele frequency (MAF) on variation in complex traits. Our objective was to estimate the proportion of variance explained by imputed sequence variants classified according to their MAF compared with the variance explained by the pedigree-based additive genetic relationship matrix for 17 traits in Nordic Holstein dairy cattle. Imputed sequence variants were grouped into seven classes according to their MAF (0.001-0.01, 0.01-0.05, 0.05-0.1, 0.1-0.2, 0.2-0.3, 0.3-0.4 and 0.4-0.5). The total contribution of all imputed sequence variants to variance in deregressed estimated breeding values or proofs (DRP) for different traits ranged from 0.41 [standard error (SE) = 0.026] for temperament to 0.87 (SE = 0.011) for milk yield. The contribution of rare variants (MAF < 0.01) to the total DRP variance explained by all imputed sequence variants was relatively small (a maximum of 12.5% for the health index). Rare and low-frequency variants (MAF < 0.05) contributed a larger proportion of the explained DRP variances (>13%) for health-related traits than for production traits (<11%). However, a substantial proportion of these variance estimates across different MAF classes had large SE, especially when the variance explained by a MAF class was small. The proportion of DRP variance that was explained by all imputed whole-genome sequence variants improved slightly compared with variance explained by the 50 k Illumina markers, which are routinely used in bovine genomic prediction. However, the proportion of DRP variance explained by imputed sequence variants was lower than that explained by pedigree relationships, ranging from 1.5% for milk yield to 37.9% for the health index. Imputed sequence variants explained more of the variance in DRP than the 50 k markers for most traits, but explained less variance than that captured by pedigree

  6. Variation in sequence and organization of splicing regulatory elements in vertebrate genes

    PubMed Central

    Yeo, Gene; Hoon, Shawn; Venkatesh, Byrappa; Burge, Christopher B.

    2004-01-01

    Although core mechanisms and machinery of premRNA splicing are conserved from yeast to human, the details of intron recognition often differ, even between closely related organisms. For example, genes from the pufferfish Fugu rubripes generally contain one or more introns that are not properly spliced in mouse cells. Exploiting available genome sequence data, a battery of sequence analysis techniques was used to reach several conclusions about the organization and evolution of splicing regulatory elements in vertebrate genes. The classical splice site and putative branch site signals are completely conserved across the vertebrates studied (human, mouse, pufferfish, and zebrafish), and exonic splicing enhancers also appear broadly conserved in vertebrates. However, another class of splicing regulatory elements, the intronic splicing enhancers, appears to differ substantially between mammals and fish, with G triples (GGG) very abundant in mammalian introns but comparatively rare in fish. Conversely, short repeats of AC and GT are predicted to function as intronic splicing enhancers in fish but are not enriched in mammalian introns. Consistent with this pattern, exonic splicing enhancer-binding SR proteins are highly conserved across all vertebrates, whereas heterogeneous nuclear ribonucleoproteins, which bind many intronic sequences, vary in domain structure and even presence/absence between mammals and fish. Exploiting differences in intronic sequence composition, a statistical model was developed to predict the splicing phenotype of Fugu introns in mammalian systems and was used to engineer the spliceability of a Fugu intron in human cells by insertion of specific sequences, thereby rescuing splicing in human cells. PMID:15505203

  7. Polarimetric Variations of Binary Stars. V. Pre-Main-Sequence Spectroscopic Binaries Located in Ophiuchus and Scorpius

    NASA Astrophysics Data System (ADS)

    Manset, N.; Bastien, P.

    2003-06-01

    We present polarimetric observations of seven pre-main-sequence (PMS) spectroscopic binaries located in the ρ Ophiuchus and Upper Scorpius star-forming regions (SFRs). The average observed polarizations at 7660 Å are between 0.5% and 3.5%. After estimates of the interstellar polarization are removed, all binaries have an intrinsic polarization above 0.4%, even though most of them do not present other evidences for circumstellar dust. Two binaries, NTTS 162814-2427 and NTTS 162819-2423S, present high levels of intrinsic polarization between 1.5% and 2.1%, in agreement with the fact that other observations (photometry, spectroscopy) indicate the presence of circumstellar dust. Tests reveal that all seven PMS binaries have a statistically variable or possibly variable polarization. Combining these results with our previous sample of binaries located in the Taurus, Auriga, and Orion SFRs, 68% of the binaries have an intrinsic polarization above 0.5%, and 90% of the binaries are polarimetrically variable or possibly variable. NTTS 160814-1857, 162814-2427, and 162819-2423S are clearly polarimetrically variable. The first two also exhibit phase-locked variations over ~10 and ~40 orbits, respectively. Statistically, NTTS 160905-1859 is possibly variable, but it shows periodic variations not detected by the statistical tests; those variations are not phased locked and only present for short intervals of time. The amplitudes of the variations reach a few tenths of a percent, greater than for the previously studied PMS binaries located in the Taurus, Orion, and Auriga SFRs. The high-eccentricity system NTTS 162814-2427 shows single-periodic variations, in agreement with our previous numerical simulations. We compare the observations with some of our numerical simulations and also show that an analysis of the periodic polarimetric variations with the Brown, McLean, & Emslie (BME) formalism to find the orbital inclination is for the moment premature: nonperiodic events

  8. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants.

    PubMed

    Gundry, Michael; Vijg, Jan

    2012-01-03

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a brief

  9. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  10. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  11. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer.

    PubMed

    Timofeeva, Maria N; Kinnersley, Ben; Farrington, Susan M; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J; Harris, Sarah E; Northwood, Emma L; Barrett, Jennifer H; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G; Houlston, Richard S

    2015-11-10

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10(-7)), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10(-7)); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10(-7) and OR = 1.09, P = 7.4 × 10(-8)); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10(-9)), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10(-6)). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10(-4)) and DNA mismatch repair genes (P = 6.1 × 10(-4)) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC.

  12. Atomic force microscopy of crystalline insulins: the influence of sequence variation on crystallization and interfacial structure.

    PubMed Central

    Yip, C M; Brader, M L; DeFelippis, M R; Ward, M D

    1998-01-01

    The self-association of proteins is influenced by amino acid sequence, molecular conformation, and the presence of molecular additives. In the presence of phenolic additives, LysB28ProB29 insulin, in which the C-terminal prolyl and lysyl residues of wild-type human insulin have been inverted, can be crystallized into forms resembling those of wild-type insulins in which the protein exists as zinc-complexed hexamers organized into well-defined layers. We describe herein tapping-mode atomic force microscopy (TMAFM) studies of single crystals of rhombohedral (R3) LysB28ProB29 that reveal the influence of sequence variation on hexamer-hexamer association at the surface of actively growing crystals. Molecular scale lattice images of these crystals were acquired in situ under growth conditions, enabling simultaneous identification of the rhombohedral LysB28ProB29 crystal form, its orientation, and its dynamic growth characteristics. The ability to obtain crystallographic parameters on multiple crystal faces with TMAFM confirmed that bovine and porcine insulins grown under these conditions crystallized into the same space group as LysB28ProB29 (R3), enabling direct comparison of crystal growth behavior and the influence of sequence variation. Real-time TMAFM revealed hexamer vacancies on the (001) terraces of LysB28ProB29, and more rounded dislocation noses and larger terrace widths for actively growing screw dislocations compared to wild-type bovine and porcine insulin crystals under identical conditions. This behavior is consistent with weaker interhexamer attachment energies for LysB28ProB29 at active growth sites. Comparison of the single crystal x-ray structures of wild-type insulins and LysB28ProB29 suggests that differences in protein conformation at the hexamer-hexamer interface and accompanying changes in interhexamer bonding are responsible for this behavior. These studies demonstrate that subtle changes in molecular conformation due to a single sequence

  13. The Landscape of Extreme Genomic Variation in the Highly Adaptable Atlantic Killifish

    PubMed Central

    Reid, Noah M.; Jackson, Craig E.; Gilbert, Don; Minx, Patrick; Montague, Michael J.; Hampton, Thomas H.; Helfrich, Lily W.; King, Benjamin L.; Nacci, Diane E.; Aluru, Neel; Karchner, Sibel I.; Colbourne, John K.; Hahn, Mark E.; Shaw, Joseph R.; Oleksiak, Marjorie F.; Crawford, Douglas L.; Warren, Wesley C.

    2017-01-01

    Understanding and predicting the fate of populations in changing environments require knowledge about the mechanisms that support phenotypic plasticity and the adaptive value and evolutionary fate of genetic variation within populations. Atlantic killifish (Fundulus heteroclitus) exhibit extensive phenotypic plasticity that supports large population sizes in highly fluctuating estuarine environments. Populations have also evolved diverse local adaptations. To yield insights into the genomic variation that supports their adaptability, we sequenced a reference genome and 48 additional whole genomes from a wild population. Evolution of genes associated with cell cycle regulation and apoptosis is accelerated along the killifish lineage, which is likely tied to adaptations for life in highly variable estuarine environments. Genome-wide standing genetic variation, including nucleotide diversity and copy number variation, is extremely high. The highest diversity genes are those associated with immune function and olfaction, whereas genes under greatest evolutionary constraint are those associated with neurological, developmental, and cytoskeletal functions. Reduced genetic variation is detected for tight junction proteins, which in killifish regulate paracellular permeability that supports their extreme physiological flexibility. Low-diversity genes engage in more regulatory interactions than high-diversity genes, consistent with the influence of pleiotropic constraint on molecular evolution. High genetic variation is crucial for continued persistence of species given the pace of contemporary environmental change. Killifish populations harbor among the highest levels of nucleotide diversity yet reported for a vertebrate species, and thus may serve as a useful model system for studying evolutionary potential in variable and changing environments. PMID:28201664

  14. Variations in a hotspot region of chloroplast DNAs among common wheat and Aegilops revealed by nucleotide sequence analysis.

    PubMed

    Guo, Chang-Hong; Terachi, Toru

    2005-08-01

    The second largest BamHI fragment (B2) of the chloroplast DNA in Triticum (wheat) and Aegilops contains a highly variable region (a hotspot), resulting in four types of B2 of different size, i.e. B2l (10.5kb), B2m (10.2kb), B2 (9.6kb) and B2s (9.4kb). In order to gain a better understanding of the molecular nature of the variations in length and explain unexpected identity among B2 of Ae. ovata, Ae. speltoides and common wheat (T. aestivum), the nucleotide sequence between a stop codon of rbcL and a HindIII site in cemA in the hotspot was determined for Ae. ovata, Ae. speltoides, Ae. caudata and Ae. mutica. The total number of nucleotides in the region was 2808, 2810, 3302, and 3594 bp, for Ae. speltoides, Ae. ovata, Ae. caudata and Ae. mutica, respectively, and the sequences were compared with the corresponding ones of Ae. crassa 4x, T. aestivum and Ae. squarrosa. Compared with the largest B2l fragment of Ae. mutica, a 791bp and a 793 bp deletion were found in Ae. speltoides and Ae. ovata, respectively, and the possible site of deletion in the two species is the same as that of T. aestivum. However, a deleted segment in Ae. ovata is 2 bp longer than that of Ae. speltoides (and T. aestivum), demonstrating that recurrent deletions had occurred in the chloroplast genomes of both species. Comparison of the sequences from Ae. caudata and Ae. crassa 4x with that of Ae. mutica revealed a 289 bp and a 61 bp deletion at the same site in Ae. caudata and Ae. crassa 4x, respectively. Sequence comparison using wild Aegilops plants showed that the large length variations in a hotspot are fixed to each species. A considerable number of polymorphisms are observed in a loop in the 3' of rbcL. The study reveals the relative importance of the large and small indels and minute inversions to account for variations in the chloroplast genomes among closely related species.

  15. Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by artifactual and biological sequence variation.

    PubMed

    Mark, Kristiina; Cornejo, Carolina; Keller, Christine; Flück, Daniela; Scheidegger, Christoph

    2016-09-01

    Although lichens (lichen-forming fungi) play an important role in the ecological integrity of many vulnerable landscapes, only a minority of lichen-forming fungi have been barcoded out of the currently accepted ∼18 000 species. Regular Sanger sequencing can be problematic when analyzing lichens since saprophytic, endophytic, and parasitic fungi live intimately admixed, resulting in low-quality sequencing reads. Here, high-throughput, long-read 454 pyrosequencing in a GS FLX+ System was tested to barcode the fungal partner of 100 epiphytic lichen species from Switzerland using fungal-specific primers when amplifying the full internal transcribed spacer region (ITS). The present study shows the potential of DNA barcoding using pyrosequencing, in that the expected lichen fungus was successfully sequenced for all samples except one. Alignment solutions such as BLAST were found to be largely adequate for the generated long reads. In addition, the NCBI nucleotide database-currently the most complete database for lichen-forming fungi-can be used as a reference database when identifying common species, since the majority of analyzed lichens were identified correctly to the species or at least to the genus level. However, several issues were encountered, including a high sequencing error rate, multiple ITS versions in a genome (incomplete concerted evolution), and in some samples the presence of mixed lichen-forming fungi (possible lichen chimeras).

  16. High-Throughput Next-Generation Sequencing of Polioviruses.

    PubMed

    Montmayeur, Anna M; Ng, Terry Fei Fan; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A; Oberste, M Steven; Burns, Cara C

    2017-02-01

    The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance.

  17. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  18. Library preparation for highly accurate population sequencing of RNA viruses

    PubMed Central

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  19. High sequence conservation among cucumber mosaic virus isolates from lily.

    PubMed

    Chen, Y K; Derks, A F; Langeveld, S; Goldbach, R; Prins, M

    2001-08-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV isolates of Alstroemeria and crocus were classified as subgroup II isolates, whereas 8 other isolates, from lily, gladiolus, amaranthus, larkspur, and lisianthus, were identified as subgroup I members. In general, nucleotide sequence comparisons correlated well with geographic distribution, with one notable exception: the analyzed nucleotide sequences of 5 lily isolates showed remarkably high homology despite different origins.

  20. Climate shaped the worldwide distribution of human mitochondrial DNA sequence variation.

    PubMed

    Balloux, François; Handley, Lori-Jayne Lawson; Jombart, Thibaut; Liu, Hua; Manica, Andrea

    2009-10-07

    There is an ongoing discussion in the literature on whether human mitochondrial DNA (mtDNA) evolves neutrally. There have been previous claims for natural selection on human mtDNA based on an excess of non-synonymous mutations and higher evolutionary persistence of specific mitochondrial mutations in Arctic populations. However, these findings were not supported by the reanalysis of larger datasets. Using a geographical framework, we perform the first direct test of the relative extent to which climate and past demography have shaped the current spatial distribution of mtDNA sequences worldwide. We show that populations living in colder environments have lower mitochondrial diversity and that the genetic differentiation between pairs of populations correlates with difference in temperature. These associations were unique to mtDNA; we could not find a similar pattern in any other genetic marker. We were able to identify two correlated non-synonymous point mutations in the ND3 and ATP6 genes characterized by a clear association with temperature, which appear to be plausible targets of natural selection producing the association with climate. The same mutations have been previously shown to be associated with variation in mitochondrial pH and calcium dynamics. Our results indicate that natural selection mediated by climate has contributed to shape the current distribution of mtDNA sequences in humans.

  1. Copy number variations in Hanwoo and Yanbian cattle genomes using the massively parallel sequencing data.

    PubMed

    Choi, Jung-Woo; Chung, Won-Hyong; Lim, Kyu-Sang; Lim, Won-Jun; Choi, Bong-Hwan; Lee, Seung-Hwan; Kim, Hyeong-Cheol; Lee, Seung-Soo; Cho, Eun-Seok; Lee, Kyung-Tai; Kim, Namshin; Kim, Jeong-Dae; Kim, Jong-Bok; Chai, Han-Ha; Cho, Yong-Min; Kim, Tae-Hun; Lim, Dajeong

    2016-09-01

    Hanwoo is an indigenous Korean beef cattle breed, and it shared an ancestor with Yanbian cattle that are found in the Northeast provinces in China until the last century. During recent decades, those cattle breeds experienced different selection pressures. Here, we present genome-wide copy number variations (CNVs) by comparing Hanwoo and Yanbian cattle sequencing data. We used ~3.12 and ~3.07 billion sequence reads from Hanwoo and Yanbian cattle, respectively. A total of 901 putative CNV regions (CNVRs) were identified throughout the genome, representing 5,513,340bp. This is a smaller number than has been reported in previous studies, indicating that Hanwoo are genetically close to Yanbian cattle. Of the CNVRs, 53.2% and 46.8% were found to be gains and losses in Hanwoo. Potential functional roles of each CNVR were assessed by annotating all CNVRs and gene ontology (GO) enrichment analysis. We found that 278 CNVRs overlapped with cattle gene-sets (genic-CNVRs) that could be promising candidates to account for economically important traits in cattle. The enrichment analysis indicated that genes were significantly over-represented in GO terms, including developmental process, multicellular organismal process, reproduction, and response to stimulus. These results provide a valuable genomic resource for determining how CNVs are associated with cattle traits. Copyright © 2016. Published by Elsevier B.V.

  2. Patchwork sequencing of tomato San Marzano and Vesuviano varieties highlights genome-wide variations

    PubMed Central

    2014-01-01

    Background Investigation of tomato genetic resources is a crucial issue for better straight evolution and genetic studies as well as tomato breeding strategies. Traditional Vesuviano and San Marzano varieties grown in Campania region (Southern Italy) are famous for their remarkable fruit quality. Owing to their economic and social importance is crucial to understand the genetic basis of their unique traits. Results Here, we present the draft genome sequences of tomato Vesuviano and San Marzano genome. A 40x genome coverage was obtained from a hybrid Illumina paired-end reads assembling that combines de novo assembly with iterative mapping to the reference S. lycopersicum genome (SL2.40). Insertions, deletions and SNP variants were carefully measured. When assessed on the basis of the reference annotation, 30% of protein-coding genes are predicted to have variants in both varieties. Copy genes number and gene location were assessed by mRNA transcripts mapping, showing a closer relationship of San Marzano with reference genome. Distinctive variations in key genes and transcription/regulation factors related to fruit quality have been revealed for both cultivars. Conclusions The effort performed highlighted varieties relationships and important variants in fruit key processes useful to dissect the path from sequence variant to phenotype. PMID:24548308

  3. Sequence variations in the osteoprotegerin gene promoter in patients with postmenopausal osteoporosis.

    PubMed

    Arko, B; Prezelj, J; Komel, R; Kocijancic, A; Hudler, P; Marc, J

    2002-09-01

    Osteoprotegerin (OPG) is a recently discovered member of the TNF receptor superfamily that acts as an important paracrine regulator of bone remodeling. OPG knockout mice develop severe osteoporosis, whereas administration of OPG can prevent ovariectomy-induced bone loss. These findings implicate a role for OPG in the development of osteoporosis. In the present study, we screened the OPG gene promoter for sequence variations and examined their association with bone mineral density (BMD) in 103 osteoporotic postmenopausal women. Single-strand conformation polymorphism analysis followed by DNA sequencing revealed a presence of four nucleotide substitutions: 209 G-->A, 245 T-->G, 889 C-->T, and 950 T-->C. The frequencies of genotypes were as follows: GG (89.3%), GA (10.7%) for 209 G-->A polymorphism; TT (89.3%), TG (10.7%) for 245 T-->G polymorphism; and TT (25.2%), TC (53.4%), CC (21.4%) for 950 T-->C polymorphism. Substitution 889 C-->T was found in only two patients. Statistically significant association of genotypes with BMD at the lumbar spine (P = 0.005) was observed for 209 G-->A and 245 T-->G polymorphisms. Haplotype GATG was associated with lower BMD as compared with GGTT haplotype. Our results suggest that 209 G-->A and 245 T-->G polymorphisms in the OPG gene promoter may contribute to the genetic regulation of BMD.

  4. Mitochondrial DNA sequence variation among populations and host races of Lambdina fiscellaria (Gn.) (Lepidoptera: Geometridae).

    PubMed

    Sperling, F A; Raske, A G; Otvos, I S

    1999-02-01

    The hemlock looper, Lambdina fiscellaria (Gn.), is a recurring major forest pest that is widely distributed in North America. Three subspecies (L. f. fiscellaria, L. f. lugubrosa (Hulst) and L. f. somniaria (Hulst)) have been recognized based on larval host or adult pheromone differences, but no consistent morphological differences have been reported. To clarify their taxonomic status, we surveyed mitochondrial DNA (mtDNA) sequence and restriction site variation in two protein coding genes, cytochrome oxidase I and II (COI and COII), in populations across the range of L. fiscellaria. In addition to variation in COI and COII, we found an intergenic spacer region of 20-23 bp located between the tRNA tyrosine gene and the start of COI. Of the 141 specimens of L. fiscellaria assayed, 137 were grouped into two distinct mtDNA lineages, one of which was disproportionately associated with eastern populations and one with western populations. However, single specimens and two populations in eastern Canada had mtDNA resembling that of western populations. Three divergent and rare haplotypes had basal affinities to the two common lineages. The two major lineages of L. fiscellaria were diverged by approximately 2% from each other, as well as from the mtDNA of two outgroup species, L. athasaria (Walker) and L. pellucidaria(G. & R.). The two outgroup species had essentially the same mtDNA and may be conspecific. We interpret the pattern of mtDNA variation within L. fiscellaria as indicating genetic polymorphism within a single species without clear subspecific divisions, rather than evidence of multiple cryptic species.

  5. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    PubMed

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.

  6. Effect of laying sequence on egg mercury in captive zebra finches: an interpretation considering individual variation.

    PubMed

    Ou, Langbo; Varian-Ramos, Claire W; Cristol, Daniel A

    2015-08-01

    Bird eggs are used widely as noninvasive bioindicators for environmental mercury availability. Previous studies, however, have found varying relationships between laying sequence and egg mercury concentrations. Some studies have reported that the mercury concentration was higher in first-laid eggs or declined across the laying sequence, whereas in other studies mercury concentration was not related to egg order. Approximately 300 eggs (61 clutches) were collected from captive zebra finches dosed throughout their reproductive lives with methylmercury (0.3 μg/g, 0.6 μg/g, 1.2 μg/g, or 2.4 μg/g wet wt in diet); the total mercury concentration (mean ± standard deviation [SD] dry wt basis) of their eggs was 7.03 ± 1.38 μg/g, 14.15 ± 2.52 μg/g, 26.85 ± 5.85 μg/g, and 49.76 ± 10.37 μg/g, respectively (equivalent to fresh wt egg mercury concentrations of 1.24 μg/g, 2.50 μg/g, 4.74 μg/g, and 8.79 μg/g). The authors observed a significant decrease in the mercury concentration of successive eggs when compared with the first egg and notable variation between clutches within treatments. The mercury level of individual females within and among treatments did not alter this relationship. Based on the results, sampling of a single egg in each clutch from any position in the laying sequence is sufficient for purposes of population risk assessment, but it is not recommended as a proxy for individual female exposure or as an estimate of average mercury level within the clutch.

  7. Variation among Bm86 sequences in Rhipicephalus (Boophilus) microplus ticks collected from cattle across Thailand.

    PubMed

    Kaewmongkol, S; Kaewmongkol, G; Inthong, N; Lakkitjaroen, N; Sirinarumitr, T; Berry, C M; Jonsson, N N; Stich, R W; Jittapalapong, S

    2015-06-01

    Anti-tick vaccines based on recombinant homologues Bm86 and Bm95 have become a more cost-effective and sustainable alternative to chemical pesticides commonly used to control the cattle tick, Rhipicephalus (Boophilus) microplus. However, Bm86 polymorphism among geographically separate ticks is reportedly associated with reduced effectiveness of these vaccines. The purpose of this study was to investigate the variation of Bm86 among cattle ticks collected from Northern, Northeastern, Central and Southern areas across Thailand. Bm86 cDNA and deduced amino acid sequences representing 29 female tick midgut samples were 95.6-97.0 and 91.5-93.5 % identical to the nucleotide and amino acid reference sequences, respectively, of the Australian Yeerongpilly vaccine strain. Multiple sequence analyses of these Bm86 variants indicated geographical relationships and polymorphism among Thai cattle ticks. Two larger groups of cattle tick strains were discernable based on this phylogenetic analysis of Bm86, a Thai group and a Latin American group. Thai female and male cattle ticks (50 pairs) were also subjected to detailed morphological characterization to confirm their identity. The majority of female ticks had morphological features consistent with those described for R. (B.) microplus, whereas, curiously, the majority of male ticks were more consistent with the recently re-instated R. (B.) australis. A number of these ticks had features consistent with both species. Further investigations are warranted to test the efficacies of rBm86-based vaccines to homologous and heterologous challenge infestations with Thai tick strains and for in-depth study of the phylogeny of Thai cattle ticks.

  8. Detection and implication of significant temporal b-value variation during earthquake sequences

    NASA Astrophysics Data System (ADS)

    Gulia, Laura; Tormann, Thessa; Schorlemmer, Danijel; Wiemer, Stefan

    2016-04-01

    Earthquakes tend to cluster in space and time and periods of increased seismic activity are also periods of increased seismic hazard. Forecasting models currently used in statistical seismology and in Operational Earthquake Forecasting (e.g. ETAS) consider the spatial and temporal changes in the activity rates whilst the spatio-temporal changes in the earthquake size distribution, the b-value, are not included. Laboratory experiments on rock samples show an increasing relative proportion of larger events as the system approaches failure, and a sudden reversal of this trend after the main event. The increasing fraction of larger events during the stress increase period can be mathematically represented by a systematic b-value decrease, while the b-value increases immediately following the stress release. We investigate whether these lab-scale observations also apply to natural earthquake sequences and can help to improve our understanding of the physical processes generating damaging earthquakes. A number of large events nucleated in low b-value regions and spatial b-value variations have been extensively documented in the past. Detecting temporal b-value evolution with confidence is more difficult, one reason being the very different scales that have been suggested for a precursory drop in b-value, from a few days to decadal scale gradients. We demonstrate with the results of detailed case studies of the 2009 M6.3 L'Aquila and 2011 M9 Tohoku earthquakes that significant and meaningful temporal b-value variability can be detected throughout the sequences, which e.g. suggests that foreshock probabilities are not generic but subject to significant spatio-temporal variability. Such potential conclusions require and motivate the systematic study of many sequences to investigate whether general patterns exist that might eventually be useful for time-dependent or even real-time seismic hazard assessment.

  9. A novel multi-alignment pipeline for high-throughput sequencing data.

    PubMed

    Huang, Shunping; Holt, James; Kao, Chia-Yu; McMillan, Leonard; Wang, Wei

    2014-01-01

    Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo.

  10. Sequence variation within the neuropeptide Y gene and obesity in Mexican Americans.

    PubMed

    Bray, M S; Boerwinkle, E; Hanis, C L

    2000-05-01

    Recently, we reported evidence for linkage between neuropeptide Y (NPY) and both obesity and several obesity-related quantitative measures in a sample of Mexican Americans from Starr County, Texas. The purpose of this study was to investigate putative variation within the coding and promoter regions of NPY. Five young, obese individuals (body mass index [BMI] 33 to 45 kg/m2, age 14 to 30 years); five adult, lean individuals (BMI 20 to 26 kg/m2, age 39 to 65 years); and five sibling pairs sharing no alleles that were identical by descent at a marker locus proximal to NPY were selected for fluorescence-based sequencing of approximately 1100 base pairs (bp) immediately 5' from the start site and all four exons of NPY. We identified a total of eight variant sites, including a 2-bp insertion/deletion (I/D) within a putative negative regulatory region (-880I/D) and a 17-bp deletion at the exon 1/intron 1 junction (69I/D). The -880I/D and 69I/D variants were typed in a separate random sample of Mexican Americans (N = 914) from Starr County, Texas. Analyses of variance resulted in a significant association between -880I/D and waist-to-hip ratio (p = 0.041) in the entire sample and between -880I/D and BMI (p = 0.031), abdominal circumference (p = 0.044), and waist-to-hip ratio (p = 0.041) in a non-obese subsample (BMI < 30 kg/m2, n = 594). The 69I/D variant was observed in only one pedigree and does not appear to segregate with obesity within this pedigree. This study reports newly identified common human sequence variation within the regulatory and coding sequence of NPY. Several variants were observed, and of those tested, the -880I/D promoter region variant may influence body fat patterning in non-obese individuals but does not appear to play a major role in the etiology of common forms of obesity in this population.

  11. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing.

    PubMed

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-08-19

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.

  12. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    PubMed Central

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M. G.; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-01-01

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material. PMID:26285800

  13. Interhomologue sequence variation of alpha satellite DNA from human chromosome 17: evidence for concerted evolution along haplotypic lineages.

    PubMed

    Warburton, P E; Willard, H F

    1995-12-01

    Alpha satellite DNA is a family of tandemly repeated DNA found at the centromeres of all primate chromosomes. Different human chromosomes 17 in the population are characterized by distinct alpha satellite haplotypes, distinguished by the presence of variant repeat forms that have precise monomeric deletions. Pair-wise comparisons of sequence diversity between variant repeat units from each haplotype show that they are closely related in sequence. Direct sequencing of PCR-amplified alpha satellite reveals heterogeneous positions between the repeat units on a chromosome as two bands at the same position on a sequencing ladder. No variation was detected in the sequence and location of these heterogeneous positions between chromosomes 17 from the same haplotype, but distinct patterns of variation were detected between chromosomes from different haplotypes. Subsequent sequence analysis of individual repeats from each haplotype confirmed the presence of extensive haplotype-specific sequence variation. Phylogenetic inference yielded a tree that suggests these chromosome 17 repeat units evolve principally along haplotypic lineages. These studies allow insight into the relative rates and/or timing of genetic turnover processes that lead to the homogenization of tandem DNA families.

  14. Sequence variation within botulinum neurotoxin serotypes impacts antibody binding and neutralization.

    PubMed

    Smith, T J; Lou, J; Geren, I N; Forsyth, C M; Tsai, R; Laporte, S L; Tepp, W H; Bradshaw, M; Johnson, E A; Smith, L A; Marks, J D

    2005-09-01

    The botulinum neurotoxins (BoNTs) are category A biothreat agents which have been the focus of intensive efforts to develop vaccines and antibody-based prophylaxis and treatment. Such approaches must take into account the extensive BoNT sequence variability; the seven BoNT serotypes differ by up to 70% at the amino acid level. Here, we have analyzed 49 complete published sequences of BoNTs and show that all toxins also exhibit variability within serotypes ranging between 2.6 and 31.6%. To determine the impact of such sequence differences on immune recognition, we studied the binding and neutralization capacity of six BoNT serotype A (BoNT/A) monoclonal antibodies (MAbs) to BoNT/A1 and BoNT/A2, which differ by 10% at the amino acid level. While all six MAbs bound BoNT/A1 with high affinity, three of the six MAbs showed a marked reduction in binding affinity of 500- to more than 1,000-fold to BoNT/A2 toxin. Binding results predicted in vivo toxin neutralization; MAbs or MAb combinations that potently neutralized A1 toxin but did not bind A2 toxin had minimal neutralizing capacity for A2 toxin. This was most striking for a combination of three binding domain MAbs which together neutralized >40,000 mouse 50% lethal doses (LD(50)s) of A1 toxin but less than 500 LD(50)s of A2 toxin. Combining three MAbs which bound both A1 and A2 toxins potently neutralized both toxins. We conclude that sequence variability exists within all toxin serotypes, and this impacts monoclonal antibody binding and neutralization. Such subtype sequence variability must be accounted for when generating and evaluating diagnostic and therapeutic antibodies.

  15. Sequence Variation within Botulinum Neurotoxin Serotypes Impacts Antibody Binding and Neutralization

    PubMed Central

    Smith, T. J.; Lou, J.; Geren, I. N.; Forsyth, C. M.; Tsai, R.; LaPorte, S. L.; Tepp, W. H.; Bradshaw, M.; Johnson, E. A.; Smith, L. A.; Marks, J. D.

    2005-01-01

    The botulinum neurotoxins (BoNTs) are category A biothreat agents which have been the focus of intensive efforts to develop vaccines and antibody-based prophylaxis and treatment. Such approaches must take into account the extensive BoNT sequence variability; the seven BoNT serotypes differ by up to 70% at the amino acid level. Here, we have analyzed 49 complete published sequences of BoNTs and show that all toxins also exhibit variability within serotypes ranging between 2.6 and 31.6%. To determine the impact of such sequence differences on immune recognition, we studied the binding and neutralization capacity of six BoNT serotype A (BoNT/A) monoclonal antibodies (MAbs) to BoNT/A1 and BoNT/A2, which differ by 10% at the amino acid level. While all six MAbs bound BoNT/A1 with high affinity, three of the six MAbs showed a marked reduction in binding affinity of 500- to more than 1,000-fold to BoNT/A2 toxin. Binding results predicted in vivo toxin neutralization; MAbs or MAb combinations that potently neutralized A1 toxin but did not bind A2 toxin had minimal neutralizing capacity for A2 toxin. This was most striking for a combination of three binding domain MAbs which together neutralized >40,000 mouse 50% lethal doses (LD50s) of A1 toxin but less than 500 LD50s of A2 toxin. Combining three MAbs which bound both A1 and A2 toxins potently neutralized both toxins. We conclude that sequence variability exists within all toxin serotypes, and this impacts monoclonal antibody binding and neutralization. Such subtype sequence variability must be accounted for when generating and evaluating diagnostic and therapeutic antibodies. PMID:16113261

  16. High-throughput sequencing in veterinary infection biology and diagnostics.

    PubMed

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.

  17. Dissection of genomic features and variations of three pathotypes of Puccinia striiformis through whole genome sequencing

    PubMed Central

    Kiran, Kanti; Rawal, Hukam C.; Dubey, Himanshu; Jaswal, R.; Bhardwaj, Subhash C.; Prasad, P.; Pal, Dharam; Devanna, B. N.; Sharma, Tilak R.

    2017-01-01

    Stripe rust of wheat, caused by Puccinia striiformis f. sp. tritici, is one of the important diseases of wheat. We used NGS technologies to generate a draft genome sequence of two highly virulent (46S 119 and 31) and a least virulent (K) pathotypes of P. striiformis from the Indian subcontinent. We generated ~24,000–32,000 sequence contigs (N50;7.4–9.2 kb), which accounted for ~86X–105X sequence depth coverage with an estimated genome size of these pathotypes ranging from 66.2–70.2 Mb. A genome-wide analysis revealed that pathotype 46S 119 might be highly evolved among the three pathotypes in terms of year of detection and prevalence. SNP analysis revealed that ~47% of the gene sets are affected by nonsynonymous mutations. The extracellular secreted (ES) proteins presumably are well conserved among the three pathotypes, and perhaps purifying selection has an important role in differentiating pathotype 46S 119 from pathotypes K and 31. In the present study, we decoded the genomes of three pathotypes, with 81% of the total annotated genes being successfully assigned functional roles. Besides the identification of secretory genes, genes essential for pathogen-host interactions shall prove this study as a huge genomic resource for the management of this disease using host resistance. PMID:28211474

  18. Dissection of genomic features and variations of three pathotypes of Puccinia striiformis through whole genome sequencing.

    PubMed

    Kiran, Kanti; Rawal, Hukam C; Dubey, Himanshu; Jaswal, R; Bhardwaj, Subhash C; Prasad, P; Pal, Dharam; Devanna, B N; Sharma, Tilak R

    2017-02-17

    Stripe rust of wheat, caused by Puccinia striiformis f. sp. tritici, is one of the important diseases of wheat. We used NGS technologies to generate a draft genome sequence of two highly virulent (46S 119 and 31) and a least virulent (K) pathotypes of P. striiformis from the Indian subcontinent. We generated ~24,000-32,000 sequence contigs (N50;7.4-9.2 kb), which accounted for ~86X-105X sequence depth coverage with an estimated genome size of these pathotypes ranging from 66.2-70.2 Mb. A genome-wide analysis revealed that pathotype 46S 119 might be highly evolved among the three pathotypes in terms of year of detection and prevalence. SNP analysis revealed that ~47% of the gene sets are affected by nonsynonymous mutations. The extracellular secreted (ES) proteins presumably are well conserved among the three pathotypes, and perhaps purifying selection has an important role in differentiating pathotype 46S 119 from pathotypes K and 31. In the present study, we decoded the genomes of three pathotypes, with 81% of the total annotated genes being successfully assigned functional roles. Besides the identification of secretory genes, genes essential for pathogen-host interactions shall prove this study as a huge genomic resource for the management of this disease using host resistance.

  19. PCR/SSCP detects reliably and efficiently DNA sequence variations in large scale screening projects.

    PubMed

    Miterski, B; Krüger, R; Wintermeyer, P; Epplen, J T

    2000-06-01

    A simple and fast method with high reliability is necessary for the identification of mutations, polymorphisms and sequence variants (MPSV) within many genes and many samples, e.g. for clarifying the genetic background of individuals with multifactorial diseases. Here we review our experience with the polymerase chain reaction/single-strand conformation polymorphism (PCR/SSCP) analysis to identify MPSV in a number of genes thought to be involved in the pathogenesis of multifactorial neurological disorders, including autoimmune diseases like multiple sclerosis (MS) and neurodegenerative disorders like Parkinson s disease (PD). The method is based on the property of the DNA that the electrophoretic mobility of single stranded nucleic acids depends not only on their size but also on their sequence. The target sequences were amplified, digested into fragments ranging from 50-240 base pairs (bp), heat-denatured and analysed on native polyacrylamide (PAA) gels of different composition. The analysis of a great number of different PCR products demonstrates that the detection rate of MPSV depends on the fragment lengths, the temperature during electrophoresis and the composition of the gel. In general, the detection of MPSV is neither influenced by their location within the DNA fragment nor by the type of substitution, i.e., transitions or transversions. The standard PCR/SSCP system described here provides high reliability and detection rates. It allows the efficient analysis of a large number of DNA samples and many different genes.

  20. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  1. Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

    PubMed Central

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  2. Nullomers and High Order Nullomers in Genomic Sequences

    PubMed Central

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon

  3. Exome Sequence Analysis of 14 Families With High Myopia

    PubMed Central

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.; Quow, Krystina L.; Huang, Samuel J.; Pavelec, Derek M.; Rosenberg, Thomas; Young, Terri L.

    2017-01-01

    Purpose To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. Results In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Conclusions Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder. PMID:28384719

  4. Exome Sequence Analysis of 14 Families With High Myopia.

    PubMed

    Kloss, Bethany A; Tompson, Stuart W; Whisenhunt, Kristina N; Quow, Krystina L; Huang, Samuel J; Pavelec, Derek M; Rosenberg, Thomas; Young, Terri L

    2017-04-01

    To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.

  5. Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.

    PubMed

    Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric

    2016-08-30

    Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.

  6. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-09-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas.

  7. Effect of variations in peptide sequence on anti-human milk fat globule membrane antibody reactions.

    PubMed

    Xing, P X; Reynolds, K; Pietersz, G A; McKenzie, I F

    1991-02-01

    Monoclonal anti-mucine antibodies BC1, BC2 and BC3 produced using human milk fat globule membrane react with a synthetic peptide p1-24 (PDTRPAPGSTAPPAHGVTSAPDTR) representing the repeating amino acid sequence of the mucin core protein. The minimum epitope recognized by these three monoclonal antibodies (mAb) in p1-24 was contained in the five amino acids APDTR. To analyse the variation of position of the epitope, various modifications of the APDTR sequence were made by synthesizing peptides and testing by direct binding and inhibition enzyme-linked immunosorbent assays. Firstly, peptides p13-32 and C-p13-32, in which the epitope APDTR was placed in the middle instead of the C-terminal as in p1-24, were examined. These peptides had a greater reaction with mAb BC1, BC2 and BC3 compared with the reaction with p1-24. Secondly, A-p1-24 and TSA-p1-24 were made wherein two APDTR epitopes were present--these peptides were shown to bind two IgG antibody molecules. Finally, the contribution of each amino acid in the APDTR epitope was studied using the pepscan polyethylene rods, making all 20 of the amino acid substitutions in each position for SAPDTR (the minimum epitope APDTR with an adjacent amino acid S). In the 120 peptides examined there were some 'permissible' substitutions in A, D and T but not in P or R for BC1 and BC2; there were more 'permissible' substitutions for BC3; different substitution patterns were found with each antibody and some substitutions gave an increased reaction compared with the native peptide SAPDTR. The studies are of value in analysing the reaction of antibodies with epitopes expressed in breast cancer and in determining the antigenicity of synthetic peptides.

  8. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed Central

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-01-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas. PMID:7637022

  9. Genetic variation of Trigonobalanus verticillata, a primitive species of Fagaceae, in Malaysia revealed by chloroplast sequences and AFLP markers.

    PubMed

    Kamiya, Koichi; Harada, Ko; Clyde, Mahani Mansor; Mohamed, Abdul Latiff

    2002-06-01

    The genetic variation of Trigonobalanus verticillata, the most recently described genus of Fagaceae, was studied using chloroplast DNA sequences and AFLP fingerprinting. This species has a restricted distribution that is known to include seven localities in tropical lower montane forests in Malaysia and Indonesia. A total of 75 individuals were collected from Bario, Kinabalu, and Fraser's Hill in Malaysia. The sequences of rbcL, matK, and three non-coding regions (atpB-rbcL spacer, trnL intron, and trnL-trnF spacer) were determined for 19 individuals from these populations. We found a total of 30 nucleotide substitutions and four length variations, which allowed identification of three haplotypes characterizing each population. No substitutions were detected within populations, while the tandem repeats in the trnL -trnF spacer had a variable repeat number of a 20-bp motif only in Kinabalu. The differentiation of the populations inferred from the cpDNA molecular clock calibrated with paleontological data was estimated to be 8.3 MYA between Bario and Kinabalu, and 16.7 MYA between Fraser's Hill and the other populations. In AFLP analysis, four selective primer pairs yielded a total of 431 loci, of which 340 (78.9%) were polymorphic. The results showed relatively high gene diversity (H(S) = 0.153 and H(T) = 0.198) and nucleotide diversity (pi(S) = 0.0132 and pi(T) = 0.0168) both within and among the populations. Although the cpDNA data suggest that little or no gene flow occurred between the populations via seeds, the fixation index estimated from AFLP data (F(ST) = 0.153 and N(ST) = 0.214) implies that some gene flow occurs between populations, possibly through pollen transfer.

  10. Natural allelic variations in highly polyploidy Saccharum complex

    USDA-ARS?s Scientific Manuscript database

    Sugarcane (Saccharum spp.) as important sugar and biofuel crop are highly polypoid with complex genomes. A large amount of natural phenotypic variation exists in sugarcane germplasm. Understanding its allelic variance has been challenging but is a critical foundation for discovery of the genomic seq...

  11. Mitochondrial DNA D-loop sequence variation in maternal lineages of Iranian native horses.

    PubMed

    Moridi, M; Masoudi, A A; Vaez Torshizi, R; Hill, E W

    2013-04-01

    To understand the origin and genetic diversity of Iranian native horses, mitochondrial DNA (mtDNA) D-loop sequences were generated for 95 horses from five breeds sampled in eight geographical locations in Iran. Sequence analysis of a 247-bp segment revealed a total of 27 haplotypes with 38 polymorphic sites. Twelve of 19 mtDNA haplogroups were identified in the samples. The most common haplotypes were found within haplogroup X2. Within-population haplotype and nucleotide diversities of the five breeds ranged from 0.838 ± 0.056 to 0.974 ± 0.022 and 0.011 ± 0.002 to 0.021 ± 0.001 respectively, indicating a relatively high genetic diversity in Iranian horses. The identification of several ancient sequences common between the breeds suggests that the lineage of the majority of Iranian horse breeds is old and obviously originated from a vast number of mares. We found in all native Iranian horse breeds lineages of the haplogroups D and K, which is concordant with the previous findings of Asian origins of these haplogroups. The presence of haplotypes E and K in our study also is consistent with a geographical west-east direction of increasing frequency of these haplotypes and a genetic fusion in Iranian horse breeds. © 2012 The Authors, Animal Genetics © 2012 Stichting International Foundation for Animal Genetics.

  12. Sequence variation in the Toxoplasma gondii eIF4A gene among strains from different hosts and geographical locations.

    PubMed

    Chen, J; Fang, S F; Zhou, D H; Li, Z Y; Liu, G H; Zhu, X Q

    2014-04-29

    Toxoplasma gondii is an opportunistic protozoan parasite that infects a wide range of animals, including humans. The T. gondii eukaryotic translation initiation factor 4A (eIF4A) protein is expressed in the tachyzoite, but its expression is markedly downregulated in the bradyzoite, and it is therefore considered to be associated with tachyzoite virulence. The present study examined sequence variation in the eIF4A gene among nine strains of different genotypes from different hosts and geographical localities using polymerase chain reaction amplification, sequence analysis, and phylogenetic reconstruction by Bayesian inference. The complete genomic sequence of the eIF4A gene was 3156 bp in length in the strain TgCgCaI, 3153 bp in the strain MAS, 3152 bp in the strain TgPNY, and 3154 bp in the other six strains. Sequence analysis identified 29 (0-0.8%) variable nucleotide positions among all strains, with 16 of these variations located in the coding region, while the other 12 were distributed between the two introns. Phylogenetic analyses revealed that these eIF4A sequences were not effective molecular markers for intra-species phylogenetic analysis and differential identification of T. gondii strains from different hosts and geographical locations. This study demonstrated the existence of low sequence variation in the eIF4A gene, suggesting that T. gondii eIF4A may represent a suitable candidate vaccine against toxoplasmosis.

  13. Sequence variation of the 16S to 23S rRNA spacer region in Salmonella enterica.

    PubMed

    Christensen, H; Møller, P L; Vogensen, F K; Olsen, J E

    2000-01-01

    The possibility for identification of Salmonella enterica serotypes by sequence analysis of the 16S to 23S rRNA internal transcribed spacer was investigated by direct sequencing of polymerase chain reaction-amplified DNA from all operons simultaneously in a collection of 25 strains of 18 different serotypes of S. enterica, and by sequencing individual cloned operons from a single strain. It was only possible to determine the first 117 bases upstream from the 23S rRNA gene by direct sequencing because of variation between the rrn operons. Comparison of sequences from this region allowed separation of only 15 out of the 18 serotypes investigated and was not specific even at the subspecies level of S. enterica. To determine the differences between internal transcribed spacers in more detail, the individual rrn operons of strain JEO 197, serotype IV 43:z4,z23:-, were cloned and sequenced. The strain contained four short internal transcribed spacer fragments of 382-384 bases in length, which were 98.4-99.7% similar to each other and three long fragments of 505 bases with 98.0-99.8% similarity. The study demonstrated a higher degree of interbacterial variation than intrabacterial variation between operons for serotypes of S. enterica.

  14. Alignment of 700 globin sequences: extent of amino acid substitution and its correlation with variation in volume.

    PubMed Central

    Kapp, O. H.; Moens, L.; Vanfleteren, J.; Trotman, C. N.; Suzuki, T.; Vinogradov, S. N.

    1995-01-01

    Seven-hundred globin sequences, including 146 nonvertebrate sequences, were aligned on the basis of conservation of secondary structure and the avoidance of gap penalties. Of the 182 positions needed to accommodate all the globin sequences, only 84 are common to all, including the absolutely conserved PheCD1 and HisF8. The mean number of amino acid substitutions per position ranges from 8 to 13 for all globins and 5 to 9 for internal positions. Although the total sequence volumes have a variation approximately 2-3%, the variation in volume per position ranges from approximately 13% for the internal to approximately 21% for the surface positions. Plausible correlations exist between amino acid substitution and the variation in volume per position for the 84 common and the internal but not the surface positions. The amino acid substitution matrix derived from the 84 common positions was used to evaluate sequence similarity within the globins and between the globins and phycocyanins C and colicins A, via calculation of pairwise similarity scores. The scores for globin-globin comparisons over the 84 common positions overlap the globin-phycocyanin and globin-colicin scores, with the former being intermediate. For the subset of internal positions, overlap is minimal between the three groups of scores. These results imply a continuum of amino acid sequences able to assume the common three-on-three alpha-helical structure and suggest that the determinants of the latter include sites other than those inaccessible to solvent. PMID:8535255

  15. Sequence variation in mitochondrial cox1 and nad1 genes of ascaridoid nematodes in cats and dogs from Iran.

    PubMed

    Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B

    2015-07-01

    The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.

  16. Phylogeography and population structure of the common warthog (Phacochoerus africanus) inferred from variation in mitochondrial DNA sequences and microsatellite loci.

    PubMed

    Muwanika, V B; Nyakaana, S; Siegismund, H R; Arctander, P

    2003-10-01

    Global climate fluctuated considerably throughout the Pliocene and Pleistocene, influencing the evolutionary history of a wide range of species. Using both mitochondrial sequences and microsatellites, we have investigated the evolutionary consequences of such environmental fluctuation for the patterns of genetic variation in the common warthog, sampled from 24 localities in Africa. In the sample of 181 individuals, 70 mitochondrial DNA haplotypes were identified and an overall nucleotide diversity of 4.0% was observed. The haplotypes cluster in three well-differentiated clades (estimated net sequence divergence of 3.1-6.6%) corresponding to the geographical origins of individuals (i.e. eastern, western and southern African clades). At the microsatellite loci, high polymorphism was observed both in the number of alleles per locus (6-21), and in the gene diversity (in each population 0.59-0.80). Analysis of population differentiation indicates greater subdivision at the mitochondrial loci (FST=0.85) than at nuclear loci (FST=0.20), but both mitochondrial and nuclear loci support the existence of the three warthog lineages. We interpret our results in terms of the large-scale climatic fluctuations of the Pleistocene.

  17. Genetic variation and evolutionary demography of Fenneropenaeus chinensis populations, as revealed by the analysis of mitochondrial control region sequences

    PubMed Central

    2010-01-01

    Genetic variation and evolutionary demography of the shrimp Fenneropenaeus chinensis were investigated using sequence data of the complete mitochondrial control region (CR). Fragments of 993 bp of the CR were sequenced for 93 individuals from five localities over most of the species' range in the Yellow Sea and the Bohai Sea. There were 84 variable sites defining 68 haplotypes. Haplotype diversity levels were very high (0.95 ± 0.03-0.99 ± 0.02) in F. chinensis populations, whereas those of nucleotide diversity were moderate to low (0.66 ± 0.36%-0.84 ± 0.46%). Analysis of molecular variance and conventional population statistics (FST ) revealed no significant genetic structure throughout the range of F. chinensis. Mismatch distribution, estimates of population parameters and neutrality tests revealed that the significant fluctuations and shallow coalescence of mtDNA genealogies observed were coincident with estimated demographic parameters and neutrality tests, in implying important past-population size fluctuations or range expansion. Isolation with Migration (IM) coalescence results suggest that F. chinensis, distributed along the coasts of northern China and the Korean Peninsula (about 1000 km apart), diverged recently, the estimated time-split being 12,800 (7,400-18,600) years ago. PMID:21637498

  18. Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese

    PubMed Central

    Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

    2016-01-01

    AIM To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. METHODS LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. RESULTS Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. CONCLUSION LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA. PMID:27672588

  19. An improved high throughput sequencing method for studying oomycete communities.

    PubMed

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-03-01

    Culture-independent studies using next generation sequencing have revolutionized microbial ecology, however, oomycete ecology in soils is severely lagging behind. The aim of this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomycete communities. The well-known primer sets ITS4, ITS6 and ITS7 were used in the study in a semi-nested PCR approach to target the internal transcribed spacer (ITS) 1 of ribosomal DNA in a next generation sequencing protocol. These primers have been used in similar studies before, but with limited success. We were able to increase the proportion of retrieved oomycete sequences dramatically mainly by increasing the annealing temperature during PCR. The optimized protocol was validated using three mock communities and the method was further evaluated using total DNA from 26 soil samples collected from different agricultural fields in Denmark, and 11 samples from carrot tissue with symptoms of Pythium infection. Sequence data from the Pythium and Phytophthora mock communities showed that our strategy successfully detected all included species. Taxonomic assignments of OTUs from 26 soil sample showed that 95% of the sequences could be assigned to oomycetes including Pythium, Aphanomyces, Peronospora, Saprolegnia and Phytophthora. A high proportion of oomycete reads was consistently present in all 26 soil samples showing the versatility of the strategy. A large diversity of Pythium species including pathogenic and saprophytic species were dominating in cultivated soil. Finally, we analyzed amplicons from carrots with symptoms of cavity spot. This resulted in 94% of the reads belonging to oomycetes with a dominance of species of Pythium that are known to be involved in causing cavity spot, thus demonstrating the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete

  20. Metadata-driven comparative analysis tool for sequences (meta-CATS): an automated process for identifying significant sequence variations that correlate with virus attributes.

    PubMed

    Pickett, B E; Liu, M; Sadat, E L; Squires, R B; Noronha, J M; He, S; Jen, W; Zaremba, S; Gu, Z; Zhou, L; Larsen, C N; Bosch, I; Gehrke, L; McGee, M; Klem, E B; Scheuermann, R H

    2013-12-01

    The Virus Pathogen Resource (ViPR; www.viprbrc.org) and Influenza Research Database (IRD; www.fludb.org) have developed a metadata-driven Comparative Analysis Tool for Sequences (meta-CATS), which performs statistical comparative analyses of nucleotide and amino acid sequence data to identify correlations between sequence variations and virus attributes (metadata). Meta-CATS guides users through: selecting a set of nucleotide or protein sequences; dividing them into multiple groups based on any associated metadata attribute (e.g. isolation location, host species); performing a statistical test at each aligned position; and identifying all residues that significantly differ between the groups. As proofs of concept, we have used meta-CATS to identify sequence biomarkers associated with dengue viruses isolated from different hemispheres, and to identify variations in the NS1 protein that are unique to each of the 4 dengue serotypes. Meta-CATS is made freely available to virology researchers to identify genotype-phenotype correlations for development of improved vaccines, diagnostics, and therapeutics.

  1. Metadata-driven Comparative Analysis Tool for Sequences (meta-CATS): an Automated Process for Identifying Significant Sequence Variations that Correlate with Virus Attributes

    PubMed Central

    Pickett, BE; Liu, M; Sadat, EL; Squires, RB; Noronha, JM; He, S; Jen, W; Zaremba, S; Gu, Z; Zhou, L; Larsen, CN; Bosch, I; Gehrke, L; McGee, M; Klem, EB; Scheuermann, RH

    2016-01-01

    The Virus Pathogen Resource (ViPR; www.viprbrc.org) and Influenza Research Database (IRD; www.fludb.org) have developed a metadata-driven Comparative Analysis Tool for Sequences (meta-CATS), which performs statistical comparative analyses of nucleotide and amino acid sequence data to identify correlations between sequence variations and virus attributes (metadata). Meta-CATS guides users through: selecting a set of nucleotide or protein sequences; dividing them into multiple groups based on any associated metadata attribute (e.g. isolation location, host species); performing a statistical test at each aligned position; and identifying all residues that significantly differ between the groups. As proofs of concept, we have used meta-CATS to identify sequence biomarkers associated with dengue viruses isolated from different hemispheres, and to identify variations in the NS1 protein that are unique to each of the 4 dengue serotypes. Meta-CATS is made freely available to virology researchers to identify genotype-phenotype correlations for development of improved vaccines, diagnostics, and therapeutics. PMID:24210098

  2. Whole genome re-sequencing reveals genome-wide variations among parental lines of 16 mapping populations in chickpea (Cicer arietinum L.).

    PubMed

    Thudi, Mahendar; Khan, Aamir W; Kumar, Vinay; Gaur, Pooran M; Katta, Krishnamohan; Garg, Vanika; Roorkiwal, Manish; Samineni, Srinivasan; Varshney, Rajeev K

    2016-01-27

    Chickpea (Cicer arietinum L.) is the second most important grain legume cultivated by resource poor farmers in South Asia and Sub-Saharan Africa. In order to harness the untapped genetic potential available for chickpea improvement, we re-sequenced 35 chickpea genotypes representing parental lines of 16 mapping populations segregating for abiotic (drought, heat, salinity), biotic stresses (Fusarium wilt, Ascochyta blight, Botrytis grey mould, Helicoverpa armigera) and nutritionally important (protein content) traits using whole genome re-sequencing approach. A total of 192.19 Gb data, generated on 35 genotypes of chickpea, comprising 973.13 million reads, with an average sequencing depth of ~10 X for each line. On an average 92.18 % reads from each genotype were aligned to the chickpea reference genome with 82.17 % coverage. A total of 2,058,566 unique single nucleotide polymorphisms (SNPs) and 292,588 Indels were detected while comparing with the reference chickpea genome. Highest number of SNPs were identified on the Ca4 pseudomolecule. In addition, copy number variations (CNVs) such as gene deletions and duplications were identified across the chickpea parental genotypes, which were minimum in PI 489777 (1 gene deletion) and maximum in JG 74 (1,497). A total of 164,856 line specific variations (144,888 SNPs and 19,968 Indels) with the highest percentage were identified in coding regions in ICC 1496 (21 %) followed by ICCV 97105 (12 %). Of 539 miscellaneous variations, 339, 138 and 62 were inter-chromosomal variations (CTX), intra-chromosomal variations (ITX) and inversions (INV) respectively. Genome-wide SNPs, Indels, CNVs, PAVs, and miscellaneous variations identified in different mapping populations are a valuable resource in genetic research and helpful in locating genes/genomic segments responsible for economically important traits. Further, the genome-wide variations identified in the present study can be used for developing high density SNP arrays for

  3. An Oligonucleotide Microarray for High-Throughput Sequencing of the Mitochondrial Genome

    PubMed Central

    Zhou, Shaoyu; Kassauei, Keyaunoosh; Cutler, David J.; Kennedy, Giulia C.; Sidransky, David; Maitra, Anirban; Califano, Joseph

    2006-01-01

    Previously we developed an oligonucleotide sequencing microarray (MitoChip) as an array-based sequencing platform for rapid and high-throughput analysis of mitochondrial DNA. The first generation MitoChip, however, was not tiled with probes for the noncoding D-loop region, a site frequently mutated in human cancers. Here we report the development of a second-generation MitoChip (v2.0) with oligonucleotide probes to sequence the entire mitochondrial genome. In addition, the MitoChip v2.0 contains redundant tiling of sequences for 500 of the most common haplotypes including single-nucleotide changes, insertions, and deletions. Sequencing results from 14 primary head and neck tumor tissues demonstrated that the v2.0 MitoChips detected a larger number of variants than the original version. Multiple coding region variants detected only in the second generation MitoChips, but not the earlier chip version, were further confirmed with conventional sequencing. Moreover, 31 variations in noncoding region were identified using MitoChips v2.0. Replicate experiments demonstrated >99.99% reproducibility in the second generation MitoChip. In seven head and neck cancer samples with matched lymphocyte DNA, the MitoChip v2.0 detected at least one cancer-associated mitochondrial mutation in four (57%) samples. These results indicate that the second generation MitoChip is a high-throughput platform for identification of mitochondrial DNA mutations in primary tumors. PMID:16931588

  4. Binary interactions with high accretion rates onto main sequence stars

    NASA Astrophysics Data System (ADS)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10-2 M ⊙ yr-1 for solar type stars, and up to ≈ 1 M ⊙ yr-1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  5. Diversity and Variation of Bacterial Community Revealed by MiSeq Sequencing in Chinese Dark Teas

    PubMed Central

    Fu, Jianyu; Lv, Haipeng; Chen, Feng

    2016-01-01

    Chinese dark teas (CDTs) are now among the popular tea beverages worldwide due to their unique health benefits. Because the production of CDTs involves fermentation that is characterized by the effect of microbes, microorganisms are believed to play critical roles in the determination of the chemical characteristics of CDTs. Some dominant fungi have been identified from CDTs. In contrast, little, if anything, is known about the composition of bacterial community in CDTs. This study was set to investigate the diversity and variation of bacterial community in four major types of CDTs from China. First, the composition of the bacterial community of CDTs was determined using MiSeq sequencing. From the four typical CDTs, a total of 238 genera that belong to 128 families of bacteria were detected, including most of the families of beneficial bacteria known to be associated with fermented food. While different types of CDTs had generally distinct bacterial structures, the two types of brick teas produced from adjacent regions displayed strong similarity in bacterial composition, suggesting that the producing environment and processing condition perhaps together influence bacterial succession in CDTs. The global characterization of bacterial communities in CDTs is an essential first step for us to understand their function in fermentation and their potential impact on human health. Such knowledge will be important guidance for improving the production of CDTs with higher quality and elevated health benefits. PMID:27690376

  6. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    PubMed Central

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  7. Structural variation discovery in the cancer genome using next generation sequencing: computational solutions and perspectives.

    PubMed

    Liu, Biao; Conroy, Jeffrey M; Morrison, Carl D; Odunsi, Adekunle O; Qin, Maochun; Wei, Lei; Trump, Donald L; Johnson, Candace S; Liu, Song; Wang, Jianmin

    2015-03-20

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome.

  8. Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas.

    PubMed

    Sri, Tanu; Mayee, Pratiksha; Singh, Anandita

    2015-09-01

    Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.

  9. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  10. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2016-07-12

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  11. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

    USDA-ARS?s Scientific Manuscript database

    Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data....

  12. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  13. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed Central

    Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179

  14. Savant: genome browser for high-throughput sequencing data.

    PubMed

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-08-15

    The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Savant is freely available at http://compbio.cs.toronto.edu/savant.

  15. Variation in the Kozak sequence of WNT16 results in an increased translation and is associated with osteoporosis related parameters.

    PubMed

    Hendrickx, Gretl; Boudin, Eveline; Fijałkowski, Igor; Nielsen, Torben Leo; Andersen, Marianne; Brixen, Kim; Van Hul, Wim

    2014-02-01

    The importance of WNT16 in the regulation of bone metabolism was recently confirmed by several genome-wide association studies and by a Wnt16 (Wnt16(-/-)) knockout mouse model. The aim of this study was thus to replicate and further elucidate the effect of common genetic variation in WNT16 on osteoporosis related parameters. Hereto, we performed a WNT16 candidate gene association study in a population of healthy Caucasian men from the Odense Androgen Study (OAS). Using HapMap, five tagSNPs and one multimarker test were selected for genotyping to cover most of the common genetic variation in and around WNT16 (MAF>5%). This study confirmed previously reported associations for rs3801387 and rs2707466 with bone mineral density (BMD) at several sites. Furthermore, we additionally demonstrated that rs2908007 is strongly associated with BMD at several sites in the young, elderly and complete OAS population. The observed effect of these three associated SNPs on the respective phenotypes is comparable and we can conclude that the presence of the minor allele results in an increase in BMD. Additionally, we performed re-sequencing of WNT16 on two cohorts selected from the young OAS cohort, based on their extreme BMD values. On this basis, rs55710688 was selected for an in vitro translation experiment since it is located in the Kozak sequence of WNT16a. We observed an increased translation efficiency and thus a higher amount of WNT16a for the Kozak sequence that was significantly more prevalent in the high BMD cohort. This observation is in line with the results of the Wnt16(-/-) mice. Finally, a WNT luciferase reporter assay was performed and showed no activation of the β-catenin dependent pathway by Wnt16. We did detect a dose-dependent inhibitory effect of Wnt16 on WNT1 activation of this canonical WNT pathway. Increased translation of WNT16 can thus lead to an increased inhibitory action of WNT16 on canonical WNT signaling. This statement is in contrast with the known

  16. De novo sequencing of highly modified therapeutic oligonucleotides by hydrophobic tag sequencing coupled with LC-MS.

    PubMed

    Goto, R; Miyakawa, S; Inomata, E; Takami, T; Yamaura, J; Nakamura, Y

    2017-02-01

    Correct sequences are prerequisite for quality control of therapeutic oligonucleotides. However, there is no definitive method available for determining sequences of highly modified therapeutic RNAs, and thereby, most of the oligonucleotides have been used clinically without direct sequence determination. In this study, we developed a novel sequencing method called 'hydrophobic tag sequencing'. Highly modified oligonucleotides are sequenced by partially digesting oligonucleotides conjugated with a 5'-hydrophobic tag, followed by liquid chromatography-mass spectrometry analysis. 5'-Hydrophobic tag-printed fragments (5'-tag degradates) can be separated in order of their molecular masses from tag-free oligonucleotides by reversed-phase liquid chromatography. As models for the sequencing, the anti-VEGF aptamer (Macugen) and the highly modified 38-mer RNA sequences were analyzed under blind conditions. Most nucleotides were identified from the molecular weight of hydrophobic 5'-tag degradates calculated from monoisotopic mass in simple full mass data. When monoisotopic mass could not be assigned, the nucleotide was estimated using the molecular weight of the most abundant mass. The sequences of Macugen and 38-mer RNA perfectly matched the theoretical sequences. The hydrophobic tag sequencing worked well to obtain simple full mass data, resulting in accurate and clear sequencing. The present study provides for the first time a de novo sequencing technology for highly modified RNAs and contributes to quality control of therapeutic oligonucleotides. Copyright © 2016 John Wiley & Sons, Ltd.

  17. Microsatellites and 16S sequences corroborate phenotypic evidence of trans-Andean variation in the parasitoid Microctonus hyperodae (Hymenoptera: Braconidae).

    PubMed

    Winder, L M; Phillips, C B; Lenney-Williams, C; Cane, R P; Paterson, K; Vink, C J; Goldson, S L

    2005-08-01

    Eight South American geographical populations of the parasitoid Microctonus hyperodae Loan were collected in South America (Argentina, Brazil, Chile and Uruguay) and released in New Zealand for biological control of the weevil Listronotus bonariensis (Kuschel), a pest of pasture grasses and cereals. DNA sequencing (16S, COI, 28S, ITS1, beta-tubulin), RAPD, AFLP, microsatellite, SSCP and RFLP analyses were used to seek markers for discriminating between the South American populations. All of the South American populations were more homogeneous than expected. However, variation in microsatellites and 16S gene sequences corroborated morphological, allozyme and other phenotypic evidence of trans-Andes variation between the populations. The Chilean populations were the most genetically variable, while the variation present on the eastern side of the Andes mountains was a subset of that observed in Chile.

  18. De novo structure prediction of globular proteins aided by sequence variation-derived contacts.

    PubMed

    Kosciolek, Tomasz; Jones, David T

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.

  19. De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts

    PubMed Central

    Kosciolek, Tomasz; Jones, David T.

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm – FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step. PMID:24637808

  20. Optical transitions in highly charged californium ions with high sensitivity to variation of the fine-structure constant.

    PubMed

    Berengut, J C; Dzuba, V A; Flambaum, V V; Ong, A

    2012-08-17

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf(16+) is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf(16+) has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters.

  1. Optical Transitions in Highly Charged Californium Ions with High Sensitivity to Variation of the Fine-Structure Constant

    NASA Astrophysics Data System (ADS)

    Berengut, J. C.; Dzuba, V. A.; Flambaum, V. V.; Ong, A.

    2012-08-01

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf16+ is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf16+ has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters.

  2. Epilepsy-causing sequence variations in SIK1 disrupt synaptic activity response gene expression and affect neuronal morphology.

    PubMed

    Pröschel, Christoph; Hansen, Jeanne N; Ali, Adil; Tuttle, Emily; Lacagnina, Michelle; Buscaglia, Georgia; Halterman, Marc W; Paciorkowski, Alex R

    2017-02-01

    SIK1 syndrome is a newly described developmental epilepsy disorder caused by heterozygous mutations in the salt-inducible kinase SIK1. To better understand the pathophysiology of SIK1 syndrome, we studied the effects of SIK1 pathogenic sequence variations in human neurons. Primary human fetal cortical neurons were transfected with a lentiviral vector to overexpress wild-type and mutant SIK1 protein. We evaluated the transcriptional activity of known downstream gene targets in neurons expressing mutant SIK1 compared with wild type. We then assayed neuronal morphology by measuring neurite length, number and branching. Truncating SIK1 sequence variations were associated with abnormal MEF2C transcriptional activity and decreased MEF2C protein levels. Epilepsy-causing SIK1 sequence variations were associated with significantly decreased expression of ARC (activity-regulated cytoskeletal-associated) and other synaptic activity response element genes. Assay of mRNA levels for other MEF2C target genes NR4A1 (Nur77) and NRG1, found significantly, decreased the expression of these genes as well. The missense p.(Pro287Thr) SIK1 sequence variation was associated with abnormal neuronal morphology, with significant decreases in mean neurite length, mean number of neurites and a significant increase in proximal branches compared with wild type. Epilepsy-causing SIK1 sequence variations resulted in abnormalities in the MEF2C-ARC pathway of neuronal development and synapse activity response. This work provides the first insights into the mechanisms of pathogenesis in SIK1 syndrome, and extends the ARX-MEF2C pathway in the pathogenesis of developmental epilepsy.

  3. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform.

    PubMed

    Fordyce, Sarah L; Ávila-Arcos, Maria C; Rockenbauer, Eszter; Børsting, Claus; Frank-Hansen, Rune; Petersen, Frederik Torp; Willerslev, Eske; Hansen, Anders J; Morling, Niels; Gilbert, M Thomas P

    2011-08-01

    The analysis and profiling of short tandem repeat (STR) loci is routinely used in forensic genetics. Current methods to investigate STR loci, including PCR-based standard fragment analyses and capillary electrophoresis, only provide amplicon lengths that are used to estimate the number of STR repeat units. These methods do not allow for the full resolution of STR base composition that sequencing approaches could provide. Here we present an STR profiling method based on the use of the Roche Genome Sequencer (GS) FLX to simultaneously sequence multiple core STR loci. Using this method in combination with a bioinformatic tool designed specifically to analyze sequence lengths and frequencies, we found that GS FLX STR sequence data are comparable to conventional capillary electrophoresis-based STR typing. Furthermore, we found DNA base substitutions and repeat sequence variations that would not have been identified using conventional STR typing.

  4. Compression of Structured High-Throughput Sequencing Data

    PubMed Central

    Campagne, Fabien; Dorff, Kevin C.; Chambwe, Nyasha; Robinson, James T.; Mesirov, Jill P.

    2013-01-01

    Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays. PMID:24260313

  5. Minimal intraspecific variation in the sequence of the transcribed spacer regions of the ribosomal DNA of lake trout (Salvelinus namaycush).

    PubMed

    Zhuo, L; Sajdak, S L; Phillips, R B

    1994-08-01

    Intraspecific variation in the sequence of the transcribed spacer regions of the ribosomal DNA (rDNA) in lake trout was examined by restriction mapping and sequencing of these regions amplified by the polymerase chain reaction. The length of the first internal transcribed spacer region (ITS-1) was 566 bases and the second internal transcribed spacer region (ITS-2) was 368 bases in lake trout. When the 1.4-kb region including the ITS-1, the 5.8S coding region, and the ITS-2 was amplified from 12 individuals from four populations and digested with eight different enzymes only one intraindividual polymorphism was found that occurred in each population. When the amplified ITS-1 region was sequenced from an additional 10 individuals from five populations, no interindividual variation was found in the sequence. A 6-kb portion of the rDNA repeat unit including 1.6 kb of the 18S coding region, the 5' external spacer region (5' ETS), and part of the adjacent intergenic spacer was cloned and a restriction map was prepared for these regions in lake trout. No intraspecific variation was found in the region adjacent to the 18S rDNA, which includes the 5' ETS, although intraspecific and intraindividual length variation was found in the intergenic spacer region 3-6 kb from the 18S. Sequencing of a 609-b segment of the 5' ETS adjacent to the 18S coding region revealed the presence of two 41-b repeats. The 198-b sequence between the repeats had some similarity to the 18S coding region of other fishes. Primers were designed for amplification of 559 b of the 5' ETS using the polymerase chain reaction.(ABSTRACT TRUNCATED AT 250 WORDS)

  6. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    PubMed Central

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  7. High precision variational calculations of few-electron atoms

    NASA Astrophysics Data System (ADS)

    Bubin, Sergiy

    2015-05-01

    High precision calculations of energy levels and other properties of small atoms and ions have been a subject of fruitful interplay between the experiment and theory. However, most calculation of spectroscopic accuracy, until recently, have been possible only for two- and three-electron systems. In this talk I will report on progress toward performing high accuracy calculations of larger atomic systems (up to four-six electrons). The results of benchmark quality are attainable with the use of variational expansions in terms of all-particle explicitly correlated Gaussians, whose nonlinear variational parameters are extensively optimized. I will demonstrate what level of accuracy is available today for few-electron atoms and discuss the issues that must be overcome in order to extend the capability of the method to even larger systems. This work has been supported by the Ministry of Education and Science of Kazakhstan.

  8. Extreme size and sequence variation in the ITS rDNA of Bremia lactucae.

    PubMed

    Choi, Young-Joon; Hong, Seung-Boem; Shin, Hyeon-Dong

    2007-02-01

    Bremia lactucae Regel (Chromista, Peronosporaceae) is an economically destructive pathogen, which causes downy mildew disease on lettuce (Lactuca sativa L.) worldwide. The ribosomal internal transcribed spacer (ITS) of Bremia lactucae isolates was analyzed for the first time. The ITS region of lettuce downy mildew was observed to have a size of 2458 bp; thereby, having one of the longest ITS sizes recorded to date. The majority of the extremely large sized ITS2 length of 2086 was attributed to the additional presences of nine repetitive elements with lengths of 179-194 bp, which between them shared the low homology of 48-69%. Comparison of the ITS2 sequences with the B. lactucae isolates from other host plants showed that isolates present on Lactuca sativa were distinct from those on L. indica var. laciniata, as well as Hemistepta and Youngia. We suggest the high degree of sequence heterogeneity exhibited in the ITS2 region of B. lactucae may warrant the specific detection and diagnosis of this destructive pathogen or its division into several distinct species.

  9. Cross-amplification and sequence variation of microsatellite loci in Eurasian hard pines.

    PubMed

    González-Martínez, S C; Robledo-Arnuncio, J J; Collada, C; Díaz, A; Williams, C G; Alía, R; Cervera, M T

    2004-06-01

    Microsatellite transfer across coniferous species is a valued methodology because de novo development for each species is costly and there are many species with only a limited commodity value. Cross-species amplification of orthologous microsatellite regions provides valuable information on mutational and evolutionary processes affecting these loci. We tested 19 nuclear microsatellite markers from Pinus taeda L. (subsection Australes) and three from P. sylvestris L. (subsection Pinus) on seven Eurasian hard pine species ( P. uncinata Ram., P. sylvestris L., P. nigra Arn., P. pinaster Ait., P. halepensis Mill., P. pinea L. and P. canariensis Sm.). Transfer rates to species in subsection Pinus (36-59%) were slightly higher than those to subsections Pineae and Pinaster (32-45%). Half of the trans-specific microsatellites were found to be polymorphic over evolutionary times of approximately 100 million years (ten million generations). Sequencing of three trans-specific microsatellites showed conserved repeat and flanking regions. Both a decrease in the number of perfect repeats in the non-focal species and a polarity for mutation, the latter defined as a higher substitution rate in the flanking sequence regions close to the repeat motifs, were observed in the trans-specific microsatellites. The transfer of microsatellites among hard pine species proved to be useful for obtaining highly polymorphic markers in a wide range of species, thereby providing new tools for population and quantitative genetic studies.

  10. Transcriptome-wide comparison of sequence variation in divergent ecotypes of kokanee salmon

    PubMed Central

    2013-01-01

    Background High throughput next-generation sequencing technology has enabled the collection of genome-wide sequence data and revolutionized single nucleotide polymorphism (SNP) discovery in a broad range of species. When analyzed within a population genomics framework, SNP-based genotypic data may be used to investigate questions of evolutionary, ecological, and conservation significance in natural populations of non-model organisms. Kokanee salmon are recently diverged freshwater populations of sockeye salmon (Oncorhynchus nerka) that exhibit reproductive ecotypes (stream-spawning and shore-spawning) in lakes throughout western North America and northeast Asia. Current conservation and management strategies may treat these ecotypes as discrete stocks, however their recent divergence and low levels of gene flow make in-season genetic stock identification a challenge. The development of genome-wide SNP markers is an essential step towards fine-scale stock identification, and may enable a direct investigation of the genetic basis of ecotype divergence. Results We used pooled cDNA samples from both ecotypes of kokanee to generate 750 million base pairs of transcriptome sequence data. These raw data were assembled into 11,074 high coverage contigs from which we identified 32,699 novel single nucleotide polymorphisms. A subset of these putative SNPs was validated using high-resolution melt analysis and Sanger resequencing to genotype independent samples of kokanee and anadromous sockeye salmon. We also identified a number of contigs that were composed entirely of reads from a single ecotype, which may indicate regions of differential gene expression between the two reproductive ecotypes. In addition, we found some evidence for greater pathogen load among the kokanee sampled in stream-spawning habitats, suggesting a possible evolutionary advantage to shore-spawning that warrants further study. Conclusions This study provides novel genomic resources to support population

  11. High mitochondrial sequence diversity in linguistic isolates of the Alps.

    PubMed Central

    Stenico, M.; Nigro, L.; Bertorelle, G.; Calafell, F.; Capitanio, M.; Corrain, C.; Barbujani, G.

    1996-01-01

    Segment I of the control region of mtDNA (360 bases) was sequenced in seven samples, each of 10 individuals inhabiting villages in the eastern Italian Alps (South Tyrol and Trentino). Three linguistic groups, German, Italian, and Ladin, were represented by two samples each; the seventh sample comes from an isolated group of German origin, the Mocheni, who are linguistically distinct and geographically separated from the bulk of the German speakers. Seventy-four polymorphic sites were identified, defining 63 different haplotypes. Mocheni and Ladin speakers tend to form two clusters in the evolutionary trees inferred from sequences. Analysis of molecular variance shows significant differentiation within samples, among them, and among linguistic groups. Genetic differences between the Ladins and the other groups are not much smaller than between Europeans and some Africans; variation is large within groups, as well, with the exception of only the Mocheni. In the evolutionary trees where the four alpine groups are compared with other European populations, Mocheni and especially Ladins appear as clear outliers. Romansch-speaking Swiss, who are linguistically related to Ladins, are not genetically similar to them, for this segment of DNA. Because the time elapsed since colonization of the Alps (< or = 12,000 years) is short in mutational terms, the only model accounting for the observed relationships between mtDNA variation and linguistic identity seems one in which a population ancestral to Ladin speakers was already differentiated long before the Alps were settled and the current linguistic affiliations were established. For the Mocheni, the results are consistent with a simpler episode of allele loss, from an original genetic pool common to the ancestors of the current German speakers. PMID:8940282

  12. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies.

    PubMed

    Giancarlo, Raffaele; Rombo, Simona E; Utro, Filippo

    2014-05-01

    High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.

  13. Sequence variation of Bemisia tabaci Chemosensory Protein 2 in cryptic species B and Q: New DNA markers for whitefly recognition.

    PubMed

    Liu, Guo-Xia; Ma, Hong-Mei; Xie, Hong-Yan; Xuan, Ning; Picimbon, Jean-François

    2016-01-15

    Bemisia tabaci Gennadius biotypes B and Q are two of the most important worldwide agricultural insect pests. Genomic sequences of Type-2 B. tabaci chemosensory protein (BtabCSP2) were cloned and sequenced in B and Q biotypes, revealing key biotype-specific variations in the intron sequence. A Q260 sequence was found specifically in Q-BtabCSP2 and Cucumis melo LN692399, suggesting ancestral horizontal transfer of gene between the insect and the plant through bacteria. A cleaved amplified polymorphic sequences (CAPS) method was then developed to differentiate B and Q based on the sequence variation in exon of BtabCSP2 gene. The performances of CSP2-based CAPS for whitefly recognition were assessed using B. tabaci field collections from Shandong Province (P.R. China). Our SacII based CAPS method led to the same result compared to mitochondrial cytochrome oxidase-based CAPS method in the field collections. We therefore propose an explanation for CSP origin and a new rapid simple molecular method based on genomic DNA and chemosensory gene to differentiate accurately the B and Q whiteflies of the Bemisia complex around the world.

  14. Identification of conserved genomic regions and variation therein amongst Cetartiodactyla species using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Background Next Generation Sequencing has created an opportunity to genetically characterize an individual both inexpensively and comprehensively. In earlier work produced in our collaboration [1], it was demonstrated that, for animals without a reference genome, their Next Generation Sequence data ...

  15. MEGARes: an antimicrobial resistance database for high throughput sequencing

    PubMed Central

    Lakin, Steven M.; Dean, Chris; Noyes, Noelle R.; Dettenwanger, Adam; Ross, Anne Spencer; Doster, Enrique; Rovira, Pablo; Abdo, Zaid; Jones, Kenneth L.; Ruiz, Jaime; Belk, Keith E.; Morley, Paul S.; Boucher, Christina

    2017-01-01

    Antimicrobial resistance has become an imminent concern for public health. As methods for detection and characterization of antimicrobial resistance move from targeted culture and polymerase chain reaction to high throughput metagenomics, appropriate resources for the analysis of large-scale data are required. Currently, antimicrobial resistance databases are tailored to smaller-scale, functional profiling of genes using highly descriptive annotations. Such characteristics do not facilitate the analysis of large-scale, ecological sequence datasets such as those produced with the use of metagenomics for surveillance. In order to overcome these limitations, we present MEGARes (https://megares.meglab.org), a hand-curated antimicrobial resistance database and annotation structure that provides a foundation for the development of high throughput acyclical classifiers and hierarchical statistical analysis of big data. MEGARes can be browsed as a stand-alone resource through the website or can be easily integrated into sequence analysis pipelines through download. Also via the website, we provide documentation for AmrPlusPlus, a user-friendly Galaxy pipeline for the analysis of high throughput sequencing data that is pre-packaged for use with the MEGARes database. PMID:27899569

  16. MEGARes: an antimicrobial resistance database for high throughput sequencing.

    PubMed

    Lakin, Steven M; Dean, Chris; Noyes, Noelle R; Dettenwanger, Adam; Ross, Anne Spencer; Doster, Enrique; Rovira, Pablo; Abdo, Zaid; Jones, Kenneth L; Ruiz, Jaime; Belk, Keith E; Morley, Paul S; Boucher, Christina

    2017-01-04

    Antimicrobial resistance has become an imminent concern for public health. As methods for detection and characterization of antimicrobial resistance move from targeted culture and polymerase chain reaction to high throughput metagenomics, appropriate resources for the analysis of large-scale data are required. Currently, antimicrobial resistance databases are tailored to smaller-scale, functional profiling of genes using highly descriptive annotations. Such characteristics do not facilitate the analysis of large-scale, ecological sequence datasets such as those produced with the use of metagenomics for surveillance. In order to overcome these limitations, we present MEGARes (https://megares.meglab.org), a hand-curated antimicrobial resistance database and annotation structure that provides a foundation for the development of high throughput acyclical classifiers and hierarchical statistical analysis of big data. MEGARes can be browsed as a stand-alone resource through the website or can be easily integrated into sequence analysis pipelines through download. Also via the website, we provide documentation for AmrPlusPlus, a user-friendly Galaxy pipeline for the analysis of high throughput sequencing data that is pre-packaged for use with the MEGARes database.

  17. Modelling Human Regulatory Variation in Mouse: Finding the Function in Genome-Wide Association Studies and Whole-Genome Sequencing

    PubMed Central

    Schmouth, Jean-François; Bonaguro, Russell J.; Corso-Diaz, Ximena; Simpson, Elizabeth M.

    2012-01-01

    An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs), in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX). This method can be applied to most human genes for which a bacterial artificial chromosome (BAC) construct can be derived and a mouse-null allele exists. This strategy comprises (1) the use of recombineering technology to create a human variant–harbouring BAC, (2) knock-in of this BAC into the mouse genome using Hprt docking technology, and (3) allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation. PMID:22396661

  18. Fin whale MDH-1 and MPI allozyme variation is not reflected in the corresponding DNA sequences

    PubMed Central

    Olsen, Morten Tange; Pampoulie, Christophe; Daníelsdóttir, Anna K; Lidh, Emmelie; Bérubé, Martine; Víkingsson, Gísli A; Palsbøll, Per J

    2014-01-01

    The appeal of genetic inference methods to assess population genetic structure and guide management efforts is grounded in the correlation between the genetic similarity and gene flow among populations. Effects of such gene flow are typically genomewide; however, some loci may appear as outliers, displaying above or below average genetic divergence relative to the genomewide level. Above average population, genetic divergence may be due to divergent selection as a result of local adaptation. Consequently, substantial efforts have been directed toward such outlying loci in order to identify traits subject to local adaptation. Here, we report the results of an investigation into the molecular basis of the substantial degree of genetic divergence previously reported at allozyme loci among North Atlantic fin whale (Balaenoptera physalus) populations. We sequenced the exons encoding for the two most divergent allozyme loci (MDH-1 and MPI) and failed to detect any nonsynonymous substitutions. Following extensive error checking and analysis of additional bioinformatic and morphological data, we hypothesize that the observed allozyme polymorphisms may reflect phenotypic plasticity at the cellular level, perhaps as a response to nutritional stress. While such plasticity is intriguing in itself, and of fundamental evolutionary interest, our key finding is that the observed allozyme variation does not appear to be a result of genetic drift, migration, or selection on the MDH-1 and MPI exons themselves, stressing the importance of interpreting allozyme data with caution. As for North Atlantic fin whale population structure, our findings support the low levels of differentiation found in previous analyses of DNA nucleotide loci. PMID:24963377

  19. Sequence variations in the collagen IX and XI genes are associated with degenerative lumbar spinal stenosis

    PubMed Central

    Noponen-Hietala, N; Kyllonen, E; Mannikko, M; Ilkko, E; Karppinen, J; Ott, J; Ala-Kokko, L

    2003-01-01

    Background: Degenerative lumbar spinal stenosis (LSS) is usually caused by disc herniation or degeneration. Several genetic factors have been implicated in disc disease. Tryptophan alleles in COL9A2 and COL9A3 have been shown to be associated with lumbar disc disease in the Finnish population, and polymorphisms in the vitamin D receptor gene (VDR) (FokI and TaqI), the matrix metalloproteinase-3 gene (MMP-3) and an aggrecan gene (AGC1) VNTR have been reported to be associated with disc degeneration. In addition, an IVS6-4 a>t polymorphism in COL11A2 has been found in connection with stenosis caused by ossification of the posterior longitudinal ligament in the Japanese population. Objective: To study the role of genetic factors in LSS. Methods: 29 Finnish probands were analysed for mutations in the genes coding for intervertebral disc matrix proteins, COL1A1, COL1A2, COL2A1, COL9A1, COL9A2, COL9A3, COL11A1, COL11A2, and AGC1. VDR and MMP-3 polymorphisms were also analysed. Sequence variations were tested in 56 Finnish controls. Results: Several disease associated alleles were identified. A splice site mutation in COL9A2 leading to a premature translation termination codon and the generation of a truncated protein was identified in one proband, another had the Trp2 allele, and four others the Trp3 allele. The frequency of the COL11A2 IVS6-4 t allele was 93.1% in the probands and 72.3% in controls (p = 0.0016). The differences in genotype frequencies for this site were less significant (p = 0.0043). Conclusions: Genetic factors have an important role in the pathogenesis of LSS. PMID:14644861

  20. Sequence Variation in Superoxide Dismutase Gene of Toxoplasma gondii among Various Isolates from Different Hosts and Geographical Regions.

    PubMed

    Wang, Shuai; Cao, Aiping; Li, Xun; Zhao, Qunli; Liu, Yuan; Cong, Hua; He, Shenyi; Zhou, Huaiyu

    2015-06-01

    Toxoplasma gondii, an obligate intracellular protozoan parasite of the phylum Apicomplexa, can infect all warm-blooded vertebrates, including humans, livestock, and marine mammals. The aim of this study was to investigate whether superoxide dismutase (SOD) of T. gondii can be used as a new marker for genetic study or a potential vaccine candidate. The partial genome region of the SOD gene was amplified and sequenced from 10 different T. gondii isolates from different parts of the world, and all the sequences were examined by PCR-RFLP, sequence analysis, and phylogenetic reconstruction. The results showed that partial SOD gene sequences ranged from 1,702 bp to 1,712 bp and A + T contents varied from 50.1% to 51.1% among all examined isolates. Sequence alignment analysis identified total 43 variable nucleotide positions, and these results showed that 97.5% sequence similarity of SOD gene among all examined isolates. Phylogenetic analysis revealed that these SOD sequences were not an effective molecular marker for differential identification of T. gondii strains. The research demonstrated existence of low sequence variation in the SOD gene among T. gondii strains of different genotypes from different hosts and geographical regions.

  1. Mitochondrial genome sequences of Artemia tibetiana and Artemia urmiana: assessing molecular changes for high plateau adaptation.

    PubMed

    Zhang, Hangxiao; Luo, Qibin; Sun, Jing; Liu, Fei; Wu, Gang; Yu, Jun; Wang, Weiwei

    2013-05-01

    Brine shrimps, Artemia (Crustacea, Anostraca), inhabit hypersaline environments and have a broad geographical distribution from sea level to high plateaus. Artemia therefore possess significant genetic diversity, which gives them their outstanding adaptability. To understand this remarkable plasticity, we sequenced the mitochondrial genomes of two Artemia tibetiana isolates from the Tibetan Plateau in China and one Artemia urmiana isolate from Lake Urmia in Iran and compared them with the genome of a low-altitude Artemia, A. franciscana. We compared the ratio of the rate of nonsynonymous (Ka) and synonymous (Ks) substitutions (Ka/Ks ratio) in the mitochondrial protein-coding gene sequences and found that atp8 had the highest Ka/Ks ratios in comparisons of A. franciscana with either A. tibetiana or A. urmiana and that atp6 had the highest Ka/Ks ratio between A. tibetiana and A. urmiana. Atp6 may have experienced strong selective pressure for high-altitude adaptation because although A. tibetiana and A. urmiana are closely related they live at different altitudes. We identified two extended termination-associated sequences and three conserved sequence blocks in the D-loop region of the mitochondrial genomes. We propose that sequence variations in the D-loop region and in the subunits of the respiratory chain complexes independently or collectively contribute to the adaptation of Artemia to different altitudes.

  2. High-resolution specificity from DNA sequencing highlights alternative modes of Lac repressor binding.

    PubMed

    Zuo, Zheng; Stormo, Gary D

    2014-11-01

    Knowing the specificity of transcription factors is critical to understanding regulatory networks in cells. The lac repressor-operator system has been studied for many years, but not with high-throughput methods capable of determining specificity comprehensively. Details of its binding interaction and its selection of an asymmetric binding site have been controversial. We employed a new method to accurately determine relative binding affinities to thousands of sequences simultaneously, requiring only sequencing of bound and unbound fractions. An analysis of 2560 different DNA sequence variants, including both base changes and variations in operator length, provides a detailed view of lac repressor sequence specificity. We find that the protein can bind with nearly equal affinities to operators of three different lengths, but the sequence preference changes depending on the length, demonstrating alternative modes of interaction between the protein and DNA. The wild-type operator has an odd length, causing the two monomers to bind in alternative modes, making the asymmetric operator the preferred binding site. We tested two other members of the LacI/GalR protein family and find that neither can bind with high affinity to sites with alternative lengths or shows evidence of alternative binding modes. A further comparison with known and predicted motifs suggests that the lac repressor may be unique in this ability and that this may contribute to its selection.

  3. Identifying micro-inversions using high-throughput sequencing reads.

    PubMed

    He, Feifei; Li, Yang; Tang, Yu-Hang; Ma, Jian; Zhu, Huaiqiu

    2016-01-11

    The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .

  4. Genotype-Frequency Estimation from High-Throughput Sequencing Data.

    PubMed

    Maruki, Takahiro; Lynch, Michael

    2015-10-01

    Rapidly improving high-throughput sequencing technologies provide unprecedented opportunities for carrying out population-genomic studies with various organisms. To take full advantage of these methods, it is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks. The proposed method fully accounts for uncertainties resulting from sequencing errors and biparental chromosome sampling and yields essentially unbiased estimates with minimal sampling variances with moderately high depths of coverage regardless of a mating system and structure of the population. Moreover, we have developed statistical tests for examining the significance of polymorphisms and their genotypic deviations from Hardy-Weinberg equilibrium. We examine the performance of the proposed method by computer simulations and apply it to low-coverage human data generated by high-throughput sequencing. The results show that the proposed method improves our ability to carry out population-genomic analyses in important ways. The software package of the proposed method is freely available from https://github.com/Takahiro-Maruki/Package-GFE.

  5. Fulcrum: condensing redundant reads from high-throughput sequencing studies

    PubMed Central

    Burriesci, Matthew S.; Lehnert, Erik M.; Pringle, John R.

    2012-01-01

    Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. Results: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory. Availability and implementation: Source code and a tutorial are available at http://pringlelab.stanford.edu/protocols.html under a BSD-like license. Fulcrum was written and tested in Python 2.6, and the single-machine and local-network modes depend on a modified version of the Parallel Python library (provided). Contact: erik.m.lehnert@gmail.com Supplementary information: Supplementary information is available at Bioinformatics online. PMID:22419786

  6. Tandem repeat sequence variation as causative cis-eQTLs for protein-coding gene expression variation: the case of CSTB.

    PubMed

    Borel, Christelle; Migliavacca, Eugenia; Letourneau, Audrey; Gagnebin, Maryline; Béna, Frédérique; Sailani, M Reza; Dermitzakis, Emmanouil T; Sharp, Andrew J; Antonarakis, Stylianos E

    2012-08-01

    Association studies have revealed expression quantitative trait loci (eQTLs) for a large number of genes. However, the causative variants that regulate gene expression levels are generally unknown. We hypothesized that copy-number variation of sequence repeats contribute to the expression variation of some genes. Our laboratory has previously identified that the rare expansion of a repeat c.-174CGGGGCGGGGCG in the promoter region of the CSTB gene causes a silencing of the gene, resulting in progressive myoclonus epilepsy. Here, we genotyped the repeat length and quantified CSTB expression by quantitative real-time polymerase chain reaction in 173 lymphoblastoid cell lines (LCLs) and fibroblast samples from the GenCord collection. The majority of alleles contain either two or three copies of this repeat. Independent analysis revealed that the c.-174CGGGGCGGGGCG repeat length is strongly associated with CSTB expression (P = 3.14 × 10(-11)) in LCLs only. Examination of both genotyped and imputed single-nucleotide polymorphisms (SNPs) within 2 Mb of CSTB revealed that the dodecamer repeat represents the strongest cis-eQTL for CSTB in LCLs. We conclude that the common two or three copy variation is likely the causative cis-eQTL for CSTB expression variation. More broadly, we propose that polymorphic tandem repeats may represent the causative variation of a fraction of cis-eQTLs in the genome.

  7. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control.

    PubMed

    Bartha, István; Carlson, Jonathan M; Brumme, Chanson J; McLaren, Paul J; Brumme, Zabrina L; John, Mina; Haas, David W; Martinez-Picado, Javier; Dalmau, Judith; López-Galíndez, Cecilio; Casado, Concepción; Rauch, Andri; Günthard, Huldrych F; Bernasconi, Enos; Vernazza, Pietro; Klimkait, Thomas; Yerly, Sabine; O'Brien, Stephen J; Listgarten, Jennifer; Pfeifer, Nico; Lippert, Christoph; Fusi, Nicolo; Kutalik, Zoltán; Allen, Todd M; Müller, Viktor; Harrigan, P Richard; Heckerman, David; Telenti, Amalio; Fellay, Jacques

    2013-10-29

    HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10(-12)). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the 'intermediate phenotype' nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.

  8. Sequence variation in two mitochondrial DNA regions and internal transcribed spacer among isolates of the nematode Oesophagostomum asperum originating from goats in Hunan Province, China.

    PubMed

    Li, F; Hu, T; Duan, N C; Li, W Y; Teng, Q; Li, H; Liu, W; Liu, Y; Cheng, T Y

    2016-01-01

    The present study examined sequence variability in two mitochondrial DNA (mtDNA) regions, namely cytochrome c oxidase subunit 1 (cox1) and NADH dehydrogenase subunit 1 (nad1), and internal transcribed spacer (ITS) of nuclear ribosomal DNA (rDNA) among Oesophagostomum asperum isolates from goats in Hunan Province, China. A portion of the cox1 (pcox1), nad1 (pnad1) genes and the ITS (ITS1+5.8S rDNA+ITS2) rDNA were amplified by polymerase chain reaction (PCR) separately from adult O. asperum individuals and the representative amplicons were subjected to sequencing from both directions. The lengths of pcox1, pnad1 and ITS rDNA were 366 bp, 681 bp and 785 bp, respectively. The A+T contents of gene sequences were 71.5-72% for pcox1, 73.7-74.2% for pnad1 and 58-58.8% for ITS rDNA. Intra-specific sequence variations within O. asperum were 0-1.6% for pcox1, 0-1.9% for pnad1 and 0-1.7% for ITS rDNA, while inter-specific sequence differences among members of the genus Oesophagostomum were significantly higher, being 11.1-12.5%, 13.3-17.7% and 8.5-18.6% for pcox1, pnad1 and ITS rDNA, respectively. Phylogenetic analyses using combined sequences of pcox1 and pnad1, with three different computational algorithms (Bayesian inference, maximum likelihood and maximum parsimony), revealed distinct groups with high statistical support. These findings demonstrated the existence of intra-specific variation in mtDNA and rDNA sequences among O. asperum isolates from goats in Hunan Province, China, and have implications for studying molecular epidemiology and population genetics of O. asperum.

  9. Combined examination of sequence and copy number variations in human deafness genes improves diagnosis for cases of genetic deafness

    PubMed Central

    2014-01-01

    Background Copy number variations (CNVs) are the major type of structural variation in the human genome, and are more common than DNA sequence variations in populations. CNVs are important factors for human genetic and phenotypic diversity. Many CNVs have been associated with either resistance to diseases or identified as the cause of diseases. Currently little is known about the role of CNVs in causing deafness. CNVs are currently not analyzed by conventional genetic analysis methods to study deafness. Here we detected both DNA sequence variations and CNVs affecting 80 genes known to be required for normal hearing. Methods Coding regions of the deafness genes were captured by a hybridization-based method and processed through the standard next-generation sequencing (NGS) protocol using the Illumina platform. Samples hybridized together in the same reaction were analyzed to obtain CNVs. A read depth based method was used to measure CNVs at the resolution of a single exon. Results were validated by the quantitative PCR (qPCR) based method. Results Among 79 sporadic cases clinically diagnosed with sensorineural hearing loss, we identified previously-reported disease-causing sequence mutations in 16 cases. In addition, we identified a total of 97 CNVs (72 CNV gains and 25 CNV losses) in 27 deafness genes. The CNVs included homozygous deletions which may directly give rise to deleterious effects on protein functions known to be essential for hearing, as well as heterozygous deletions and CNV gains compounded with sequence mutations in deafness genes that could potentially harm gene functions. Conclusions We studied how CNVs in known deafness genes may result in deafness. Data provided here served as a basis to explain how CNVs disrupt normal functions of deafness genes. These results may significantly expand our understanding about how various types of genetic mutations cause deafness in humans. PMID:25342930

  10. Phylogenetic Relationships and Genetic Variation in Longidorus and Xiphinema Species (Nematoda: Longidoridae) Using ITS1 Sequences of Nuclear Ribosomal DNA

    PubMed Central

    Ye, Weimin; Szalanski, Allen L.; Robbins, R. T.

    2004-01-01

    Genetic analyses using DNA sequences of nuclear ribosomal DNA ITS1 were conducted to determine the extent of genetic variation within and among Longidorus and Xiphinema species. DNA sequences were obtained from samples collected from Arkansas, California and Australia as well as 4 Xiphinema DNA sequences from GenBank. The sequences of the ITS1 region including the 3' end of the 18S rDNA gene and the 5' end of the 5.8S rDNA gene ranged from 1020 bp to 1244 bp for the 9 Longidorus species, and from 870 bp to 1354 bp for the 7 Xiphinema species. Nucleotide frequencies were: A = 25.5%, C = 21.0%, G = 26.4%, and T = 27.1%. Genetic variation between the two genera had a maximum divergence of 38.6% between X. chambersi and L. crassus. Genetic variation among Xiphinema species ranged from 3.8% between X. diversicaudatum and X. bakeri to 29.9% between X. chambersi and X. italiae. Within Longidorus, genetic variation ranged from 8.9% between L. crassus and L. grandis to 32.4% between L. fragilis and L. diadecturus. Intraspecific genetic variation in X. americanum sensu lato ranged from 0.3% to 1.9%, while genetic variation in L. diadecturus had 0.8% and L. biformis ranged from 0.6% to 10.9%. Identical sequences were obtained between the two populations of L. grandis, and between the two populations of X. bakeri. Phylogenetic analyses based on the ITS1 DNA sequence data were conducted on each genus separately using both maximum parsimony and maximum likelihood analysis. Among the Longidorus taxa, 4 subgroups are supported: L. grandis, L. crassus, and L. elongatus are in one cluster; L. biformis and L. paralongicaudatus are in a second cluster; L. fragilis and L. breviannulatus are in a third cluster; and L. diadecturus is in a fourth cluster. Among the Xiphinema taxa, 3 subgroups are supported: X. americanum with X. chambersi, X. bakeri with X. diversicaudatum, and X. italiae and X. vuittenezi forming a sister group with X. index. The relationships observed in this study

  11. A HIGH COVERAGE GENOME SEQUENCE FROM AN ARCHAIC DENISOVAN INDIVIDUAL

    PubMed Central

    Meyer, Matthias; Kircher, Martin; Gansauge, Marie-Theres; Li, Heng; Racimo, Fernando; Mallick, Swapan; Schraiber, Joshua G.; Jay, Flora; Prüfer, Kay; de Filippo, Cesare; Sudmant, Peter H.; Alkan, Can; Fu, Qiaomei; Do, Ron; Rohland, Nadin; Tandon, Arti; Siebauer, Michael; Green, Richard E.; Bryc, Katarzyna; Briggs, Adrian W.; Stenzel, Udo; Dabney, Jesse; Shendure, Jay; Kitzman, Jacob; Hammer, Michael F.; Shunkov, Michael V.; Derevianko, Anatoli P.; Patterson, Nick; Andrés, Aida M.; Eichler, Evan E.; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante

    2013-01-01

    We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans. PMID:22936568

  12. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    PubMed

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. High-Throughput Sequencing of a South American Amerindian

    PubMed Central

    Almeida, Renan; Alencar, Dayse O.; Barbosa, Maria Silvanira; Gusmão, Leonor; Silva, Wilson A.; de Souza, Sandro J.; Silva, Artur; Ribeiro-dos-Santos, Ândrea; Darnet, Sylvain; Santos, Sidney

    2013-01-01

    The emergence of next-generation sequencing technologies allowed access to the vast amounts of information that are contained in the human genome. This information has contributed to the understanding of individual and population-based variability and improved the understanding of the evolutionary history of different human groups. However, the genome of a representative of the Amerindian populations had not been previously sequenced. Thus, the genome of an individual from a South American tribe was completely sequenced to further the understanding of the genetic variability of Amerindians. A total of 36.8 giga base pairs (Gbp) were sequenced and aligned with the human genome. These Gbp corresponded to 95.92% of the human genome with an estimated miscall rate of 0.0035 per sequenced bp. The data obtained from the alignment were used for SNP (single-nucleotide) and INDEL (insertion-deletion) calling, which resulted in the identification of 502,017 polymorphisms, of which 32,275 were potentially new high-confidence SNPs and 33,795 new INDELs, specific of South Native American populations. The authenticity of the sample as a member of the South Native American populations was confirmed through the analysis of the uniparental (maternal and paternal) lineages. The autosomal comparison distinguished the investigated sample from others continental populations and revealed a close relation to the Eastern Asian populations and Aboriginal Australian. Although, the findings did not discard the classical model of America settlement; it brought new insides to the understanding of the human population history. The present study indicates a remarkable genetic variability in human populations that must still be identified and contributes to the understanding of the genetic variability of South Native American populations and of the human populations history. PMID:24386182

  14. Ultra-deep Illumina sequencing accurately identifies MHC class IIb alleles and provides evidence for copy number variation in the guppy (Poecilia reticulata).

    PubMed

    Lighten, Jackie; van Oosterhout, Cock; Paterson, Ian G; McMullan, Mark; Bentzen, Paul

    2014-07-01

    We address the bioinformatic issue of accurately separating amplified genes of the major histocompatibility complex (MHC) from artefacts generated during high-throughput sequencing workflows. We fit observed ultra-deep sequencing depths (hundreds to thousands of sequences per amplicon) of allelic variants to expectations from genetic models of copy number variation (CNV). We provide a simple, accurate and repeatable method for genotyping multigene families, evaluating our method via analyses of 209 b of MHC class IIb exon 2 in guppies (Poecilia reticulata). Genotype repeatability for resequenced individuals (N = 49) was high (100%) within the same sequencing run. However, repeatability dropped to 83.7% between independent runs, either because of lower mean amplicon sequencing depth in the initial run or random PCR effects. This highlights the importance of fully independent replicates. Significant improvements in genotyping accuracy were made by greatly reducing type I genotyping error (i.e. accepting an artefact as a true allele), which may occur when using low-depth allele validation thresholds used by previous methods. Only a small amount (4.9%) of type II error (i.e. rejecting a genuine allele as an artefact) was detected through fully independent sequencing runs. We observed 1-6 alleles per individual, and evidence of sharing of alleles across loci. Variation in the total number of MHC class II loci among individuals, both among and within populations was also observed, and some genotypes appeared to be partially hemizygous; total allelic dosage added up to an odd number of allelic copies. Collectively, observations provide evidence of MHC CNV and its complex basis in natural populations.

  15. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics.

  16. Characterization and Sequence Variation in the rDNA Region of Six Nematode Species of the Genus Longidorus (Nematoda)

    PubMed Central

    De Luca, F.; Reyes, A.; Grunder, J.; Kunz, P.; Agostinelli, A.; De Giorgi, C.; Lamberti, F.

    2004-01-01

    Total DNA was isolated from individual nematodes of the species Longidorus helveticus, L. macrosoma, L. arthensis, L. profundorum, L. elongatus, and L. raskii collected in Switzerland. The ITS region and D1-D2 expansion segments of the 26S rDNA were amplified and cloned. The sequences obtained were aligned in order to investigate sequence diversity and to infer the phylogenetic relationships among the six Longidorus species. D1-D2 sequences were more conserved than the ITS sequences that varied widely in primary structure and length, and no consensus was observed. Phylogenetic analyses using the neighbor-joining, maximum parsimony and maximum likelihood methods were performed with three different sequence data sets: ITS1-ITS2, 5.8S-D1-D2, and combining ITS1-ITS2+5.8S-D1-D2 sequences. All multiple alignments yielded similar basic trees supporting the existence of the six species established using morphological characters. These sequence data also provided evidence that the different regions of the rDNA are characterized by different evolution rates and by different factors associated with the generation of extreme size variation. PMID:19262800

  17. Validation of high throughput sequencing and microbial forensics applications.

    PubMed

    Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.

  18. Validation of high throughput sequencing and microbial forensics applications

    PubMed Central

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166

  19. Variation.

    ERIC Educational Resources Information Center

    Hamilton City Board of Education (Ontario).

    Suggestions for studying the topic of variation of individuals and objects (balls) to help develop elementary school students' measurement, comparison, classification, evaluation, and data collection and recording skills are made. General suggestions of variables that can be investigated are made for the study of human variation. Twelve specific…

  20. Illumina sequencing of 16S rRNA tag revealed spatial variations of bacterial communities in a mangrove wetland.

    PubMed

    Jiang, Xiao-Tao; Peng, Xin; Deng, Guan-Hua; Sheng, Hua-Fang; Wang, Yu; Zhou, Hong-Wei; Tam, Nora Fung-Yee

    2013-07-01

    The microbial community plays an essential role in the high productivity in mangrove wetlands. A proper understanding of the spatial variations of microbial communities will provide clues about the underline mechanisms that structure microbial groups and the isolation of bacterial strains of interest. In the present study, the diversity and composition of the bacterial community in sediments collected from four locations, namely mudflat, edge, bulk, and rhizosphere, within the Mai Po Ramsar Wetland in Hong Kong, SAR, China were compared using the barcoded Illumina paired-end sequencing technique. Rarefaction results showed that the bulk sediment inside the mature mangrove forest had the highest bacterial α-diversity, while the mudflat sediment without vegetation had the lowest. The comparison of β-diversity using principal component analysis and principal coordinate analysis with UniFrac metrics both showed that the spatial effects on bacterial communities were significant. All sediment samples could be clustered into two major groups, inner (bulk and rhizosphere sediments collected inside the mangrove forest) and outer mangrove sediments (the sediments collected at the mudflat and the edge of the mangrove forest). With the linear discriminate analysis scores larger than 3, four phyla, namely Actinobacteria, Acidobacteria, Nitrospirae, and Verrucomicrobia, were enriched in the nutrient-rich inner mangrove sediments, while abundances of Proteobacteria and Deferribacterias were higher in outer mangrove sediments. The rhizosphere effect of mangrove plants was also significant, which had a lower α-diversity, a higher amount of Nitrospirae, and a lower abundance of Proteobacteria than the bulk sediment nearby.

  1. Allelic sequence variation of the HLA-DQ loci: relationship to serology and to insulin-dependent diabetes susceptibility.

    PubMed Central

    Horn, G T; Bugawan, T L; Long, C M; Erlich, H A

    1988-01-01

    Analysis of sequence variation in the polymorphic second exon of the major histocompatibility complex genes HLA-DQ alpha and -DQ beta has revealed 8 allelic variants at the alpha locus and 13 variants at the beta locus. Correlation of sequence variation with serologic typing suggests that the DQw2, DQw3, and DQ(blank) types are determined by the DQ beta subunit, while the DQw1 specificity is determined by DQ alpha. The nature of the amino acid at position 57 in the DQ beta subunit is correlated with susceptibility to insulin-dependent diabetes mellitus. This region of the DQ beta chain contains shared peptides with Epstein-Barr virus and rubella virus. PMID:2842756

  2. Integrating nested PCR with high-throughput sequencing to characterize mutations of HBV genome in low viral load samples.

    PubMed

    Wang, Xianjun; Xu, Lihui; Chen, Yueming; Liu, Anbing; Wang, Liqian; Xu, Peisong; Liu, Yunhui; Li, Lei; Meng, Fei

    2017-07-01

    Due to the low viral load of hepatitis B virus (HBV) in plasma samples, conventional techniques have limitations to the detection of antiviral resistance mutations. To solve the problem, we developed a fast, highly sensitive, and accurate method to sequence the HBV whole-genome sequencing in plasma samples which had various viral loads from very low to high.Twenty-one plasma samples were collected from patients who were carriers of HBV from the Hangzhou First People's Hospital. Two pairs of conserved, overlapping, nested primers were used to amplify and sequence the whole HBV genome in 8 plasma samples with different viral loads. High-throughput sequencing was performed on Illumina MiSeq platform. Concomitantly, 3 samples were directly sequenced without PCR amplification. We compared amplicon-sequencing with direct sequencing to develop a method for amplifying and characterizing the whole genome of HBV.HBV genome was amplified from all samples and verified by Sanger sequencing, regardless of the viral loads. Sequencing results revealed that only a few reads were mapped to the HBV genome following direct sequencing, while the amplicon-sequencing reads had a good coverage and depth. We identified 50 intrahost single nucleotide variations (iSNVs), 14 of which were low frequency mutations. Interestingly, iSNVs were more common in low viral load samples than in high viral load samples, and mutations in the reverse transcriptase (RT) region were most prevalent.We conclude that amplicon-sequencing is not only a practical method to detect HBV infection with a high sensitivity and accuracy but also enables to detect mutations in the HBV genome in low viral load samples from HBV-infected patients. Thus, our findings provide a new diagnosis method of HBV infection, which is capable of detection of low frequent mutations in low viral load samples.

  3. Conservation of the C-type lectin fold for accommodating massive sequence variation in archaeal diversity-generating retroelements.

    PubMed

    Handa, Sumit; Paul, Blair G; Miller, Jeffery F; Valentine, David L; Ghosh, Partho

    2016-08-31

    Diversity-generating retroelements (DGRs) provide organisms with a unique means for adaptation to a dynamic environment through massive protein sequence variation. The potential scope of this variation exceeds that of the vertebrate adaptive immune system. DGRs were known to exist only in viruses and bacteria until their recent discovery in archaea belonging to the 'microbial dark matter', specifically in organisms closely related to Nanoarchaeota. However, Nanoarchaeota DGR variable proteins were unassignable to known protein folds and apparently unrelated to characterized DGR variable proteins. To address the issue of how Nanoarchaeota DGR variable proteins accommodate massive sequence variation, we determined the 2.52 Å resolution limit crystal structure of one such protein, AvpA, which revealed a C-type lectin (CLec)-fold that organizes a putative ligand-binding site that is capable of accommodating 10(13) sequences. This fold is surprisingly reminiscent of the CLec-folds of viral and bacterial DGR variable protein, but differs sufficiently to define a new CLec-fold subclass, which is consistent with early divergence between bacterial and archaeal DGRs. The structure also enabled identification of a group of AvpA-like proteins in multiple putative DGRs from uncultivated archaea. These variable proteins may aid Nanoarchaeota and these uncultivated archaea in symbiotic relationships. Our results have uncovered the widespread conservation of the CLec-fold in viruses, bacteria, and archaea for accommodating massive sequence variation. In addition, to our knowledge, this is the first report of an archaeal CLec-fold protein.

  4. ITS2-rDNA Sequence Variation of Phlebotomus sergenti s.l. (Dip: Psychodidae) Populations in Iran

    PubMed Central

    Moin-Vaziri, Vahideh; Oshaghi, Mohammad Ali; Yaghoobi-Ershadi, Mohammad Reza; Derakhshandeh-Peykar, Pupak; Abaei, Mohammad Reza; Mohtarami, Fatemeh; Zahraei-Ramezani, Ali Reza; Nadim, Aboulhassan

    2016-01-01

    Background: Phlebotomus sergenti s.l. is considered the most likely vector of Leishmania tropica in Iran. Although two morphotypes- P. sergenti sergenti (A) and P. sergenti similis (B)-have been formally described, further morphological and a molecular analysis of mitochondrial cytochrome oxidase I (mtDNA-COI) gene revealed inconsistencies and suggests that the variation between the morphotypes is intraspecific and the morphotypes might be identical species. Methods: We examined the sequence of the ITS2-rDNA of Iranian specimens of P. sergenti s.l., comprising P. cf sergenti, P. cf similis, and intermediate morphotypes, together with available data in Genbank. Results: Sequence analysis showed 5.2% variation among P. sergenti s.l. morphotypes. Almost half of the variation was due to the number of an AT microsatellite repeats in the center of the spacer. Nine haplotypes were found in the species constructing three main lineages corresponding to the origin of the colonies located in southwest (SW), northeast (NE), and northwest-center-southeast (NCS). Lineages NCS and NE included both typical P. cf sergenti and P. cf similis and intermediate morphotypes. Conclusion: Phylogenetic sequence analysis revealed that, except for one Iranian sample, which was close to the European samples, other Iranian haplotypes were associated with the northeastern Mediterranean populations including Turkey, Cyprus, Syria, and Pakistan. Similar to the sequences of mtDNA COI gene, ITS2 sequences could not resolve P. sergenti from P. similis and did not support the possible existence of sibling species or subspecies within P. sergenti s.l.. PMID:28032098

  5. Protein engineering and stabilization from sequence statistics: variation and covariation analysis.

    PubMed

    Durani, Venuka; Magliery, Thomas J

    2013-01-01

    The concepts of consensus and correlation in multiple sequence alignments (MSAs) have been used in the past to understand and engineer proteins. However, there are multiple ways of acquiring MSA databases and also numerous mathematical metrics that can be applied to calculate each of the parameters. This chapter describes an overall methodology that we have chosen to employ for acquiring and statistically analyzing MSAs. We have provided a step-by-step protocol for calculating relative entropy and mutual information metrics and describe how they can be used to predict mutations that have a high probability of stabilizing a protein. This protocol allows for flexibility for modification of formulae and parameters without using anything more complicated than Microsoft Excel. We have also demonstrated various aspects of data analysis by carrying out a sample analysis on the BPTI-Kunitz family of proteins and identified mutations that would be predicted to stabilize this protein based on consensus and correlation values. Copyright © 2013 Elsevier Inc. All rights reserved.

  6. Characterization of mitochondrial control region in Merlucciidae: sequence variation and molecular phylogeny.

    PubMed

    Crous, Marta; Roldán, María I

    2015-06-12

    In order to describe the structure and evolution of Merlucciidae and related Gadiformes mitochondrial control region we analysed 470 bp of 31 taxa belonging to 28 different species. The general structure and conserved sequence blocks observed in Gadiformes mitochondrial control region are similar to those present in other teleost fishes. The length of this segment is variable among related species due to the presence of numerous indels at domain I. Domain II is the most conserved region with a high G content. The GTGGG-box is absent in all Merluccius and seven other Gadidae species. Several methods of phylogenetic analyses has revealed the monophyly of Gadiformes, Gadinae and Merlucciidae. Merlucciidae is most closely related to Gadidae. Within Merlucciidae, American and Euroafrican clades show similar levels of differentiation to those within Gadinae where Trisopterus and Micromesistius are sister taxa. Genetic distance values for Merluccius subspecies pairs are less than half of those between species, comparable to intra specific differentiation levels in marine fish species.

  7. Variation of the internal transcribed spacer 1 sequence within individual strains and among different strains of Neospora caninum.

    PubMed

    Gondim, Luis F P; Laski, Paul; Gao, Liying; McAllister, Milton M

    2004-02-01

    Small differences have been reported in the internal transcribed spacer 1 (ITS1) region among strains of Neospora caninum. We compared ITS1 sequences among 6 N. caninum strains analyzed in our laboratory, including 2 strains that have not been examined previously (NC-Illinois and NC-Bahia). Five sequences showed 100% similarity and also were identical to 7 of 11 sequences that were previously reported by others. In contrast, initial attempts to sequence the ITS1 of NC-Bahia generated 12 nucleotide differences compared with the other 5 strains, and several ambiguous bases. However, the single band containing the ITS1 region, as observed after electrophoresis on a 2% agarose gel, became divided into 2 distinct bands when reanalyzed using 5 or 10% polyacrylamide gel electrophoresis (PAGE), and the ITS1 within these separate bands were sequenced without ambiguity. The other 5 N. caninum strains were also reexamined using PAGE, and in each strain 2 distinct bands were discovered. In comparison, 2 strains of Toxoplasma gondii continued to show only 1 band when examined using PAGE. The ITS1 sequence of NC-Bahia, from Brazil, differs in several base pairs from those of North American and European strains of N. caninum. Intrastrain variation of the ITS1 region appears to be common in N. caninum, in contrast to T. gondii.

  8. High-Throughput Sequencing: A Roadmap Toward Community Ecology

    PubMed Central

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-01-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines. PMID:23610649

  9. High-resolution heteronuclear multi-dimensional NMR spectroscopy in magnetic fields with unknown spatial variations.

    PubMed

    Zhang, Zhiyong; Huang, Yuqing; Smith, Pieter E S; Wang, Kaiyu; Cai, Shuhui; Chen, Zhong

    2014-05-01

    Heteronuclear NMR spectroscopy is an extremely powerful tool for determining the structures of organic molecules and is of particular significance in the structural analysis of proteins. In order to leverage the method's potential for structural investigations, obtaining high-resolution NMR spectra is essential and this is generally accomplished by using very homogeneous magnetic fields. However, there are several situations where magnetic field distortions and thus line broadening is unavoidable, for example, the samples under investigation may be inherently heterogeneous, and the magnet's homogeneity may be poor. This line broadening can hinder resonance assignment or even render it impossible. We put forth a new class of pulse sequences for obtaining high-resolution heteronuclear spectra in magnetic fields with unknown spatial variations based on distant dipolar field modulations. This strategy's capabilities are demonstrated with the acquisition of high-resolution 2D gHSQC and gHMBC spectra. These sequences' performances are evaluated on the basis of their sensitivities and acquisition efficiencies. Moreover, we show that by encoding and decoding NMR observables spatially, as is done in ultrafast NMR, an extra dimension containing J-coupling information can be obtained without increasing the time necessary to acquire a heteronuclear correlation spectrum. Since the new sequences relax magnetic field homogeneity constraints imposed upon high-resolution NMR, they may be applied in portable NMR sensors and studies of heterogeneous chemical and biological materials.

  10. DNA sequence variation of wild barley Hordeum spontaneum (L.) across environmental gradients in Israel.

    PubMed

    Bedada, G; Westerbergh, A; Nevo, E; Korol, A; Schmid, K J

    2014-06-01

    Wild barley Hordeum spontaneum (L.) shows a wide geographic distribution and ecological diversity. A key question concerns the spatial scale at which genetic differentiation occurs and to what extent it is driven by natural selection. The Levant region exhibits a strong ecological gradient along the North-South axis, with numerous small canyons in an East-West direction and with small-scale environmental gradients on the opposing North- and South-facing slopes. We sequenced 34 short genomic regions in 54 accessions of wild barley collected throughout Israel and from the opposing slopes of two canyons. The nucleotide diversity of the total sample is 0.0042, which is about two-thirds of a sample from the whole species range (0.0060). Thirty accessions collected at 'Evolution Canyon' (EC) at Nahal Oren, close to Haifa, have a nucleotide diversity of 0.0036, and therefore harbor a large proportion of the genetic diversity. There is a high level of genetic clustering throughout Israel and within EC, which roughly differentiates the slopes. Accessions from the hot and dry South-facing slope have significantly reduced genetic diversity and are genetically more distinct from accessions from the North-facing slope, which are more similar to accessions from other regions in Northern Israel. Statistical population models indicate that wild barley within the EC consist of three separate genetic clusters with substantial gene flow. The data indicate a high level of population structure at large and small geographic scales that shows isolation-by-distance, and is also consistent with ongoing natural selection contributing to genetic differentiation at a small geographic scale.

  11. DNA sequence variation of wild barley Hordeum spontaneum (L.) across environmental gradients in Israel

    PubMed Central

    Bedada, G; Westerbergh, A; Nevo, E; Korol, A; Schmid, K J

    2014-01-01

    Wild barley Hordeum spontaneum (L.) shows a wide geographic distribution and ecological diversity. A key question concerns the spatial scale at which genetic differentiation occurs and to what extent it is driven by natural selection. The Levant region exhibits a strong ecological gradient along the North–South axis, with numerous small canyons in an East–West direction and with small-scale environmental gradients on the opposing North- and South-facing slopes. We sequenced 34 short genomic regions in 54 accessions of wild barley collected throughout Israel and from the opposing slopes of two canyons. The nucleotide diversity of the total sample is 0.0042, which is about two-thirds of a sample from the whole species range (0.0060). Thirty accessions collected at ‘Evolution Canyon' (EC) at Nahal Oren, close to Haifa, have a nucleotide diversity of 0.0036, and therefore harbor a large proportion of the genetic diversity. There is a high level of genetic clustering throughout Israel and within EC, which roughly differentiates the slopes. Accessions from the hot and dry South-facing slope have significantly reduced genetic diversity and are genetically more distinct from accessions from the North-facing slope, which are more similar to accessions from other regions in Northern Israel. Statistical population models indicate that wild barley within the EC consist of three separate genetic clusters with substantial gene flow. The data indicate a high level of population structure at large and small geographic scales that shows isolation-by-distance, and is also consistent with ongoing natural selection contributing to genetic differentiation at a small geographic scale. PMID:24619177

  12. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    PubMed

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  13. Phylogenetic lineage of Tobacco leaf curl virus in Korea and estimation of recombination events implicated in their sequence variation.

    PubMed

    Park, Jungan; Lee, Hyejung; Kim, Mi-Kyung; Kwak, Hae-Ryun; Auh, Chung-Kyoon; Lee, Kyeong-Yeoll; Kim, Sunghan; Choi, Hong-Soo; Lee, Sukchan

    2011-08-01

    New strains of Tobacco leaf curl virus (TbLCV) were isolated from tomato plants in four different local communities of Korea, and hence were designated TbLCV-Kr. Phylogenetic analysis of the sequences of the whole genome and of individual ORFs of these viruses indicated that they are closely related to the Tobacco leaf curl Japan virus (TbLCJV) cluster, which includes Honeysuckle yellow vein virus (HYVV), Honeysuckle yellow vein mosaic virus (HYVMV), and TbLCJV isolates. Four putative recombination events were recognized within these virus sequences, suggesting that the sequence variations observed in these viruses may be attributable to intraspecific and interspecific recombination events involving some TbLCV-Kr isolates, Papaya leaf curl virus (PaLCV), and a local isolate of Tomato yellow leaf curl virus (TYLCV). Copyright © 2011 Elsevier B.V. All rights reserved.

  14. The Effects of Sequence Variation on Genome-wide NRF2 Binding—New Target Genes and Regulatory SNPs

    PubMed Central

    Kuosmanen, Suvi M.; Viitala, Sari; Laitinen, Tuomo; Peräkylä, Mikael; Pölönen, Petri; Kansanen, Emilia; Leinonen, Hanna; Raju, Suresh; Wienecke-Baldacchino, Anke; Närvänen, Ale; Poso, Antti; Heinäniemi, Merja; Heikkinen, Sami; Levonen, Anna-Liisa

    2016-01-01

    Transcription factor binding specificity is crucial for proper target gene regulation. Motif discovery algorithms identify the main features of the binding patterns, but the accuracy on the lower affinity sites is often poor. Nuclear factor E2-related factor 2 (NRF2) is a ubiquitous redox-activated transcription factor having a key protective role against endogenous and exogenous oxidant and electrophile stress. Herein, we decipher the effects of sequence variation on the DNA binding sequence of NRF2, in order to identify both genome-wide binding sites for NRF2 and disease-associated regulatory SNPs (rSNPs) with drastic effects on NRF2 binding. Interactions between NRF2 and DNA were studied using molecular modelling, and NRF2 chromatin immunoprecipitation-sequence datasets together with protein binding microarray measurements were utilized to study binding sequence variation in detail. The binding model thus generated was used to identify genome-wide binding sites for NRF2, and genomic binding sites with rSNPs that have strong effects on NRF2 binding and reside on active regulatory elements in human cells. As a proof of concept, miR-126–3p and -5p were identified as NRF2 target microRNAs, and a rSNP (rs113067944) residing on NRF2 target gene (Ferritin, light polypeptide, FTL) promoter was experimentally verified to decrease NRF2 binding and result in decreased transcriptional activity. PMID:26826707

  15. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    SciTech Connect

    Le Coq, Johanne; Ghosh, Partho

    2012-06-19

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.

  16. A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny

    PubMed Central

    Pucker, Boas; Holtgräwe, Daniela; Rosleff Sörensen, Thomas; Stracke, Ralf; Viehöver, Prisca

    2016-01-01

    Arabidopsis thaliana is the most important model organism for fundamental plant biology. The genome diversity of different accessions of this species has been intensively studied, for example in the 1001 genome project which led to the identification of many small nucleotide polymorphisms (SNPs) and small insertions and deletions (InDels). In addition, presence/absence variation (PAV), copy number variation (CNV) and mobile genetic elements contribute to genomic differences between A. thaliana accessions. To address larger genome rearrangements between the A. thaliana reference accession Columbia-0 (Col-0) and another accession of about average distance to Col-0, we created a de novo next generation sequencing (NGS)-based assembly from the accession Niederzenz-1 (Nd-1). The result was evaluated with respect to assembly strategy and synteny to Col-0. We provide a high quality genome sequence of the A. thaliana accession (Nd-1, LXSY01000000). The assembly displays an N50 of 0.590 Mbp and covers 99% of the Col-0 reference sequence. Scaffolds from the de novo assembly were positioned on the basis of sequence similarity to the reference. Errors in this automatic scaffold anchoring were manually corrected based on analyzing reciprocal best BLAST hits (RBHs) of genes. Comparison of the final Nd-1 assembly to the reference revealed duplications and deletions (PAV). We identified 826 insertions and 746 deletions in Nd-1. Randomly selected candidates of PAV were experimentally validated. Our Nd-1 de novo assembly allowed reliable identification of larger genic and intergenic variants, which was difficult or error-prone by short read mapping approaches alone. While overall sequence similarity as well as synteny is very high, we detected short and larger (affecting more than 100 bp) differences between Col-0 and Nd-1 based on bi-directional comparisons. The de novo assembly provided here and additional assemblies that will certainly be published in the future will allow to

  17. Multiplexed Metagenomic Deep Sequencing To Analyze the Composition of High-Priority Pathogen Reagents

    PubMed Central

    Wilson, Michael R.; Stenglein, Mark D.; Olejnik, Judith; Rennick, Linda J.; Nambulli, Sham; Feldmann, Friederike; Duprex, W. Paul

    2016-01-01

    ABSTRACT Laboratories studying high-priority pathogens need comprehensive methods to confirm microbial species and strains while also detecting contamination. Metagenomic deep sequencing (MDS) inventories nucleic acids present in laboratory stocks, providing an unbiased assessment of pathogen identity, the extent of genomic variation, and the presence of contaminants. Double-stranded cDNA MDS libraries were constructed from RNA extracted from in vitro-passaged stocks of six viruses (La Crosse virus, Ebola virus, canine distemper virus, measles virus, human respiratory syncytial virus, and vesicular stomatitis virus). Each library was dual indexed and pooled for sequencing. A custom bioinformatics pipeline determined the organisms present in each sample in a blinded fashion. Single nucleotide variant (SNV) analysis identified viral isolates. We confirmed that (i) each sample contained the expected microbe, (ii) dual indexing of the samples minimized false assignments of individual sequences, (iii) multiple viral and bacterial contaminants were present, and (iv) SNV analysis of the viral genomes allowed precise identification of the viral isolates. MDS can be multiplexed to allow simultaneous and unbiased interrogation of mixed microbial cultures and (i) confirm pathogen identity, (ii) characterize the extent of genomic variation, (iii) confirm the cell line used for virus propagation, and (iv) assess for contaminating microbes. These assessments ensure the true composition of these high-priority reagents and generate a comprehensive database of microbial genomes studied in each facility. MDS can serve as an integral part of a pathogen-tracking program which in turn will enhance sample security and increase experimental rigor and precision. IMPORTANCE Both the integrity and reproducibility of experiments using select agents depend in large part on unbiased validation to ensure the correct identity and purity of the species in question. Metagenomic deep sequencing

  18. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data.

    PubMed

    Carver, Tim; Harris, Simon R; Berriman, Matthew; Parkhill, Julian; McQuillan, Jacqueline A

    2012-02-15

    High-throughput sequencing (HTS) technologies have made low-cost sequencing of large numbers of samples commonplace. An explosion in the type, not just number, of sequencing experiments has also taken place including genome re-sequencing, population-scale variation detection, whole transcriptome sequencing and genome-wide analysis of protein-bound nucleic acids. We present Artemis as a tool for integrated visualization and computational analysis of different types of HTS datasets in the context of a reference genome and its corresponding annotation. Artemis is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute websites: http://www.sanger.ac.uk/resources/software/artemis/.

  19. ClinVar: public archive of relationships among sequence variation and human phenotype

    PubMed Central

    Landrum, Melissa J.; Lee, Jennifer M.; Riley, George R.; Jang, Wonhee; Rubinstein, Wendy S.; Church, Deanna M.; Maglott, Donna R.

    2014-01-01

    ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on the phenotypic descriptions maintained in MedGen (http://www.ncbi.nlm.nih.gov/medgen). Each ClinVar record represents the submitter, the variation and the phenotype, i.e. the unit that is assigned an accession of the format SCV000000000.0. The submitter can update the submission at any time, in which case a new version is assigned. To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations. Data in ClinVar are available in multiple formats, including html, download as XML, VCF or tab-delimited subsets. Data from ClinVar are provided as annotation tracks on genomic RefSeqs and are used in tools such as Variation Reporter (http://www.ncbi.nlm.nih.gov/variation/tools/reporter), which reports what is known about variation based on user-supplied locations. PMID:24234437

  20. Reprint of "Identification of staphylococcal species based on variations in protein sequences (mass spectrometry) and DNA sequence (sodA microarray)".

    PubMed

    Kooken, Jennifer; Fox, Karen; Fox, Alvin; Altomare, Diego; Creek, Kim; Wunschel, David; Pajares-Merino, Sara; Martínez-Ballesteros, Ilargi; Garaizar, Javier; Oyarzabal, Omar; Samadpour, Mansour

    2014-01-01

    This report is among the first using sequence variation in newly discovered protein markers for staphylococcal (or indeed any other bacterial) speciation. Variation, at the DNA sequence level, in the sodA gene (commonly used for staphylococcal speciation) provided excellent correlation. Relatedness among strains was also assessed using protein profiling using microcapillary electrophoresis and pulsed field electrophoresis. A total of 64 strains were analyzed including reference strains representing the 11 staphylococcal species most commonly isolated from man (Staphylococcus aureus and 10 coagulase negative species [CoNS]). Matrix assisted time of flight ionization/ionization mass spectrometry (MALDI TOF MS) and liquid chromatography-electrospray ionization tandem mass spectrometry (LC ESI MS/MS) were used for peptide analysis of proteins isolated from gel bands. Comparison of experimental spectra of unknowns versus spectra of peptides derived from reference strains allowed bacterial identification after MALDI TOF MS analysis. After LC-MS/MS analysis of gel bands bacterial speciation was performed by comparing experimental spectra versus virtual spectra using the software X!Tandem. Finally LC-MS/MS was performed on whole proteomes and data analysis also employing X!tandem. Aconitate hydratase and oxoglutarate dehydrogenase served as marker proteins on focused analysis after gel separation. Alternatively on full proteomics analysis elongation factor Tu generally provided the highest confidence in staphylococcal speciation.

  1. Variation in Campylobacter Multilocus Sequence Typing Subtypes from Chickens as Detected on Three Plating Media.

    PubMed

    Berrang, M E; Ladely, S R; Meinersmann, R J; Line, J E; Oakley, B B; Cox, N A

    2016-11-01

    The objective of this study was to compare subtypes of Campylobacter jejuni and Campylobacter coli detected on three selective Campylobacter plating media to determine whether each medium selected for different subtypes. Fifty ceca and 50 carcasses (representing 50 flocks) were collected from the evisceration line in a commercial broiler processing plant. Campylobacter was cultured and isolated from cecal contents and carcass rinses on Campy-Cefex, Campy Line, and RF Campylobacter jejuni/coli agars. When a positive result was obtained with all three media, one colony of the most prevalent morphology on each medium was selected for further analysis by full genome sequencing and multilocus sequence typing. Sequence types were assigned according to PubMLST. A total of 49 samples were positive for Campylobacter on all three media. Forty samples contained only C. jejuni , three had only C. coli , and both species were detected in six samples. From 71% of samples, Campylobacter isolates of the same sequence type were recovered on all three media. From 81.6% of samples, isolates were all from the same clonal complex. From significantly fewer samples (26%, P < 0.01), one medium recovered an isolate with a sequence type different from the type recovered on the other two media. When multiple sequence types were detected, six times the medium with the odd sequence type was Campy-Cefex, four times it was Campy-Line, and six times it was RF Campylobacter jejuni/coli . From one sample, three sequence types were detected. In most cases, all three plating media allowed detection of the same type of Campylobacter from complex naturally contaminated chicken samples.

  2. High-speed lossless compression for angiography image sequences

    NASA Astrophysics Data System (ADS)

    Kennedy, Jonathon M.; Simms, Michael; Kearney, Emma; Dowling, Anita; Fagan, Andrew; O'Hare, Neil J.

    2001-05-01

    High speed processing of large amounts of data is a requirement for many diagnostic quality medical imaging applications. A demanding example is the acquisition, storage and display of image sequences in angiography. The functional performance requirements for handling angiography data were identified. A new lossless image compression algorithm was developed, implemented in C++ for the Intel Pentium/MS-Windows environment and optimized for speed of operation. Speeds of up to 6M pixels per second for compression and 12M pixels per second for decompression were measured. This represents an improvement of up to 400% over the next best high-performance algorithm (LOCO-I) without significant reduction in compression ratio. Performance tests were carried out at St. James's Hospital using actual angiography data. Results were compared with the lossless JPEG standard and other leading methods such as JPEG-LS (LOCO-I) and the lossless wavelet approach proposed for JPEG 2000. Our new algorithm represents a significant improvement in the performance of lossless image compression technology without using specialized hardware. It has been applied successfully to image sequence decompression at video rate for angiography, one of the most challenging application areas in medical imaging.

  3. Optimization of shRNA inhibitors by variation of the terminal loop sequence.

    PubMed

    Schopman, Nick C T; Liu, Ying Poi; Konstantinova, Pavlina; ter Brake, Olivier; Berkhout, Ben

    2010-05-01

    Gene silencing by RNA interference (RNAi) can be achieved by intracellular expression of a short hairpin RNA (shRNA) that is processed into the effective small interfering RNA (siRNA) inhibitor by the RNAi machinery. Previous studies indicate that shRNA molecules do not always reflect the activity of corresponding synthetic siRNAs that attack the same target sequence. One obvious difference between these two effector molecules is the hairpin loop of the shRNA. Most studies use the original shRNA design of the pSuper system, but no extensive study regarding optimization of the shRNA loop sequence has been performed. We tested the impact of different hairpin loop sequences, varying in size and structure, on the activity of a set of shRNAs targeting HIV-1. We were able to transform weak inhibitors into intermediate or even strong shRNA inhibitors by replacing the loop sequence. We demonstrate that the efficacy of these optimized shRNA inhibitors is improved significantly in different cell types due to increased siRNA production. These results indicate that the loop sequence is an essential part of the shRNA design. The optimized shRNA loop sequence is generally applicable for RNAi knockdown studies, and will allow us to develop a more potent gene therapy against HIV-1.

  4. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    PubMed Central

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse; Hansen, Thomas Arn; Kjartansdóttir, Kristín Rós; Guldberg Frøslev, Tobias; Snogdal Boutrup, Torsten; Nielsen, Lars Peter; Willerslev, Eske; Hansen, Anders J.

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses. PMID:26493184

  5. Structural mechanisms underlying sequence-dependent variations in GAG affinities of decorin binding protein A, a Borrelia burgdorferi adhesin.

    PubMed

    Morgan, Ashli M; Wang, Xu

    2015-05-01

    Decorin-binding protein A (DBPA) is an important surface adhesin of the bacterium Borrelia burgdorferi, the causative agent of Lyme disease. DBPA facilitates the bacteria's colonization of human tissue by adhering to glycosaminoglycan (GAG), a sulfated polysaccharide. Interestingly, DBPA sequence variation among different strains of Borrelia spirochetes is high, resulting in significant differences in their GAG affinities. However, the structural mechanisms contributing to these differences are unknown. We determined the solution structures of DBPAs from strain N40 of B. burgdorferi and strain PBr of Borrelia garinii, two DBPA variants whose GAG affinities deviate significantly from strain B31, the best characterized version of DBPA. Our structures revealed that significant differences exist between PBr DBPA and B31/N40 DBPAs. In particular, the C-terminus of PBr DBPA, unlike C-termini from B31 and N40 DBPAs, is positioned away from the GAG-binding pocket and the linker between helices one and two of PBr DBPA is highly structured and retracted from the GAG-binding pocket. The repositioning of the C-terminus allowed the formation of an extra GAG-binding epitope in PBr DBPA and the retracted linker gave GAG ligands more access to the GAG-binding epitopes than other DBPAs. Characterization of GAG ligands' interactions with wild-type (WT) PBr and mutants confirmed the importance of the second major GAG-binding epitope and established the fact that the two epitopes are independent of one another and the new epitope is as important to GAG binding as the traditional epitope.

  6. Widespread sequence variations in VAMP1 across vertebrates suggest a potential selective pressure from botulinum neurotoxins.

    PubMed

    Peng, Lisheng; Adler, Michael; Demogines, Ann; Borrell, Andrew; Liu, Huisheng; Tao, Liang; Tepp, William H; Zhang, Su-Chun; Johnson, Eric A; Sawyer, Sara L; Dong, Min

    2014-07-01

    Botulinum neurotoxins (BoNT/A-G), the most potent toxins known, act by cleaving three SNARE proteins required for synaptic vesicle exocytosis. Previous studies on BoNTs have generally utilized the major SNARE homologues expressed in brain (VAMP2, syntaxin 1, and SNAP-25). However, BoNTs target peripheral motor neurons and cause death by paralyzing respiratory muscles such as the diaphragm. Here we report that VAMP1, but not VAMP2, is the SNARE homologue predominantly expressed in adult rodent diaphragm motor nerve terminals and in differentiated human motor neurons. In contrast to the highly conserved VAMP2, BoNT-resistant variations in VAMP1 are widespread across vertebrates. In particular, we identified a polymorphism at position 48 of VAMP1 in rats, which renders VAMP1 either resistant (I48) or sensitive (M48) to BoNT/D. Taking advantage of this finding, we showed that rat diaphragms with I48 in VAMP1 are insensitive to BoNT/D compared to rat diaphragms with M48 in VAMP1. This unique intra-species comparison establishes VAMP1 as a physiological toxin target in diaphragm motor nerve terminals, and demonstrates that the resistance of VAMP1 to BoNTs can underlie the insensitivity of a species to members of BoNTs. Consistently, human VAMP1 contains I48, which may explain why humans are insensitive to BoNT/D. Finally, we report that residue 48 of VAMP1 varies frequently between M and I across seventeen closely related primate species, suggesting a potential selective pressure from members of BoNTs for resistance in vertebrates.

  7. [Mitochondrial DNA sequence variation, demographic history, and population structure of Amur sturgeon Acipenser schrenckii Brandt, 1869].

    PubMed

    Shedko, S V; Miroshnichenko, I L; Nemkova, G A; Koshelev, V N; Shedko, M B

    2015-02-01

    The variability of the mtDNA control region (D-loop) was examined in Amur sturgeon endemic to the Amur River. This species is also classified as critically endangered by the IUCN Red List of Threatened species. Sequencing of 796- to 812-bp fragments of the D-loop in 112 sturgeon collected in the Lower Amur revealed 73 different genotypes. The sample was characterized by a high level of haplotypic (0.976) and nucleotide (0.0194) diversity. The identified haplotypes split into two well-defined monophyletic groups, BG (n = 39) and SM (n = 34), differing (HKY distance) on average by 3.41% of nucleotide positions upon an average level of intragroup differences of 0.54 and 1.23%, respectively. Moreover, the haplotypes of the SM groups differed by the presence of a 13-14 bp deletion. Most ofthe samples (66 out of 112) carried BG haplotypes. Overall, the pattern of pairwise nucleotide differences and the results of neutrality tests, as well as the results of tests for compliance with the model of sudden demographic expansion or with the model of exponential growth pointed to a past significant increase in the number of Amur sturgeon, which was most clearly manifested in the analysis of data on the BG haplogroup. The constructed Bayesian skyline plots showed that this growth began about 18 to 16 thousand years ago. At present, the effective size of the strongly reduced (due to overharvesting) population of Amur sturgeon may be equal to or even lower than it was before the beginning of this growth during the Last Glacial Maximum. The presence in the mitochondrial gene pool ofAmur sturgeon of two haplogroups, their unequal evolutionary dynamics, and, judging by scanty data, their unequal representation in the Russian and Chinese parts of the Amur River basin point to the possible existence of at least two distinct populations of Amur sturgeon in the past.

  8. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon

    PubMed Central

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. PMID:27311063

  9. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    PubMed

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  10. Accurate, high-throughput typing of copy number variation using paralogue ratios from dispersed repeats

    PubMed Central

    Armour, John A. L.; Palla, Raquel; Zeeuwen, Patrick L. J. M.; den Heijer, Martin; Schalkwijk, Joost; Hollox, Edward J.

    2007-01-01

    Recent work has demonstrated an unexpected prevalence of copy number variation in the human genome, and has highlighted the part this variation may play in predisposition to common phenotypes. Some important genes vary in number over a high range (e.g. DEFB4, which commonly varies between two and seven copies), and have posed formidable technical challenges for accurate copy number typing, so that there are no simple, cheap, high-throughput approaches suitable for large-scale screening. We have developed a simple comparative PCR method based on dispersed repeat sequences, using a single pair of precisely designed primers to amplify products simultaneously from both test and reference loci, which are subsequently distinguished and quantified via internal sequence differences. We have validated the method for the measurement of copy number at DEFB4 by comparison of results from >800 DNA samples with copy number measurements by MAPH/REDVR, MLPA and array-CGH. The new Paralogue Ratio Test (PRT) method can require as little as 10 ng genomic DNA, appears to be comparable in accuracy to the other methods, and for the first time provides a rapid, simple and inexpensive method for copy number analysis, suitable for application to typing thousands of samples in large case-control association studies. PMID:17175532

  11. Comparative analysis of methods used to define eustatic variations in outcrop: Late Cambrian interbasinal sequence development

    SciTech Connect

    Osleger, D. ); Read, J.F. )

    1993-03-01

    Interbasinal correlation of Late Cambrian cyclic carbonates from the Appalachian and Cordilleran passive margins, the Texas craton, and the southern Oklahoma aulacogen defines six major third-order depositional sequences. Graphic correlation of biostratigraphically-constrained strata was used to establish equivalency of stratigraphic sequences between the individual sections. Relatively isochronous biomere boundaries were used as time datums for lithostratigraphic correlation. Although the individual sections are composed of different types of meter-scale cycles and component lithofacies that reflect the various environmental settings of the localities, the overall upward-shallowing character of individual sequences is evident. The sequences are: late Cedaria, mid-Crepicephalus, late Crepicephalus, Aphelaspis to earliest Elvinia, Elvinia to early Saukia, and Saukia to the Cambrian-Ordovician boundary. Interbasinal correlation of stratigraphic sequences permits an evaluation of quantitative techniques for determining accommodation history. Correlation of Fischer plots of cyclic successions from separate basins supports a eustatic control of Late Cambrian sequence development. R2/R3 curves derived from subsidence analysis of the Late Cambrian sections provide good resolution of the second- and third-order scales of accommodation change, and interbasinal correlations of R2/R3 curves also support eustatic control on sequence development. Comparing the accomodation curves and subsidence analysis with paleobathymetric trends of Late Cambrian cyclic strata suggests that the curves may approximate the form of the eustatic sealevel signal. A composite eustatic sealevel curve for Late Cambrian time in North America was created by qualitatively combining the accommodation curves defined by the different techniques for each of the four localities. 129 refs., 16 figs., 3 tabs.

  12. High order total variation method for interior tomography

    NASA Astrophysics Data System (ADS)

    Yang, Jiansheng; Yu, Hengyong; Cong, Wenxiang; Jiang, Ming; Wang, Ge

    2012-10-01

    While classic CT theory targets exact reconstruction of a whole cross-section or an entire object, practical applications often focus on a region of interest (ROI). The long-standing interior problem is well known that an internal ROI cannot be exactly reconstruct only from truncated projection data associated with x-rays through the ROI. Although lambda tomography was developed to target gradient-like features of an internal ROI for the interior problem, it has not been well accepted in the biomedical community. On the other hand, approximate local reconstruction methods are subject to biases and artifacts. Recently, the interior problem is re-visited with appropriate prior knowledge, delivering practical results. First, the interior problem can be exactly and stably solved if a sub-region in an ROI is known. Thereafter, the sub-region knowledge can be replaced by certain rather weak constraints. For local reconstruction, a candidate image can be represented as the sum of the truth and an ambiguity component. Very surprisingly, the ROI image is prove to be the unique minimizer of the total variation (TV) or high order total variation (HOT) functional subject to the measurement, if the ROI is piece-wise constant or polynomial. Interior tomography algorithms based on HOT minimization have been developed for x-ray CT, and then extended for interior SPECT and interior differential phasecontrast tomography, respectively. In this paper, we will summarize the main theoretical and algorithmic results.

  13. The use of museum specimens with high-throughput DNA sequencers

    PubMed Central

    Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.

    2015-01-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801

  14. The use of museum specimens with high-throughput DNA sequencers.

    PubMed

    Burrell, Andrew S; Disotell, Todd R; Bergey, Christina M

    2015-02-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists' ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. Rapid multiplexed genotyping of simple tandem repeats using capture and high-throughput sequencing

    PubMed Central

    Guilmatre, Audrey; Highnam, Gareth; Borel, Christelle; Mittelman, David; Sharp, Andrew J.

    2013-01-01

    Although simple tandem repeats (STRs) comprise ~2% of the human genome and represent an important source of polymorphism, this class of variation remains understudied. We have developed a cost-effective strategy for performing targeted enrichment of STR regions that utilizes capture probes targeting the flanking sequences of STR loci, enabling specific capture of DNA fragments containing STRs for subsequent high-throughput sequencing. Utilizing a capture design targeting 6,243 STR loci <94bp and multiplexing eight individuals in a single Illumina HiSeq2000 sequencing lane we were able to call genotypes in at least one individual for 67.5% of the targeted STRs. We observed a strong relationship between (G+C) content and genotyping rate. STRs with moderate (G+C) content were recovered with >90% success rate, while only 12% of STRs with ≥80% (G+C) were genotyped in our assay. Analysis of a parent-offspring trio, complete hydatidiform mole samples, repeat analyses of the same individual, and Sanger sequencing-based validation indicated genotyping error rates between 7.6–12.4%. The majority of such errors were a single repeat unit at mono- or dinucleotide repeats. Altogether, our STR capture assay represents a cost-effective method that enables multiplexed genotyping of thousands of STR loci suitable for large scale population studies. PMID:23696428

  16. Source quality variations tied to sequence development: Integration of physical and chemical aspects, Lower to Middle Triassic, western Barents Sea

    SciTech Connect

    Bohacs, K.M.; Isaksen, G.H. )

    1991-03-01

    Triassic mudrocks from the Barents Sea area demonstrate to covariance of physical and chemical properties of mudrocks deposited in shelfal environments and the aspect of depositional sequences in distal settings. The tie of physical parameters to chemical character within a detailed sequence-stratigraphic framework enables the construction of depositional-facies models to predict organic-matter content and quality. This allows the explorer to more closely constrain and predict the nature of potential source rocks using seismic and well-log data. Changes in lithology, bedding geometry, sedimentary structures, body and trace-fossil assemblages, and inorganic, bulk-organic, and molecular geochemistry revealed the detailed depositional environments. The depositional environments stack predictably, according to their position in the depositional sequence: from aerobic lower-shoreface--offshore transition environments in lowstand systems tracts to dysaerobic-anaerobic distal open-marine-shelf environment in transgressive and early highstand systems tracts. Quantitative molecular geochemistry also revealed variations within this distal setting and strong covariance with sequence position. Input of organic matter from terrigenous higher plants dominates the lowstands whereas marine-algal organic matter is most prevalent within transgressive and highstand systems tracts. Specifically, the abundance of C{sub 30} steranes, total steranes, and moretane reflected development of the sequences.

  17. Sequence variation of a novel heptahelical leucocyte receptor through alternative transcript formation.

    PubMed Central

    Barella, L; Loetscher, M; Tobler, A; Baggiolini, M; Moser, B

    1995-01-01

    Chemoattractants, including chemokines such as interleukin 8 (IL-8) and related proteins, activate leucocytes via seven-transmembrane-domain G-protein-coupled receptors. A cDNA for a novel receptor of this kind consisting of 327 amino acids was isolated from a human blood monocyte cDNA library. The polypeptide, termed monocyte-derived receptor 15 (MDR15), is an alternative form of the Burkitt's lymphoma receptor 1 (BLR1) encoded by a human Burkitt's lymphoma cDNA [Dobner, Wolf, Emrich and Lipp (1992) Eur. J. Immunol. 22, 2795-2799]. MDR15 and BLR1 cDNAs differ in the 5' region, where the open reading frame of MDR15 is shorter by 45 codons. Southern-blot analysis indicates that the two transcripts for MDR15 and BLR1 are encoded by the same gene. Northern-blot analysis using a probe that hybridizes with both mRNAs demonstrated high-level expression in chronic B-lymphoid leukaemia and non-Hodgkin's lymphoma cells and, to a lesser extent, peripheral blood monocytes and lymphocytes. Reverse transcription-PCR studies with MDR15- and BLR1-specific primers showed similar levels of transcripts for both receptors in RNA that was positive in Northern-blot analysis. MDR15 and BLR1 have high structural similarity to receptors for human IL-8 (about 40% amino acid identity) and other chemokines. However, none of a series of radiolabelled chemokines (IL-8, NAP-2, GRO alpha, PF4, IP10, MCP-1, MCP-2, MCP-3, I-309, RANTES and MIP-1 alpha) and other ligands (C3a and leukotriene B4) bound to Jurkat transfectants that stably expressed either MDR15 or BLR1 mRNA. The fact that MDR15 and BLR1 are expressed on leucocytes and show marked sequence similarity to chemokine receptors suggests the existence of as yet unidentified chemokines. Alternative transcript formation affecting the 5'-terminal part of the coding region may be a way to modify ligand-binding selectivity. Images Figure 2 Figure 3 Figure 5 Figure 6 Figure 7 PMID:7639692

  18. Unusual and strongly structured sequence variation in a complex satellite DNA family from the nematode Meloidogyne chitwoodi.

    PubMed

    Castagnone-Sereno, P; Leroy, H; Semblat, J P; Leroy, F; Abad, P; Zijlstra, C

    1998-02-01

    An AluI satellite DNA family has been isolated in the genome of the root-knot nematode Meloidogyne chitwoodi. This repeated sequence was shown to be present at approximately 11,400 copies per haploid genome, and represents about 3.5% of the total genomic DNA. Nineteen monomers were cloned and sequenced. Their length ranged from 142 to 180 bp, and their A + T content was high (from 65.7 to 79.1%), with frequent runs of As and Ts. An unexpected heterogeneity in primary structure was observed between monomers, and multiple alignment analysis showed that the 19 repeats could be unambiguously clustered in six subfamilies. A consensus sequence has been deduced for each subfamily, within which the number of positions conserved is very high, ranging from 86.7% to 98.6%. Even though blocks of conserved regions could be observed, multiple alignment of the six consensus sequences did not enable the establishment of a general unambiguous consensus sequence. Screening of the six consensus sequences for evidence of internal repeated subunits revealed a 6-bp motif (AAATTT), present in both direct and inverted orientation. This motif was found up to nine times in the consensus sequences, also with the occurrence of degenerated subrepeats. Along with the meiotic parthenogenetic mode of reproduction of this nematode, such structural features may argue for the evolution of this satellite DNA family either (1) from a common ancestral sequence by amplification followed by mechanisms of sequence divergence, or (2) through independent mutations of the ancestral sequence in isolated amphimictic nematode populations and subsequent hybridization events. Overall, our results suggest the ancient origin of this satellite DNA family, and may reflect for M. chitwoodi a phylogenetic position close to the ancestral amphimictic forms of root-knot nematodes.

  19. Population whole-genome bisulfite sequencing across two tissues highlights the environment as the principal source of human methylome variation.

    PubMed

    Busche, Stephan; Shao, Xiaojian; Caron, Maxime; Kwan, Tony; Allum, Fiona; Cheung, Warren A; Ge, Bing; Westfall, Susan; Simon, Marie-Michelle; Barrett, Amy; Bell, Jordana T; McCarthy, Mark I; Deloukas, Panos; Blanchette, Mathieu; Bourque, Guillaume; Spector, Timothy D; Lathrop, Mark; Pastinen, Tomi; Grundberg, Elin

    2015-12-23

    CpG methylation variation is involved in human trait formation and disease susceptibility. Analyses within populations have been biased towards CpG-dense regions through the application of targeted arrays. We generate whole-genome bisulfite sequencing data for approximately 30 adipose and blood samples from monozygotic and dizygotic twins for the characterization of non-genetic and genetic effects at single-site resolution. Purely invariable CpGs display a bimodal distribution with enrichment of unmethylated CpGs and depletion of fully methylated CpGs in promoter and enhancer regions. Population-variable CpGs account for approximately 15-20 % of total CpGs per tissue, are enriched in enhancer-associated regions and depleted in promoters, and single nucleotide polymorphisms at CpGs are a frequent confounder of extreme methylation variation. Differential methylation is primarily non-genetic in origin, with non-shared environment accounting for most of the variance. These non-genetic effects are mainly tissue-specific. Tobacco smoking is associated with differential methylation in blood with no evidence of this exposure impacting cell counts. Opposite to non-genetic effects, genetic effects of CpG methylation are shared across tissues and thus limit inter-tissue epigenetic drift. CpH methylation is rare, and shows similar characteristics of variation patterns as CpGs. Our study highlights the utility of low pass whole-genome bisulfite sequencing in identifying methylome variation beyond promoter regions, and suggests that targeting the population dynamic methylome of tissues requires assessment of understudied intergenic CpGs distal to gene promoters to reveal the full extent of inter-individual variation.

  20. Assessing population-level variation in the mitochondrial genome of Euphausia superba using 454 next-generation sequencing.

    PubMed

    Johansson, Mattias; Duda, Elizabeth; Sremba, Angela; Banks, Michael; Peterson, William

    2012-05-01

    The Antarctic krill (Euphausia superba Dana 1852) is widely distributed throughout the Southern Ocean, where it provides a key link between primary producers and upper trophic levels and supports a major commercial fishery. Despite its ecological and commercial importance, genetic population structure of the Antarctic krill remains poorly described. In an attempt to illuminate genetic markers for future population and phylogenetic analysis, five E. superba mitogenomes, from samples collected west of the Antarctic Peninsula, were sequenced using new 454 next-generation sequencing techniques. The sequences, of lengths between 13,310 and 13,326 base pairs, were then analyzed in the context of two previously-published near-complete sequences. Sequences revealed relatively well-conserved partial mitochondrial genomes which included complete sequences for 11 of 13 protein-coding genes, 16 of 23 tRNAs, and the large ribosomal subunit. Partial sequences were also recovered for cox1 and the small ribosomal subunit. Sequence analysis suggested that the cox2, nad5, and nad6 genes would be the best candidates for future population genetics analyses, due to their high number of variable sites. Future work to reveal the noncoding control region remains.

  1. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing

    PubMed Central

    Lazarevic, Vladimir; Whiteson, Katrine; Huse, Susan; Hernandez, David; Farinelli, Laurent; Østerås, Magne; Schrenzel, Jacques; François, Patrice

    2013-01-01

    To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454 pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable identification of bacterial sequences available in public databases such as the Human Oral Microbiome Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an ~82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of samples, and to assess that at least 30,000 additional reads would be required to identify only one additional phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements in read length enable the utilization of different platforms for studying communities of complex flora. Access to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost. PMID:19796657

  2. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    PubMed

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  3. Associations between sequence variations in the mitochondrial DNA D-loop region and outcome of hepatocellular carcinoma

    PubMed Central

    LI, SHILAI; WAN, PEIQI; PENG, TAO; XIAO, KAIYIN; SU, MING; SHANG, LIMING; XU, BANGHAO; SU, ZHIXIONG; YE, XINPING; PENG, NING; QIN, QUANLIN; LI, LEQUN

    2016-01-01

    The association between mitochondrial DNA (mtDNA) polymorphisms or mutations and the prognoses of cancer have been investigated previously, but the results have been ambiguous. In the present study, the associations between sequence variations in the mtDNA D-loop region and the outcomes of patients with hepatocellular carcinoma (HCC) were analysed. A total of 140 patients with HCC (123 males and 17 females), who were hospitalised to undergo radical resection, were studied. Polymerase chain reaction and direct sequencing were performed to detect the sequence variations in the mtDNA D-loop region. Multivariate and univariate analyses were conducted to determine important factors in the prognosis of HCC. A total of 150 point sequence variations were observed in the 140 cases (13 point mutations, 8 insertions, 20 deletions and 116 polymorphisms). The variation rate was 13.4% (150/1, 122). mtDNA nucleotide 150 (C/T) was an independent factor in the logistic regression for early/late recurrence of HCC. Patients with 150T appeared to have later recurrences. In a Cox proportional hazards regression model, hepatitis B virus DNA, Child-Pugh class, differentiation degree, tumour-node-metastasis (TNM) stage, nucleotide 16263 (T/C) and nucleotide 315 (N/insertion C) were independent factors for tumour-free survival time. Patients with the 16263T allele had a greater tumour-free survival time than patients with the 16263C allele. Similarly, patients with 315 insertion C had a superior tumour-free survival time when compared with patients with 315 N (normal). In the Cox proportional hazards regression model, recurrence type (early/late), Child-Pugh class, TNM stage and adjuvant treatment after tumour recurrence (none or one/more than one treatment) were independent factors for overall survival. None of the mtDNA variations served as independent factors. Patients with late recurrence, Child-Pugh class A, and low TNM stages and/or those who received more than one adjuvant treatment

  4. HOXA10 and HOXA13 sequence variations in human female genital malformations including congenital absence of the uterus and vagina.

    PubMed

    Ekici, Arif B; Strissel, Pamela L; Oppelt, Patricia G; Renner, Stefan P; Brucker, Sara; Beckmann, Matthias W; Strick, Reiner

    2013-04-15

    Congenital genital malformations occurring in the female population are estimated to be 5 per 1000 and associate with infertility, abortion, stillbirth, preterm delivery and other organ abnormalities. Complete aplasia of the uterus, cervix and upper vagina (Mayer-Rokitansky-Küster-Hauser (MRKH) syndrome) has an incidence of 1 per 4000 female live births. The molecular etiology of congenital genital malformations including MRKH is unknown up to date. The homeobox (HOX) genes HOXA10 and HOXA13 are involved in the development of human genitalia. In this investigation, HOXA10 and HOXA13 genes of 20 patients with the MRKH syndrome, 7 non-MRKH patients with genital malformations and 53 control women were sequenced to assess for DNA variations. A total of 14 DNA sequence variations (10 novel and 4 known) within exonic and untranslated regions were detected in HOXA10 and HOXA13 among our coh