Science.gov

Sample records for high sequence variation

  1. Application of high-throughput sequencing for studying genomic variations in congenital heart disease.

    PubMed

    Dorn, Cornelia; Grunert, Marcel; Sperling, Silke R

    2014-01-01

    Congenital heart diseases (CHD) represent the most common birth defect in human. The majority of cases are caused by a combination of complex genetic alterations and environmental influences. In the past, many disease-causing mutations have been identified; however, there is still a large proportion of cardiac malformations with unknown precise origin. High-throughput sequencing technologies established during the last years offer novel opportunities to further study the genetic background underlying the disease. In this review, we provide a roadmap for designing and analyzing high-throughput sequencing studies focused on CHD, but also with general applicability to other complex diseases. The three main next-generation sequencing (NGS) platforms including their particular advantages and disadvantages are presented. To identify potentially disease-related genomic variations and genes, different filtering steps and gene prioritization strategies are discussed. In addition, available control datasets based on NGS are summarized. Finally, we provide an overview of current studies already using NGS technologies and showing that these techniques will help to further unravel the complex genetics underlying CHD.

  2. Comparative analysis of Mycobacterium tuberculosis pe and ppe genes reveals high sequence variation and an apparent absence of selective constraints.

    PubMed

    McEvoy, Christopher R E; Cloete, Ruben; Müller, Borna; Schürch, Anita C; van Helden, Paul D; Gagneux, Sebastien; Warren, Robin M; Gey van Pittius, Nicolaas C

    2012-01-01

    Mycobacterium tuberculosis complex (MTBC) genomes contain 2 large gene families termed pe and ppe. The function of pe/ppe proteins remains enigmatic but studies suggest that they are secreted or cell surface associated and are involved in bacterial virulence. Previous studies have also shown that some pe/ppe genes are polymorphic, a finding that suggests involvement in antigenic variation. Using comparative sequence analysis of 18 publicly available MTBC whole genome sequences, we have performed alignments of 33 pe (excluding pe_pgrs) and 66 ppe genes in order to detect the frequency and nature of genetic variation. This work has been supplemented by whole gene sequencing of 14 pe/ppe (including 5 pe_pgrs) genes in a cohort of 40 diverse and well defined clinical isolates covering all the main lineages of the M. tuberculosis phylogenetic tree. We show that nsSNP's in pe (excluding pgrs) and ppe genes are 3.0 and 3.3 times higher than in non-pe/ppe genes respectively and that numerous other mutation types are also present at a high frequency. It has previously been shown that non-pe/ppe M. tuberculosis genes display a remarkably low level of purifying selection. Here, we also show that compared to these genes those of the pe/ppe families show a further reduction of selection pressure that suggests neutral evolution. This is inconsistent with the positive selection pressure of "classical" antigenic variation. Finally, by analyzing such a large number of genes we were able to detect large differences in mutation type and frequency between both individual genes and gene sub-families. The high variation rates and absence of selective constraints provides valuable insights into potential pe/ppe function. Since pe/ppe proteins are highly antigenic and have been studied as potential vaccine components these results should also prove informative for aspects of M. tuberculosis vaccine design.

  3. Combining Natural Sequence Variation with High Throughput Mutational Data to Reveal Protein Interaction Sites

    PubMed Central

    Melamed, Daniel; Young, David L.; Miller, Christina R.; Fields, Stanley

    2015-01-01

    Many protein interactions are conserved among organisms despite changes in the amino acid sequences that comprise their contact sites, a property that has been used to infer the location of these sites from protein homology. In an inter-species complementation experiment, a sequence present in a homologue is substituted into a protein and tested for its ability to support function. Therefore, substitutions that inhibit function can identify interaction sites that changed over evolution. However, most of the sequence differences within a protein family remain unexplored because of the small-scale nature of these complementation approaches. Here we use existing high throughput mutational data on the in vivo function of the RRM2 domain of the Saccharomyces cerevisiae poly(A)-binding protein, Pab1, to analyze its sites of interaction. Of 197 single amino acid differences in 52 Pab1 homologues, 17 reduce the function of Pab1 when substituted into the yeast protein. The majority of these deleterious mutations interfere with the binding of the RRM2 domain to eIF4G1 and eIF4G2, isoforms of a translation initiation factor. A large-scale mutational analysis of the RRM2 domain in a two-hybrid assay for eIF4G1 binding supports these findings and identifies peripheral residues that make a smaller contribution to eIF4G1 binding. Three single amino acid substitutions in yeast Pab1 corresponding to residues from the human orthologue are deleterious and eliminate binding to the yeast eIF4G isoforms. We create a triple mutant that carries these substitutions and other humanizing substitutions that collectively support a switch in binding specificity of RRM2 from the yeast eIF4G1 to its human orthologue. Finally, we map other deleterious substitutions in Pab1 to inter-domain (RRM2–RRM1) or protein-RNA (RRM2–poly(A)) interaction sites. Thus, the combined approach of large-scale mutational data and evolutionary conservation can be used to characterize interaction sites at single

  4. Application of high-throughput genome sequencing to intrapathovar variation in Pseudomonas syringae.

    PubMed

    Studholme, David J

    2011-10-01

    One reason for the success of Pseudomonas syringae as a model pathogen has been the availability of three complete genome sequences since 2005. Now, at the beginning of 2011, more than 25 strains of P. syringae have been sequenced and many more will soon be released. To date, published analyses of P. syringae have been largely descriptive, focusing on catalogues of genetic differences among strains and between species. Numerous powerful statistical tools are now available that have yet to be applied to P. syringae genomic data for robust and quantitative reconstruction of evolutionary events. The aim of this review is to provide a snapshot of the current status of P. syringae genome sequence data resources, including very recent and unpublished studies, and thereby demonstrate the richness of resources available for this species. Furthermore, certain specific opportunities and challenges in making the best use of these data resources are highlighted.

  5. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians.

    PubMed

    Shen, Hui; Li, Jian; Zhang, Jigang; Xu, Chao; Jiang, Yan; Wu, Zikai; Zhao, Fuping; Liao, Li; Chen, Jun; Lin, Yong; Tian, Qing; Papasian, Christopher J; Deng, Hong-Wen

    2013-01-01

    Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.

  6. Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on "Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  7. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations

    PubMed Central

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-01-01

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species. PMID:26492246

  8. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations.

    PubMed

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-10-20

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species.

  9. Complete genome sequence analysis of goatpox virus isolated from China shows high variation.

    PubMed

    Zeng, Xiancheng; Chi, Xuelin; Li, Wei; Hao, Wenbo; Li, Ming; Huang, Xiaohong; Huang, Yifan; Rock, Daniel L; Luo, Shuhong; Wang, Shihua

    2014-09-17

    Goatpox virus (GTPV), a member of the Capripoxvirus genus of the Poxviridae family, is the causative agent of variolo caprina (goatpox). GTPV can cause significant economic losses of domestic ruminants in endemic regions and can threaten breeding stocks. In this study, we report on the compilation of the complete genomic sequence of an isolated GTPV field strain FZ (GTPV_FZ). The 150,194bp GTPV genome consists of a central coding region bounded by two identical 2301bp inverted terminal repeats and contains 151 putative genes. Comparative genomic analysis reveals the apparent genetic relationships among Capripoxviruses are close, but sufficient genomic variants in the field isolate strain FZ have been identified to distinguish it from other GTPV strains and other Capripoxvirus species. Phylogenetic analysis based on the p32 and complete GTPV genome can be used to differentiate SPPVs, GTPVs and LSDVs. These data may contribute to the epidemiological study of the Chinese capripoxvirus and help to develop more specific detection methods to distinguish GTPVs, SPPVs and LSDVs.

  10. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes.

    PubMed

    Zhuang, Jiali; Weng, Zhiping

    2015-09-30

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs.

  11. High sensitivity of the single-strand conformation polymorphism method for detecting sequence variations in the low-density lipoprotein receptor gene validated by DNA sequencing.

    PubMed

    Jensen, H K; Jensen, L G; Hansen, P S; Faergeman, O; Gregersen, N

    1996-08-01

    We designed oligonucleotide primer pairs to amplify the promoter region, the translated exon sequences, and the flanking intron sequences of all 18 exons of the LDL receptor gene to compare the ability of the PCR single-strand conformation polymorphism (PCR-SSCP) method with semiautomated solid-phase genomic DNA sequencing to detect sequence variations. In 20 apparently unrelated Danish patients with a clinical diagnosis of heterozygous familial hypercholesterolemia (FH), we identified 13 different mutations in the LDL receptor gene: two silent (C331C, N494 N); five missense (W66G, E119K, T383P, W556S, T7051); one nonsense (W23X); three splice-site (313 + 1G-->A, 1061-8T-->C, 1846-1G-->A); and two frameshift (335del10, 1650delG) mutations. Four of these mutations, N494 N, T383P, 1061-8T-->C, and W556S, have not been reported earlier. The pathogenicity of the T383P, 1061-8T-->C, and W556S mutations remains to be established by in vitro mutagenesis and transfection studies. One patient had three mutations (335del10, 1061-8T-->C, and T705I) on the same allele. Further, nine well-known polymorphisms were detectable with this methodological setup. Direct DNA sequencing of the PCR products used for the SSCP analysis did not reveal any sequence variations not detected by the PCR-SSCP method. In two patients we did not detect any mutation by either method. We conclude that the PCR-SSCP analysis, performed as described here, is as sensitive and efficient as DNA sequencing in the ability to identify the sequence variations in the LDL receptor gene of the patients with heterozygous FH of this study.

  12. High-resolution melt analysis to detect sequence variations in highly homologous gene regions: application to CYP2B6.

    PubMed

    Twist, Greyson P; Gaedigk, Roger; Leeder, J Steven; Gaedigk, Andrea

    2013-06-01

    High-resolution melt (HRM) analysis using 'release-on-demand' dyes, such as EvaGreen(®) has the potential to resolve complex genotypes in situations where genotype interpretation is complicated by the presence of pseudogenes or allelic variants in close proximity to the locus of interest. We explored the utility of HRM to genotype a SNP (785A>G, K262R, rs2279343) that is located within exon 5 of the CYP2B6 gene, which contributes to the metabolism of a number of clinically used drugs. Testing of 785A>G is challenging, but crucial for accurate genotype determination. This SNP is part of multiple known CYP2B6 haplotypes and located in a region that is identical to CYP2B7, a nonfunctional pseudogene. Because small CYP2B6-specific PCR amplicons bracketing 785A>G cannot be generated, we simultaneously amplified both genes. A panel of 235 liver tissue DNAs and five Coriell samples were assessed. Eight CYP2B6/CYP2B7 diplotype combinations were found and a novel variant 769G>A (D257N) was discovered. The frequency of 785G corresponded to those reported for Caucasians and African-Americans. Assay performance was confirmed by CYP2B6 and/or CYP2B7 sequence analysis in a subset of samples, using a preamplified CYP2B6-specific long-range-PCR amplicon as HRM template. Inclusion rather than exclusion of a homologous pseudogene allowed us to devise a sensitive, reliable and affordable assay to test this CYP2B6 SNP. This assay design may be utilized to overcome the challenges and limitations of other methods. Owing to the flexibility of HRM, this assay design can easily be adapted to other gene loci of interest.

  13. Chromospheric variations in main-sequence stars

    NASA Technical Reports Server (NTRS)

    Baliunas, S. L.; Donahue, R. A.; Soon, J. H.; Horne, J. H.; Frazer, J.; Woodard-Eklund, L.; Bradford, M.; Rao, L. M.; Wilson, O. C.; Zhang, Q.

    1995-01-01

    The fluxes in passbands 0.1 nm wide and centered on the Ca II H and K emission cores have been monitored in 111 stars of spectral type F2-M2 on or near the main sequence in a continuation of an observing program started by O. C. Wilson. Most of the measurements began in 1966, with observations scheduled monthly until 1980, when observations were schedueld sevral times per week. The records, with a long-term precision of about 1.5%, display fluctuations that can be idntified with variations on timescales similar to the 11 yr cycle of solar activity as well as axial rotation, and the growth and decay of emitting regions. We present the records of chromospheric emission and general conclusions about variations in surface magnetic activity on timescales greater than 1 yr but less than a few decades. The results for stars of spectral type G0-K5 V indicate a pattern of change in rotation and chromospheric activity on an evolutionary timescale, in which (1) young stars exhibit high average levels of activity, rapid rotation rates, no Maunder minimum phase and rarely display a smooth, cyclic variation; (2) stars of intermediate age (approximately 1-2 Gyr for 1 solar mass) have moderate levels of activity and rotation rates, and occasional smooth cycles; and (3) stars as old as the Sun and older have slower rotation rates, lower activity levels and smooth cycles with occasional Maunder minimum-phases.

  14. Transcriptome analysis of the variations between autotetraploid Paulownia tomentosa and its diploid using high-throughput sequencing.

    PubMed

    Fan, Guoqiang; Wang, Limin; Deng, Minjie; Niu, Suyan; Zhao, Zhenli; Xu, Enkai; Cao, Xibin; Zhang, Xiaoshen

    2015-08-01

    Timber properties of autotetraploid Paulownia tomentosa are heritable with whole genome duplication, but the molecular mechanisms for the predominant characteristics remain unclear. To illuminate the genetic basis, high-throughput sequencing technology was used to identify the related unigenes. 2677 unigenes were found to be significantly differentially expressed in autotetraploid P. tomentosa. In total, 30 photosynthesis-related, 21 transcription factor-related, and 22 lignin-related differentially expressed unigenes were detected, and the roles of the peroxidase in lignin biosynthesis, MYB DNA-binding proteins, and WRKY proteins associated with the regulation of relevant hormones are extensively discussed. The results provide transcriptome data that may bring a new perspective to explain the polyploidy mechanism in the long growth cycle of plants and offer some help to the future Paulownia breeding.

  15. High Frequency of Copy Number Variations and Sequence Variants at CYP21A2 Locus: Implication for the Genetic Diagnosis of 21-Hydroxylase Deficiency

    PubMed Central

    Parajes, Silvia; Quinteiro, Celsa; Domínguez, Fernando; Loidi, Lourdes

    2008-01-01

    Background The systematic study of the human genome indicates that the inter-individual variability is greater than expected and it is not only related to sequence polymorphisms but also to gene copy number variants (CNVs). Congenital Adrenal Hyperplasia due to 21-hydroxylase deficiency (21OHD) is the most common autosomal recessive disorder with a carrier frequency of 1∶25 to 1∶10. The gene that encodes 21-hydroxylase enzyme, CYP21A2, is considered to be one of the most polymorphic human genes. Copy number variations, such as deletions, which are severe mutations common in 21OHD patients, or gene duplications, which have been reported as rare events, have also been described. The correct characterization of 21OHD alleles is important for disease carrier detection and genetic counselling Methodology and Findings CYP21A2 genotyping by sequencing has been performed in a random sample of the Spanish population, where 144 individuals recruited from university students and employees of the hospital were studied. The frequency of CYP21A2 mutated alleles in our sample was 15.3% (77.3% were mild mutations, 9% were severe mutations and 13.6% were novel variants). Gene dosage assessment was also performed when CYP21A2 gene duplication was suspected. This analysis showed that 7% of individuals bore a chromosome with a duplicated CYP21A2 gene, where one of the copies was mutated. Conclusions As far as we know, the present study has shown the highest frequency of 21OHD carriers reported by a genotyping analysis. In addition, a high frequency of alleles with CYP21A2 duplications, which could be misinterpreted as 21OHD alleles, was found. Moreover, a high frequency of novel genetic variations with an unknown effect on 21-hydroxylase activity was also found. The high frequency of gene duplications, as well as novel variations, should be considered since they have an important involvement in carrier testing and genetic counseling. PMID:18478071

  16. Comprehensive Sequence Analysis of the Human IL23A Gene Defines New Variation Content and High Rate of Evolutionary Conservation

    PubMed Central

    Tindall, Elizabeth A.; Hayes, Vanessa M.

    2010-01-01

    A newly described heterodimeric cytokine, interleukin-23 (IL-23) is emerging as a key player in both the innate and the adaptive T helper (Th)17 driven immune response as well as an initiator of several autoimmune diseases. The rate-limiting element of IL-23 production is believed to be driven by expression of the unique p19 subunit encoded by IL23A. We set out to perform comprehensive DNA sequencing of this previously under-studied gene in 96 individuals from two evolutionary distinct human population groups, Southern African Bantu and European. We observed a total of 33 different DNA variants within these two groups, 22 (67%) of which are currently not reported in any available database. We further demonstrate both inter-population and intra-species sequence conservation within the coding and known regulatory regions of IL23A, supporting a critical physiological role for IL-23. We conclude that IL23A may have undergone positive selection pressure directed towards conservation, suggesting that functional genetic variants within IL23A will have a significant impact on the host immune response. PMID:20154336

  17. Identification of Sequence Variation in the Apolipoprotein A2 Gene and Their Relationship with Serum High-Density Lipoprotein Cholesterol Levels

    PubMed Central

    Bandarian, Fatemeh; Daneshpour, Maryam Sadat; Hedayati, Mehdi; Naseri, Mohsen; Azizi, Fereidoun

    2016-01-01

    Background: Apolipoprotein A2 (APOA2) is the second major apolipoprotein of the high-density lipoprotein cholesterol (HDL-C). The study aim was to identify APOA2 gene variation in individuals within two extreme tails of HDL-C levels and its relationship with HDL-C level. Methods: This cross-sectional survey was conducted on participants from Tehran Glucose and Lipid Study (TLGS) at Research Institute for Endocrine Sciences, Tehran, Iran from April 2012 to February 2013. In total, 79 individuals with extreme low HDL-C levels (≤5th percentile for age and gender) and 63 individuals with extreme high HDL-C levels (≥95th percentile for age and gender) were selected. Variants were identified using DNA amplification and direct sequencing. Results: Screen of all exons and the core promoter region of APOA2 gene identified nine single nucleotide substitutions and one microsatellite; five of which were known and four were new variants. Of these nine variants, two were common tag single nucleotide polymorphisms (SNPs) and seven were rare SNPs. Both exonic substitutions were missense mutations and caused an amino acid change. There was a significant association between the new missense mutation (variant Chr.1:16119226, Ala98Pro) and HDL-C level. Conclusion: None of two common tag SNPs of rs6413453 and rs5082 contributes to the HDL-C trait in Iranian population, but a new missense mutation in APOA2 in our population has a significant association with HDL-C. PMID:26590203

  18. Analyzing Neisseria gonorrhoeae Pilin Antigenic Variation Using 454 Sequencing Technology

    PubMed Central

    Rotman, Ella; Webber, David M.

    2016-01-01

    ABSTRACT Many pathogens use homologous recombination to vary surface antigens in order to avoid immune surveillance. Neisseria gonorrhoeae, the bacterium responsible for the sexually transmitted infection gonorrhea, achieves this in part by changing the sequence of the major subunit of the type IV pilus in a process termed pilin antigenic variation (Av). The N. gonorrhoeae chromosome contains one expression locus (pilE) and many promoterless, partial-coding silent copies (pilS) that act as reservoirs for variant pilin information. Pilin Av occurs by high-frequency gene conversion reactions, which transfer pilS sequences into the pilE locus. We have developed a 454 sequencing-based assay to analyze the frequency and characteristics of pilin Av that allows a more robust analysis of pilin Av than previous assays. We used this assay to analyze mutations and conditions previously shown to affect pilin Av, confirming many but not all of the previously reported phenotypes. We show that mutations or conditions that cause growth defects can result in Av phenotypes when analyzed by phase variation-based assays. Adapting the 454 sequencing to analyze pilin Av demonstrates the utility of this technology to analyze any diversity generation system that uses recombination to develop biological diversity. IMPORTANCE Measuring and analyzing complex recombination-based systems constitute a major barrier to understanding the mechanisms used to generate diversity. We have analyzed the contributions of many gonococcal mutations or conditions to the process of pilin antigenic variation. PMID:27381912

  19. Sequence variation of 22 autosomal STR loci detected by next generation sequencing.

    PubMed

    Gettings, Katherine Butler; Kiesler, Kevin M; Faith, Seth A; Montano, Elizabeth; Baker, Christine H; Young, Brian A; Guerrieri, Richard A; Vallone, Peter M

    2016-03-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified.

  20. Sequence variation of 22 autosomal STR loci detected by next generation sequencing

    PubMed Central

    Gettings, Katherine Butler; Kiesler, Kevin M.; Faith, Seth A.; Montano, Elizabeth; Baker, Christine H.; Young, Brian A.; Guerrieri, Richard A.; Vallone, Peter M.

    2016-01-01

    Sequencing short tandem repeat (STR) loci allows for determination of repeat motif variations within the STR (or entire PCR amplicon) which cannot be ascertained by size-based PCR fragment analysis. Sanger sequencing has been used in research laboratories to further characterize STR loci, but is impractical for routine forensic use due to the laborious nature of the procedure in general and additional steps required to separate heterozygous alleles. Recent advances in library preparation methods enable high-throughput next generation sequencing (NGS) and technological improvements in sequencing chemistries now offer sufficient read lengths to encompass STR alleles. Herein, we present sequencing results from 183 DNA samples, including African American, Caucasian, and Hispanic individuals, at 22 autosomal forensic STR loci using an assay designed for NGS. The resulting dataset has been used to perform population genetic analyses of allelic diversity by length compared to sequence, and exemplifies which loci are likely to achieve the greatest gains in discrimination via sequencing. Within this data set, six loci demonstrate greater than double the number of alleles obtained by sequence compared to the number of alleles obtained by length: D12S391, D2S1338, D21S11, D8S1179, vWA, and D3S1358. As expected, repeat region sequences which had not previously been reported in forensic literature were identified. PMID:26701720

  1. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  2. Protein structure prediction from sequence variation

    PubMed Central

    Marks, Debora S; Hopf, Thomas A; Sander, Chris

    2015-01-01

    Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics. PMID:23138306

  3. Using chaos to generate variations on movement sequences

    NASA Astrophysics Data System (ADS)

    Bradley, Elizabeth; Stuart, Joshua

    1998-12-01

    We describe a method for introducing variations into predefined motion sequences using a chaotic symbol-sequence reordering technique. A progression of symbols representing the body positions in a dance piece, martial arts form, or other motion sequence is mapped onto a chaotic trajectory, establishing a symbolic dynamics that links the movement sequence and the attractor structure. A variation on the original piece is created by generating a trajectory with slightly different initial conditions, inverting the mapping, and using special corpus-based graph-theoretic interpolation schemes to smooth any abrupt transitions. Sensitive dependence guarantees that the variation is different from the original; the attractor structure and the symbolic dynamics guarantee that the two resemble one another in both aesthetic and mathematical senses.

  4. Unraveling genomic variation from next generation sequencing data

    PubMed Central

    2013-01-01

    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field. PMID:23885890

  5. A case study on the genetic origin of the high oleic acid trait through FAD2-1 DNA sequence variation in safflower (Carthamus tinctorius L.).

    PubMed

    Rapson, Sara; Wu, Man; Okada, Shoko; Das, Alpana; Shrestha, Pushkar; Zhou, Xue-Rong; Wood, Craig; Green, Allan; Singh, Surinder; Liu, Qing

    2015-01-01

    The safflower (Carthamus tinctorius L.) is considered a strongly domesticated species with a long history of cultivation. The hybridization of safflower with its wild relatives has played an important role in the evolution of cultivars and is of particular interest with regards to their production of high quality edible oils. Original safflower varieties were all rich in linoleic acid, while varieties rich in oleic acid have risen to prominence in recent decades. The high oleic acid trait is controlled by a partially recessive allele ol at a single locus OL. The ol allele was found to be a defective microsomal oleate desaturase FAD2-1. Here we present DNA sequence data and Southern blot analysis suggesting that there has been an ancient hybridization and introgression of the FAD2-1 gene into C. tinctorius from its wild relative C. palaestinus. It is from this gene that FAD2-1Δ was derived more recently. Identification and characterization of the genetic origin and diversity of FAD2-1 could aid safflower breeders in reducing population size and generations required for the development of new high oleic acid varieties by using perfect molecular marker-assisted selection.

  6. The regions of sequence variation in caulimovirus gene VI.

    PubMed

    Sanger, M; Daubert, S; Goodman, R M

    1991-06-01

    The sequence of gene VI from figwort mosaic virus (FMV) clone x4 was determined and compared with that previously published for FMV clone DxS. Both clones originated from the same virus isolation, but the virus used to clone DxS was propagated extensively in a host of a different family prior to cloning whereas that used to clone x4 was not. Differences in the amino acid sequence inferred from the DNA sequences occurred in two clusters. An N-terminal conserved region preceded two regions of variation separated by a central conserved region. Variation in cauliflower mosaic virus (CaMV) gene VI sequences, all of which were derived from virus isolates from hosts from one host family, was similar to that seen in the FMV comparison, though the extent of variation was less. Alignment of gene VI domains from FMV and CaMV revealed regions of amino acid sequence identical in both viruses within the conserved regions. The similarity in the pattern of conserved and variable domains of these two viruses suggests common host-interactive functions in caulimovirus gene VI homologues, and possibly an analogy between caulimoviruses and certain animal viruses in the influence of the host on sequence variability of viral genes.

  7. Mapping copy number variation by population-scale genome sequencing.

    PubMed

    Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

    2011-02-03

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

  8. Mitochondrial sequence variation suggests an African influence in Portuguese cattle.

    PubMed Central

    Cymbron, T; Loftus, R T; Malheiro, M I; Bradley, D G

    1999-01-01

    A total of 49 samples from indigenous Portuguese cattle breeds were analysed for sequence variation in the hypervariable region of the mitochondrial DNA D-loop. Sequence comparison and phylogenetic analyses revealed that haplotypes fell into two distinct groups. These corresponded with two separate haplotype clusters into which, respectively, all African, or alternatively all sequences of European origin, have previously been shown to fall. Here, the majority of sequences of African type were encountered in three southern, as compared to three northern breeds. This pattern of African influence may reflect an intercontinental admixture in the initial origins of Iberian breeds, or it is perhaps an introgression dating from the long and influential Moorish occupation of the south of the Iberian peninsula. PMID:10212450

  9. High Throughput Sequencing: An Overview of Sequencing Chemistry.

    PubMed

    Ambardar, Sheetal; Gupta, Rikita; Trakroo, Deepika; Lal, Rup; Vakhlu, Jyoti

    2016-12-01

    In the present century sequencing is to the DNA science, what gel electrophoresis was to it in the last century. From 1977 to 2016 three generation of the sequencing technologies of various types have been developed. Second and third generation sequencing technologies referred commonly to as next generation sequencing technology, has evolved significantly with increase in sequencing speed, decrease in sequencing cost, since its inception in 2004. GS FLX by 454 Life Sciences/Roche diagnostics, Genome Analyzer, HiSeq, MiSeq and NextSeq by Illumina, Inc., SOLiD by ABI, Ion Torrent by Life Technologies are various type of the sequencing platforms available for second generation sequencing. The platforms available for the third generation sequencing are Helicos™ Genetic Analysis System by SeqLL, LLC, SMRT Sequencing by Pacific Biosciences, Nanopore sequencing by Oxford Nanopore's, Complete Genomics by Beijing Genomics Institute and GnuBIO by BioRad, to name few. The present article is an overview of the principle and the sequencing chemistry of these high throughput sequencing technologies along with brief comparison of various types of sequencing platforms available.

  10. Gene sequence variations and expression patterns of mitochondrial genes are associated with the adaptive evolution of two Gynaephora species (Lepidoptera: Lymantriinae) living in different high-elevation environments.

    PubMed

    Zhang, Qi-Lin; Zhang, Li; Zhao, Tian-Xuan; Wang, Juan; Zhu, Qian-Hua; Chen, Jun-Yuan; Yuan, Ming-Long

    2017-04-30

    The adaptive evolution of animals to high-elevation environments has been extensively studied in vertebrates, while few studies have focused on insects. Gynaephora species (Lepidoptera: Lymantriinae) are endemic to the Qinghai-Tibetan Plateau (QTP) and represent an important insect pest of alpine meadows. Here, we present a detailed comparative analysis of the mitochondrial genomes (mitogenomes) of two Gynaephora species inhabiting different high-elevation environments: G. alpherakii and G. menyuanensis. The results indicated that the general mitogenomic features (genome size, nucleotide composition, codon usage and secondary structures of tRNAs) were well conserved between the two species. All of mitochondrial protein-coding genes were evolving under purifying selection, suggesting that selection constraints may play a role in ensuring adequate energy production. However, a number of substitutions and indels were identified that altered the protein conformations of ATP8 and NAD1, which may be the result of adaptive evolution of the two Gynaephora species to different high-elevation environments. Levels of gene expression for nine mitochondrial genes in nine different developmental stages were significantly suppressed in G. alpherakii, which lives at the higher elevation (~4800m above sea level), suggesting that gene expression patterns could be modulated by atmospheric oxygen content and environmental temperature. These results enhance our understanding of the genetic bases for the adaptive evolution of insects endemic to the QTP.

  11. STR allele sequence variation: Current knowledge and future issues.

    PubMed

    Gettings, Katherine Butler; Aponte, Rachel A; Vallone, Peter M; Butler, John M

    2015-09-01

    This article reviews what is currently known about short tandem repeat (STR) allelic sequence variation in and around the twenty-four loci most commonly used throughout the world to perform forensic DNA investigations. These STR loci include D1S1656, TPOX, D2S441, D2S1338, D3S1358, FGA, CSF1PO, D5S818, SE33, D6S1043, D7S820, D8S1179, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D18S51, D19S433, D21S11, Penta D, and D22S1045. All known reported variant alleles are compiled along with genomic information available from GenBank, dbSNP, and the 1000 Genomes Project. Supplementary files are included which provide annotated reference sequences for each STR locus, characterize genomic variation around the STR repeat region, and compare alleles present in currently available STR kit allelic ladders. Looking to the future, STR allele nomenclature options are discussed as they relate to next generation sequencing efforts underway.

  12. An improved protocol for sequencing of repetitive genomic regions and structural variations using mutagenesis and next generation sequencing.

    PubMed

    Sipos, Botond; Massingham, Tim; Stütz, Adrian M; Goldman, Nick

    2012-01-01

    The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.

  13. Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

    PubMed

    Perry, George H; Melsted, Páll; Marioni, John C; Wang, Ying; Bainer, Russell; Pickrell, Joseph K; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D; Stephens, Matthew; Pritchard, Jonathan K; Gilad, Yoav

    2012-04-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success.

  14. Variational formulation of high performance finite elements: Parametrized variational principles

    NASA Technical Reports Server (NTRS)

    Felippa, Carlos A.; Militello, Carmello

    1991-01-01

    High performance elements are simple finite elements constructed to deliver engineering accuracy with coarse arbitrary grids. This is part of a series on the variational basis of high-performance elements, with emphasis on those constructed with the free formulation (FF) and assumed natural strain (ANS) methods. Parametrized variational principles that provide a foundation for the FF and ANS methods, as well as for a combination of both are presented.

  15. Geochemical variations during the 2012 Emilia seismic sequence

    NASA Astrophysics Data System (ADS)

    Sciarra, Alessandra; Cantucci, Barbara; Galli, Gianfranco; Cinti, Daniele; Pizzino, Luca

    2015-04-01

    , apart one sample, are not thermally anomalous. Stable isotopes of H and O point out the absence of mixing with connate waters, prolonged interaction with the host-rock at high temperature and/or heavy gas-water exchange at depth. Isotopic carbon composition emphasizes its organic (i.e. shallow) origin; only "La Canonica" site, the deepest well sampled in this study, shows a probable deep(er) provenance of dissolved carbon. Waters trend away from the atmospheric end-member composition, dissolving CO2 or CH4 depending on their redox state. Dissolved radon activity is very low, likely due to the particular hydrogeological setting of the study area (i.e. the presence of waters with long residence times in the considered aquifers). Obtained results highlight a different behavior before and after the seismic events, proved also by the different carbon isotopic signature of CH4. These variations could be produced by increasing of bacterial (e.g. peat strata) and methanogenic fermentation processes in the first meters of the soil.

  16. The Quantification of Representative Sequences pipeline for amplicon sequencing: case study on within-population ITS1 sequence variation in a microparasite infecting Daphnia.

    PubMed

    González-Tortuero, E; Rusek, J; Petrusek, A; Gießler, S; Lyras, D; Grath, S; Castro-Monzón, F; Wolinska, J

    2015-11-01

    Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within-population variation. Additionally, a public Illumina data set was used to validate the pipeline on community-level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova) revealed that population structure of C. mesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within-population structure but also the successful application of the QRS pipeline on Illumina-generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences.

  17. Lysoplex: An efficient toolkit to detect DNA sequence variations in the autophagy-lysosomal pathway

    PubMed Central

    Di Fruscio, Giuseppina; Schulz, Angela; De Cegli, Rossella; Savarese, Marco; Mutarelli, Margherita; Parenti, Giancarlo; Banfi, Sandro; Braulke, Thomas; Nigro, Vincenzo; Ballabio, Andrea

    2015-01-01

    The autophagy-lysosomal pathway (ALP) regulates cell homeostasis and plays a crucial role in human diseases, such as lysosomal storage disorders (LSDs) and common neurodegenerative diseases. Therefore, the identification of DNA sequence variations in genes involved in this pathway and their association with human diseases would have a significant impact on health. To this aim, we developed Lysoplex, a targeted next-generation sequencing (NGS) approach, which allowed us to obtain a uniform and accurate coding sequence coverage of a comprehensive set of 891 genes involved in lysosomal, endocytic, and autophagic pathways. Lysoplex was successfully validated on 14 different types of LSDs and then used to analyze 48 mutation-unknown patients with a clinical phenotype of neuronal ceroid lipofuscinosis (NCL), a genetically heterogeneous subtype of LSD. Lysoplex allowed us to identify pathogenic mutations in 67% of patients, most of whom had been unsuccessfully analyzed by several sequencing approaches. In addition, in 3 patients, we found potential disease-causing variants in novel NCL candidate genes. We then compared the variant detection power of Lysoplex with data derived from public whole exome sequencing (WES) efforts. On average, a 50% higher number of validated amino acid changes and truncating variations per gene were identified. Overall, we identified 61 truncating sequence variations and 488 missense variations with a high probability to cause loss of function in a total of 316 genes. Interestingly, some loss-of-function variations of genes involved in the ALP pathway were found in homozygosity in the normal population, suggesting that their role is not essential. Thus, Lysoplex provided a comprehensive catalog of sequence variants in ALP genes and allows the assessment of their relevance in cell biology as well as their contribution to human disease. PMID:26075876

  18. Inter-specific sequence conservation and intra-individual sequence variation in a spider silk gene.

    PubMed

    Tai, Pei-Ling; Hwang, Guang-Yuh; Tso, I-Min

    2004-10-01

    Currently, studies on major ampullate spidroin 1 (MaSp1) genes of non-orb weaving spiders are few, and it is not clear whether genes of these organisms exhibit the same characteristics as those of orb-weavers. In addition, many studies have proposed that MaSp1 might be a single gene with allelic variants, but supporting evidence is still lacking. In this study, we compared partial DNA and amino acid sequences of MaSp1 cloned from different spider guilds. We also cloned partial MaSp1 sequences from genomic DNA and cDNA of the same individuals of spiders using the same primer combination to see if different molecular forms existed. In the repetitive region of partial MaSp1 sequences obtained, GGX, GA and poly-A motifs were present in all Araneomorphae and Mygalomorpae species examined. An extreme similarity in MaSp1 non-repetitive portions was found in sequences of ecribellate, cribellate and Mygalomorphae web-builders and such a result suggested that this sequence might exhibit an important function. A comparison of sequences amplified from the same individual showed that substitutions in amino acids occurred in both repetitive and non-repetitive regions, with a much higher variation in the former. These results suggest that the MaSp1 of Araneomorphae spiders exhibits several forms in an individual spider and it might be either a multiple gene or a single gene with a multiple exon/intron organization.

  19. A sparse model based detection of copy number variations from exome sequencing data

    PubMed Central

    Duan, Junbo; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2016-01-01

    Goal Whole-exome sequencing provides a more cost-effective way than whole-genome sequencing for detecting genetic variants such as copy number variations (CNVs). Although a number of approaches have been proposed to detect CNVs from whole-genome sequencing, a direct adoption of these approaches to whole-exome sequencing will often fail because exons are separately located along a genome. Therefore, an appropriate method is needed to target the specific features of exome sequencing data. Methods In this paper a novel sparse model based method is proposed to discover CNVs from multiple exome sequencing data. First, exome sequencing data are represented with a penalized matrix approximation, and technical variability and random sequencing errors are assumed to follow a generalized Gaussian distribution. Second, an iteratively re-weighted least squares algorithm is used to estimate the solution. Results The method is tested and validated on both synthetic and real data, and compared with other approaches including CoNIFER, XHMM and cn.MOPS. The test demonstrates that the proposed method outperform other approaches. Conclusion The proposed sparse model can detect CNVs from exome sequencing data with high power and precision. Significance Sparse model can target the specific features of exome sequencing data. The software codes are freely available at http://www.tulane.edu/wyp/software/ExonCNV.m PMID:26258935

  20. Protein 3D structure computed from evolutionary sequence variation.

    PubMed

    Marks, Debora S; Colwell, Lucy J; Sheridan, Robert; Hopf, Thomas A; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures

  1. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  2. CODEX: a normalization and copy number variation detection method for whole exome sequencing.

    PubMed

    Jiang, Yuchao; Oldridge, Derek A; Diskin, Sharon J; Zhang, Nancy R

    2015-03-31

    High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.

  3. Natural Allelic Variations in Highly Polyploidy Saccharum Complex

    PubMed Central

    Song, Jian; Yang, Xiping; Resende, Marcio F. R.; Neves, Leandro G.; Todd, James; Zhang, Jisen; Comstock, Jack C.; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes. PMID:27375658

  4. Natural Allelic Variations in Highly Polyploidy Saccharum Complex.

    PubMed

    Song, Jian; Yang, Xiping; Resende, Marcio F R; Neves, Leandro G; Todd, James; Zhang, Jisen; Comstock, Jack C; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.

  5. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  6. Analysis of sequence variation in Gnathostoma spinigerum mitochondrial DNA by single-strand conformation polymorphism analysis and DNA sequence.

    PubMed

    Ngarmamonpirat, Charinthon; Waikagul, Jitra; Petmitr, Songsak; Dekumyoy, Paron; Rojekittikhun, Wichit; Anantapruti, Malinee T

    2005-03-01

    Morphological variations were observed in the advance third stage larvae of Gnathostoma spinigerum collected from swamp eel (Fluta alba), the second intermediate host. Larvae with typical and three atypical types were chosen for partial cytochrome c oxidase subunit I (COI) gene sequence analysis. A 450 bp polymerase chain reaction product of the COI gene was amplified from mitochondrial DNA. The variations were analyzed by single-strand conformation polymorphism and DNA sequencing. The nucleotide variations of the COI gene in the four types of larvae indicated the presence of an intra-specific variation of mitochondrial DNA in the G. spinigerum population.

  7. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

    PubMed

    Laehnemann, David; Borkhardt, Arndt; McHardy, Alice Carolyn

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

  8. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

    PubMed

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

    2015-01-15

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development.

  9. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions

    PubMed Central

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M.; Greenwood, Alex D.; Roca, Alfred L.

    2014-01-01

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  10. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    PubMed Central

    Chateigner, Aurélien; Bézier, Annie; Labrousse, Carole; Jiolle, Davy; Barbe, Valérie; Herniou, Elisabeth A.

    2015-01-01

    Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%). K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs). Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential. PMID:26198241

  11. A map of human genome variation from population-scale sequencing.

    PubMed

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  12. Variation in the sequence and modification state of the human insulin gene flanking regions.

    PubMed

    Ullrich, A; Dull, T J; Gray, A; Philips, J A; Peter, S

    1982-04-10

    The nucleotide sequence of a highly repetitive sequence region upstream from the human insulin gene is reported. The length of this region varies between alleles in the population, and appears to be stably transmitted to the next generation in a Mendelian fashion. There is no significant correlation between the length of this sequence and two types of diabetes mellitus. We observe variation in the cleavability of a BglI recognition site downstream from the human insulin gene, which is probably due to variable nucleotide modification. This presumed modification state appears not to be inherited, and varies between tissues within an individual and between individuals for a given tissue. Both alleles in a given tissue DNA sample are modified to the same extent.

  13. Characterization of ADME gene variation in 21 populations by exome sequencing

    PubMed Central

    Hovelson, Daniel H.; Xue, Zhengyu; Zawistowski, Matthew; Ehm, Margaret G.; Harris, Elizabeth C.; Stocker, Sophie L.; Gross, Annette S.; Jang, In-Jin; Ieiri, Ichiro; Lee, Jong-Eun; Cardon, Lon R.; Chissoe, Stephanie L.; Abecasis, Gonçalo

    2017-01-01

    Objective Proteins involving absorption, distribution, metabolism, and excretion (ADME) play a critical role in drug pharmacokinetics. The type and frequency of genetic variation in the ADME genes differ among populations. The aim of this study was to systematically investigate common and rare ADME coding variation in diverse ethnic populations by exome sequencing. Materials and methods Data derived from commercial exome capture arrays and next-generation sequencing were used to characterize coding variation in 298 ADME genes in 251 Northeast Asians and 1181 individuals from the 1000 Genomes Project. Results Approximately 75% of the ADME coding sequence was captured at high quality across the joint samples harboring more than 8000 variants, with 49% of individuals carrying at least one ‘knockout’ allele. ADME genes carried 50% more nonsynonymous variation than non-ADME genes (P=8.2×10–13) and showed significantly greater levels of population differentiation (P=7.6×10–11). Out of the 2135 variants identified that were predicted to be deleterious, 633 were not on commercially available ADME or general-purpose genotyping arrays. Forty deleterious variants within important ADME genes, with frequencies of at least 2% in at least one population, were identified as candidates for future pharmacogenetic studies. Conclusion Exome sequencing was effective in accurately genotyping most ADME variants important for pharmacogenetic research, in addition to identifying rare or potentially de novo coding variants that may be clinically meaningful. Furthermore, as a class, ADME genes are more variable and less sensitive to purifying selection than non-ADME genes. PMID:27984508

  14. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    PubMed Central

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A.C.T; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-01-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts. PMID:27617755

  15. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    NASA Astrophysics Data System (ADS)

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-09-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

  16. Spatio-temporal Variations of Characteristic Repeating Earthquake Sequences along the Middle America Trench in Mexico

    NASA Astrophysics Data System (ADS)

    Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.

    2015-12-01

    Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.

  17. Phylogenetic Sequence Variations in Bacterial rRNA Affect Species-Specific Susceptibility to Drugs Targeting Protein Synthesis▿‡

    PubMed Central

    Akshay, Subramanian; Bertea, Mihai; Hobbie, Sven N.; Oettinghaus, Björn; Shcherbakov, Dimitri; Böttger, Erik C.; Akbergenov, Rashid

    2011-01-01

    Antibiotics targeting the bacterial ribosome typically bind to highly conserved rRNA regions with only minor phylogenetic sequence variations. It is unclear whether these sequence variations affect antibiotic susceptibility or resistance development. To address this question, we have investigated the drug binding pockets of aminoglycosides and macrolides/ketolides. The binding site of aminoglycosides is located within helix 44 of the 16S rRNA (A site); macrolides/ketolides bind to domain V of the 23S rRNA (peptidyltransferase center). We have used mutagenesis of rRNA sequences in Mycobacterium smegmatis ribosomes to reconstruct the different bacterial drug binding sites and to study the effects of rRNA sequence variations on drug activity. Our results provide a rationale for differences in species-specific drug susceptibility patterns and species-specific resistance phenotypes associated with mutational alterations in the drug binding pocket. PMID:21730122

  18. Storage and retrieval of highly repetitive sequence collections.

    PubMed

    Mäkinen, Veli; Navarro, Gonzalo; Sirén, Jouni; Välimäki, Niko

    2010-03-01

    A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible sequence will be incompressible under this model. We develop new static and dynamic full-text indexes that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations. The new indexes can be plugged into a recent dynamic fully-compressed suffix tree, achieving full functionality for sequence analysis, while retaining the reduced space and the polylogarithmic slowdown. Our experimental results confirm the practicality of our proposal.

  19. Variation in the nucleotide sequence of a prolamin gene family in wild rice.

    PubMed

    Barbier, P; Ishihama, A

    1990-07-01

    Variation in the DNA sequence of the 10 kDa prolamin gene family within the wild rice species Oryza rufipogon was probed using the direct sequencing of PCR-amplified genes. A comparison of the nucleotide and deduced amino-acid sequences of eight Asian strains of O. rufipogon and one strain of the related African species O. longistaminata is presented.

  20. Intragenomic and interspecific 5S rDNA sequence variation in five Asian pines.

    PubMed

    Liu, Zhan-Lin; Zhang, Daming; Wang, Xiao-Quan; Ma, Xiao-Fei; Wang, Xiao-Ru

    2003-01-01

    Patterns of intragenomic and interspecific variation of 5S rDNA in Pinus (Pinaceae) were studied by cloning and sequencing multiple 5S rDNA repeats from individual trees. Five pines, from both subgenera, Pinus and Strobus, were selected. The 5S rDNA repeat in pines has a conserved 120-base pair (bp) transcribed region and an intergenic spacer region of variable length (382-608 bp). The evolutionary rate in the spacer region is three- to sevenfold higher than in the genic region. We found substantial sequence divergence between the two subgenera. Intragenomic sequence heterogeneity was high for all species, and more than 86% of the clones within each individual were unique. The 5S gene tree revealed that different 5S repeats within individuals are polyphyletic, indicating that their ancestral divergence preceded the speciation events. The degrees of interspecific and intragenomic divergence among diploxylon pines are similar. The observed sequence patterns suggest that concerted evolution has been acting after the diversification of the two subgenera but very weak after the speciation of the four diploxylon pines. Sequence patterns in P. densata are consistent with hybrid origin. It had higher intragenomic diversity and maintained polymorphic copies of the parental types in addition to new and recombinant types unique to the hybrid.

  1. BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers

    PubMed Central

    Abo, Ryan P.; Ducar, Matthew; Garcia, Elizabeth P.; Thorner, Aaron R.; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M.; Hahn, William C.; Meyerson, Matthew; Lindeman, Neal I.; Van Hummelen, Paul; MacConaill, Laura E.

    2015-01-01

    Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for ‘targeted’ resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a ‘kmer’ strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. PMID:25428359

  2. Forward Genetics by Sequencing EMS Variation-Induced Inbred Lines

    PubMed Central

    Addo-Quaye, Charles; Buescher, Elizabeth; Best, Norman; Chaikam, Vijay; Baxter, Ivan; Dilkes, Brian P.

    2016-01-01

    In order to leverage novel sequencing techniques for cloning genes in eukaryotic organisms with complex genomes, the false positive rate of variant discovery must be controlled for by experimental design and informatics. We sequenced five lines from three pedigrees of ethyl methanesulfonate (EMS)-mutagenized Sorghum bicolor, including a pedigree segregating a recessive dwarf mutant. Comparing the sequences of the lines, we were able to identify and eliminate error-prone positions. One genomic region contained EMS mutant alleles in dwarfs that were homozygous reference sequences in wild-type siblings and heterozygous in segregating families. This region contained a single nonsynonymous change that cosegregated with dwarfism in a validation population and caused a premature stop codon in the Sorghum ortholog encoding the gibberellic acid (GA) biosynthetic enzyme ent-kaurene oxidase. Application of exogenous GA rescued the mutant phenotype. Our method for mapping did not require outcrossing and introduced no segregation variance. This enables work when line crossing is complicated by life history, permitting gene discovery outside of genetic models. This inverts the historical approach of first using recombination to define a locus and then sequencing genes. Our formally identical approach first sequences all the genes and then seeks cosegregation with the trait. Mutagenized lines lacking obvious phenotypic alterations are available for an extension of this approach: mapping with a known marker set in a line that is phenotypically identical to starting material for EMS mutant generation. PMID:28040779

  3. Wide variation in microsatellite sequences within each Pfcrt mutant haplotype.

    PubMed

    Vinayak, Sumiti; Mittra, Pooja; Sharma, Yagya D

    2006-05-01

    Flanking microsatellites for each of the Pfcrt mutant haplotype of Plasmodium falciparum remain conserved among geographical isolates. We describe here heterogeneity in the intragenic microsatellites among each of the Pfcrt haplotype. There were fourteen different alleles of AT repeats of intron 2 and eight alleles of TA repeats of intron 4 of the pfcrt gene among Indian isolates. This resulted in 33 different two-locus (intron 2 plus intron 4) microsatellite genotypes among 224 isolates. There were 15 different two-locus microsatellite genotypes within the South American Pfcrt haplotype (S72V73M74N75T76S220) and 11 genotypes in the southeast Asian haplotype (C72V73I74E75T76S220) in these isolates. Indian isolates with Pfcrt haplotype C72V73I74E75T76S220 shared one of its two-locus microsatellite genotype with southeast Asian P. falciparum parasite lines from Thailand (K1) and Indochina (Dd2 and W2). Conversely, Indian isolates containing S72V73M74N75T76S220 Pfcrt haplotype did not share any of their two-locus microsatellite genotype with South American parasite line 7G8 from Brazil. Significantly, large number of newer two-locus microsatellite genotypes were detected in a 2-year time period (P<0.05). Microsatellite variation was more prominent in the areas of high malaria transmission. It is concluded that the genetic recombination in the intragenic microsatellites continues in the parasite population even after microsatellites flanking the pfcrt gene had already been fixed. Presence of various Pfcrt haplotypes and a variety of intragenic microsatellites indicates that there is a wide spectrum of chloroquine resistant parasite population in India. This information should be useful for malaria control programs of the country.

  4. CNV-TV: A robust method to discover copy number variation from short sequencing reads

    PubMed Central

    2013-01-01

    Background Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. Results A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. Conclusion The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods. PMID:23634703

  5. From phenotypes to causal sequences: using genome wide association studies to dissect the sequence basis for variation of plant development.

    PubMed

    Ogura, Takehiko; Busch, Wolfgang

    2015-02-01

    Tremendous natural variation of growth and development exists within species. Uncovering the molecular mechanisms that tune growth and development promises to shed light on a broad set of biological issues including genotype to phenotype relations, regulatory mechanisms of biological processes and evolutionary questions. Recent progress in sequencing and data processing capabilities has enabled Genome Wide Association Studies (GWASs) to identify DNA sequence polymorphisms that underlie the variation of biological traits. In the last years, GWASs have proven powerful in revealing the complex genetic bases of many phenotypes in various plant species. Here we highlight successful recent GWASs that uncovered mechanistic and sequence bases of trait variation related to plant growth and development and discuss important considerations for conducting successful GWASs.

  6. Sequence variation in the Tbx4 gene in marine mammals.

    PubMed

    Onbe, Kaori; Nishida, Shin; Sone, Emi; Kanda, Naohisa; Goto, Mutsuo; Pastene, Luis A; Tanabe, Shinsuke; Koike, Hiroko

    2007-05-01

    The amino-acid sequences of the T-domain region of the Tbx4 gene, which is required for hindlimb development, are 100% identical in humans and mice. Cetaceans have lost most of their hindlimb structure, although hindlimb buds are present in very early cetacean embryos. To examine whether the Tbx4 gene has the same function in cetaceans as in other mammals, we analyzed Tbx4 sequences from cetaceans, dugong, artiodactyls and marine carnivores. A total of 39 primers were designed using human and dog Tbx4 nucleotide sequences. Exons 3, 4, 5, 6, 7, and 8 of the Tbx4 genes from cetaceans, artiodactyls, and marine carnivores were sequenced. Non-synonymous substitution sites were detected in the T-domain regions from some cetacean species, but were not detected in those from artiodactyls, the dugong, or the carnivores. The C-terminal regions contained a number of non-synonymous substitutions. Although some indels were present, they were in groups of three nucleotides and therefore did not cause frame shifts. The dN/dS values for the T-domain and C-terminal regions of the cetacean and artiodactylous Tbx4 genes were much lower than 1, indicating that the Tbx4 gene maintains it function in cetaceans, although full expression leading to hindlimb development is suppressed.

  7. Variation in Symbiodinium ITS2 Sequence Assemblages among Coral Colonies

    PubMed Central

    Stat, Michael; Bird, Christopher E.; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J.; Concepcion, Gregory T.; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J.; Gates, Ruth D.

    2011-01-01

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping. PMID:21246044

  8. Variation in Symbiodinium ITS2 sequence assemblages among coral colonies.

    PubMed

    Stat, Michael; Bird, Christopher E; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J; Concepcion, Gregory T; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J; Gates, Ruth D

    2011-01-05

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping.

  9. Sequence variation of alcohol dehydrogenase (Adh) paralogs in cactophilic Drosophila.

    PubMed Central

    Matzkin, Luciano M; Eanes, Walter F

    2003-01-01

    This study focuses on the population genetics of alcohol dehydrogenase (Adh) in cactophilic Drosophila. Drosophila mojavensis and D. arizonae utilize cactus hosts, and each host contains a characteristic mixture of alcohol compounds. In these Drosophila species there are two functional Adh loci, an adult form (Adh-2) and a larval and ovarian form (Adh-1). Overall, the greater level of variation segregating in D. arizonae than in D. mojavensis suggests a larger population size for D. arizonae. There are markedly different patterns of variation between the paralogs across both species. A 16-bp intron haplotype segregates in both species at Adh-2, apparently the product of an ancient gene conversion event between the paralogs, which suggests that there is selection for the maintenance of the intron structure possibly for the maintenance of pre-mRNA structure. We observe a pattern of variation consistent with adaptive protein evolution in the D. mojavensis lineage at Adh-1, suggesting that the cactus host shift that occurred in the divergence of D. mojavensis from D. arizonae had an effect on the evolution of the larval expressed paralog. Contrary to previous work we estimate a recent time for both the divergence of D. mojavensis and D. arizonae (2.4 +/- 0.7 MY) and the age of the gene duplication (3.95 +/- 0.45 MY). PMID:12586706

  10. Draft genome sequence of an elite Dura palm and whole-genome patterns of DNA variation in oil palm

    PubMed Central

    Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua

    2016-01-01

    Oil palm is the world’s leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits. PMID:27426468

  11. Draft genome sequence of an elite Dura palm and whole-genome patterns of DNA variation in oil palm.

    PubMed

    Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua

    2016-12-01

    Oil palm is the world's leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits.

  12. A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato

    PubMed Central

    Uitdewilligen, Jan G. A. M. L.; Wolters, Anne-Marie A.; D’hoop, Bjorn B.; Borm, Theo J. A.; Visser, Richard G. F.; van Eck, Herman J.

    2013-01-01

    Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS) of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a genome wide scale, across the cultivated gene pool. Therefore, we enriched cultivar-specific DNA sequencing libraries using an in-solution hybridisation method (SureSelect). This complexity reduction allowed to confine our study to 807 target genes distributed across the genomes of 83 tetraploid cultivars and one reference (DM 1–3 511). Indexed sequencing libraries were paired-end sequenced in 7 pools of 12 samples using Illumina HiSeq2000. After filtering and processing the raw sequence data, 12.4 Gigabases of high-quality sequence data was obtained, which mapped to 2.1 Mb of the potato reference genome, with a median average read depth of 63× per cultivar. We detected 129,156 sequence variants and genotyped the allele copy number of each variant for every cultivar. In this cultivar panel a variant density of 1 SNP/24 bp in exons and 1 SNP/15 bp in introns was obtained. The average minor allele frequency (MAF) of a variant was 0.14. Potato germplasm displayed a large number of relatively rare variants and/or haplotypes, with 61% of the variants having a MAF below 0.05. A very high average nucleotide diversity (π = 0.0107) was observed. Nucleotide diversity varied among potato chromosomes. Several genes under selection were identified. Genotyping-by-sequencing results, with allele copy number estimates, were validated with a KASP genotyping assay. This validation showed that read depths of ∼60–80× can be used as a lower boundary for reliable assessment of allele copy number of sequence variants in autotetraploids. Genotypic data were associated with traits, and

  13. Analysis of Sequence Variation and Risk Association of Human Papillomavirus 52 Variants Circulating in Korea

    PubMed Central

    Choi, Youn Jin; Ki, Eun Young; Zhang, Chuqing; Ho, Wendy C. S.; Lee, Sung-Jong; Jeong, Min Jin

    2016-01-01

    Introduction Human papillomavirus (HPV) 52 is a carcinogenic, high-risk genotype frequently detected in cervical cancer cases from East Asia, including Korea. Materials and Methods Sequences of HPV52 detected in 91 cervical samples collected from women attending Seoul St. Mary’s Hospital were analyzed. HPV52 genomic sequences were obtained by polymerase chain reaction (PCR)-based sequencing and analyzed using Seq-Scape software, and phylogenetic trees were constructed using MEGA6 software. Results Of the 91 cervical samples, 40 were normal, 22 were low-grade lesions, 21 were high-grade lesions and 7 were squamous cell carcinomas. Four HPV52 variant lineages (A, B, C and D) were identified. Lineage B was the most frequently detected lineage, followed by lineage C. By analyzing the two most frequently detected lineages (B and C), we found that distinct variations existed in each lineage. We also found that a lineage B-specific mutation K93R (A379G) was associated with an increased risk of cervical neoplasia. Conclusions To our knowledge, we are the first to reveal the predominance of the HPV52 lineages, B and C, in Korea. We also found these lineages harbored distinct genetic alterations that may affect oncogenicity. Our findings increase our understanding on the heterogeneity of HPV52 variants, and may be useful for the development of new diagnostic assays and therapeutic vaccines. PMID:27977741

  14. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-01-01

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants' isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  15. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-12-31

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  16. Targeted deep sequencing of flowering regulators in Brassica napus reveals extensive copy number variation

    PubMed Central

    Schiessl, Sarah; Huettel, Bruno; Kuehn, Diana; Reinhardt, Richard; Snowdon, Rod J.

    2017-01-01

    Gene copy number variation (CNV) is increasingly implicated in control of complex trait networks, particularly in polyploid plants like rapeseed (Brassica napus L.) with an evolutionary history of genome restructuring. Here we performed sequence capture to assay nucleotide variation and CNV in a panel of central flowering time regulatory genes across a species-wide diversity set of 280 B. napus accessions. The genes were chosen based on prior knowledge from Arabidopsis thaliana and related Brassica species. Target enrichment was performed using the Agilent SureSelect technology, followed by Illumina sequencing. A bait (probe) pool was developed based on results of a preliminary experiment with representatives from different B. napus morphotypes. A very high mean target coverage of ~670x allowed reliable calling of CNV, single nucleotide polymorphisms (SNPs) and insertion-deletion (InDel) polymorphisms. No accession exhibited no CNV, and at least one homolog of every gene we investigated showed CNV in some accessions. Some CNV appear more often in specific morphotypes, indicating a role in diversification. PMID:28291231

  17. Engineering the Dynamic Properties of Protein Networks through Sequence Variation

    PubMed Central

    2016-01-01

    The dynamic behavior of macromolecular networks dominates the mechanical properties of soft materials and influences biological processes at multiple length scales. In hydrogels prepared from self-assembling artificial proteins, stress relaxation and energy dissipation arise from the transient character of physical network junctions. Here we show that subtle changes in sequence can be used to program the relaxation behavior of end-linked networks of engineered coiled-coil proteins. Single-site substitutions in the coiled-coil domains caused shifts in relaxation time over 5 orders of magnitude as demonstrated by dynamic oscillatory shear rheometry and stress relaxation measurements. Networks with multiple relaxation time scales were also engineered. This work demonstrates how time-dependent mechanical responses of macromolecular materials can be encoded in genetic information. PMID:27924309

  18. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  19. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

    PubMed

    Sheynkman, Gloria M; Shortreed, Michael R; Cesnik, Anthony J; Smith, Lloyd M

    2016-06-12

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  20. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  1. The complete nucleotide sequence of the Crossostoma lacustre mitochondrial genome: conservation and variations among vertebrates.

    PubMed Central

    Tzeng, C S; Hui, C F; Shen, S C; Huang, P C

    1992-01-01

    The complete mitochondrial (mt) genome of Crossostoma lacustre, a freshwater loach from mountain stream of Taiwan, has been cloned and sequenced. This fish mt genome, consisting of 16558 base-pairs, encodes genes for 13 proteins, two rRNAs, and 22 tRNAs, in addition to a regulatory sequence for replication and transcription (D-loop), is similar to those of the other vertebrates in both the order and orientation of these genes. The protein-coding and ribosomal RNA genes are highly homologous both in size and composition, to their counterparts in mammals, birds, amphibians, and invertebrates, and using essentially the same set of codons, including both the initiation and termination signals, and the tRNAs. Differences do exist, however, in the lengths and sequences of the D-loop regions, and in space between genes, which account for the variations in total lengths of the genomes. Our observations provide evidence for the first time for the conservation of genetic information in the fish mitochondrial genome, especially among the vertebrates. PMID:1408800

  2. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  3. Targeted Exome Sequencing Outcome Variations of Colorectal Tumors within and across Two Sequencing Platforms

    PubMed Central

    Ashktorab, Hassan; Azimi, Hamed; Nickerson, Michael L.; Bass, Sara; Varma, Sudhir; Brim, Hassan

    2016-01-01

    Background and Aim Next generation sequencing (NGS) has quickly the tool of choice for genome and exome data generation. The multitude of sequencing platforms as well as the variabilities within each platform need to be assessed. In this paper we used two platforms (ION TORRENT AND ILLUMINA) to assess single nucleotides variants in colorectal cancer (CRC) specimens. Methods CRC specimens (n = 13) collected from 6 CRC (cancer and matched normal) patients were used to establish the mutational profile using ION TORRENT AND ILLUMINA sequencing platforms. We analyzed a set of samples from Formalin Fixed Paraffin Embedded and FF (FF) samples on both platforms to assess the effect of sample nature (FFPE vs. FF) on sequencing outcome and to evaluate the similarity/differences of SNVs across the two platforms. In addition, duplicates of FF samples were sequenced on each platform to assess variability within platform. Results The comparison of FF replicates to each other gave a concordance of 77% (± 15.3%) in Ion Torrent and 70% (± 3.7%) in Illumina. FFPE vs. FF replicates gave a concordance of 40% (± 32%) in Ion Torrent and 49% (± 19%) in Illumina. For the cross platform concordance were FFPE compared to FF (Average of 75% (± 9.8%) for FFPE samples and 67% (± 32%) for FF and 70% (± 26.8%) overall average). Conclusion Our data show a significant variability within and across platforms. Also the number of detected variants depend on the nature of the specimen; FF vs. FFPE. Validation of NGS discovered mutations is a must to rule-out false positive mutants. This validation might either be performed through a second NGS platform or through Sanger sequencing. PMID:27547838

  4. Sequence variation and methylation of the flax 5S RNA genes.

    PubMed Central

    Goldsbrough, P B; Ellis, T H; Lomonossoff, G P

    1982-01-01

    The complete sequence of the flax 5S DNA repeat is presented. Length heterogeneity is the consequence of the presence or absence of a single direct repeat and the majority of single base changes are transition mutations. No sequence variation has been found in the coding sequence. The extent of methylation of cytosines has been measured at one location in the gene and one in the spacer. The relationship between the observed sequence heterogeneity and the level of methylation is discussed in the context of the operation of a correction mechanism. Images PMID:6290983

  5. Copy number variation of individual cattle genomes using next-generation sequencing

    PubMed Central

    Bickhart, Derek M.; Hou, Yali; Schroeder, Steven G.; Alkan, Can; Cardone, Maria Francesca; Matukumalli, Lakshmi K.; Song, Jiuzhou; Schnabel, Robert D.; Ventura, Mario; Taylor, Jeremy F.; Garcia, Jose Fernando; Van Tassell, Curtis P.; Sonstegard, Tad S.; Eichler, Evan E.; Liu, George E.

    2012-01-01

    Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising ∼55.6-Mbp sequence—476 of which (∼38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (∼52%, χ2 test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome. PMID:22300768

  6. Variation of partial transferrin sequences and phylogenetic relationships among hares (Lepus capensis, Lagomorpha) from Tunisia.

    PubMed

    Awadi, Asma; Suchentrunk, Franz; Makni, Mohamed; Ben Slimen, Hichem

    2016-10-01

    North African hares are currently included in cape hares, Lepus capensis sensu lato, a taxon that may be considered a superspecies or a complex of closely related species. The existing molecular data, however, are not unequivocal, with mtDNA control region sequences suggesting a separate species status and nuclear loci (allozymes, microsatellites) revealing conspecificity of L. capensis and L. europaeus. Here, we study sequence variation in the intron 6 (468 bp) of the transferrin nuclear gene, of 105 hares with different coat colour from different regions in Tunisia with respect to genetic diversity and differentiation, as well as their phylogenetic status. Forty-six haplotypes (alleles) were revealed and compared phylogenetically to all available TF haplotypes of various Lepus species retrieved from GenBank. Maximum Likelihood, neighbor joining and median joining network analyses concordantly grouped all currently obtained haplotypes together with haplotypes belonging to six different Chinese hare species and the African scrub hare L. saxatilis. Moreover, two Tunisian haploypes were shared with L. capensis, L timidus, L. sinensis, L. yarkandensis, and L. hainanus from China. These results indicated the evolutionary complexity of the genus Lepus with the mixing of nuclear gene haplotypes resulting from introgressive hybridization or/and shared ancestral polymorphism. We report the presence of shared ancestral polymorphism between North African and Chinese hares. This has not been detected earlier in the mtDNA sequences of the same individuals. Genetic diversity of the TF sequences from the Tunisian populations was relatively high compared to other hare populations. However, genetic differentiation and gene flow analyses (AMOVA, FST, Nm) indicated little divergence with the absence of geographically meaningful phylogroups and lack of clustering with coat colour types. These results confirm the presence of a single hare species in Tunisia, but a sound inference on

  7. Tough Coating Proteins: Subtle Sequence Variation Modulates Cohesion

    PubMed Central

    Das, Saurabh; Miller, Dusty R.; Kaufman, Yair; Martinez Rodriguez, Nadine R.; Pallaoro, Alessia; Harrington, Matthew J.; Gylys, Maryte; Israelachvili, Jacob N.; Waite, J. Herbert

    2015-01-01

    Mussel foot protein-1 (mfp-1) is an essential constituent of the protective cuticle covering all exposed portions of the byssus (plaque and the thread) that marine mussels use to attach to intertidal rocks. The reversible complexation of Fe3+ by the 3,4-dihydroxyphenylalanine (Dopa) side chains in mfp-1 in Mytilus californianus cuticle is responsible for its high extensibility (120%) as well as its stiffness (2 GPa) due to the formation of sacrificial bonds that help to dissipate energy and avoid accumulation of stresses in the material. We have investigated the interactions between Fe3+ and mfp-1 from two mussel species, M. californianus (Mc) and M. edulis (Me), using both surface sensitive and solution phase techniques. Our results show that although mfp-1 homologues from both species bind Fe3+, mfp-1 (Mc) contains Dopa with two distinct Fe3+-binding tendencies and prefers to form intramolecular complexes with Fe3+. In contrast, mfp-1 (Me) is better adapted to intermolecular Fe3+ binding by Dopa. Addition of Fe3+ did not significantly increase the cohesion energy between the mfp-1 (Mc) films at pH 5.5. However, iron appears to stabilize the cohesive bridging of mfp-1 (Mc) films at the physiologically relevant pH of 7.5, where most other mfps lose their ability to adhere reversibly. Understanding the molecular mechanisms underpinning the capacity of M. californianus cuticle to withstand twice the strain of M. edulis cuticle is important for engineering of tunable strain tolerant composite coatings for biomedical applications. PMID:25692318

  8. Variation in the prion protein sequence in Dutch goat breeds.

    PubMed

    Windig, J J; Hoving, R A H; Priem, J; Bossers, A; van Keulen, L J M; Langeveld, J P M

    2016-10-01

    Scrapie is a neurodegenerative disease occurring in goats and sheep. Several haplotypes of the prion protein increase resistance to scrapie infection and may be used in selective breeding to help eradicate scrapie. In this study, frequencies of the allelic variants of the PrP gene are determined for six goat breeds in the Netherlands. Overall frequencies in Dutch goats were determined from 768 brain tissue samples in 2005, 766 in 2008 and 300 in 2012, derived from random sampling for the national scrapie surveillance without knowledge of the breed. Breed specific frequencies were determined in the winter 2013/2014 by sampling 300 breeding animals from the main breeders of the different breeds. Detailed analysis of the scrapie-resistant K222 haplotype was carried out in 2014 for 220 Dutch Toggenburger goats and in 2015 for 942 goats from the Saanen derived White Goat breed. Nine haplotypes were identified in the Dutch breeds. Frequencies for non-wild type haplotypes were generally low. Exception was the K222 haplotype in the Dutch Toggenburger (29%) and the S146 haplotype in the Nubian and Boer breeds (respectively 7 and 31%). The frequency of the K222 haplotype in the Toggenburger was higher than for any other breed reported in literature, while for the White Goat breed it was with 3.1% similar to frequencies of other Saanen or Saanen derived breeds. Further evidence was found for the existence of two M142 haplotypes, M142 /S240 and M142 /P240 . Breeds vary in haplotype frequencies but frequencies of resistant genotypes are generally low and consequently selective breeding for scrapie resistance can only be slow but will benefit from animals identified in this study. The unexpectedly high frequency of the K222 haplotype in the Dutch Toggenburger underlines the need for conservation of rare breeds in order to conserve genetic diversity rare or absent in other breeds.

  9. Mitochondrial DNA sequence variation in Iranian native dogs.

    PubMed

    Amiri Ghanatsaman, Zeinab; Adeola, Adeniyi C; Asadi Fozi, Masood; Ma, Ya-Ping; Peng, Min-Sheng; Wang, Guo-Dong; Esmailizadeh, Ali; Zhang, Ya-Ping

    2017-03-17

    The dog mtDNA diversity picture from wide geographical sampling but from a small number of individuals per region or breed, displayed little geographical correlation and high degree of haplotype sharing between very distant breeds. For a clear picture, we extensively surveyed Iranian native dogs (n = 305) in comparison with published European (n = 443) and Southwest Asian (n = 195) dogs. Twelve haplotypes related to haplogroups A, B and C were shared by Iranian, European, Southwest Asian and East Asian dogs. In Iran, haplotype and nucleotide diversities were highest in east, southeast and northwest populations while western population had the least. Sarabi and Saluki dog populations can be assigned into haplogroups A, B, C and D; Qahderijani and Kurdi to haplogroups A, B and C, Torkaman to haplogroups A, B and D while Sangsari and Fendo into haplogroups A and B, respectively. Evaluation of population differentiation using pairwise FST generally revealed no clear population structure in most Iranian dog populations. The genetic signal of a recent demographic expansion was detected in East and Southeast populations. Further, in accordance with previous studies on dog-wolf hybridization for haplogroup d2 origin, the highest number of d2 haplotypes in Iranian dog as compared to other areas of Mediterranean basin suggests Iran as the probable center of its origin. Historical evidence showed that Silk Road linked Iran to countries in South East Asia and other parts of the world, which might have probably influenced effective gene flow within Iran and these regions. The medium nucleotide diversity observed in Iranian dog calls for utilization of appropriate management techniques in increasing effective population size.

  10. Detecting Alu insertions from high-throughput sequencing data

    PubMed Central

    David, Matei; Mustafa, Harun; Brudno, Michael

    2013-01-01

    High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions. PMID:23921633

  11. Homologous recombination drives both sequence diversity and gene content variation in Neisseria meningitidis.

    PubMed

    Kong, Ying; Ma, Jennifer H; Warren, Keisha; Tsang, Raymond S W; Low, Donald E; Jamieson, Frances B; Alexander, David C; Hao, Weilong

    2013-01-01

    The study of genetic and phenotypic variation is fundamental for understanding the dynamics of bacterial genome evolution and untangling the evolution and epidemiology of bacterial pathogens. Neisseria meningitidis (Nm) is among the most intriguing bacterial pathogens in genomic studies due to its dynamic population structure and complex forms of pathogenicity. Extensive genomic variation within identical clonal complexes (CCs) in Nm has been recently reported and suggested to be the result of homologous recombination, but the extent to which recombination contributes to genomic variation within identical CCs has remained unclear. In this study, we sequenced two Nm strains of identical serogroup (C) and multi-locus sequence type (ST60), and conducted a systematic analysis with an additional 34 Nm genomes. Our results revealed that all gene content variation between the two ST60 genomes was introduced by homologous recombination at the conserved flanking genes, and 94.25% or more of sequence divergence was caused by homologous recombination. Recombination was found in genes associated with virulence factors, antigenic outer membrane proteins, and vaccine targets, suggesting an important role of homologous recombination in rapidly altering the pathogenicity and antigenicity of Nm. Recombination was also evident in genes of the restriction and modification systems, which may undermine barriers to DNA exchange. In conclusion, homologous recombination can drive both gene content variation and sequence divergence in Nm. These findings shed new light on the understanding of the rapid pathoadaptive evolution of Nm and other recombinogenic bacterial pathogens.

  12. Sequence variation of ribosomal internal transcribed spacers (ITS) in commercially important Phytoseiidae mites.

    PubMed

    Navajas, M; Lagnel, J; Fauvel, G; de Moraes, G

    1999-11-01

    Preliminary work is needed to assess the usefulness of different markers at different taxonomic scales when a new group is analyzed, such as the commercially important Phytoseiidae mites. We investigate here the level of sequence variation of the nuclear ribosomal spacers ITS 1 and 2 and the 5.8S gene in six species of Phytoseiidae: Neoseiulus culifornicus, N. fallacis, Euseius concordis, Metaseiulus occidentalis, Typhlodromus pyri and Phytoseiulus persimilis. As expected, the 5.8S gene (148 base pairs) is markedly conserved and displays little variation in between genera comparisons. ITS1 and ITS2 show contrasting patterns: while the ITS2 is short (80-89 bp) and shows little variation, the ITS1 is longer (303-404 bp) and is very variable in sequence. This fact compromises reliable nucleotide homologies when comparing the genera. The comparison of ITS1 sequence similarity at the species level might be useful for species identification, however, the value of ITS in taxonomic studies does not extend to the level of the family. The intraspecific variations of ITS were investigated in three species: N. californicus, N. fallacis and E. concordis. The first species has identical ITS1 sequences and the last two display low polymorphism (2 nucleotide substitutions). The ITS2 and 5.8S sequences were identical in all three subspecies comparisons.

  13. Characterization of genetic sequence variation of 58 STR loci in four major population groups.

    PubMed

    Novroski, Nicole M M; King, Jonathan L; Churchill, Jennifer D; Seah, Lay Hong; Budowle, Bruce

    2016-11-01

    Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data.

  14. Analysis of the sequence variations in the Mhc DRB1-like gene of the endangered Humboldt penguin (Spheniscus humboldti).

    PubMed

    Kikkawa, Eri F; Tsuda, Tomi T; Naruse, Taeko K; Sumiyama, Daisuke; Fukuda, Michio; Kurita, Masanori; Murata, Koichi; Wilson, Rory P; LeMaho, Yvon; Tsuda, Michio; Kulski, Jerzy K; Inoko, Hidetoshi

    2005-04-01

    The Major Histocompatibility Complex (Mhc) genomic region of many vertebrates is known to contain at least one highly polymorphic class II gene that is homologous in sequence to one or other of the human Mhc DRB1 class II genes. The diversity of the avian Mhc class II gene sequences have been extensively studied in chickens, quails, and some songbirds, but have been largely ignored in the oceanic birds, including the flightless penguins. We have previously reported that several penguin species have a high degree of polymorphism on exon 2 of the Mhc class II DRB1-like gene. In this study, we present for the first time the complete nucleotide sequences of exon 2, intron 2, and exon 3 of the DRB1-like gene of 20 Humboldt penguins, a species that is presently vulnerable to the dangers of extinction. The Humboldt DRB1-like nucleotide and amino acid sequences reveal at least eight unique alleles. Phylogenetic analysis of all the available avian DRB-like sequences showed that, of five penguin species and nine other bird species, the sequences of the Humboldt penguins grouped most closely to the Little penguin and the mallard, respectively. The present analysis confirms that the sequence variations of the Mhc class II gene, DRB1, are useful for discriminating among individuals within the same penguin population as well those within different penguin population groups and species.

  15. Repetitive sequence variation and dynamics in the ribosomal DNA array of Saccharomyces cerevisiae as revealed by whole-genome resequencing

    PubMed Central

    James, Stephen A.; O'Kelly, Michael J.T.; Carter, David M.; Davey, Robert P.; van Oudenaarden, Alexander; Roberts, Ian N.

    2009-01-01

    Ribosomal DNA (rDNA) plays a key role in ribosome biogenesis, encoding genes for the structural RNA components of this important cellular organelle. These genes are vital for efficient functioning of the cellular protein synthesis machinery and as such are highly conserved and normally present in high copy numbers. In the baker's yeast Saccharomyces cerevisiae, there are more than 100 rDNA repeats located at a single locus on chromosome XII. Stability and sequence homogeneity of the rDNA array is essential for function, and this is achieved primarily by the mechanism of gene conversion. Detecting variation within these arrays is extremely problematic due to their large size and repetitive structure. In an attempt to address this, we have analyzed over 35 Mbp of rDNA sequence obtained from whole-genome shotgun sequencing (WGSS) of 34 strains of S. cerevisiae. Contrary to expectation, we find significant rDNA sequence variation exists within individual genomes. Many of the detected polymorphisms are not fully resolved. For this type of sequence variation, we introduce the term partial single nucleotide polymorphism, or pSNP. Comparative analysis of the complete data set reveals that different S. cerevisiae genomes possess different patterns of rDNA polymorphism, with much of the variation located within the rapidly evolving nontranscribed intergenic spacer (IGS) region. Furthermore, we find that strains known to have either structured or mosaic/hybrid genomes can be distinguished from one another based on rDNA pSNP number, indicating that pSNP dynamics may provide a reliable new measure of genome origin and stability. PMID:19141593

  16. Using Next-Generation Sequencing for DNA Barcoding: Capturing Allelic Variation in ITS2

    PubMed Central

    Batovska, Jana; Cogan, Noel O. I.; Lynch, Stacey E.; Blacket, Mark J.

    2016-01-01

    Internal Transcribed Spacer 2 (ITS2) is a popular DNA barcoding marker; however, in some animal species it is hypervariable and therefore difficult to sequence with traditional methods. With next-generation sequencing (NGS) it is possible to sequence all gene variants despite the presence of single nucleotide polymorphisms (SNPs), insertions/deletions (indels), homopolymeric regions, and microsatellites. Our aim was to compare the performance of Sanger sequencing and NGS amplicon sequencing in characterizing ITS2 in 26 mosquito species represented by 88 samples. The suitability of ITS2 as a DNA barcoding marker for mosquitoes, and its allelic diversity in individuals and species, was also assessed. Compared to Sanger sequencing, NGS was able to characterize the ITS2 region to a greater extent, with resolution within and between individuals and species that was previously not possible. A total of 382 unique sequences (alleles) were generated from the 88 mosquito specimens, demonstrating the diversity present that has been overlooked by traditional sequencing methods. Multiple indels and microsatellites were present in the ITS2 alleles, which were often specific to species or genera, causing variation in sequence length. As a barcoding marker, ITS2 was able to separate all of the species, apart from members of the Culex pipiens complex, providing the same resolution as the commonly used Cytochrome Oxidase I (COI). The ability to cost-effectively sequence hypervariable markers makes NGS an invaluable tool with many applications in the DNA barcoding field, and provides insights into the limitations of previous studies and techniques. PMID:27799340

  17. Molecular indicators for palaeoenvironmental change in a Messinian evaporitic sequence (Vena del Gesso, Italy). II: High-resolution variations in abundances and 13C contents of free and sulphur-bound carbon skeletons in a single marl bed

    NASA Technical Reports Server (NTRS)

    Kenig, F.; Damste, J. S.; Frewin, N. L.; Hayes, J. M.; De Leeuw, J. W.

    1995-01-01

    The extractable organic matter of 10 immature samples from a marl bed of one evaporitic cycle of the Vena del Gesso sediments (Gessoso-solfifera Fm., Messinian, Italy) was analyzed quantitatively for free hydrocarbons and organic sulphur compounds. Nickel boride was used as a desulphurizing agent to recover sulphur-bound lipids from the polar and asphaltene fractions. Carbon isotopic compositions (delta vs PDB) of free hydrocarbons and of S-bound hydrocarbons were also measured. Relationships between these carbon skeletons, precursor biolipids, and the organisms producing them could then be examined. Concentrations of S-bound lipids and free hydrocarbons and their delta values were plotted vs depth in the marl bed and the profiles were interpreted in terms of variations in source organisms, 13 C contents of the carbon source, and environmentally induced changes in isotopic fractionation. The overall range of delta values measured was 24.7%, from -11.6% for a component derived from green sulphur bacteria (Chlorobiaceae) to -36.3% for a lipid derived from purple sulphur bacteria (Chromatiaceae). Deconvolution of mixtures of components deriving from multiple sources (green and purple sulphur bacteria, coccolithophorids, microalgae and higher plants) was sometimes possible because both quantitative and isotopic data were available and because either the free or S-bound pool sometimes appeared to contain material from a single source. Several free n-alkanes and S-bound lipids appeared to be specific products of upper-water-column primary producers (i.e. algae and cyanobacteria). Others derived from anaerobic photoautotrophs and from heterotrophic protozoa (ciliates), which apparently fed partly on Chlorobiaceae. Four groups of n-alkanes produced by algae or cyanobacteria were also recognized based on systematic variations of abundance and isotopic composition with depth. For hydrocarbons probably derived from microalgae, isotopic variations are well correlated with

  18. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  19. DNA-protein recognition and sequence-dependent variations of DNA conformational properties

    NASA Astrophysics Data System (ADS)

    Vologodskii, Alexander

    2015-03-01

    Parameters of B-DNA, the major form of the double helix, depend on its sequence. This dependence can contribute to the recognition of specific DNA sequences by proteins. Here we try to analyze this contribution quantitatively. In the first approach to this goal we used experimental data on the sequence dependence of DNA bending rigidity and its helical repeat. The solution data on these parameters of B-DNA were derived from the experiments on cyclization of short DNA fragments with specially designed sequences. The data allowed calculating the sequence variations of DNA bending energy, as well as the variations of the energy of torsional deformation of the double helix associated with a protein binding. The results show that DNA conformational parameters can have very limited influence on the sequence specificity of protein binding. In the second approach we analyzed the experimental data on the binding affinity of the nucleosome core with DNA fragments of different sequences. The conclusions derived in these two approaches are in a good agreement with one another.

  20. Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation

    PubMed Central

    Zelewska, Marta A.; Pulijala, Madhuri; Spencer-Smith, Russell; Mahmood, Hiba-Tun-Noor A.; Norman, Billie; Churchward, Colin P.; Calder, Alan

    2016-01-01

    There are many types of repeated DNA sequences in the genomes of the species of the genus Neisseria, from homopolymeric tracts to tandem repeats of hundreds of bases. Some of these have roles in the phase-variable expression of genes. When a repeat mediates phase variation, reversible switching between tract lengths occurs, which in the species of the genus Neisseria most often causes the gene to switch between on and off states through frame shifting of the open reading frame. Changes in repeat tract lengths may also influence the strength of transcription from a promoter. For phenotypes that can be readily observed, such as expression of the surface-expressed Opa proteins or pili, verification that repeats are mediating phase variation is relatively straightforward. For other genes, particularly those where the function has not been identified, gathering evidence of repeat tract changes can be more difficult. Here we present analysis of the repetitive sequences that could mediate phase variation in the Neisseria gonorrhoeae strain NCCP11945 genome sequence and compare these results with other gonococcal genome sequences. Evidence is presented for an updated phase-variable gene repertoire in this species, including a class of phase variation that causes amino acid changes at the C-terminus of the protein, not previously described in N. gonorrhoeae. PMID:28348872

  1. Sequence variation in the androgen receptor gene is not a common determinant of male sexual orientation

    SciTech Connect

    Macke, J.P.; Nathans, J.; King, V.L. ); Hu, N.; Hu, S.; Hamer, D.; Bailey, M. ); Brown, T. )

    1993-10-01

    To test the hypothesis that DNA sequence variation in the androgen receptor gene plays a causal role in the development of male sexual orientation, the authors have (1) measured the degree of concordance of androgen receptor alleles in 36 pairs of homosexual brothers, (2) compared the lengths of polyglutamine and polyglycine tracts in the amino-terminal domain of the androgen receptor in a sample of 197 homosexual males and 213 unselected subjects, and (3) screened the entire androgen receptor coding region for sequence variation by PCR and denaturing gradient-gel electrophoresis (DGGE) and/or single-strand conformation polymorphism analysis in 20 homosexual males with homosexual or bisexual brothers and one homosexual male with no homosexual brothers, and screened the amino-terminal domain of the receptor for sequence variation in an additional 44 homosexual males, 37 of whom had one or more first- or second-degree male relatives who were either homosexual or bisexual. These analyses show that (1) homosexual brothers are as likely to be discordant as concordant for androgen receptor alleles; (2) there are no large-scale differences between the distributions of polyglycine or polyglutamine tract lengths in the homosexual and control groups; and (3) coding region sequence variation is not commonly found within the androgen receptor gene of homosexual men. The DGGE screen identified two rare amino acid substitutions, ser[sup 205] -to-arg and glu[sup 793]-to-asp, the biological significance of which is unknown. 32 refs., 2 figs., 2 tabs.

  2. Whole-genome sequencing reveals the diversity of cattle copy number variations and multicopy genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Structural and functional impacts of copy number variations (CNVs) on livestock genomes are not yet well understood. We identified 1853 CNV regions using population-scale sequencing data generated from 75 cattle representing 8 breeds (Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, Romagnol...

  3. Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus

    PubMed Central

    Weldatsadik, Rigbe G.; Wang, Jingwen; Puhakainen, Kai; Jiao, Hong; Jalava, Jari; Räisänen, Kati; Datta, Neeta; Skoog, Tiina; Vuopio, Jaana; Jokiranta, T. Sakari; Kere, Juha

    2017-01-01

    Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development. Pooled sequencing (Pool-seq) is a cost effective approach for population level genetic studies that require large numbers of samples such as various strains of a microbe. To test the use of Pool-seq in identifying variation, we pooled DNA of 100 Streptococcus pyogenes strains of different emm types in two pools, each containing 50 strains. We used four variant calling tools (Freebayes, UnifiedGenotyper, SNVer, and SAMtools) and one emm1 strain, SF370, as a reference genome. In total 63719 SNPs and 164 INDELs were identified in the two pools concordantly by at least two of the tools. Majority of the variants (93.4%) from six individually sequenced strains used in the pools could be identified from the two pools and 72.3% and 97.4% of the variants in the pools could be mined from the analysis of the 44 complete Str. pyogenes genomes and 3407 sequence runs deposited in the European Nucleotide Archive respectively. We conclude that DNA sequencing of pooled samples of large numbers of bacterial strains is a robust, rapid and cost-efficient way to discover sequence variation. PMID:28361960

  4. Multi-Sample Pooling and Illumina Genome Analyzer Sequencing Methods to Determine Gene Sequence Variation for Database Development

    PubMed Central

    Margraf, Rebecca L.; Durtschi, Jacob D.; Dames, Shale; Pattison, David C.; Stephens, Jack E.; Mao, Rong; Voelkerding, Karl V.

    2010-01-01

    Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13–16. Ten samples with 13 known unique variants (“singleton variants” within the pool) and seven common changes were amplified and then equimolar-pooled before sequencing on a single flow cell lane, generating 36 base reads. For comparison, a single “control” sample was run in a different lane. After alignment, a 24-base quality score-screening threshold and 3` read end trimming of three bases yielded low background error rates with a 27% decrease in aligned read coverage. Sequencing data were evaluated using an established variant detection method (percent variant reads), by the presented subtractive correction method, and with SNPSeeker software. In total, 41 variants (of which 23 were singleton variants) were detected in the 10 pool data, which included all Sanger-identified variants. The 23 singleton variants were detected near the expected 5% allele frequency (average 5.17%±0.90% variant reads), well above the highest background error (1.25%). Based on background error rates, read coverage, simulated 30, 40, and 50 sample pool data, expected singleton allele frequencies within pools, and variant detection methods; ≥30 samples (which demonstrated a minimum 1% variant reads for singletons) could be pooled to reliably detect singleton variants by GA sequencing. PMID:20808642

  5. Mitochondrial DNA sequence variation is associated with free-living activity energy expenditure in the elderly.

    PubMed

    Tranah, Gregory J; Lam, Ernest T; Katzman, Shana M; Nalls, Michael A; Zhao, Yiqiang; Evans, Daniel S; Yokoyama, Jennifer S; Pawlikowska, Ludmila; Kwok, Pui-Yan; Mooney, Sean; Kritchevsky, Stephen; Goodpaster, Bret H; Newman, Anne B; Harris, Tamara B; Manini, Todd M; Cummings, Steven R

    2012-09-01

    The decline in activity energy expenditure underlies a range of age-associated pathological conditions, neuromuscular and neurological impairments, disability, and mortality. The majority (90%) of the energy needs of the human body are met by mitochondrial oxidative phosphorylation (OXPHOS). OXPHOS is dependent on the coordinated expression and interaction of genes encoded in the nuclear and mitochondrial genomes. We examined the role of mitochondrial genomic variation in free-living activity energy expenditure (AEE) and physical activity levels (PAL) by sequencing the entire (~16.5 kilobases) mtDNA from 138 Health, Aging, and Body Composition Study participants. Among the common mtDNA variants, the hypervariable region 2 m.185G>A variant was significantly associated with AEE (p=0.001) and PAL (p=0.0005) after adjustment for multiple comparisons. Several unique nonsynonymous variants were identified in the extremes of AEE with some occurring at highly conserved sites predicted to affect protein structure and function. Of interest is the p.T194M, CytB substitution in the lower extreme of AEE occurring at a residue in the Qi site of complex III. Among participants with low activity levels, the burden of singleton variants was 30% higher across the entire mtDNA and OXPHOS complex I when compared to those having moderate to high activity levels. A significant pooled variant association across the hypervariable 2 region was observed for AEE and PAL. These results suggest that mtDNA variation is associated with free-living AEE in older persons and may generate new hypotheses by which specific mtDNA complexes, genes, and variants may contribute to the maintenance of activity levels in late life.

  6. Phylogenetic and functional analysis of sequence variation of human papillomavirus type 31 E6 and E7 oncoproteins.

    PubMed

    Ferenczi, Annamária; Gyöngyösi, Eszter; Szalmás, Anita; László, Brigitta; Kónya, József; Veress, György

    2016-09-01

    High-risk human papillomaviruses (HPV) are the causative agents of cervical and other anogenital cancers as well as a subset of head and neck cancers. The E6 and E7 oncoproteins of HPV contribute to oncogenesis by associating with the tumour suppressor protein p53 and pRb, respectively. For HPV types 16 and 18, intratypic sequence variation was shown to have biological and clinical significance. The functional significance of sequence variation among HPV 31 variants was studied less intensively. HPV 31 variants belonging to different variant lineages were found to have differences in persistence and in the ability to cause high grade cervical intraepithelial neoplasia. In the present study, we started to explore the functional effects of natural sequence variation of HPV 31 E6 and E7 oncoproteins. The E6 variants were tested for their effects on p53 protein stability and transcriptional activity, while the E7 variants were tested for their effects on pRb protein level and also on the transcriptional activity of E2F transcription factors. HPV 31 E7 variants displayed uniform effects on pRb stability and also on the activity of E2F transcription factors. HPV 31 E6 variants had remarkable differences in the ability to inhibit the trans-activation function of p53 but not in the ability to induce the in vivo degradation of p53. Our results indicate that natural sequence variation of the HPV 31 E6 protein may be involved in the observed differences in the oncogenic potential between HPV 31 variants.

  7. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  8. AFLP and DNA sequence variation in an Andean domesticate, pepino (Solanum muricatum, Solanaceae): implications for evolution and domestication.

    PubMed

    Blanca, José M; Prohens, Jaime; Anderson, Gregory J; Zuriaga, Elena; Cañizares, Joaquín; Nuez, Fernando

    2007-07-01

    The pepino (Solanum muricatum) is a vegetatively propagated, domesticated native of the Andes, where it grows with wild relatives. We used AFLPs and a 1-kb sequence of the 3-methylcrotonyl-CoA carboxylase gene to study variation of 27 accessions of S. muricatum and 35 collections of 10 species of wild relatives (Solanum section Basarthrum). A total of 298 AFLP fragments and 29 DNA sequence haplotypes were detected. Cluster and principal coordinate analyses and other genetic parameters estimated from both types of markers, show that S. muricatum is closely related to the species from one of the series (Caripensia) of section Basarthrum and that >90% of the variation of the cultigen is also represented in that series. Pepino is highly diverse, either because it is not monophyletic or it has been subjected to regular introgression with wild species, or both. Although a continuous distribution of the genetic variation occurred within the cultivated species, three genetic clusters were recognized. Cluster 1 is mostly centered in Ecuador, cluster 2 in Ecuador and Peru, and cluster 3 in Colombia and Ecuador. Cluster 3 also includes all modern cultivars studied. These results and other evidence suggest that northern Ecuador/southern Colombia is the main center of pepino diversity and the center of origin. The high genetic variation of this cultigen indicates that domestication does not always produce a genetic bottleneck.

  9. DNA Sequence Analysis of SLC26A5, Encoding Prestin, in a Patient-Control Cohort: Identification of Fourteen Novel DNA Sequence Variations

    PubMed Central

    Minor, Jacob S.; Tang, Hsiao-Yuan; Pereira, Fred A.; Alford, Raye Lynn

    2009-01-01

    Background Prestin, encoded by the gene SLC26A5, is a transmembrane protein of the cochlear outer hair cell (OHC). Prestin is required for the somatic electromotile activity of OHCs, which is absent in OHCs and causes severe hearing impairment in mice lacking prestin. In humans, the role of sequence variations in SLC26A5 in hearing loss is less clear. Although prestin is expected to be required for functional human OHCs, the clinical significance of reported putative mutant alleles in humans is uncertain. Methodology/Principal Findings To explore the hypothesis that SLC26A5 may act as a modifier gene, affecting the severity of hearing loss caused by an independent etiology, a patient-control cohort was screened for DNA sequence variations in SLC26A5 using sequencing and allele specific methods. Patients in this study carried known pathogenic or controversial sequence variations in GJB2, encoding Connexin 26, or confirmed or suspected sequence variations in SLC26A5; controls included four ethnic populations. Twenty-three different DNA sequence variations in SLC26A5, 14 of which are novel, were observed: 4 novel sequence variations were found exclusively among patients; 7 novel sequence variations were found exclusively among controls; and, 12 sequence variations, 3 of which are novel, were found in both patients and controls. Twenty-one of the 23 DNA sequence variations were located in non-coding regions of SLC26A5. Two coding sequence variations, both novel, were observed only in patients and predict a silent change, p.S434S, and an amino acid substitution, p.I663V. In silico analysis of the p.I663V amino acid variation suggested this variant might be benign. Using Fisher's exact test, no statistically significant difference was observed between patients and controls in the frequency of the identified DNA sequence variations. Haplotype analysis using HaploView 4.0 software revealed the same predominant haplotype in patients and controls and derived haplotype blocks

  10. Optimal assembly for high throughput shotgun sequencing

    PubMed Central

    2013-01-01

    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization. PMID:23902516

  11. Rice pseudomolecule-anchored cross-species DNA sequence alignments indicate regional genomic variation in expressed sequence conservation

    PubMed Central

    Armstead, Ian; Huang, Lin; King, Julie; Ougham, Helen; Thomas, Howard; King, Ian

    2007-01-01

    Background Various methods have been developed to explore inter-genomic relationships among plant species. Here, we present a sequence similarity analysis based upon comparison of transcript-assembly and methylation-filtered databases from five plant species and physically anchored rice coding sequences. Results A comparison of the frequency of sequence alignments, determined by MegaBLAST, between rice coding sequences in TIGR pseudomolecules and annotations vs 4.0 and comprehensive transcript-assembly and methylation-filtered databases from Lolium perenne (ryegrass), Zea mays (maize), Hordeum vulgare (barley), Glycine max (soybean) and Arabidopsis thaliana (thale cress) was undertaken. Each rice pseudomolecule was divided into 10 segments, each containing 10% of the functionally annotated, expressed genes. This indicated a correlation between relative segment position in the rice genome and numbers of alignments with all the queried monocot and dicot plant databases. Colour-coded moving windows of 100 functionally annotated, expressed genes along each pseudomolecule were used to generate 'heat-maps'. These revealed consistent intra- and inter-pseudomolecule variation in the relative concentrations of significant alignments with the tested plant databases. Analysis of the annotations and derived putative expression patterns of rice genes from 'hot-spots' and 'cold-spots' within the heat maps indicated possible functional differences. A similar comparison relating to ancestral duplications of the rice genome indicated that duplications were often associated with 'hot-spots'. Conclusion Physical positions of expressed genes in the rice genome are correlated with the degree of conservation of similar sequences in the transcriptomes of other plant species. This relative conservation is associated with the distribution of different sized gene families and segmentally duplicated loci and may have functional and evolutionary implications. PMID:17708759

  12. Identification of staphylococcal species based on variations in protein sequences (mass spectrometry) and DNA sequence (sodA microarray).

    PubMed

    Kooken, Jennifer; Fox, Karen; Fox, Alvin; Altomare, Diego; Creek, Kim; Wunschel, David; Pajares-Merino, Sara; Martínez-Ballesteros, Ilargi; Garaizar, Javier; Oyarzabal, Omar; Samadpour, Mansour

    2014-02-01

    This report is among the first using sequence variation in newly discovered protein markers for staphylococcal (or indeed any other bacterial) speciation. Variation, at the DNA sequence level, in the sodA gene (commonly used for staphylococcal speciation) provided excellent correlation. Relatedness among strains was also assessed using protein profiling using microcapillary electrophoresis and pulsed field electrophoresis. A total of 64 strains were analyzed including reference strains representing the 11 staphylococcal species most commonly isolated from man (Staphylococcus aureus and 10 coagulase negative species [CoNS]). Matrix assisted time of flight ionization/ionization mass spectrometry (MALDI TOF MS) and liquid chromatography-electrospray ionization tandem mass spectrometry (LC ESI MS/MS) were used for peptide analysis of proteins isolated from gel bands. Comparison of experimental spectra of unknowns versus spectra of peptides derived from reference strains allowed bacterial identification after MALDI TOF MS analysis. After LC-MS/MS analysis of gel bands bacterial speciation was performed by comparing experimental spectra versus virtual spectra using the software X!Tandem. Finally LC-MS/MS was performed on whole proteomes and data analysis also employing X!tandem. Aconitate hydratase and oxoglutarate dehydrogenase served as marker proteins on focused analysis after gel separation. Alternatively on full proteomics analysis elongation factor Tu generally provided the highest confidence in staphylococcal speciation.

  13. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  14. Resolving postglacial phylogeography using high-throughput sequencing

    PubMed Central

    Emerson, Kevin J.; Merz, Clayton R.; Catchen, Julian M.; Hohenlohe, Paul A.; Cresko, William A.; Bradshaw, William E.; Holzapfel, Christina M.

    2010-01-01

    The distinction between model and nonmodel organisms is becoming increasingly blurred. High-throughput, second-generation sequencing approaches are being applied to organisms based on their interesting ecological, physiological, developmental, or evolutionary properties and not on the depth of genetic information available for them. Here, we illustrate this point using a low-cost, efficient technique to determine the fine-scale phylogenetic relationships among recently diverged populations in a species. This application of restriction site-associated DNA tags (RAD tags) reveals previously unresolved genetic structure and direction of evolution in the pitcher plant mosquito, Wyeomyia smithii, from a southern Appalachian Mountain refugium following recession of the Laurentide Ice Sheet at 22,000–19,000 B.P. The RAD tag method can be used to identify detailed patterns of phylogeography in any organism regardless of existing genomic data, and, more broadly, to identify incipient speciation and genome-wide variation in natural populations in general. PMID:20798348

  15. Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    NASA Astrophysics Data System (ADS)

    Hofmann, Hansjörg; Sakti, Sakriani; Hori, Chiori; Kashioka, Hideki; Nakamura, Satoshi; Minker, Wolfgang

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  16. Sequence variation in the Mc1r gene for a group of polymorphic snakes.

    PubMed

    Cox, Christian L; Rabosky, Alison R Davis; Chippindale, Paul T

    2013-01-25

    Studying the genetic factors underlying phenotypic traits can provide insight into dynamics of selection and molecular basis of adaptation, but this goal can be difficult for non-model organisms without extensive genomic resources. However, sequencing candidate genes for the trait of interest can facilitate the study of evolutionary genetics in natural populations. We sequenced the melanocortin-1 receptor (Mc1r) to study the genetic basis of color polymorphism in a group of snake species with variable black banding, the genera Sonora, Chilomeniscus, and Chionactis. Mc1r is an important gene in the melanin synthesis pathway and is associated with ecologically important variation in color pattern in birds, mammals, and other squamate reptiles. We found that Mc1r nucleotide sequence was variable and that within our focal Sonora species, there are both fixed and heterozygous nucleotide substitutions that result in an amino acid change and selection analyses indicated that Mc1r sequence was likely under purifying selection. However, we did not detect any statistical association with the presence or absence of black bands. Our results agree with other studies that have found no role for sequence variation in Mc1r and highlight the importance of comparative data for studying the phenotypic associations of candidate genes.

  17. FEATnotator: A tool for integrated annotation of sequence features and variation, facilitating interpretation in genomics experiments.

    PubMed

    Podicheti, Ram; Mockaitis, Keithanne

    2015-06-01

    As approaches are sought for more efficient and democratized uses of non-model and expanded model genomics references, ease of integration of genomic feature datasets is especially desirable in multidisciplinary research communities. Valuable conclusions are often missed or slowed when researchers refer experimental results to a single reference sequence that lacks integrated pan-genomic and multi-experiment data in accessible formats. Association of genomic positional information, such as results from an expansive variety of next-generation sequencing experiments, with annotated reference features such as genes or predicted protein binding sites, provides the context essential for conclusions and ongoing research. When the experimental system includes polymorphic genomic inputs, rapid calculation of gene structural and protein translational effects of sequence variation from the reference can be invaluable. Here we present FEATnotator, a lightweight, fast and easy to use open source software program that integrates and reports overlap and proximity in genomic information from any user-defined datasets including those from next generation sequencing applications. We illustrate use of the tool by summarizing whole genome sequence variation of a widely used natural isolate of Arabidopsis thaliana in the context of gene models of the reference accession. Previous discovery of a protein coding deletion influencing root development is replicated rapidly. Appropriate even in investigations of a single gene or genic regions such as QTL, comprehensive reports provided by FEATnotator better prepare researchers for interpretation of their experimental results. The tool is available for download at http://featnotator.sourceforge.net.

  18. Notes on individual sequence variation in humans: Immunoglobulin kappa light chain

    SciTech Connect

    Kurth, J.H. ); Cavalli-Sforza, L.L. )

    1994-06-01

    Little is known concerning the magnitude of variability in the nucleic acid sequence of DNA at the individual level. The authors have collected a large set of sequence data from the human immunoglobulin kappa light-chain-locus constant region (10,444 bp) and subgroup IV variable region (18,580 bp). For the constant region, absolute conservation of sequence was observed, even in intron and coding-region silent sites, with the exception of one previously defined polymorphic site. For the variable region, 12 heterozygous positions were identified, giving a heterozygosity of 6 x 10[sup [minus]4] per nucleotide site. The amount of nucleic acid sequence variation differs significantly ([chi][sup 2] = 4.88) between these two regions, and the observed variation is two orders of magnitude lower than that reported for two Drosophila melanogaster loci. These data suggest that, for at least some regions of the human genome, nucleic acid sequence may be less variable than previously estimated. 13 refs., 2 figs.

  19. Enterovirus D68 in Hospitalized Children: Sequence Variation, Viral Loads and Clinical Outcomes

    PubMed Central

    Salamon, Douglas; Leber, Amy; Mejias, Asuncion

    2016-01-01

    Background An outbreak of enterovirus D68 (EV-D68) caused severe respiratory illness in 2014. The disease spectrum of EV-D68 infections in children with underlying medical conditions other than asthma, the role of EV-D68 loads on clinical illness, and the variation of EV-D68 strains within the same institution over time have not been described. We sought to define the association between EV-D68 loads and sequence variation, and the clinical characteristic in hospitalized children at our institution from 2011 to 2014. Methods May through November 2014, and August to September 2011 to 2013, a convenience sample of nasopharyngeal specimens from children with rhinovirus (RV)/EV respiratory infections were tested for EV-D68 by RT-PCR. Clinical data were compared between children with RV/EV-non-EV-D68 and EV-D68 infections, and among children with EV-D68 infections categorized as healthy, asthmatics, and chronic medical conditions. EV-D68 loads were analyzed in relation to disease severity parameters and sequence variability characterized over time. Results In 2014, 44% (192/438) of samples tested positive for EV-D68 vs. 10% (13/130) in 2011–13 (p<0.0001). PICU admissions (p<0.0001) and non-invasive ventilation (p<0.0001) were more common in children with EV-D68 vs. RV/EV-non-EV-D68 infections. Asthmatic EV-D68+ children, required supplemental oxygen administration (p = 0.03) and PICU admissions (p <0.001) more frequently than healthy children or those with chronic medical conditions; however oxygen duration (p<0.0001), and both PICU and total hospital stay (p<0.01) were greater in children with underlying medical conditions, irrespective of viral burden. By phylogenetic analysis, the 2014 EV-D68 strains clustered into a new sublineage within clade B. Conclusions This is one of the largest pediatric cohorts described from the EV-D68 outbreak. Irrespective of viral loads, EV-D68 was associated with high morbidity in children with asthma and co-morbidities. While EV-D68

  20. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...

  1. HIV-1 Tat and Viral Latency: What We Can Learn from Naturally Occurring Sequence Variations.

    PubMed

    Kamori, Doreen; Ueno, Takamasa

    2017-01-01

    Despite the effective use of antiretroviral therapy, the remainder of a latently HIV-1-infected reservoir mainly in the resting memory CD4(+) T lymphocyte subset has provided a great setback toward viral eradication. While host transcriptional silencing machinery is thought to play a dominant role in HIV-1 latency, HIV-1 protein such as Tat, may affect both the establishment and the reversal of latency. Indeed, mutational studies have demonstrated that insufficient Tat transactivation activity can result in impaired transcription of viral genes and the establishment of latency in cell culture experiments. Because Tat protein is one of highly variable proteins within HIV-1 proteome, it is conceivable that naturally occurring Tat mutations may differentially modulate Tat functions, thereby influencing the establishment and/or the reversal of viral latency in vivo. In this mini review, we summarize the recent findings of Tat naturally occurring polymorphisms associating with host immune responses and we highlight the implication of Tat sequence variations in relation to HIV latency.

  2. HIV-1 Tat and Viral Latency: What We Can Learn from Naturally Occurring Sequence Variations

    PubMed Central

    Kamori, Doreen; Ueno, Takamasa

    2017-01-01

    Despite the effective use of antiretroviral therapy, the remainder of a latently HIV-1-infected reservoir mainly in the resting memory CD4+ T lymphocyte subset has provided a great setback toward viral eradication. While host transcriptional silencing machinery is thought to play a dominant role in HIV-1 latency, HIV-1 protein such as Tat, may affect both the establishment and the reversal of latency. Indeed, mutational studies have demonstrated that insufficient Tat transactivation activity can result in impaired transcription of viral genes and the establishment of latency in cell culture experiments. Because Tat protein is one of highly variable proteins within HIV-1 proteome, it is conceivable that naturally occurring Tat mutations may differentially modulate Tat functions, thereby influencing the establishment and/or the reversal of viral latency in vivo. In this mini review, we summarize the recent findings of Tat naturally occurring polymorphisms associating with host immune responses and we highlight the implication of Tat sequence variations in relation to HIV latency. PMID:28194140

  3. Unique Features of Germline Variation in Five Egyptian Familial Breast Cancer Families Revealed by Exome Sequencing

    PubMed Central

    Kim, Yeong C.; Soliman, Amr S.; Cui, Jian; Ramadan, Mohamed; Hablas, Ahmed; Abouelhoda, Mohamed; Hussien, Nehal; Ahmed, Ola; Zekri, Abdel-Rahman Nabawy; Seifeldin, Ibrahim A.

    2017-01-01

    Genetic predisposition increases the risk of familial breast cancer. Recent studies indicate that genetic predisposition for familial breast cancer can be ethnic-specific. However, current knowledge of genetic predisposition for the disease is predominantly derived from Western populations. Using this existing information as the sole reference to judge the predisposition in non-Western populations is not adequate and can potentially lead to misdiagnosis. Efforts are required to collect genetic predisposition from non-Western populations. The Egyptian population has high genetic variations in reflecting its divergent ethnic origins, and incident rate of familial breast cancer in Egypt is also higher than the rate in many other populations. Using whole exome sequencing, we investigated genetic predisposition in five Egyptian familial breast cancer families. No pathogenic variants in BRCA1, BRCA2 and other classical breast cancer-predisposition genes were present in these five families. Comparison of the genetic variants with those in Caucasian familial breast cancer showed that variants in the Egyptian families were more variable and heterogeneous than the variants in Caucasian families. Multiple damaging variants in genes of different functional categories were identified either in a single family or shared between families. Our study demonstrates that genetic predisposition in Egyptian breast cancer families may differ from those in other disease populations, and supports a comprehensive screening of local disease families to determine the genetic predisposition in Egyptian familial breast cancer. PMID:28076423

  4. Characterization of a highly repeated DNA sequence family in five species of the genus Eulemur.

    PubMed

    Ventura, M; Boniotto, M; Cardone, M F; Fulizio, L; Archidiacono, N; Rocchi, M; Crovella, S

    2001-09-19

    The karyotypes of Eulemur species exhibit a high degree of variation, as a consequence of the Robertsonian fusion and/or centromere fission. Centromeric and pericentromeric heterochromatin of eulemurs is constituted by highly repeated DNA sequences (including some telomeric TTAGGG repeats) which have so far been investigated and used for the study of the systematic relationships of the different species of the genus Eulemur. In our study, we have cloned a set of repetitive pericentromeric sequences of five Eulemur species: E. fulvus fulvus (EFU), E. mongoz (EMO), E. macaco (EMA), E. rubriventer (ERU), and E. coronatus (ECO). We have characterized these clones by sequence comparison and by comparative fluorescence in situ hybridization analysis in EMA and EFU. Our results showed a high degree of sequence similarity among Eulemur species, indicating a strong conservation, within the five species, of these pericentromeric highly repeated DNA sequences.

  5. Genetic variation in and spatial structure of natural populations of Dipterocarpus alatus (Dipterocarpaceae) determined using single sequence repeat markers.

    PubMed

    Tam, N M; Duy, V D; Duc, N M; Giap, V D; Xuan, B T T

    2014-07-24

    Dipterocarpus alatus (Dipterocarpaceae) is widely distributed in lowland forests in central and southern Vietnam, Cambodia, Laos, Myanmar, Philippines, Thailand, and India. Due to over-exploitation and habitat destruction, the species is now threatened. The genetic variation within and among populations of D. alatus was investigated on the basis of 9 microsatellite (single sequence repeat, SSR) loci. In all, 268 sampled trees from 10 populations in central and southern Vietnam were analyzed in this study. The SSR data showed a high genetic variability within populations with an average of HO = 0.209 and HE = 0.239. Genetic differentiation among populations was high (FST = 0.266), indicating limited gene flow (Nm = 0.69). Analysis of molecular variance showed that most genetic variation was within populations (74.96%). This study highlights the importance of conserving the genetic resources of D. alatus species.

  6. mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

    PubMed

    Vellarikkal, Shamsudheen Karuthedath; Dhiman, Heena; Joshi, Kandarp; Hasija, Yasha; Sivasubbu, Sridhar; Scaria, Vinod

    2015-04-01

    The human mitochondrial genome has been reported to have a very high mutation rate as compared with the nuclear genome. A large number of mitochondrial mutations show significant phenotypic association and are involved in a broad spectrum of diseases. In recent years, there has been a remarkable progress in the understanding of mitochondrial genetics. The availability of next-generation sequencing (NGS) technologies have not only reduced sequencing cost by orders of magnitude but has also provided us good quality mitochondrial genome sequences with high coverage, thereby enabling decoding of a number of human mitochondrial diseases. In this study, we report a computational and experimental pipeline to decipher the human mitochondrial DNA variations and examine them for their clinical correlation. As a proof of principle, we also present a clinical study of a patient with Leigh disease and confirmed maternal inheritance of the causative allele. The pipeline is made available as a user-friendly online tool to annotate variants and find haplogroup, disease association, and heteroplasmic sites. The "mit-o-matic" computational pipeline represents a comprehensive cloud-based tool for clinical evaluation of mitochondrial genomic variations from NGS datasets. The tool is freely available at http://genome.igib.res.in/mitomatic/.

  7. Magnetic susceptibility variations in Loess sequences and their relationship to astronomical forcing

    NASA Technical Reports Server (NTRS)

    Verosub, Kenneth L.; Singer, Michael J.

    1992-01-01

    The long, well-exposed and often continuous sequences of loess found throughout the world are generally thought to provide an excellent opportunity for studying long-term, large-scale environmental change during the last few million years. In recent years, the most fruitful loess studies have been those involving the deposits of the loess in China. One of the most intriguing results of that work has been the discovery of an apparent correlation between variations in the magnetic susceptibility of the loess sequence and the oxygen isotope record of the deep sea. This correlation implies that magnetic susceptibility variations are being driven by astronomical parameters. However, the basic data have been interpreted in various ways by different authors, most of whom assumed that the magnetic minerals in the loess have not been affected by post-depositional processes. Using a chemical extraction procedure that allows us to separate the contribution of secondary pedogenic magnetic minerals from primary inherited magnetic minerals, we have found that the magnetic susceptibility of the Chinese paleosols is largely due to a pedogenic component which is present to a lesser degree in the loess. We have also found that the smaller inherited component of the magnetic susceptibility is about the same in the paleosols and the loess. These results demonstrate the need for additional study of the processes that create magnetic susceptibility variations in order to interpret properly the role of astronomical forcing in producing these variations.

  8. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.

    PubMed Central

    Clark, A G; Weiss, K M; Nickerson, D A; Taylor, S L; Buchanan, A; Stengård, J; Salomaa, V; Vartiainen, E; Perola, M; Boerwinkle, E; Sing, C F

    1998-01-01

    Allelic variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL) was scored in 71 healthy individuals (142 chromosomes) from three populations: African Americans (24) from Jackson, MS; Finns (24) from North Karelia, Finland; and non-Hispanic Whites (23) from Rochester, MN. The sequences had a total of 88 variable sites, with a nucleotide diversity (site-specific heterozygosity) of .002+/-.001 across this 9.7-kb region. The frequency spectrum of nucleotide variation exhibited a slight excess of heterozygosity, but, in general, the data fit expectations of the infinite-sites model of mutation and genetic drift. Allele-specific PCR helped resolve linkage phases, and a total of 88 distinct haplotypes were identified. For 1,410 (64%) of the 2,211 site pairs, all four possible gametes were present in these haplotypes, reflecting a rich history of past recombination. Despite the strong evidence for recombination, extensive linkage disequilibrium was observed. The number of haplotypes generally is much greater than the number expected under the infinite-sites model, but there was sufficient multisite linkage disequilibrium to reveal two major clades, which appear to be very old. Variation in this region of LPL may depart from the variation expected under a simple, neutral model, owing to complex historical patterns of population founding, drift, selection, and recombination. These data suggest that the design and interpretation of disease-association studies may not be as straightforward as often is assumed. PMID:9683608

  9. Impact of next generation sequencing: the 2009 Human Genome Variation Society Scientific Meeting.

    PubMed

    Oetting, William S

    2010-04-01

    The annual scientific meeting of the Human Genome Variation Society (HGVS) was held on the 20th of October, 2009, in Honolulu, Hawaii. The theme of this meeting was the "Impact of Next Generation Sequencing." Presenters spoke on issues ranging from advances in the technology of large-scale genome sequencing to how this information can be analyzed to uncover genetic variants associated with disease. Many of the challenges resulting from the implementation of these new technologies were presented, but possible solutions, or at least paths to the solutions, were also given. With the combined efforts of investigators using next-generation sequencing to help understand the impact of genetic variants on disease, the use of the personal genome in medicine will soon become a reality.

  10. Analysis of amino acid sequence variations and immunoglobulin E-binding epitopes of German cockroach tropomyosin.

    PubMed

    Jeong, Kyoung Yong; Lee, Jongweon; Lee, In-Yong; Ree, Han-Il; Hong, Chein-Soo; Yong, Tai-Soon

    2004-09-01

    The allergenicities of tropomyosins from different organisms have been reported to vary. The cDNA encoding German cockroach tropomyosin (Bla g 7) was isolated, expressed, and characterized previously. In the present study, the amino acid sequence variations in German cockroach tropomyosin were analyzed in order to investigate its influence on allergenicity. We also undertook the identification of immunodominant peptides containing immunoglobulin E (IgE) epitopes which may facilitate the development of diagnostic and immunotherapeutic strategies based on the recombinant proteins. Two-dimensional gel electrophoresis and immunoblot analysis with mouse anti-recombinant German cockroach tropomyosin serum was performed to investigate the isoforms at the protein level. Reverse transcriptase PCR (RT-PCR) was applied to examine the sequence diversity. Eleven different variants of the deduced amino acid sequences were identified by RT-PCR. German cockroach tropomyosin has only minor sequence variations that did not seem to affect its allergenicity significantly. These results support the molecular basis underlying the cross-reactivities of arthropod tropomyosins. Recombinant fragments were also generated by PCR, and IgE-binding epitopes were assessed by enzyme-linked immunosorbent assay. Sera from seven patients revealed heterogeneous IgE-binding responses. This study demonstrates multiple IgE-binding epitope regions in a single molecule, suggesting that full-length tropomyosin should be used for the development of diagnostic and therapeutic reagents.

  11. Population subdivision in Europe's great bustard inferred from mitochondrial and nuclear DNA sequence variation.

    PubMed

    Pitra, C; Lieckfeldt, D; Alonso, J C

    2000-08-01

    A continent-wide survey of sequence variation in mitochondrial (mt) and nuclear (n) DNA of the endangered great bustard (Otis tarda) was conducted to assess the extent of phylogeographic structure in a morphologically monotypic bird. DNA sequence variation in a combined 809 bp segment of the mtDNA genome from 66 individuals from the last six breeding regions showed relatively low levels of intraspecific sequence diversity (n = 0.32%) but significant differences in the regional distribution of 11 haplotypes (phiST = 0.49). Despite their exceptional potential for dispersal, a complete and long-term historical separation between the populations from the Iberian Peninsula (Spain) and mainland Europe (Hungary, Slovakia, Germany, and Russia) was demonstrated. Divergence between populations based on a 3-bp insertion-deletion polymorphism within the intron region of the nuclear CHD-Z gene was geographically concordant with the primary subdivision identified within the mtDNA sequences. Inferred aspects of phylogeography were used to formulate conservation recommendations for this endangered species.

  12. Simple sequence repeat variations expedite phage divergence: Mechanisms of indels and gene mutations.

    PubMed

    Lin, Tiao-Yin

    2016-07-01

    Phages are the most abundant biological entities and influence prokaryotic communities on Earth. Comparing closely related genomes sheds light on molecular events shaping phage evolution. Simple sequence repeat (SSR) variations impart over half of the genomic changes between T7M and T3, indicating an important role of SSRs in accelerating phage genetic divergence. Differences in coding and noncoding regions of phages infecting different hosts, coliphages T7M and T3, Yersinia phage ϕYeO3-12, and Salmonella phage ϕSG-JL2, frequently arise from SSR variations. Such variations modify noncoding and coding regions; the latter efficiently changes multiple amino acids, thereby hastening protein evolution. Four classes of events are found to drive SSR variations: insertion/deletion of SSR units, expansion/contraction of SSRs without alteration of genome length, changes of repeat motifs, and generation/loss of repeats. The categorization demonstrates the ways SSRs mutate in genomes during phage evolution. Indels are common constituents of genome variations and human diseases, yet, how they occur without preexisting repeat sequence is less understood. Non-repeat-unit-based misalignment-elongation (NRUBME) is proposed to be one mechanism for indels without adjacent repeats. NRUBME or consecutive NRUBME may also change repeat motifs or generate new repeats. NRUBME invoking a non-Watson-Crick base pair explains insertions that initiate mononucleotide repeats. Furthermore, NRUBME successfully interprets many inexplicable human di- to tetranucleotide repeat generations. This study provides the first evidence of SSR variations expediting phage divergence, and enables insights into the events and mechanisms of genome evolution. NRUBME allows us to emulate natural evolution to design indels for various applications.

  13. Generating barcoded libraries for multiplex high-throughput sequencing.

    PubMed

    Knapp, Michael; Stiller, Mathias; Meyer, Matthias

    2012-01-01

    Molecular barcoding is an essential tool to use the high throughput of next generation sequencing platforms optimally in studies involving more than one sample. Various barcoding strategies allow for the incorporation of short recognition sequences (barcodes) into sequencing libraries, either by ligation or polymerase chain reaction (PCR). Here, we present two approaches optimized for generating barcoded sequencing libraries from low copy number extracts and amplification products typical of ancient DNA studies.

  14. Distinct intraspecific variations of garlic (Allium sativum L.) revealed by the exon-intron sequences of the alliinase gene.

    PubMed

    Endo, Aki; Imai, Yukiko; Nakamura, Mizuho; Yanagisawa, Eri; Taguchi, Takaaki; Torii, Kosuke; Okumura, Hidenobu; Ichinose, Koji

    2014-04-01

    Garlic (Allium sativum L.) has been used worldwide as a food and for medicinal purposes since early times. Garlic cultivars exhibit considerable morphological diversity despite the fact that they are mostly sterile and are grown only by vegetative propagation of cloves. Considerable recombination occurs in garlic genomes, including the genes involved in secondary metabolites. We examined the genomic DNAs (gDNAs) from garlic, encoding alliinase, a key enzyme involved in organosulfur metabolism in Allium plants. The 1.7-kb gDNA fragments, covering three exons (2, 3, and 4) and all four introns, were amplified from total DNAs prepared from garlic samples produced in Asia and Europe, leading to 73 sequences in total: Japan (JPN), China (CHN), India (IND), Spain (ESP), and France (FRA). The exon sequences were highly conserved among all the sequences, probably reflecting the fully functional alliinase associated with the flavor quality. Distinct intraspecific variations were detected for all four intron sequences, leading to the haplotype classifications. A close relationship between JPN and CHN was observed for all four introns, whereas IND showed a more divergent distribution. ESP and FRA afforded clearly different variants compared with those from Asian sequences. The present study provides information that could be useful in the development of an additional molecular marker for garlic authentication and quality control.

  15. Population clustering based on copy number variations detected from next generation sequencing data

    PubMed Central

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2015-01-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering. PMID:25152046

  16. Phylogeography of the endangered Cathaya argyrophylla (Pinaceae) inferred from sequence variation of mitochondrial and nuclear DNA.

    PubMed

    Wang, Hong-Wei; Ge, Song

    2006-11-01

    Cathaya argyrophylla is an endangered conifer restricted to subtropical mountains of China. To study phylogeographical pattern and demographic history of C. argyrophylla, species-wide genetic variation was investigated using sequences of maternally inherited mtDNA and biparentally inherited nuclear DNA. Of 15 populations sampled from all four distinct regions, only three mitotypes were detected at two loci, without single region having a mixed composition (G(ST) = 1). Average nucleotide diversity (theta(ws) = 0.0024; pi(s) = 0.0029) across eight nuclear loci is significantly lower than those found for other conifers (theta(ws) = 0.003 approximately 0.015; pi(s) = 0.002 approximately 0.012) based on estimates of multiple loci. Because of its highest diversity among the eight nuclear loci and evolving neutrally, one locus (2009) was further used for phylogeographical studies and eight haplotypes resulting from 12 polymorphic sites were obtained from 98 individuals. All the four distinct regions had at least four haplotypes, with the Dalou region (DL) having the highest diversity and the Bamian region (BM) the lowest, paralleling the result of the eight nuclear loci. An AMOVA revealed significant proportion of diversity attributable to differences among regions (13.4%) and among populations within regions (8.9%). F(ST) analysis also indicated significantly high differentiation among populations (F(ST) = 0.22) and between regions (F(ST) = 0.12-0.38). Non-overlapping distribution of mitotypes and high genetic differentiation among the distinct geographical groups suggest the existence of at least four separate glacial refugia. Based on network and mismatch distribution analyses, we do not find evidence of long distance dispersal and population expansion in C. argyrophylla. Ex situ conservation and artificial crossing are recommended for the management of this endangered species.

  17. Cis-regulatory sequence variation and association with Mycoplasma load in natural populations of the house finch (Carpodacus mexicanus)

    PubMed Central

    Backström, Niclas; Shipilina, Daria; Blom, Mozes P K; Edwards, Scott V

    2013-01-01

    Characterization of the genetic basis of fitness traits in natural populations is important for understanding how organisms adapt to the changing environment and to novel events, such as epizootics. However, candidate fitness-influencing loci, such as regulatory regions, are usually unavailable in nonmodel species. Here, we analyze sequence data from targeted resequencing of the cis-regulatory regions of three candidate genes for disease resistance (CD74, HSP90α, and LCP1) in populations of the house finch (Carpodacus mexicanus) historically exposed (Alabama) and naïve (Arizona) to Mycoplasma gallisepticum. Our study, the first to quantify variation in regulatory regions in wild birds, reveals that the upstream regions of CD74 and HSP90α are GC-rich, with the former exhibiting unusually low sequence variation for this species. We identified two SNPs, located in a GC-rich region immediately upstream of an inferred promoter site in the gene HSP90α, that were significantly associated with Mycoplasma pathogen load in the two populations. The SNPs are closely linked and situated in potential regulatory sequences: one in a binding site for the transcription factor nuclear NFYα and the other in a dinucleotide microsatellite ((GC)6). The genotype associated with pathogen load in the putative NFYα binding site was significantly overrepresented in the Alabama birds. However, we did not see strong effects of selection at this SNP, perhaps because selection has acted on standing genetic variation over an extremely short time in a highly recombining region. Our study is a useful starting point to explore functional relationships between sequence polymorphisms, gene expression, and phenotypic traits, such as pathogen resistance that affect fitness in the wild. PMID:23532859

  18. High-throughput sequencing and vaccine design.

    PubMed

    Luciani, F

    2016-04-01

    Next-generation sequencing (NGS) technologies have reshaped genome research. The resulting increase in sequencing depth and resolution has led to an unprecedented level of genomic detail and thus an increasing awareness of the complexity of animal, human and pathogen genomes. This has resulted in new approaches to vaccine research. On the one hand, the increase in genome complexity challenges our ability to study and understand pathogen biology and pathogen-host interactions. On the other hand, the increase in genomic data also provides key information for developing and designing improved vaccines against pathogens that were previously extremely difficult to deal with, such as rapidly mutating RNA viruses or bacteria that have complex interactions with the host immune system. This review describes how the broad application of NGS technologies to genome research is affecting vaccine research. It focuses on implications for the field of viral genomics, and includes recent animal and human studies.

  19. PSCC: Sensitive and Reliable Population-Scale Copy Number Variation Detection Method Based on Low Coverage Sequencing

    PubMed Central

    Vogel, Ida; Choy, Kwong Wai; Chen, Fang; Christensen, Rikke; Zhang, Chunlei; Ge, Huijuan; Jiang, Haojun; Yu, Chang; Huang, Fang; Wang, Wei; Jiang, Hui; Zhang, Xiuqing

    2014-01-01

    Background Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method. Methodology/Principal Findings In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS. Conclusions/Significance Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing. PMID:24465483

  20. Sequencing strategy for the whole mitochondrial genome resulting in high quality sequences

    PubMed Central

    Fendt, Liane; Zimmermann, Bettina; Daniaux, Martin; Parson, Walther

    2009-01-01

    Background It has been demonstrated that a reliable and fail-safe sequencing strategy is mandatory for high-quality analysis of mitochondrial (mt) DNA, as the sequencing and base-calling process is prone to error. Here, we present a high quality, reliable and easy handling manual procedure for the sequencing of full mt genomes that is also appropriate for laboratories where fully automated processes are not available. Results We amplified whole mitochondrial genomes as two overlapping PCR-fragments comprising each about 8500 bases in length. We developed a set of 96 primers that can be applied to a (manual) 96 well-based technology, which resulted in at least double strand sequence coverage of the entire coding region (codR). Conclusion This elaborated sequencing strategy is straightforward and allows for an unambiguous sequence analysis and interpretation including sometimes challenging phenomena such as point and length heteroplasmy that are relevant for the investigation of forensic and clinical samples. PMID:19331681

  1. A survey of chromosomal and nucleotide sequence variation in Drosophila miranda.

    PubMed Central

    Yi, Soojin; Bachtrog, Doris; Charlesworth, Brian

    2003-01-01

    There have recently been several studies of the evolution of Y chromosome degeneration and dosage compensation using the neo-sex chromosomes of Drosophila miranda as a model system. To understand these evolutionary processes more fully, it is necessary to document the general pattern of genetic variation in this species. Here we report a survey of chromosomal variation, as well as polymorphism and divergence data, for 12 nuclear genes of D. miranda. These genes exhibit varying levels of DNA sequence polymorphism. Compared to its well-studied sibling species D. pseudoobscura, D. miranda has much less nucleotide sequence variation, and the effective population size of this species is inferred to be several-fold lower. Nevertheless, it harbors a few inversion polymorphisms, one of which involves the neo-X chromosome. There is no convincing evidence for a recent population expansion in D. miranda, in contrast to D. pseudoobscura. The pattern of population subdivision previously observed for the X-linked gene period is not seen for the other loci, suggesting that there is no general population subdivision in D. miranda. However, data on an additional region of period confirm population subdivision for this gene, suggesting that local selection is operating at or near period to promote differentiation between populations. PMID:12930746

  2. Large scale DNA sequencing: new challenges emerge--the 2007 Human Genome Variation Society scientific meeting.

    PubMed

    Oetting, William S

    2008-05-01

    The annual scientific meeting of the Human Genome Variation Society (HGVS) was held on 23 October 2007, in San Diego, CA. The major theme of this meeting was "New DNA Sequencing Technologies & Human Genome Variation." A series of speakers provided information on several new technologies that produce DNA sequence data on a scale far beyond what was possible even a few years ago. These new technologies produce up to gigabases of nucleotides on a single run. Already, two individuals have had their entire genome sequenced, resulting in the identification of many novel DNA variants. Several new questions now need to be answered. What impact do these novel variants have on the phenotypes? How are we to associate private variants in a single individual with disease, especially when current association studies require genotyping thousands of individuals? Further work will be required to create methodologies to analyze these variants to determine if they are potentially disease-producing or are phenotypically silent. For the technology to be useful in a medical setting it will be crucial to answer to these questions.

  3. Complete plastid genome sequence of Primula sinensis (Primulaceae): structure comparison, sequence variation and evidence for accD transfer to nucleus

    PubMed Central

    Liu, Tong-Jian; Zhang, Cai-Yun; Yan, Hai-Fei; Zhang, Lu

    2016-01-01

    Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp) were separated by a large single-copy region (82,064 bp) and a small single-copy region (17,725 bp). The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF) were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36–rps8, rps16–trnQ, trnH–psbA and ndhC–trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis. PMID:27375965

  4. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    PubMed

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-04

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs.

  5. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma

    PubMed Central

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-01-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  6. Individual and population variation in invertebrates revealed by Inter-simple Sequence Repeats (ISSRs)

    PubMed Central

    Abbot, Patrick

    2001-01-01

    PCR-based molecular markers are well suited for questions requiring large scale surveys of plant and animal populations. Inter-simple Sequence Repeats or ISSRs are analyzed by a recently developed technique based on the amplification of the regions between inverse-oriented microsatellite loci with oligonucleotides anchored in microsatellites themselves. ISSRs have shown much promise for the study of the population biology of plants, but have not yet been explored for similar studies of animals. The value of ISSRs is demonstrated for the study of animal species with low levels of within-population variation. Sets of primers are identified which reveal variation in two aphid species, Acyrthosiphon pisum and Pemphigus obesinymphae, in the yellow fever mosquito Aedes aegypti, and in a rotifer in the genus Philodina. PMID:15455068

  7. MRI assessment of internal acoustic canal variations using 3D-FIESTA sequences.

    PubMed

    Erdogan, Nezahat; Altay, Canan; Akay, Emrah; Karakas, Levent; Uluc, Engin; Mete, Berna; Oygen, Aysegul; Oyar, Orhan; Gelal, Fazıl; Songu, Murat; Katilmis, Huseyin; Calli, Cağlar

    2013-02-01

    Magnetic resonance imaging (MRI) of the internal acoustic canal is the standard diagnostic tool for a wide range of indications in patients. This study aims to investigate the vascular variations and compression of the cranial nerves (CNs) VII and VIII at the cerebellopontine angle in patients with neuro-otologic symptoms using 3D-fast imaging employing steady-state acquisition (FIESTA) MR imaging. One hundred and eighty-seven patients (374 temporal bones) were examined on a 1.5-T MRI. In addition to conventional MR sequences, a 3D-FIESTA MR imaging was acquired. Magnetic resonance images thus obtained were evaluated with special regard to the presence of vascular contact to the CNs VII and VIII, as well as the presence of the vascular variations of the anterior inferior cerebellar artery (AICA) causing the compression of CNs. The Chi-squared test was used for statistical analysis. No statistically significant differences were found between the presence and absence of the AICA loop and/or vascular contact for the clinical symptoms of patients (P > 0.05). The cisternal and canalicular segments of CNs VII and VIII and adjacent vascular variations are well identified using 3D-FIESTA, especially by determining the relationship of the AICA variations between CNs.

  8. Cloning and characterization of a highly repetitive fish nucleotide sequence.

    PubMed

    Datta, U; Dutta, P; Mandal, R K

    1988-01-01

    We have cloned and sequenced a highly repetitive HindIII fragment of DNA from the common carp Cyprinus carpio. It represents a tandemly repeated sequence with a monomeric unit of 245 bp and comprises 8% of the fish genome. Higher units of this monomer appear as a ladder in Southern blots. The monomeric unit has been sequenced; it is A + T-rich with some direct and some inverse-repeat nucleotide clusters.

  9. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

    PubMed Central

    Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

    2008-01-01

    Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465

  10. Highly conserved repetitive DNA sequences are present at human centromeres.

    PubMed Central

    Grady, D L; Ratliff, R L; Robinson, D L; McCanlies, E C; Meyne, J; Moyzis, R K

    1992-01-01

    Highly conserved repetitive DNA sequence clones, largely consisting of (GGAAT)n repeats, have been isolated from a human recombinant repetitive DNA library by high-stringency hybridization with rodent repetitive DNA. This sequence, the predominant repetitive sequence in human satellites II and III, is similar to the essential core DNA of the Saccharomyces cerevisiae centromere, centromere DNA element (CDE) III. In situ hybridization to human telophase and Drosophila polytene chromosomes shows localization of the (GGAAT)n sequence to centromeric regions. Hyperchromicity studies indicate that the (GGAAT)n sequence exhibits unusual hydrogen bonding properties. The purine-rich strand alone has the same thermal stability as the duplex. Hyperchromicity studies of synthetic DNA variants indicate that all sequences with the composition (AATGN)n exhibit this unusual thermal stability. DNA-mobility-shift assays indicate that specific HeLa-cell nuclear proteins recognize this sequence with a relative affinity greater than 10(5). The extreme evolutionary conservation of this DNA sequence, its centromeric location, its unusual hydrogen bonding properties, its high affinity for specific nuclear proteins, and its similarity to functional centromeres isolated from yeast suggest that this sequence may be a component of the functional human centromere. Images PMID:1542662

  11. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  12. A molecular footprint of limb loss: sequence variation of the autopodial identity gene Hoxa-13.

    PubMed

    Kohlsdorf, Tiana; Cummings, Michael P; Lynch, Vincent J; Stopper, Geffrey F; Takahashi, Kazuhiko; Wagner, Günter P

    2008-12-01

    The homeobox gene Hoxa-13 codes for a transcription factor involved in multiple functions, including body axis and hand/foot development in tetrapods. In this study we investigate whether the loss of one function (e.g., limb loss in snakes) left a molecular footprint in exon 1 of Hoxa-13 that could be associated with the release of functional constraints caused by limb loss. Fragments of the Hoxa-13 exon 1 were sequenced from 13 species and analyzed, with additional published sequences of the same region, using relative rates and likelihood-ratio tests. Five amino acid sites in exon 1 of Hoxa-13 were detected as evolving under positive selection in the stem lineage of snakes. To further investigate whether there is an association between limb loss and sequence variation in Hoxa-13, we used the random forest method on an alignment that included shark, basal fish lineages, and "eu-tetrapods" such as mammals, turtle, alligator, and birds. The random forest method approaches the problem as one of classification, where we seek to predict the presence or absence of autopodium based on amino acid variation in Hoxa-13 sequences. Different alignments tested were associated with similar error rates (18.42%). The random forest method suggested that phenotypic states (autopodium present and absent) can often be correctly predicted based on Hoxa-13 sequences. Basal, nontetrapod gnat-hostomes that never had an autopodium were consistently classified as limbless together with the snakes, while eu-tetrapods without any history of limb loss in their phylogeny were also consistently classified as having a limb. Misclassifications affected mostly lizards, which, as a group, have a history of limb loss and limb re-evolution, and the urodele and caecilian in our sample. We conclude that a molecular footprint can be detected in Hoxa-13 that is associated with the lack of an autopodium; groups with classification ambiguity (lizards) are characterized by a history of repeated limb loss

  13. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association

    PubMed Central

    Strachan, Norval J. C.; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J.; Hanson, Mary F.; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H. A. M.; French, Nigel P.; George, Tessy; Biggs, Patrick J.; Forbes, Ken J.

    2015-01-01

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei’s genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei’s genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world. PMID:26442781

  14. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association.

    PubMed

    Strachan, Norval J C; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J; Hanson, Mary F; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H A M; French, Nigel P; George, Tessy; Biggs, Patrick J; Forbes, Ken J

    2015-10-07

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei's genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei's genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world.

  15. DNA sequence variation in the mitochondrial control region of red-backed voles (Clethrionomys).

    PubMed

    Matson, C W; Baker, R J

    2001-08-01

    The complete mitochondrial DNA (mtDNA) control region was sequenced for 71 individuals from five species of the rodent genus Clethrionomys both to understand patterns of variation and to explore the existence of previously described domains and other elements. Among species, the control region ranged from 942 to 971 bp in length. Our data were compatible with the proposal of three domains (extended terminal associated sequences [ETAS], central, conserved sequence blocks [CSB]) within the control region. The most conserved region in the control region was the central domain (12% of nucleotide positions variable), whereas in the ETAS and CSB domains, 22% and 40% of nucleotide positions were variable, respectively. Tandem repeats were encountered only in the ETAS domain of Clethrionomys rufocanus. This tandem repeat found in C. rufocanus was 24 bp in length and was located at the 5' end of the control region. Only two of the proposed CSB and ETAS elements appeared to be supported by our data; however, a "CSB1-like" element was also documented in the ETAS domain.

  16. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    PubMed Central

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-01-01

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. PMID:27172215

  17. DNA sequence variation in BpMADS2 gene in two populations of Betula pendula.

    PubMed

    Järvinen, Pia; Lemmetyinen, Juha; Savolainen, Outi; Sopanen, Tuomas

    2003-02-01

    The PISTILLATA (PI) homologue, BpMADS2, was isolated from silver birch (Betula pendula Roth) and used to study nucleotide polymorphism. Two regions (together about 2450 bp) comprising mainly untranslated sequences were sequenced from 10 individuals from each of two populations in Finland. The nucleotide polymorphism was low in the BpMADS2 locus, especially in the coding region. The synonymous site overall nucleotide diversity (pis) was 0.0043 and the nonsynonymous nucleotide diversity (pia) was only 0.000052. For the whole region, the pi values for the two populations were 0.0039 and 0.0045, and for the coding regions, the pi values were only 0 and 0.00066 (for the corresponding coding regions of Arabidopsis thaliana PI world-wide pi was 0.0021). Estimates of pi or theta did not differ significantly between the two populations, and the two populations were not diverged from each other. Two classes of BpMADS2 alleles were present in both populations, suggesting that this gene exhibits allelic dimorphism. In addition to the nucleotide site variation, two microsatellites were also associated within the haplotypes. This allelic dimorphism might be the result of postglacial re-colonization partly from northwestern, partly from southeastern/eastern refugia. The sequence comparison detected five recombination events in the regions studied. The large number of microsatellites in all of the three introns studied suggests that BpMADS2 is a hotspot for microsatellite formation.

  18. Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing.

    PubMed

    Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre

    2017-04-01

    Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.

  19. DNA Sequence and Expression Variation of Hop (Humulus lupulus) Valerophenone Synthase (VPS), a Key Gene in Bitter Acid Biosynthesis

    PubMed Central

    Castro, Consuelo B.; Whittock, Lucy D.; Whittock, Simon P.; Leggett, Grey; Koutoulis, Anthony

    2008-01-01

    Background The hop plant (Humulus lupulus) is a source of many secondary metabolites, with bitter acids essential in the beer brewing industry and others having potential applications for human health. This study investigated variation in DNA sequence and gene expression of valerophenone synthase (VPS), a key gene in the bitter acid biosynthesis pathway of hop. Methods Sequence variation was studied in 12 varieties, and expression was analysed in four of the 12 varieties in a series across the development of the hop cone. Results Nine single nucleotide polymorphisms (SNPs) were detected in VPS, seven of which were synonymous. The two non-synonymous polymorphisms did not appear to be related to typical bitter acid profiles of the varieties studied. However, real-time quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of VPS expression during hop cone development showed a clear link with the bitter acid content. The highest levels of VPS expression were observed in two triploid varieties, ‘Symphony’ and ‘Ember’, which typically have high bitter acid levels. Conclusions In all hop varieties studied, VPS expression was lowest in the leaves and an increase in expression was consistently observed during the early stages of cone development. PMID:18519445

  20. Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

    PubMed Central

    2010-01-01

    Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441

  1. [Genetic variation of Manchurian pheasant (Phasianus colchicus pallasi Rotshild, 1903) inferred from mitochondrial DNA control region sequences].

    PubMed

    Kozyrenko, M M; Fisenko, P V; Zhuravlev, Iu N

    2009-04-01

    Sequence variation of the mitochondrial DNA control region was studied in Manchurian pheasants (Phasianus colchicus pallasi Rotshild, 1903) representing three geographic populations from the southern part of the Russian Far East. Extremely low population genetic differentiation (F(ST) = 0.0003) pointed to a very high gene exchange between the populations. Combination of such characters as high haplotype diversity (0.884 to 0.913), low nucleotide diversity (0.0016 to 0.0022), low R2 values (0.1235 to 0.1337), certain patterns of pairwise-difference distributions, and the absence of phylogenetic structure suggested that the phylogenetic history of Ph. C. pallasi included passing through a bottleneck with further expansion in the postglacial period. According to the data obtained, it was suggested that differentiation between the mitochondrial lineages started approximately 100 000 years ago.

  2. Polarimetric Variations of Binary Stars. V. Pre-Main-Sequence Spectroscopic Binaries Located in Ophiuchus and Scorpius

    NASA Astrophysics Data System (ADS)

    Manset, N.; Bastien, P.

    2003-06-01

    We present polarimetric observations of seven pre-main-sequence (PMS) spectroscopic binaries located in the ρ Ophiuchus and Upper Scorpius star-forming regions (SFRs). The average observed polarizations at 7660 Å are between 0.5% and 3.5%. After estimates of the interstellar polarization are removed, all binaries have an intrinsic polarization above 0.4%, even though most of them do not present other evidences for circumstellar dust. Two binaries, NTTS 162814-2427 and NTTS 162819-2423S, present high levels of intrinsic polarization between 1.5% and 2.1%, in agreement with the fact that other observations (photometry, spectroscopy) indicate the presence of circumstellar dust. Tests reveal that all seven PMS binaries have a statistically variable or possibly variable polarization. Combining these results with our previous sample of binaries located in the Taurus, Auriga, and Orion SFRs, 68% of the binaries have an intrinsic polarization above 0.5%, and 90% of the binaries are polarimetrically variable or possibly variable. NTTS 160814-1857, 162814-2427, and 162819-2423S are clearly polarimetrically variable. The first two also exhibit phase-locked variations over ~10 and ~40 orbits, respectively. Statistically, NTTS 160905-1859 is possibly variable, but it shows periodic variations not detected by the statistical tests; those variations are not phased locked and only present for short intervals of time. The amplitudes of the variations reach a few tenths of a percent, greater than for the previously studied PMS binaries located in the Taurus, Orion, and Auriga SFRs. The high-eccentricity system NTTS 162814-2427 shows single-periodic variations, in agreement with our previous numerical simulations. We compare the observations with some of our numerical simulations and also show that an analysis of the periodic polarimetric variations with the Brown, McLean, & Emslie (BME) formalism to find the orbital inclination is for the moment premature: nonperiodic events

  3. Variation in sequence and organization of splicing regulatory elements in vertebrate genes

    PubMed Central

    Yeo, Gene; Hoon, Shawn; Venkatesh, Byrappa; Burge, Christopher B.

    2004-01-01

    Although core mechanisms and machinery of premRNA splicing are conserved from yeast to human, the details of intron recognition often differ, even between closely related organisms. For example, genes from the pufferfish Fugu rubripes generally contain one or more introns that are not properly spliced in mouse cells. Exploiting available genome sequence data, a battery of sequence analysis techniques was used to reach several conclusions about the organization and evolution of splicing regulatory elements in vertebrate genes. The classical splice site and putative branch site signals are completely conserved across the vertebrates studied (human, mouse, pufferfish, and zebrafish), and exonic splicing enhancers also appear broadly conserved in vertebrates. However, another class of splicing regulatory elements, the intronic splicing enhancers, appears to differ substantially between mammals and fish, with G triples (GGG) very abundant in mammalian introns but comparatively rare in fish. Conversely, short repeats of AC and GT are predicted to function as intronic splicing enhancers in fish but are not enriched in mammalian introns. Consistent with this pattern, exonic splicing enhancer-binding SR proteins are highly conserved across all vertebrates, whereas heterogeneous nuclear ribonucleoproteins, which bind many intronic sequences, vary in domain structure and even presence/absence between mammals and fish. Exploiting differences in intronic sequence composition, a statistical model was developed to predict the splicing phenotype of Fugu introns in mammalian systems and was used to engineer the spliceability of a Fugu intron in human cells by insertion of specific sequences, thereby rescuing splicing in human cells. PMID:15505203

  4. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  5. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer.

    PubMed

    Timofeeva, Maria N; Kinnersley, Ben; Farrington, Susan M; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J; Harris, Sarah E; Northwood, Emma L; Barrett, Jennifer H; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G; Houlston, Richard S

    2015-11-10

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10(-7)), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10(-7)); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10(-7) and OR = 1.09, P = 7.4 × 10(-8)); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10(-9)), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10(-6)). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10(-4)) and DNA mismatch repair genes (P = 6.1 × 10(-4)) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC.

  6. Atomic force microscopy of crystalline insulins: the influence of sequence variation on crystallization and interfacial structure.

    PubMed Central

    Yip, C M; Brader, M L; DeFelippis, M R; Ward, M D

    1998-01-01

    The self-association of proteins is influenced by amino acid sequence, molecular conformation, and the presence of molecular additives. In the presence of phenolic additives, LysB28ProB29 insulin, in which the C-terminal prolyl and lysyl residues of wild-type human insulin have been inverted, can be crystallized into forms resembling those of wild-type insulins in which the protein exists as zinc-complexed hexamers organized into well-defined layers. We describe herein tapping-mode atomic force microscopy (TMAFM) studies of single crystals of rhombohedral (R3) LysB28ProB29 that reveal the influence of sequence variation on hexamer-hexamer association at the surface of actively growing crystals. Molecular scale lattice images of these crystals were acquired in situ under growth conditions, enabling simultaneous identification of the rhombohedral LysB28ProB29 crystal form, its orientation, and its dynamic growth characteristics. The ability to obtain crystallographic parameters on multiple crystal faces with TMAFM confirmed that bovine and porcine insulins grown under these conditions crystallized into the same space group as LysB28ProB29 (R3), enabling direct comparison of crystal growth behavior and the influence of sequence variation. Real-time TMAFM revealed hexamer vacancies on the (001) terraces of LysB28ProB29, and more rounded dislocation noses and larger terrace widths for actively growing screw dislocations compared to wild-type bovine and porcine insulin crystals under identical conditions. This behavior is consistent with weaker interhexamer attachment energies for LysB28ProB29 at active growth sites. Comparison of the single crystal x-ray structures of wild-type insulins and LysB28ProB29 suggests that differences in protein conformation at the hexamer-hexamer interface and accompanying changes in interhexamer bonding are responsible for this behavior. These studies demonstrate that subtle changes in molecular conformation due to a single sequence

  7. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  8. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants.

    PubMed

    Gundry, Michael; Vijg, Jan

    2012-01-03

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a brief

  9. Variations in a hotspot region of chloroplast DNAs among common wheat and Aegilops revealed by nucleotide sequence analysis.

    PubMed

    Guo, Chang-Hong; Terachi, Toru

    2005-08-01

    The second largest BamHI fragment (B2) of the chloroplast DNA in Triticum (wheat) and Aegilops contains a highly variable region (a hotspot), resulting in four types of B2 of different size, i.e. B2l (10.5kb), B2m (10.2kb), B2 (9.6kb) and B2s (9.4kb). In order to gain a better understanding of the molecular nature of the variations in length and explain unexpected identity among B2 of Ae. ovata, Ae. speltoides and common wheat (T. aestivum), the nucleotide sequence between a stop codon of rbcL and a HindIII site in cemA in the hotspot was determined for Ae. ovata, Ae. speltoides, Ae. caudata and Ae. mutica. The total number of nucleotides in the region was 2808, 2810, 3302, and 3594 bp, for Ae. speltoides, Ae. ovata, Ae. caudata and Ae. mutica, respectively, and the sequences were compared with the corresponding ones of Ae. crassa 4x, T. aestivum and Ae. squarrosa. Compared with the largest B2l fragment of Ae. mutica, a 791bp and a 793 bp deletion were found in Ae. speltoides and Ae. ovata, respectively, and the possible site of deletion in the two species is the same as that of T. aestivum. However, a deleted segment in Ae. ovata is 2 bp longer than that of Ae. speltoides (and T. aestivum), demonstrating that recurrent deletions had occurred in the chloroplast genomes of both species. Comparison of the sequences from Ae. caudata and Ae. crassa 4x with that of Ae. mutica revealed a 289 bp and a 61 bp deletion at the same site in Ae. caudata and Ae. crassa 4x, respectively. Sequence comparison using wild Aegilops plants showed that the large length variations in a hotspot are fixed to each species. A considerable number of polymorphisms are observed in a loop in the 3' of rbcL. The study reveals the relative importance of the large and small indels and minute inversions to account for variations in the chloroplast genomes among closely related species.

  10. Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by artifactual and biological sequence variation.

    PubMed

    Mark, Kristiina; Cornejo, Carolina; Keller, Christine; Flück, Daniela; Scheidegger, Christoph

    2016-09-01

    Although lichens (lichen-forming fungi) play an important role in the ecological integrity of many vulnerable landscapes, only a minority of lichen-forming fungi have been barcoded out of the currently accepted ∼18 000 species. Regular Sanger sequencing can be problematic when analyzing lichens since saprophytic, endophytic, and parasitic fungi live intimately admixed, resulting in low-quality sequencing reads. Here, high-throughput, long-read 454 pyrosequencing in a GS FLX+ System was tested to barcode the fungal partner of 100 epiphytic lichen species from Switzerland using fungal-specific primers when amplifying the full internal transcribed spacer region (ITS). The present study shows the potential of DNA barcoding using pyrosequencing, in that the expected lichen fungus was successfully sequenced for all samples except one. Alignment solutions such as BLAST were found to be largely adequate for the generated long reads. In addition, the NCBI nucleotide database-currently the most complete database for lichen-forming fungi-can be used as a reference database when identifying common species, since the majority of analyzed lichens were identified correctly to the species or at least to the genus level. However, several issues were encountered, including a high sequencing error rate, multiple ITS versions in a genome (incomplete concerted evolution), and in some samples the presence of mixed lichen-forming fungi (possible lichen chimeras).

  11. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  12. Copy number variations in Hanwoo and Yanbian cattle genomes using the massively parallel sequencing data.

    PubMed

    Choi, Jung-Woo; Chung, Won-Hyong; Lim, Kyu-Sang; Lim, Won-Jun; Choi, Bong-Hwan; Lee, Seung-Hwan; Kim, Hyeong-Cheol; Lee, Seung-Soo; Cho, Eun-Seok; Lee, Kyung-Tai; Kim, Namshin; Kim, Jeong-Dae; Kim, Jong-Bok; Chai, Han-Ha; Cho, Yong-Min; Kim, Tae-Hun; Lim, Dajeong

    2016-09-01

    Hanwoo is an indigenous Korean beef cattle breed, and it shared an ancestor with Yanbian cattle that are found in the Northeast provinces in China until the last century. During recent decades, those cattle breeds experienced different selection pressures. Here, we present genome-wide copy number variations (CNVs) by comparing Hanwoo and Yanbian cattle sequencing data. We used ~3.12 and ~3.07 billion sequence reads from Hanwoo and Yanbian cattle, respectively. A total of 901 putative CNV regions (CNVRs) were identified throughout the genome, representing 5,513,340bp. This is a smaller number than has been reported in previous studies, indicating that Hanwoo are genetically close to Yanbian cattle. Of the CNVRs, 53.2% and 46.8% were found to be gains and losses in Hanwoo. Potential functional roles of each CNVR were assessed by annotating all CNVRs and gene ontology (GO) enrichment analysis. We found that 278 CNVRs overlapped with cattle gene-sets (genic-CNVRs) that could be promising candidates to account for economically important traits in cattle. The enrichment analysis indicated that genes were significantly over-represented in GO terms, including developmental process, multicellular organismal process, reproduction, and response to stimulus. These results provide a valuable genomic resource for determining how CNVs are associated with cattle traits.

  13. Patchwork sequencing of tomato San Marzano and Vesuviano varieties highlights genome-wide variations

    PubMed Central

    2014-01-01

    Background Investigation of tomato genetic resources is a crucial issue for better straight evolution and genetic studies as well as tomato breeding strategies. Traditional Vesuviano and San Marzano varieties grown in Campania region (Southern Italy) are famous for their remarkable fruit quality. Owing to their economic and social importance is crucial to understand the genetic basis of their unique traits. Results Here, we present the draft genome sequences of tomato Vesuviano and San Marzano genome. A 40x genome coverage was obtained from a hybrid Illumina paired-end reads assembling that combines de novo assembly with iterative mapping to the reference S. lycopersicum genome (SL2.40). Insertions, deletions and SNP variants were carefully measured. When assessed on the basis of the reference annotation, 30% of protein-coding genes are predicted to have variants in both varieties. Copy genes number and gene location were assessed by mRNA transcripts mapping, showing a closer relationship of San Marzano with reference genome. Distinctive variations in key genes and transcription/regulation factors related to fruit quality have been revealed for both cultivars. Conclusions The effort performed highlighted varieties relationships and important variants in fruit key processes useful to dissect the path from sequence variant to phenotype. PMID:24548308

  14. High-Throughput Next-Generation Sequencing of Polioviruses.

    PubMed

    Montmayeur, Anna M; Ng, Terry Fei Fan; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A; Oberste, M Steven; Burns, Cara C

    2017-02-01

    The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance.

  15. Mitochondrial DNA sequence variation among populations and host races of Lambdina fiscellaria (Gn.) (Lepidoptera: Geometridae).

    PubMed

    Sperling, F A; Raske, A G; Otvos, I S

    1999-02-01

    The hemlock looper, Lambdina fiscellaria (Gn.), is a recurring major forest pest that is widely distributed in North America. Three subspecies (L. f. fiscellaria, L. f. lugubrosa (Hulst) and L. f. somniaria (Hulst)) have been recognized based on larval host or adult pheromone differences, but no consistent morphological differences have been reported. To clarify their taxonomic status, we surveyed mitochondrial DNA (mtDNA) sequence and restriction site variation in two protein coding genes, cytochrome oxidase I and II (COI and COII), in populations across the range of L. fiscellaria. In addition to variation in COI and COII, we found an intergenic spacer region of 20-23 bp located between the tRNA tyrosine gene and the start of COI. Of the 141 specimens of L. fiscellaria assayed, 137 were grouped into two distinct mtDNA lineages, one of which was disproportionately associated with eastern populations and one with western populations. However, single specimens and two populations in eastern Canada had mtDNA resembling that of western populations. Three divergent and rare haplotypes had basal affinities to the two common lineages. The two major lineages of L. fiscellaria were diverged by approximately 2% from each other, as well as from the mtDNA of two outgroup species, L. athasaria (Walker) and L. pellucidaria(G. & R.). The two outgroup species had essentially the same mtDNA and may be conspecific. We interpret the pattern of mtDNA variation within L. fiscellaria as indicating genetic polymorphism within a single species without clear subspecific divisions, rather than evidence of multiple cryptic species.

  16. Detection and implication of significant temporal b-value variation during earthquake sequences

    NASA Astrophysics Data System (ADS)

    Gulia, Laura; Tormann, Thessa; Schorlemmer, Danijel; Wiemer, Stefan

    2016-04-01

    Earthquakes tend to cluster in space and time and periods of increased seismic activity are also periods of increased seismic hazard. Forecasting models currently used in statistical seismology and in Operational Earthquake Forecasting (e.g. ETAS) consider the spatial and temporal changes in the activity rates whilst the spatio-temporal changes in the earthquake size distribution, the b-value, are not included. Laboratory experiments on rock samples show an increasing relative proportion of larger events as the system approaches failure, and a sudden reversal of this trend after the main event. The increasing fraction of larger events during the stress increase period can be mathematically represented by a systematic b-value decrease, while the b-value increases immediately following the stress release. We investigate whether these lab-scale observations also apply to natural earthquake sequences and can help to improve our understanding of the physical processes generating damaging earthquakes. A number of large events nucleated in low b-value regions and spatial b-value variations have been extensively documented in the past. Detecting temporal b-value evolution with confidence is more difficult, one reason being the very different scales that have been suggested for a precursory drop in b-value, from a few days to decadal scale gradients. We demonstrate with the results of detailed case studies of the 2009 M6.3 L'Aquila and 2011 M9 Tohoku earthquakes that significant and meaningful temporal b-value variability can be detected throughout the sequences, which e.g. suggests that foreshock probabilities are not generic but subject to significant spatio-temporal variability. Such potential conclusions require and motivate the systematic study of many sequences to investigate whether general patterns exist that might eventually be useful for time-dependent or even real-time seismic hazard assessment.

  17. Effect of laying sequence on egg mercury in captive zebra finches: an interpretation considering individual variation.

    PubMed

    Ou, Langbo; Varian-Ramos, Claire W; Cristol, Daniel A

    2015-08-01

    Bird eggs are used widely as noninvasive bioindicators for environmental mercury availability. Previous studies, however, have found varying relationships between laying sequence and egg mercury concentrations. Some studies have reported that the mercury concentration was higher in first-laid eggs or declined across the laying sequence, whereas in other studies mercury concentration was not related to egg order. Approximately 300 eggs (61 clutches) were collected from captive zebra finches dosed throughout their reproductive lives with methylmercury (0.3 μg/g, 0.6 μg/g, 1.2 μg/g, or 2.4 μg/g wet wt in diet); the total mercury concentration (mean ± standard deviation [SD] dry wt basis) of their eggs was 7.03 ± 1.38 μg/g, 14.15 ± 2.52 μg/g, 26.85 ± 5.85 μg/g, and 49.76 ± 10.37 μg/g, respectively (equivalent to fresh wt egg mercury concentrations of 1.24 μg/g, 2.50 μg/g, 4.74 μg/g, and 8.79 μg/g). The authors observed a significant decrease in the mercury concentration of successive eggs when compared with the first egg and notable variation between clutches within treatments. The mercury level of individual females within and among treatments did not alter this relationship. Based on the results, sampling of a single egg in each clutch from any position in the laying sequence is sufficient for purposes of population risk assessment, but it is not recommended as a proxy for individual female exposure or as an estimate of average mercury level within the clutch.

  18. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    PubMed

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.

  19. Library preparation for highly accurate population sequencing of RNA viruses

    PubMed Central

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  20. High sequence conservation among cucumber mosaic virus isolates from lily.

    PubMed

    Chen, Y K; Derks, A F; Langeveld, S; Goldbach, R; Prins, M

    2001-08-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV isolates of Alstroemeria and crocus were classified as subgroup II isolates, whereas 8 other isolates, from lily, gladiolus, amaranthus, larkspur, and lisianthus, were identified as subgroup I members. In general, nucleotide sequence comparisons correlated well with geographic distribution, with one notable exception: the analyzed nucleotide sequences of 5 lily isolates showed remarkably high homology despite different origins.

  1. A novel multi-alignment pipeline for high-throughput sequencing data.

    PubMed

    Huang, Shunping; Holt, James; Kao, Chia-Yu; McMillan, Leonard; Wang, Wei

    2014-01-01

    Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo.

  2. Sequence variation within botulinum neurotoxin serotypes impacts antibody binding and neutralization.

    PubMed

    Smith, T J; Lou, J; Geren, I N; Forsyth, C M; Tsai, R; Laporte, S L; Tepp, W H; Bradshaw, M; Johnson, E A; Smith, L A; Marks, J D

    2005-09-01

    The botulinum neurotoxins (BoNTs) are category A biothreat agents which have been the focus of intensive efforts to develop vaccines and antibody-based prophylaxis and treatment. Such approaches must take into account the extensive BoNT sequence variability; the seven BoNT serotypes differ by up to 70% at the amino acid level. Here, we have analyzed 49 complete published sequences of BoNTs and show that all toxins also exhibit variability within serotypes ranging between 2.6 and 31.6%. To determine the impact of such sequence differences on immune recognition, we studied the binding and neutralization capacity of six BoNT serotype A (BoNT/A) monoclonal antibodies (MAbs) to BoNT/A1 and BoNT/A2, which differ by 10% at the amino acid level. While all six MAbs bound BoNT/A1 with high affinity, three of the six MAbs showed a marked reduction in binding affinity of 500- to more than 1,000-fold to BoNT/A2 toxin. Binding results predicted in vivo toxin neutralization; MAbs or MAb combinations that potently neutralized A1 toxin but did not bind A2 toxin had minimal neutralizing capacity for A2 toxin. This was most striking for a combination of three binding domain MAbs which together neutralized >40,000 mouse 50% lethal doses (LD(50)s) of A1 toxin but less than 500 LD(50)s of A2 toxin. Combining three MAbs which bound both A1 and A2 toxins potently neutralized both toxins. We conclude that sequence variability exists within all toxin serotypes, and this impacts monoclonal antibody binding and neutralization. Such subtype sequence variability must be accounted for when generating and evaluating diagnostic and therapeutic antibodies.

  3. Sequence Variation within Botulinum Neurotoxin Serotypes Impacts Antibody Binding and Neutralization

    PubMed Central

    Smith, T. J.; Lou, J.; Geren, I. N.; Forsyth, C. M.; Tsai, R.; LaPorte, S. L.; Tepp, W. H.; Bradshaw, M.; Johnson, E. A.; Smith, L. A.; Marks, J. D.

    2005-01-01

    The botulinum neurotoxins (BoNTs) are category A biothreat agents which have been the focus of intensive efforts to develop vaccines and antibody-based prophylaxis and treatment. Such approaches must take into account the extensive BoNT sequence variability; the seven BoNT serotypes differ by up to 70% at the amino acid level. Here, we have analyzed 49 complete published sequences of BoNTs and show that all toxins also exhibit variability within serotypes ranging between 2.6 and 31.6%. To determine the impact of such sequence differences on immune recognition, we studied the binding and neutralization capacity of six BoNT serotype A (BoNT/A) monoclonal antibodies (MAbs) to BoNT/A1 and BoNT/A2, which differ by 10% at the amino acid level. While all six MAbs bound BoNT/A1 with high affinity, three of the six MAbs showed a marked reduction in binding affinity of 500- to more than 1,000-fold to BoNT/A2 toxin. Binding results predicted in vivo toxin neutralization; MAbs or MAb combinations that potently neutralized A1 toxin but did not bind A2 toxin had minimal neutralizing capacity for A2 toxin. This was most striking for a combination of three binding domain MAbs which together neutralized >40,000 mouse 50% lethal doses (LD50s) of A1 toxin but less than 500 LD50s of A2 toxin. Combining three MAbs which bound both A1 and A2 toxins potently neutralized both toxins. We conclude that sequence variability exists within all toxin serotypes, and this impacts monoclonal antibody binding and neutralization. Such subtype sequence variability must be accounted for when generating and evaluating diagnostic and therapeutic antibodies. PMID:16113261

  4. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    PubMed Central

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M. G.; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-01-01

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material. PMID:26285800

  5. Dissection of genomic features and variations of three pathotypes of Puccinia striiformis through whole genome sequencing

    PubMed Central

    Kiran, Kanti; Rawal, Hukam C.; Dubey, Himanshu; Jaswal, R.; Bhardwaj, Subhash C.; Prasad, P.; Pal, Dharam; Devanna, B. N.; Sharma, Tilak R.

    2017-01-01

    Stripe rust of wheat, caused by Puccinia striiformis f. sp. tritici, is one of the important diseases of wheat. We used NGS technologies to generate a draft genome sequence of two highly virulent (46S 119 and 31) and a least virulent (K) pathotypes of P. striiformis from the Indian subcontinent. We generated ~24,000–32,000 sequence contigs (N50;7.4–9.2 kb), which accounted for ~86X–105X sequence depth coverage with an estimated genome size of these pathotypes ranging from 66.2–70.2 Mb. A genome-wide analysis revealed that pathotype 46S 119 might be highly evolved among the three pathotypes in terms of year of detection and prevalence. SNP analysis revealed that ~47% of the gene sets are affected by nonsynonymous mutations. The extracellular secreted (ES) proteins presumably are well conserved among the three pathotypes, and perhaps purifying selection has an important role in differentiating pathotype 46S 119 from pathotypes K and 31. In the present study, we decoded the genomes of three pathotypes, with 81% of the total annotated genes being successfully assigned functional roles. Besides the identification of secretory genes, genes essential for pathogen-host interactions shall prove this study as a huge genomic resource for the management of this disease using host resistance. PMID:28211474

  6. Dissection of genomic features and variations of three pathotypes of Puccinia striiformis through whole genome sequencing.

    PubMed

    Kiran, Kanti; Rawal, Hukam C; Dubey, Himanshu; Jaswal, R; Bhardwaj, Subhash C; Prasad, P; Pal, Dharam; Devanna, B N; Sharma, Tilak R

    2017-02-17

    Stripe rust of wheat, caused by Puccinia striiformis f. sp. tritici, is one of the important diseases of wheat. We used NGS technologies to generate a draft genome sequence of two highly virulent (46S 119 and 31) and a least virulent (K) pathotypes of P. striiformis from the Indian subcontinent. We generated ~24,000-32,000 sequence contigs (N50;7.4-9.2 kb), which accounted for ~86X-105X sequence depth coverage with an estimated genome size of these pathotypes ranging from 66.2-70.2 Mb. A genome-wide analysis revealed that pathotype 46S 119 might be highly evolved among the three pathotypes in terms of year of detection and prevalence. SNP analysis revealed that ~47% of the gene sets are affected by nonsynonymous mutations. The extracellular secreted (ES) proteins presumably are well conserved among the three pathotypes, and perhaps purifying selection has an important role in differentiating pathotype 46S 119 from pathotypes K and 31. In the present study, we decoded the genomes of three pathotypes, with 81% of the total annotated genes being successfully assigned functional roles. Besides the identification of secretory genes, genes essential for pathogen-host interactions shall prove this study as a huge genomic resource for the management of this disease using host resistance.

  7. PCR/SSCP detects reliably and efficiently DNA sequence variations in large scale screening projects.

    PubMed

    Miterski, B; Krüger, R; Wintermeyer, P; Epplen, J T

    2000-06-01

    A simple and fast method with high reliability is necessary for the identification of mutations, polymorphisms and sequence variants (MPSV) within many genes and many samples, e.g. for clarifying the genetic background of individuals with multifactorial diseases. Here we review our experience with the polymerase chain reaction/single-strand conformation polymorphism (PCR/SSCP) analysis to identify MPSV in a number of genes thought to be involved in the pathogenesis of multifactorial neurological disorders, including autoimmune diseases like multiple sclerosis (MS) and neurodegenerative disorders like Parkinson s disease (PD). The method is based on the property of the DNA that the electrophoretic mobility of single stranded nucleic acids depends not only on their size but also on their sequence. The target sequences were amplified, digested into fragments ranging from 50-240 base pairs (bp), heat-denatured and analysed on native polyacrylamide (PAA) gels of different composition. The analysis of a great number of different PCR products demonstrates that the detection rate of MPSV depends on the fragment lengths, the temperature during electrophoresis and the composition of the gel. In general, the detection of MPSV is neither influenced by their location within the DNA fragment nor by the type of substitution, i.e., transitions or transversions. The standard PCR/SSCP system described here provides high reliability and detection rates. It allows the efficient analysis of a large number of DNA samples and many different genes.

  8. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  9. High-throughput sequencing in veterinary infection biology and diagnostics.

    PubMed

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.

  10. Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

    PubMed Central

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  11. Natural allelic variations in highly polyploidy Saccharum complex

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sugarcane (Saccharum spp.) as important sugar and biofuel crop are highly polypoid with complex genomes. A large amount of natural phenotypic variation exists in sugarcane germplasm. Understanding its allelic variance has been challenging but is a critical foundation for discovery of the genomic seq...

  12. Genetic variation of Trigonobalanus verticillata, a primitive species of Fagaceae, in Malaysia revealed by chloroplast sequences and AFLP markers.

    PubMed

    Kamiya, Koichi; Harada, Ko; Clyde, Mahani Mansor; Mohamed, Abdul Latiff

    2002-06-01

    The genetic variation of Trigonobalanus verticillata, the most recently described genus of Fagaceae, was studied using chloroplast DNA sequences and AFLP fingerprinting. This species has a restricted distribution that is known to include seven localities in tropical lower montane forests in Malaysia and Indonesia. A total of 75 individuals were collected from Bario, Kinabalu, and Fraser's Hill in Malaysia. The sequences of rbcL, matK, and three non-coding regions (atpB-rbcL spacer, trnL intron, and trnL-trnF spacer) were determined for 19 individuals from these populations. We found a total of 30 nucleotide substitutions and four length variations, which allowed identification of three haplotypes characterizing each population. No substitutions were detected within populations, while the tandem repeats in the trnL -trnF spacer had a variable repeat number of a 20-bp motif only in Kinabalu. The differentiation of the populations inferred from the cpDNA molecular clock calibrated with paleontological data was estimated to be 8.3 MYA between Bario and Kinabalu, and 16.7 MYA between Fraser's Hill and the other populations. In AFLP analysis, four selective primer pairs yielded a total of 431 loci, of which 340 (78.9%) were polymorphic. The results showed relatively high gene diversity (H(S) = 0.153 and H(T) = 0.198) and nucleotide diversity (pi(S) = 0.0132 and pi(T) = 0.0168) both within and among the populations. Although the cpDNA data suggest that little or no gene flow occurred between the populations via seeds, the fixation index estimated from AFLP data (F(ST) = 0.153 and N(ST) = 0.214) implies that some gene flow occurs between populations, possibly through pollen transfer.

  13. Effect of variations in peptide sequence on anti-human milk fat globule membrane antibody reactions.

    PubMed

    Xing, P X; Reynolds, K; Pietersz, G A; McKenzie, I F

    1991-02-01

    Monoclonal anti-mucine antibodies BC1, BC2 and BC3 produced using human milk fat globule membrane react with a synthetic peptide p1-24 (PDTRPAPGSTAPPAHGVTSAPDTR) representing the repeating amino acid sequence of the mucin core protein. The minimum epitope recognized by these three monoclonal antibodies (mAb) in p1-24 was contained in the five amino acids APDTR. To analyse the variation of position of the epitope, various modifications of the APDTR sequence were made by synthesizing peptides and testing by direct binding and inhibition enzyme-linked immunosorbent assays. Firstly, peptides p13-32 and C-p13-32, in which the epitope APDTR was placed in the middle instead of the C-terminal as in p1-24, were examined. These peptides had a greater reaction with mAb BC1, BC2 and BC3 compared with the reaction with p1-24. Secondly, A-p1-24 and TSA-p1-24 were made wherein two APDTR epitopes were present--these peptides were shown to bind two IgG antibody molecules. Finally, the contribution of each amino acid in the APDTR epitope was studied using the pepscan polyethylene rods, making all 20 of the amino acid substitutions in each position for SAPDTR (the minimum epitope APDTR with an adjacent amino acid S). In the 120 peptides examined there were some 'permissible' substitutions in A, D and T but not in P or R for BC1 and BC2; there were more 'permissible' substitutions for BC3; different substitution patterns were found with each antibody and some substitutions gave an increased reaction compared with the native peptide SAPDTR. The studies are of value in analysing the reaction of antibodies with epitopes expressed in breast cancer and in determining the antigenicity of synthetic peptides.

  14. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed Central

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-01-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas. PMID:7637022

  15. Nullomers and High Order Nullomers in Genomic Sequences

    PubMed Central

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon

  16. Alignment of 700 globin sequences: extent of amino acid substitution and its correlation with variation in volume.

    PubMed Central

    Kapp, O. H.; Moens, L.; Vanfleteren, J.; Trotman, C. N.; Suzuki, T.; Vinogradov, S. N.

    1995-01-01

    Seven-hundred globin sequences, including 146 nonvertebrate sequences, were aligned on the basis of conservation of secondary structure and the avoidance of gap penalties. Of the 182 positions needed to accommodate all the globin sequences, only 84 are common to all, including the absolutely conserved PheCD1 and HisF8. The mean number of amino acid substitutions per position ranges from 8 to 13 for all globins and 5 to 9 for internal positions. Although the total sequence volumes have a variation approximately 2-3%, the variation in volume per position ranges from approximately 13% for the internal to approximately 21% for the surface positions. Plausible correlations exist between amino acid substitution and the variation in volume per position for the 84 common and the internal but not the surface positions. The amino acid substitution matrix derived from the 84 common positions was used to evaluate sequence similarity within the globins and between the globins and phycocyanins C and colicins A, via calculation of pairwise similarity scores. The scores for globin-globin comparisons over the 84 common positions overlap the globin-phycocyanin and globin-colicin scores, with the former being intermediate. For the subset of internal positions, overlap is minimal between the three groups of scores. These results imply a continuum of amino acid sequences able to assume the common three-on-three alpha-helical structure and suggest that the determinants of the latter include sites other than those inaccessible to solvent. PMID:8535255

  17. Sequence variation of the 16S to 23S rRNA spacer region in Salmonella enterica.

    PubMed

    Christensen, H; Møller, P L; Vogensen, F K; Olsen, J E

    2000-01-01

    The possibility for identification of Salmonella enterica serotypes by sequence analysis of the 16S to 23S rRNA internal transcribed spacer was investigated by direct sequencing of polymerase chain reaction-amplified DNA from all operons simultaneously in a collection of 25 strains of 18 different serotypes of S. enterica, and by sequencing individual cloned operons from a single strain. It was only possible to determine the first 117 bases upstream from the 23S rRNA gene by direct sequencing because of variation between the rrn operons. Comparison of sequences from this region allowed separation of only 15 out of the 18 serotypes investigated and was not specific even at the subspecies level of S. enterica. To determine the differences between internal transcribed spacers in more detail, the individual rrn operons of strain JEO 197, serotype IV 43:z4,z23:-, were cloned and sequenced. The strain contained four short internal transcribed spacer fragments of 382-384 bases in length, which were 98.4-99.7% similar to each other and three long fragments of 505 bases with 98.0-99.8% similarity. The study demonstrated a higher degree of interbacterial variation than intrabacterial variation between operons for serotypes of S. enterica.

  18. Sequence variation in mitochondrial cox1 and nad1 genes of ascaridoid nematodes in cats and dogs from Iran.

    PubMed

    Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B

    2015-07-01

    The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.

  19. Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese

    PubMed Central

    Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

    2016-01-01

    AIM To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. METHODS LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. RESULTS Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. CONCLUSION LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA. PMID:27672588

  20. Exome Sequence Analysis of 14 Families With High Myopia

    PubMed Central

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.; Quow, Krystina L.; Huang, Samuel J.; Pavelec, Derek M.; Rosenberg, Thomas; Young, Terri L.

    2017-01-01

    Purpose To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. Results In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Conclusions Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder. PMID:28384719

  1. Metadata-driven comparative analysis tool for sequences (meta-CATS): an automated process for identifying significant sequence variations that correlate with virus attributes.

    PubMed

    Pickett, B E; Liu, M; Sadat, E L; Squires, R B; Noronha, J M; He, S; Jen, W; Zaremba, S; Gu, Z; Zhou, L; Larsen, C N; Bosch, I; Gehrke, L; McGee, M; Klem, E B; Scheuermann, R H

    2013-12-01

    The Virus Pathogen Resource (ViPR; www.viprbrc.org) and Influenza Research Database (IRD; www.fludb.org) have developed a metadata-driven Comparative Analysis Tool for Sequences (meta-CATS), which performs statistical comparative analyses of nucleotide and amino acid sequence data to identify correlations between sequence variations and virus attributes (metadata). Meta-CATS guides users through: selecting a set of nucleotide or protein sequences; dividing them into multiple groups based on any associated metadata attribute (e.g. isolation location, host species); performing a statistical test at each aligned position; and identifying all residues that significantly differ between the groups. As proofs of concept, we have used meta-CATS to identify sequence biomarkers associated with dengue viruses isolated from different hemispheres, and to identify variations in the NS1 protein that are unique to each of the 4 dengue serotypes. Meta-CATS is made freely available to virology researchers to identify genotype-phenotype correlations for development of improved vaccines, diagnostics, and therapeutics.

  2. Diversity and Variation of Bacterial Community Revealed by MiSeq Sequencing in Chinese Dark Teas

    PubMed Central

    Fu, Jianyu; Lv, Haipeng; Chen, Feng

    2016-01-01

    Chinese dark teas (CDTs) are now among the popular tea beverages worldwide due to their unique health benefits. Because the production of CDTs involves fermentation that is characterized by the effect of microbes, microorganisms are believed to play critical roles in the determination of the chemical characteristics of CDTs. Some dominant fungi have been identified from CDTs. In contrast, little, if anything, is known about the composition of bacterial community in CDTs. This study was set to investigate the diversity and variation of bacterial community in four major types of CDTs from China. First, the composition of the bacterial community of CDTs was determined using MiSeq sequencing. From the four typical CDTs, a total of 238 genera that belong to 128 families of bacteria were detected, including most of the families of beneficial bacteria known to be associated with fermented food. While different types of CDTs had generally distinct bacterial structures, the two types of brick teas produced from adjacent regions displayed strong similarity in bacterial composition, suggesting that the producing environment and processing condition perhaps together influence bacterial succession in CDTs. The global characterization of bacterial communities in CDTs is an essential first step for us to understand their function in fermentation and their potential impact on human health. Such knowledge will be important guidance for improving the production of CDTs with higher quality and elevated health benefits. PMID:27690376

  3. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    PubMed Central

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  4. An improved high throughput sequencing method for studying oomycete communities.

    PubMed

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-03-01

    Culture-independent studies using next generation sequencing have revolutionized microbial ecology, however, oomycete ecology in soils is severely lagging behind. The aim of this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomycete communities. The well-known primer sets ITS4, ITS6 and ITS7 were used in the study in a semi-nested PCR approach to target the internal transcribed spacer (ITS) 1 of ribosomal DNA in a next generation sequencing protocol. These primers have been used in similar studies before, but with limited success. We were able to increase the proportion of retrieved oomycete sequences dramatically mainly by increasing the annealing temperature during PCR. The optimized protocol was validated using three mock communities and the method was further evaluated using total DNA from 26 soil samples collected from different agricultural fields in Denmark, and 11 samples from carrot tissue with symptoms of Pythium infection. Sequence data from the Pythium and Phytophthora mock communities showed that our strategy successfully detected all included species. Taxonomic assignments of OTUs from 26 soil sample showed that 95% of the sequences could be assigned to oomycetes including Pythium, Aphanomyces, Peronospora, Saprolegnia and Phytophthora. A high proportion of oomycete reads was consistently present in all 26 soil samples showing the versatility of the strategy. A large diversity of Pythium species including pathogenic and saprophytic species were dominating in cultivated soil. Finally, we analyzed amplicons from carrots with symptoms of cavity spot. This resulted in 94% of the reads belonging to oomycetes with a dominance of species of Pythium that are known to be involved in causing cavity spot, thus demonstrating the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete

  5. Binary interactions with high accretion rates onto main sequence stars

    NASA Astrophysics Data System (ADS)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10-2 M ⊙ yr-1 for solar type stars, and up to ≈ 1 M ⊙ yr-1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  6. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2016-07-12

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  7. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  8. Variation in the Kozak sequence of WNT16 results in an increased translation and is associated with osteoporosis related parameters.

    PubMed

    Hendrickx, Gretl; Boudin, Eveline; Fijałkowski, Igor; Nielsen, Torben Leo; Andersen, Marianne; Brixen, Kim; Van Hul, Wim

    2014-02-01

    The importance of WNT16 in the regulation of bone metabolism was recently confirmed by several genome-wide association studies and by a Wnt16 (Wnt16(-/-)) knockout mouse model. The aim of this study was thus to replicate and further elucidate the effect of common genetic variation in WNT16 on osteoporosis related parameters. Hereto, we performed a WNT16 candidate gene association study in a population of healthy Caucasian men from the Odense Androgen Study (OAS). Using HapMap, five tagSNPs and one multimarker test were selected for genotyping to cover most of the common genetic variation in and around WNT16 (MAF>5%). This study confirmed previously reported associations for rs3801387 and rs2707466 with bone mineral density (BMD) at several sites. Furthermore, we additionally demonstrated that rs2908007 is strongly associated with BMD at several sites in the young, elderly and complete OAS population. The observed effect of these three associated SNPs on the respective phenotypes is comparable and we can conclude that the presence of the minor allele results in an increase in BMD. Additionally, we performed re-sequencing of WNT16 on two cohorts selected from the young OAS cohort, based on their extreme BMD values. On this basis, rs55710688 was selected for an in vitro translation experiment since it is located in the Kozak sequence of WNT16a. We observed an increased translation efficiency and thus a higher amount of WNT16a for the Kozak sequence that was significantly more prevalent in the high BMD cohort. This observation is in line with the results of the Wnt16(-/-) mice. Finally, a WNT luciferase reporter assay was performed and showed no activation of the β-catenin dependent pathway by Wnt16. We did detect a dose-dependent inhibitory effect of Wnt16 on WNT1 activation of this canonical WNT pathway. Increased translation of WNT16 can thus lead to an increased inhibitory action of WNT16 on canonical WNT signaling. This statement is in contrast with the known

  9. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed Central

    Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179

  10. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  11. Optical transitions in highly charged californium ions with high sensitivity to variation of the fine-structure constant.

    PubMed

    Berengut, J C; Dzuba, V A; Flambaum, V V; Ong, A

    2012-08-17

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf(16+) is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf(16+) has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters.

  12. Optical Transitions in Highly Charged Californium Ions with High Sensitivity to Variation of the Fine-Structure Constant

    NASA Astrophysics Data System (ADS)

    Berengut, J. C.; Dzuba, V. A.; Flambaum, V. V.; Ong, A.

    2012-08-01

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf16+ is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf16+ has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters.

  13. De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts

    PubMed Central

    Kosciolek, Tomasz; Jones, David T.

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm – FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step. PMID:24637808

  14. De novo structure prediction of globular proteins aided by sequence variation-derived contacts.

    PubMed

    Kosciolek, Tomasz; Jones, David T

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.

  15. Epilepsy-causing sequence variations in SIK1 disrupt synaptic activity response gene expression and affect neuronal morphology.

    PubMed

    Pröschel, Christoph; Hansen, Jeanne N; Ali, Adil; Tuttle, Emily; Lacagnina, Michelle; Buscaglia, Georgia; Halterman, Marc W; Paciorkowski, Alex R

    2017-02-01

    SIK1 syndrome is a newly described developmental epilepsy disorder caused by heterozygous mutations in the salt-inducible kinase SIK1. To better understand the pathophysiology of SIK1 syndrome, we studied the effects of SIK1 pathogenic sequence variations in human neurons. Primary human fetal cortical neurons were transfected with a lentiviral vector to overexpress wild-type and mutant SIK1 protein. We evaluated the transcriptional activity of known downstream gene targets in neurons expressing mutant SIK1 compared with wild type. We then assayed neuronal morphology by measuring neurite length, number and branching. Truncating SIK1 sequence variations were associated with abnormal MEF2C transcriptional activity and decreased MEF2C protein levels. Epilepsy-causing SIK1 sequence variations were associated with significantly decreased expression of ARC (activity-regulated cytoskeletal-associated) and other synaptic activity response element genes. Assay of mRNA levels for other MEF2C target genes NR4A1 (Nur77) and NRG1, found significantly, decreased the expression of these genes as well. The missense p.(Pro287Thr) SIK1 sequence variation was associated with abnormal neuronal morphology, with significant decreases in mean neurite length, mean number of neurites and a significant increase in proximal branches compared with wild type. Epilepsy-causing SIK1 sequence variations resulted in abnormalities in the MEF2C-ARC pathway of neuronal development and synapse activity response. This work provides the first insights into the mechanisms of pathogenesis in SIK1 syndrome, and extends the ARX-MEF2C pathway in the pathogenesis of developmental epilepsy.

  16. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    PubMed Central

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  17. Next-generation sequencing: big data meets high performance computing.

    PubMed

    Schmidt, Bertil; Hildebrandt, Andreas

    2017-02-02

    The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and their efficient implementation on modern high performance computing systems is required.

  18. De novo sequencing of highly modified therapeutic oligonucleotides by hydrophobic tag sequencing coupled with LC-MS.

    PubMed

    Goto, R; Miyakawa, S; Inomata, E; Takami, T; Yamaura, J; Nakamura, Y

    2017-02-01

    Correct sequences are prerequisite for quality control of therapeutic oligonucleotides. However, there is no definitive method available for determining sequences of highly modified therapeutic RNAs, and thereby, most of the oligonucleotides have been used clinically without direct sequence determination. In this study, we developed a novel sequencing method called 'hydrophobic tag sequencing'. Highly modified oligonucleotides are sequenced by partially digesting oligonucleotides conjugated with a 5'-hydrophobic tag, followed by liquid chromatography-mass spectrometry analysis. 5'-Hydrophobic tag-printed fragments (5'-tag degradates) can be separated in order of their molecular masses from tag-free oligonucleotides by reversed-phase liquid chromatography. As models for the sequencing, the anti-VEGF aptamer (Macugen) and the highly modified 38-mer RNA sequences were analyzed under blind conditions. Most nucleotides were identified from the molecular weight of hydrophobic 5'-tag degradates calculated from monoisotopic mass in simple full mass data. When monoisotopic mass could not be assigned, the nucleotide was estimated using the molecular weight of the most abundant mass. The sequences of Macugen and 38-mer RNA perfectly matched the theoretical sequences. The hydrophobic tag sequencing worked well to obtain simple full mass data, resulting in accurate and clear sequencing. The present study provides for the first time a de novo sequencing technology for highly modified RNAs and contributes to quality control of therapeutic oligonucleotides. Copyright © 2016 John Wiley & Sons, Ltd.

  19. Cross-amplification and sequence variation of microsatellite loci in Eurasian hard pines.

    PubMed

    González-Martínez, S C; Robledo-Arnuncio, J J; Collada, C; Díaz, A; Williams, C G; Alía, R; Cervera, M T

    2004-06-01

    Microsatellite transfer across coniferous species is a valued methodology because de novo development for each species is costly and there are many species with only a limited commodity value. Cross-species amplification of orthologous microsatellite regions provides valuable information on mutational and evolutionary processes affecting these loci. We tested 19 nuclear microsatellite markers from Pinus taeda L. (subsection Australes) and three from P. sylvestris L. (subsection Pinus) on seven Eurasian hard pine species ( P. uncinata Ram., P. sylvestris L., P. nigra Arn., P. pinaster Ait., P. halepensis Mill., P. pinea L. and P. canariensis Sm.). Transfer rates to species in subsection Pinus (36-59%) were slightly higher than those to subsections Pineae and Pinaster (32-45%). Half of the trans-specific microsatellites were found to be polymorphic over evolutionary times of approximately 100 million years (ten million generations). Sequencing of three trans-specific microsatellites showed conserved repeat and flanking regions. Both a decrease in the number of perfect repeats in the non-focal species and a polarity for mutation, the latter defined as a higher substitution rate in the flanking sequence regions close to the repeat motifs, were observed in the trans-specific microsatellites. The transfer of microsatellites among hard pine species proved to be useful for obtaining highly polymorphic markers in a wide range of species, thereby providing new tools for population and quantitative genetic studies.

  20. Transcriptome-wide comparison of sequence variation in divergent ecotypes of kokanee salmon

    PubMed Central

    2013-01-01

    Background High throughput next-generation sequencing technology has enabled the collection of genome-wide sequence data and revolutionized single nucleotide polymorphism (SNP) discovery in a broad range of species. When analyzed within a population genomics framework, SNP-based genotypic data may be used to investigate questions of evolutionary, ecological, and conservation significance in natural populations of non-model organisms. Kokanee salmon are recently diverged freshwater populations of sockeye salmon (Oncorhynchus nerka) that exhibit reproductive ecotypes (stream-spawning and shore-spawning) in lakes throughout western North America and northeast Asia. Current conservation and management strategies may treat these ecotypes as discrete stocks, however their recent divergence and low levels of gene flow make in-season genetic stock identification a challenge. The development of genome-wide SNP markers is an essential step towards fine-scale stock identification, and may enable a direct investigation of the genetic basis of ecotype divergence. Results We used pooled cDNA samples from both ecotypes of kokanee to generate 750 million base pairs of transcriptome sequence data. These raw data were assembled into 11,074 high coverage contigs from which we identified 32,699 novel single nucleotide polymorphisms. A subset of these putative SNPs was validated using high-resolution melt analysis and Sanger resequencing to genotype independent samples of kokanee and anadromous sockeye salmon. We also identified a number of contigs that were composed entirely of reads from a single ecotype, which may indicate regions of differential gene expression between the two reproductive ecotypes. In addition, we found some evidence for greater pathogen load among the kokanee sampled in stream-spawning habitats, suggesting a possible evolutionary advantage to shore-spawning that warrants further study. Conclusions This study provides novel genomic resources to support population

  1. Sequence variation of Bemisia tabaci Chemosensory Protein 2 in cryptic species B and Q: New DNA markers for whitefly recognition.

    PubMed

    Liu, Guo-Xia; Ma, Hong-Mei; Xie, Hong-Yan; Xuan, Ning; Picimbon, Jean-François

    2016-01-15

    Bemisia tabaci Gennadius biotypes B and Q are two of the most important worldwide agricultural insect pests. Genomic sequences of Type-2 B. tabaci chemosensory protein (BtabCSP2) were cloned and sequenced in B and Q biotypes, revealing key biotype-specific variations in the intron sequence. A Q260 sequence was found specifically in Q-BtabCSP2 and Cucumis melo LN692399, suggesting ancestral horizontal transfer of gene between the insect and the plant through bacteria. A cleaved amplified polymorphic sequences (CAPS) method was then developed to differentiate B and Q based on the sequence variation in exon of BtabCSP2 gene. The performances of CSP2-based CAPS for whitefly recognition were assessed using B. tabaci field collections from Shandong Province (P.R. China). Our SacII based CAPS method led to the same result compared to mitochondrial cytochrome oxidase-based CAPS method in the field collections. We therefore propose an explanation for CSP origin and a new rapid simple molecular method based on genomic DNA and chemosensory gene to differentiate accurately the B and Q whiteflies of the Bemisia complex around the world.

  2. Identification of conserved genomic regions and variation therein amongst Cetartiodactyla species using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background Next Generation Sequencing has created an opportunity to genetically characterize an individual both inexpensively and comprehensively. In earlier work produced in our collaboration [1], it was demonstrated that, for animals without a reference genome, their Next Generation Sequence data ...

  3. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform.

    PubMed

    Fordyce, Sarah L; Ávila-Arcos, Maria C; Rockenbauer, Eszter; Børsting, Claus; Frank-Hansen, Rune; Petersen, Frederik Torp; Willerslev, Eske; Hansen, Anders J; Morling, Niels; Gilbert, M Thomas P

    2011-08-01

    The analysis and profiling of short tandem repeat (STR) loci is routinely used in forensic genetics. Current methods to investigate STR loci, including PCR-based standard fragment analyses and capillary electrophoresis, only provide amplicon lengths that are used to estimate the number of STR repeat units. These methods do not allow for the full resolution of STR base composition that sequencing approaches could provide. Here we present an STR profiling method based on the use of the Roche Genome Sequencer (GS) FLX to simultaneously sequence multiple core STR loci. Using this method in combination with a bioinformatic tool designed specifically to analyze sequence lengths and frequencies, we found that GS FLX STR sequence data are comparable to conventional capillary electrophoresis-based STR typing. Furthermore, we found DNA base substitutions and repeat sequence variations that would not have been identified using conventional STR typing.

  4. Sequence variations in the collagen IX and XI genes are associated with degenerative lumbar spinal stenosis

    PubMed Central

    Noponen-Hietala, N; Kyllonen, E; Mannikko, M; Ilkko, E; Karppinen, J; Ott, J; Ala-Kokko, L

    2003-01-01

    Background: Degenerative lumbar spinal stenosis (LSS) is usually caused by disc herniation or degeneration. Several genetic factors have been implicated in disc disease. Tryptophan alleles in COL9A2 and COL9A3 have been shown to be associated with lumbar disc disease in the Finnish population, and polymorphisms in the vitamin D receptor gene (VDR) (FokI and TaqI), the matrix metalloproteinase-3 gene (MMP-3) and an aggrecan gene (AGC1) VNTR have been reported to be associated with disc degeneration. In addition, an IVS6-4 a>t polymorphism in COL11A2 has been found in connection with stenosis caused by ossification of the posterior longitudinal ligament in the Japanese population. Objective: To study the role of genetic factors in LSS. Methods: 29 Finnish probands were analysed for mutations in the genes coding for intervertebral disc matrix proteins, COL1A1, COL1A2, COL2A1, COL9A1, COL9A2, COL9A3, COL11A1, COL11A2, and AGC1. VDR and MMP-3 polymorphisms were also analysed. Sequence variations were tested in 56 Finnish controls. Results: Several disease associated alleles were identified. A splice site mutation in COL9A2 leading to a premature translation termination codon and the generation of a truncated protein was identified in one proband, another had the Trp2 allele, and four others the Trp3 allele. The frequency of the COL11A2 IVS6-4 t allele was 93.1% in the probands and 72.3% in controls (p = 0.0016). The differences in genotype frequencies for this site were less significant (p = 0.0043). Conclusions: Genetic factors have an important role in the pathogenesis of LSS. PMID:14644861

  5. Fin whale MDH-1 and MPI allozyme variation is not reflected in the corresponding DNA sequences

    PubMed Central

    Olsen, Morten Tange; Pampoulie, Christophe; Daníelsdóttir, Anna K; Lidh, Emmelie; Bérubé, Martine; Víkingsson, Gísli A; Palsbøll, Per J

    2014-01-01

    The appeal of genetic inference methods to assess population genetic structure and guide management efforts is grounded in the correlation between the genetic similarity and gene flow among populations. Effects of such gene flow are typically genomewide; however, some loci may appear as outliers, displaying above or below average genetic divergence relative to the genomewide level. Above average population, genetic divergence may be due to divergent selection as a result of local adaptation. Consequently, substantial efforts have been directed toward such outlying loci in order to identify traits subject to local adaptation. Here, we report the results of an investigation into the molecular basis of the substantial degree of genetic divergence previously reported at allozyme loci among North Atlantic fin whale (Balaenoptera physalus) populations. We sequenced the exons encoding for the two most divergent allozyme loci (MDH-1 and MPI) and failed to detect any nonsynonymous substitutions. Following extensive error checking and analysis of additional bioinformatic and morphological data, we hypothesize that the observed allozyme polymorphisms may reflect phenotypic plasticity at the cellular level, perhaps as a response to nutritional stress. While such plasticity is intriguing in itself, and of fundamental evolutionary interest, our key finding is that the observed allozyme variation does not appear to be a result of genetic drift, migration, or selection on the MDH-1 and MPI exons themselves, stressing the importance of interpreting allozyme data with caution. As for North Atlantic fin whale population structure, our findings support the low levels of differentiation found in previous analyses of DNA nucleotide loci. PMID:24963377

  6. Compression of Structured High-Throughput Sequencing Data

    PubMed Central

    Campagne, Fabien; Dorff, Kevin C.; Chambwe, Nyasha; Robinson, James T.; Mesirov, Jill P.

    2013-01-01

    Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays. PMID:24260313

  7. Sequence Variation in Superoxide Dismutase Gene of Toxoplasma gondii among Various Isolates from Different Hosts and Geographical Regions.

    PubMed

    Wang, Shuai; Cao, Aiping; Li, Xun; Zhao, Qunli; Liu, Yuan; Cong, Hua; He, Shenyi; Zhou, Huaiyu

    2015-06-01

    Toxoplasma gondii, an obligate intracellular protozoan parasite of the phylum Apicomplexa, can infect all warm-blooded vertebrates, including humans, livestock, and marine mammals. The aim of this study was to investigate whether superoxide dismutase (SOD) of T. gondii can be used as a new marker for genetic study or a potential vaccine candidate. The partial genome region of the SOD gene was amplified and sequenced from 10 different T. gondii isolates from different parts of the world, and all the sequences were examined by PCR-RFLP, sequence analysis, and phylogenetic reconstruction. The results showed that partial SOD gene sequences ranged from 1,702 bp to 1,712 bp and A + T contents varied from 50.1% to 51.1% among all examined isolates. Sequence alignment analysis identified total 43 variable nucleotide positions, and these results showed that 97.5% sequence similarity of SOD gene among all examined isolates. Phylogenetic analysis revealed that these SOD sequences were not an effective molecular marker for differential identification of T. gondii strains. The research demonstrated existence of low sequence variation in the SOD gene among T. gondii strains of different genotypes from different hosts and geographical regions.

  8. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies.

    PubMed

    Giancarlo, Raffaele; Rombo, Simona E; Utro, Filippo

    2014-05-01

    High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.

  9. High mitochondrial sequence diversity in linguistic isolates of the Alps.

    PubMed Central

    Stenico, M.; Nigro, L.; Bertorelle, G.; Calafell, F.; Capitanio, M.; Corrain, C.; Barbujani, G.

    1996-01-01

    Segment I of the control region of mtDNA (360 bases) was sequenced in seven samples, each of 10 individuals inhabiting villages in the eastern Italian Alps (South Tyrol and Trentino). Three linguistic groups, German, Italian, and Ladin, were represented by two samples each; the seventh sample comes from an isolated group of German origin, the Mocheni, who are linguistically distinct and geographically separated from the bulk of the German speakers. Seventy-four polymorphic sites were identified, defining 63 different haplotypes. Mocheni and Ladin speakers tend to form two clusters in the evolutionary trees inferred from sequences. Analysis of molecular variance shows significant differentiation within samples, among them, and among linguistic groups. Genetic differences between the Ladins and the other groups are not much smaller than between Europeans and some Africans; variation is large within groups, as well, with the exception of only the Mocheni. In the evolutionary trees where the four alpine groups are compared with other European populations, Mocheni and especially Ladins appear as clear outliers. Romansch-speaking Swiss, who are linguistically related to Ladins, are not genetically similar to them, for this segment of DNA. Because the time elapsed since colonization of the Alps (< or = 12,000 years) is short in mutational terms, the only model accounting for the observed relationships between mtDNA variation and linguistic identity seems one in which a population ancestral to Ladin speakers was already differentiated long before the Alps were settled and the current linguistic affiliations were established. For the Mocheni, the results are consistent with a simpler episode of allele loss, from an original genetic pool common to the ancestors of the current German speakers. PMID:8940282

  10. MEGARes: an antimicrobial resistance database for high throughput sequencing

    PubMed Central

    Lakin, Steven M.; Dean, Chris; Noyes, Noelle R.; Dettenwanger, Adam; Ross, Anne Spencer; Doster, Enrique; Rovira, Pablo; Abdo, Zaid; Jones, Kenneth L.; Ruiz, Jaime; Belk, Keith E.; Morley, Paul S.; Boucher, Christina

    2017-01-01

    Antimicrobial resistance has become an imminent concern for public health. As methods for detection and characterization of antimicrobial resistance move from targeted culture and polymerase chain reaction to high throughput metagenomics, appropriate resources for the analysis of large-scale data are required. Currently, antimicrobial resistance databases are tailored to smaller-scale, functional profiling of genes using highly descriptive annotations. Such characteristics do not facilitate the analysis of large-scale, ecological sequence datasets such as those produced with the use of metagenomics for surveillance. In order to overcome these limitations, we present MEGARes (https://megares.meglab.org), a hand-curated antimicrobial resistance database and annotation structure that provides a foundation for the development of high throughput acyclical classifiers and hierarchical statistical analysis of big data. MEGARes can be browsed as a stand-alone resource through the website or can be easily integrated into sequence analysis pipelines through download. Also via the website, we provide documentation for AmrPlusPlus, a user-friendly Galaxy pipeline for the analysis of high throughput sequencing data that is pre-packaged for use with the MEGARes database. PMID:27899569

  11. MEGARes: an antimicrobial resistance database for high throughput sequencing.

    PubMed

    Lakin, Steven M; Dean, Chris; Noyes, Noelle R; Dettenwanger, Adam; Ross, Anne Spencer; Doster, Enrique; Rovira, Pablo; Abdo, Zaid; Jones, Kenneth L; Ruiz, Jaime; Belk, Keith E; Morley, Paul S; Boucher, Christina

    2017-01-04

    Antimicrobial resistance has become an imminent concern for public health. As methods for detection and characterization of antimicrobial resistance move from targeted culture and polymerase chain reaction to high throughput metagenomics, appropriate resources for the analysis of large-scale data are required. Currently, antimicrobial resistance databases are tailored to smaller-scale, functional profiling of genes using highly descriptive annotations. Such characteristics do not facilitate the analysis of large-scale, ecological sequence datasets such as those produced with the use of metagenomics for surveillance. In order to overcome these limitations, we present MEGARes (https://megares.meglab.org), a hand-curated antimicrobial resistance database and annotation structure that provides a foundation for the development of high throughput acyclical classifiers and hierarchical statistical analysis of big data. MEGARes can be browsed as a stand-alone resource through the website or can be easily integrated into sequence analysis pipelines through download. Also via the website, we provide documentation for AmrPlusPlus, a user-friendly Galaxy pipeline for the analysis of high throughput sequencing data that is pre-packaged for use with the MEGARes database.

  12. Mitochondrial genome sequences of Artemia tibetiana and Artemia urmiana: assessing molecular changes for high plateau adaptation.

    PubMed

    Zhang, Hangxiao; Luo, Qibin; Sun, Jing; Liu, Fei; Wu, Gang; Yu, Jun; Wang, Weiwei

    2013-05-01

    Brine shrimps, Artemia (Crustacea, Anostraca), inhabit hypersaline environments and have a broad geographical distribution from sea level to high plateaus. Artemia therefore possess significant genetic diversity, which gives them their outstanding adaptability. To understand this remarkable plasticity, we sequenced the mitochondrial genomes of two Artemia tibetiana isolates from the Tibetan Plateau in China and one Artemia urmiana isolate from Lake Urmia in Iran and compared them with the genome of a low-altitude Artemia, A. franciscana. We compared the ratio of the rate of nonsynonymous (Ka) and synonymous (Ks) substitutions (Ka/Ks ratio) in the mitochondrial protein-coding gene sequences and found that atp8 had the highest Ka/Ks ratios in comparisons of A. franciscana with either A. tibetiana or A. urmiana and that atp6 had the highest Ka/Ks ratio between A. tibetiana and A. urmiana. Atp6 may have experienced strong selective pressure for high-altitude adaptation because although A. tibetiana and A. urmiana are closely related they live at different altitudes. We identified two extended termination-associated sequences and three conserved sequence blocks in the D-loop region of the mitochondrial genomes. We propose that sequence variations in the D-loop region and in the subunits of the respiratory chain complexes independently or collectively contribute to the adaptation of Artemia to different altitudes.

  13. High-resolution specificity from DNA sequencing highlights alternative modes of Lac repressor binding.

    PubMed

    Zuo, Zheng; Stormo, Gary D

    2014-11-01

    Knowing the specificity of transcription factors is critical to understanding regulatory networks in cells. The lac repressor-operator system has been studied for many years, but not with high-throughput methods capable of determining specificity comprehensively. Details of its binding interaction and its selection of an asymmetric binding site have been controversial. We employed a new method to accurately determine relative binding affinities to thousands of sequences simultaneously, requiring only sequencing of bound and unbound fractions. An analysis of 2560 different DNA sequence variants, including both base changes and variations in operator length, provides a detailed view of lac repressor sequence specificity. We find that the protein can bind with nearly equal affinities to operators of three different lengths, but the sequence preference changes depending on the length, demonstrating alternative modes of interaction between the protein and DNA. The wild-type operator has an odd length, causing the two monomers to bind in alternative modes, making the asymmetric operator the preferred binding site. We tested two other members of the LacI/GalR protein family and find that neither can bind with high affinity to sites with alternative lengths or shows evidence of alternative binding modes. A further comparison with known and predicted motifs suggests that the lac repressor may be unique in this ability and that this may contribute to its selection.

  14. Tandem repeat sequence variation as causative cis-eQTLs for protein-coding gene expression variation: the case of CSTB.

    PubMed

    Borel, Christelle; Migliavacca, Eugenia; Letourneau, Audrey; Gagnebin, Maryline; Béna, Frédérique; Sailani, M Reza; Dermitzakis, Emmanouil T; Sharp, Andrew J; Antonarakis, Stylianos E

    2012-08-01

    Association studies have revealed expression quantitative trait loci (eQTLs) for a large number of genes. However, the causative variants that regulate gene expression levels are generally unknown. We hypothesized that copy-number variation of sequence repeats contribute to the expression variation of some genes. Our laboratory has previously identified that the rare expansion of a repeat c.-174CGGGGCGGGGCG in the promoter region of the CSTB gene causes a silencing of the gene, resulting in progressive myoclonus epilepsy. Here, we genotyped the repeat length and quantified CSTB expression by quantitative real-time polymerase chain reaction in 173 lymphoblastoid cell lines (LCLs) and fibroblast samples from the GenCord collection. The majority of alleles contain either two or three copies of this repeat. Independent analysis revealed that the c.-174CGGGGCGGGGCG repeat length is strongly associated with CSTB expression (P = 3.14 × 10(-11)) in LCLs only. Examination of both genotyped and imputed single-nucleotide polymorphisms (SNPs) within 2 Mb of CSTB revealed that the dodecamer repeat represents the strongest cis-eQTL for CSTB in LCLs. We conclude that the common two or three copy variation is likely the causative cis-eQTL for CSTB expression variation. More broadly, we propose that polymorphic tandem repeats may represent the causative variation of a fraction of cis-eQTLs in the genome.

  15. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control.

    PubMed

    Bartha, István; Carlson, Jonathan M; Brumme, Chanson J; McLaren, Paul J; Brumme, Zabrina L; John, Mina; Haas, David W; Martinez-Picado, Javier; Dalmau, Judith; López-Galíndez, Cecilio; Casado, Concepción; Rauch, Andri; Günthard, Huldrych F; Bernasconi, Enos; Vernazza, Pietro; Klimkait, Thomas; Yerly, Sabine; O'Brien, Stephen J; Listgarten, Jennifer; Pfeifer, Nico; Lippert, Christoph; Fusi, Nicolo; Kutalik, Zoltán; Allen, Todd M; Müller, Viktor; Harrigan, P Richard; Heckerman, David; Telenti, Amalio; Fellay, Jacques

    2013-10-29

    HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10(-12)). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the 'intermediate phenotype' nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.

  16. Sequence variation in two mitochondrial DNA regions and internal transcribed spacer among isolates of the nematode Oesophagostomum asperum originating from goats in Hunan Province, China.

    PubMed

    Li, F; Hu, T; Duan, N C; Li, W Y; Teng, Q; Li, H; Liu, W; Liu, Y; Cheng, T Y

    2016-01-01

    The present study examined sequence variability in two mitochondrial DNA (mtDNA) regions, namely cytochrome c oxidase subunit 1 (cox1) and NADH dehydrogenase subunit 1 (nad1), and internal transcribed spacer (ITS) of nuclear ribosomal DNA (rDNA) among Oesophagostomum asperum isolates from goats in Hunan Province, China. A portion of the cox1 (pcox1), nad1 (pnad1) genes and the ITS (ITS1+5.8S rDNA+ITS2) rDNA were amplified by polymerase chain reaction (PCR) separately from adult O. asperum individuals and the representative amplicons were subjected to sequencing from both directions. The lengths of pcox1, pnad1 and ITS rDNA were 366 bp, 681 bp and 785 bp, respectively. The A+T contents of gene sequences were 71.5-72% for pcox1, 73.7-74.2% for pnad1 and 58-58.8% for ITS rDNA. Intra-specific sequence variations within O. asperum were 0-1.6% for pcox1, 0-1.9% for pnad1 and 0-1.7% for ITS rDNA, while inter-specific sequence differences among members of the genus Oesophagostomum were significantly higher, being 11.1-12.5%, 13.3-17.7% and 8.5-18.6% for pcox1, pnad1 and ITS rDNA, respectively. Phylogenetic analyses using combined sequences of pcox1 and pnad1, with three different computational algorithms (Bayesian inference, maximum likelihood and maximum parsimony), revealed distinct groups with high statistical support. These findings demonstrated the existence of intra-specific variation in mtDNA and rDNA sequences among O. asperum isolates from goats in Hunan Province, China, and have implications for studying molecular epidemiology and population genetics of O. asperum.

  17. Combined examination of sequence and copy number variations in human deafness genes improves diagnosis for cases of genetic deafness

    PubMed Central

    2014-01-01

    Background Copy number variations (CNVs) are the major type of structural variation in the human genome, and are more common than DNA sequence variations in populations. CNVs are important factors for human genetic and phenotypic diversity. Many CNVs have been associated with either resistance to diseases or identified as the cause of diseases. Currently little is known about the role of CNVs in causing deafness. CNVs are currently not analyzed by conventional genetic analysis methods to study deafness. Here we detected both DNA sequence variations and CNVs affecting 80 genes known to be required for normal hearing. Methods Coding regions of the deafness genes were captured by a hybridization-based method and processed through the standard next-generation sequencing (NGS) protocol using the Illumina platform. Samples hybridized together in the same reaction were analyzed to obtain CNVs. A read depth based method was used to measure CNVs at the resolution of a single exon. Results were validated by the quantitative PCR (qPCR) based method. Results Among 79 sporadic cases clinically diagnosed with sensorineural hearing loss, we identified previously-reported disease-causing sequence mutations in 16 cases. In addition, we identified a total of 97 CNVs (72 CNV gains and 25 CNV losses) in 27 deafness genes. The CNVs included homozygous deletions which may directly give rise to deleterious effects on protein functions known to be essential for hearing, as well as heterozygous deletions and CNV gains compounded with sequence mutations in deafness genes that could potentially harm gene functions. Conclusions We studied how CNVs in known deafness genes may result in deafness. Data provided here served as a basis to explain how CNVs disrupt normal functions of deafness genes. These results may significantly expand our understanding about how various types of genetic mutations cause deafness in humans. PMID:25342930

  18. Genotype-Frequency Estimation from High-Throughput Sequencing Data.

    PubMed

    Maruki, Takahiro; Lynch, Michael

    2015-10-01

    Rapidly improving high-throughput sequencing technologies provide unprecedented opportunities for carrying out population-genomic studies with various organisms. To take full advantage of these methods, it is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks. The proposed method fully accounts for uncertainties resulting from sequencing errors and biparental chromosome sampling and yields essentially unbiased estimates with minimal sampling variances with moderately high depths of coverage regardless of a mating system and structure of the population. Moreover, we have developed statistical tests for examining the significance of polymorphisms and their genotypic deviations from Hardy-Weinberg equilibrium. We examine the performance of the proposed method by computer simulations and apply it to low-coverage human data generated by high-throughput sequencing. The results show that the proposed method improves our ability to carry out population-genomic analyses in important ways. The software package of the proposed method is freely available from https://github.com/Takahiro-Maruki/Package-GFE.

  19. Fulcrum: condensing redundant reads from high-throughput sequencing studies

    PubMed Central

    Burriesci, Matthew S.; Lehnert, Erik M.; Pringle, John R.

    2012-01-01

    Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. Results: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory. Availability and implementation: Source code and a tutorial are available at http://pringlelab.stanford.edu/protocols.html under a BSD-like license. Fulcrum was written and tested in Python 2.6, and the single-machine and local-network modes depend on a modified version of the Parallel Python library (provided). Contact: erik.m.lehnert@gmail.com Supplementary information: Supplementary information is available at Bioinformatics online. PMID:22419786

  20. Phylogenetic Relationships and Genetic Variation in Longidorus and Xiphinema Species (Nematoda: Longidoridae) Using ITS1 Sequences of Nuclear Ribosomal DNA

    PubMed Central

    Ye, Weimin; Szalanski, Allen L.; Robbins, R. T.

    2004-01-01

    Genetic analyses using DNA sequences of nuclear ribosomal DNA ITS1 were conducted to determine the extent of genetic variation within and among Longidorus and Xiphinema species. DNA sequences were obtained from samples collected from Arkansas, California and Australia as well as 4 Xiphinema DNA sequences from GenBank. The sequences of the ITS1 region including the 3' end of the 18S rDNA gene and the 5' end of the 5.8S rDNA gene ranged from 1020 bp to 1244 bp for the 9 Longidorus species, and from 870 bp to 1354 bp for the 7 Xiphinema species. Nucleotide frequencies were: A = 25.5%, C = 21.0%, G = 26.4%, and T = 27.1%. Genetic variation between the two genera had a maximum divergence of 38.6% between X. chambersi and L. crassus. Genetic variation among Xiphinema species ranged from 3.8% between X. diversicaudatum and X. bakeri to 29.9% between X. chambersi and X. italiae. Within Longidorus, genetic variation ranged from 8.9% between L. crassus and L. grandis to 32.4% between L. fragilis and L. diadecturus. Intraspecific genetic variation in X. americanum sensu lato ranged from 0.3% to 1.9%, while genetic variation in L. diadecturus had 0.8% and L. biformis ranged from 0.6% to 10.9%. Identical sequences were obtained between the two populations of L. grandis, and between the two populations of X. bakeri. Phylogenetic analyses based on the ITS1 DNA sequence data were conducted on each genus separately using both maximum parsimony and maximum likelihood analysis. Among the Longidorus taxa, 4 subgroups are supported: L. grandis, L. crassus, and L. elongatus are in one cluster; L. biformis and L. paralongicaudatus are in a second cluster; L. fragilis and L. breviannulatus are in a third cluster; and L. diadecturus is in a fourth cluster. Among the Xiphinema taxa, 3 subgroups are supported: X. americanum with X. chambersi, X. bakeri with X. diversicaudatum, and X. italiae and X. vuittenezi forming a sister group with X. index. The relationships observed in this study

  1. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics.

  2. Characterization and Sequence Variation in the rDNA Region of Six Nematode Species of the Genus Longidorus (Nematoda)

    PubMed Central

    De Luca, F.; Reyes, A.; Grunder, J.; Kunz, P.; Agostinelli, A.; De Giorgi, C.; Lamberti, F.

    2004-01-01

    Total DNA was isolated from individual nematodes of the species Longidorus helveticus, L. macrosoma, L. arthensis, L. profundorum, L. elongatus, and L. raskii collected in Switzerland. The ITS region and D1-D2 expansion segments of the 26S rDNA were amplified and cloned. The sequences obtained were aligned in order to investigate sequence diversity and to infer the phylogenetic relationships among the six Longidorus species. D1-D2 sequences were more conserved than the ITS sequences that varied widely in primary structure and length, and no consensus was observed. Phylogenetic analyses using the neighbor-joining, maximum parsimony and maximum likelihood methods were performed with three different sequence data sets: ITS1-ITS2, 5.8S-D1-D2, and combining ITS1-ITS2+5.8S-D1-D2 sequences. All multiple alignments yielded similar basic trees supporting the existence of the six species established using morphological characters. These sequence data also provided evidence that the different regions of the rDNA are characterized by different evolution rates and by different factors associated with the generation of extreme size variation. PMID:19262800

  3. Variation.

    ERIC Educational Resources Information Center

    Hamilton City Board of Education (Ontario).

    Suggestions for studying the topic of variation of individuals and objects (balls) to help develop elementary school students' measurement, comparison, classification, evaluation, and data collection and recording skills are made. General suggestions of variables that can be investigated are made for the study of human variation. Twelve specific…

  4. A HIGH COVERAGE GENOME SEQUENCE FROM AN ARCHAIC DENISOVAN INDIVIDUAL

    PubMed Central

    Meyer, Matthias; Kircher, Martin; Gansauge, Marie-Theres; Li, Heng; Racimo, Fernando; Mallick, Swapan; Schraiber, Joshua G.; Jay, Flora; Prüfer, Kay; de Filippo, Cesare; Sudmant, Peter H.; Alkan, Can; Fu, Qiaomei; Do, Ron; Rohland, Nadin; Tandon, Arti; Siebauer, Michael; Green, Richard E.; Bryc, Katarzyna; Briggs, Adrian W.; Stenzel, Udo; Dabney, Jesse; Shendure, Jay; Kitzman, Jacob; Hammer, Michael F.; Shunkov, Michael V.; Derevianko, Anatoli P.; Patterson, Nick; Andrés, Aida M.; Eichler, Evan E.; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante

    2013-01-01

    We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans. PMID:22936568

  5. High-Throughput Sequencing of a South American Amerindian

    PubMed Central

    Almeida, Renan; Alencar, Dayse O.; Barbosa, Maria Silvanira; Gusmão, Leonor; Silva, Wilson A.; de Souza, Sandro J.; Silva, Artur; Ribeiro-dos-Santos, Ândrea; Darnet, Sylvain; Santos, Sidney

    2013-01-01

    The emergence of next-generation sequencing technologies allowed access to the vast amounts of information that are contained in the human genome. This information has contributed to the understanding of individual and population-based variability and improved the understanding of the evolutionary history of different human groups. However, the genome of a representative of the Amerindian populations had not been previously sequenced. Thus, the genome of an individual from a South American tribe was completely sequenced to further the understanding of the genetic variability of Amerindians. A total of 36.8 giga base pairs (Gbp) were sequenced and aligned with the human genome. These Gbp corresponded to 95.92% of the human genome with an estimated miscall rate of 0.0035 per sequenced bp. The data obtained from the alignment were used for SNP (single-nucleotide) and INDEL (insertion-deletion) calling, which resulted in the identification of 502,017 polymorphisms, of which 32,275 were potentially new high-confidence SNPs and 33,795 new INDELs, specific of South Native American populations. The authenticity of the sample as a member of the South Native American populations was confirmed through the analysis of the uniparental (maternal and paternal) lineages. The autosomal comparison distinguished the investigated sample from others continental populations and revealed a close relation to the Eastern Asian populations and Aboriginal Australian. Although, the findings did not discard the classical model of America settlement; it brought new insides to the understanding of the human population history. The present study indicates a remarkable genetic variability in human populations that must still be identified and contributes to the understanding of the genetic variability of South Native American populations and of the human populations history. PMID:24386182

  6. Validation of high throughput sequencing and microbial forensics applications

    PubMed Central

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166

  7. Validation of high throughput sequencing and microbial forensics applications.

    PubMed

    Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.

  8. Illumina sequencing of 16S rRNA tag revealed spatial variations of bacterial communities in a mangrove wetland.

    PubMed

    Jiang, Xiao-Tao; Peng, Xin; Deng, Guan-Hua; Sheng, Hua-Fang; Wang, Yu; Zhou, Hong-Wei; Tam, Nora Fung-Yee

    2013-07-01

    The microbial community plays an essential role in the high productivity in mangrove wetlands. A proper understanding of the spatial variations of microbial communities will provide clues about the underline mechanisms that structure microbial groups and the isolation of bacterial strains of interest. In the present study, the diversity and composition of the bacterial community in sediments collected from four locations, namely mudflat, edge, bulk, and rhizosphere, within the Mai Po Ramsar Wetland in Hong Kong, SAR, China were compared using the barcoded Illumina paired-end sequencing technique. Rarefaction results showed that the bulk sediment inside the mature mangrove forest had the highest bacterial α-diversity, while the mudflat sediment without vegetation had the lowest. The comparison of β-diversity using principal component analysis and principal coordinate analysis with UniFrac metrics both showed that the spatial effects on bacterial communities were significant. All sediment samples could be clustered into two major groups, inner (bulk and rhizosphere sediments collected inside the mangrove forest) and outer mangrove sediments (the sediments collected at the mudflat and the edge of the mangrove forest). With the linear discriminate analysis scores larger than 3, four phyla, namely Actinobacteria, Acidobacteria, Nitrospirae, and Verrucomicrobia, were enriched in the nutrient-rich inner mangrove sediments, while abundances of Proteobacteria and Deferribacterias were higher in outer mangrove sediments. The rhizosphere effect of mangrove plants was also significant, which had a lower α-diversity, a higher amount of Nitrospirae, and a lower abundance of Proteobacteria than the bulk sediment nearby.

  9. Allelic sequence variation of the HLA-DQ loci: relationship to serology and to insulin-dependent diabetes susceptibility.

    PubMed Central

    Horn, G T; Bugawan, T L; Long, C M; Erlich, H A

    1988-01-01

    Analysis of sequence variation in the polymorphic second exon of the major histocompatibility complex genes HLA-DQ alpha and -DQ beta has revealed 8 allelic variants at the alpha locus and 13 variants at the beta locus. Correlation of sequence variation with serologic typing suggests that the DQw2, DQw3, and DQ(blank) types are determined by the DQ beta subunit, while the DQw1 specificity is determined by DQ alpha. The nature of the amino acid at position 57 in the DQ beta subunit is correlated with susceptibility to insulin-dependent diabetes mellitus. This region of the DQ beta chain contains shared peptides with Epstein-Barr virus and rubella virus. PMID:2842756

  10. ITS2-rDNA Sequence Variation of Phlebotomus sergenti s.l. (Dip: Psychodidae) Populations in Iran

    PubMed Central

    Moin-Vaziri, Vahideh; Oshaghi, Mohammad Ali; Yaghoobi-Ershadi, Mohammad Reza; Derakhshandeh-Peykar, Pupak; Abaei, Mohammad Reza; Mohtarami, Fatemeh; Zahraei-Ramezani, Ali Reza; Nadim, Aboulhassan

    2016-01-01

    Background: Phlebotomus sergenti s.l. is considered the most likely vector of Leishmania tropica in Iran. Although two morphotypes- P. sergenti sergenti (A) and P. sergenti similis (B)-have been formally described, further morphological and a molecular analysis of mitochondrial cytochrome oxidase I (mtDNA-COI) gene revealed inconsistencies and suggests that the variation between the morphotypes is intraspecific and the morphotypes might be identical species. Methods: We examined the sequence of the ITS2-rDNA of Iranian specimens of P. sergenti s.l., comprising P. cf sergenti, P. cf similis, and intermediate morphotypes, together with available data in Genbank. Results: Sequence analysis showed 5.2% variation among P. sergenti s.l. morphotypes. Almost half of the variation was due to the number of an AT microsatellite repeats in the center of the spacer. Nine haplotypes were found in the species constructing three main lineages corresponding to the origin of the colonies located in southwest (SW), northeast (NE), and northwest-center-southeast (NCS). Lineages NCS and NE included both typical P. cf sergenti and P. cf similis and intermediate morphotypes. Conclusion: Phylogenetic sequence analysis revealed that, except for one Iranian sample, which was close to the European samples, other Iranian haplotypes were associated with the northeastern Mediterranean populations including Turkey, Cyprus, Syria, and Pakistan. Similar to the sequences of mtDNA COI gene, ITS2 sequences could not resolve P. sergenti from P. similis and did not support the possible existence of sibling species or subspecies within P. sergenti s.l.. PMID:28032098

  11. High-resolution heteronuclear multi-dimensional NMR spectroscopy in magnetic fields with unknown spatial variations.

    PubMed

    Zhang, Zhiyong; Huang, Yuqing; Smith, Pieter E S; Wang, Kaiyu; Cai, Shuhui; Chen, Zhong

    2014-05-01

    Heteronuclear NMR spectroscopy is an extremely powerful tool for determining the structures of organic molecules and is of particular significance in the structural analysis of proteins. In order to leverage the method's potential for structural investigations, obtaining high-resolution NMR spectra is essential and this is generally accomplished by using very homogeneous magnetic fields. However, there are several situations where magnetic field distortions and thus line broadening is unavoidable, for example, the samples under investigation may be inherently heterogeneous, and the magnet's homogeneity may be poor. This line broadening can hinder resonance assignment or even render it impossible. We put forth a new class of pulse sequences for obtaining high-resolution heteronuclear spectra in magnetic fields with unknown spatial variations based on distant dipolar field modulations. This strategy's capabilities are demonstrated with the acquisition of high-resolution 2D gHSQC and gHMBC spectra. These sequences' performances are evaluated on the basis of their sensitivities and acquisition efficiencies. Moreover, we show that by encoding and decoding NMR observables spatially, as is done in ultrafast NMR, an extra dimension containing J-coupling information can be obtained without increasing the time necessary to acquire a heteronuclear correlation spectrum. Since the new sequences relax magnetic field homogeneity constraints imposed upon high-resolution NMR, they may be applied in portable NMR sensors and studies of heterogeneous chemical and biological materials.

  12. Characterization of mitochondrial control region in Merlucciidae: sequence variation and molecular phylogeny.

    PubMed

    Crous, Marta; Roldán, María I

    2015-06-12

    In order to describe the structure and evolution of Merlucciidae and related Gadiformes mitochondrial control region we analysed 470 bp of 31 taxa belonging to 28 different species. The general structure and conserved sequence blocks observed in Gadiformes mitochondrial control region are similar to those present in other teleost fishes. The length of this segment is variable among related species due to the presence of numerous indels at domain I. Domain II is the most conserved region with a high G content. The GTGGG-box is absent in all Merluccius and seven other Gadidae species. Several methods of phylogenetic analyses has revealed the monophyly of Gadiformes, Gadinae and Merlucciidae. Merlucciidae is most closely related to Gadidae. Within Merlucciidae, American and Euroafrican clades show similar levels of differentiation to those within Gadinae where Trisopterus and Micromesistius are sister taxa. Genetic distance values for Merluccius subspecies pairs are less than half of those between species, comparable to intra specific differentiation levels in marine fish species.

  13. Variation of the internal transcribed spacer 1 sequence within individual strains and among different strains of Neospora caninum.

    PubMed

    Gondim, Luis F P; Laski, Paul; Gao, Liying; McAllister, Milton M

    2004-02-01

    Small differences have been reported in the internal transcribed spacer 1 (ITS1) region among strains of Neospora caninum. We compared ITS1 sequences among 6 N. caninum strains analyzed in our laboratory, including 2 strains that have not been examined previously (NC-Illinois and NC-Bahia). Five sequences showed 100% similarity and also were identical to 7 of 11 sequences that were previously reported by others. In contrast, initial attempts to sequence the ITS1 of NC-Bahia generated 12 nucleotide differences compared with the other 5 strains, and several ambiguous bases. However, the single band containing the ITS1 region, as observed after electrophoresis on a 2% agarose gel, became divided into 2 distinct bands when reanalyzed using 5 or 10% polyacrylamide gel electrophoresis (PAGE), and the ITS1 within these separate bands were sequenced without ambiguity. The other 5 N. caninum strains were also reexamined using PAGE, and in each strain 2 distinct bands were discovered. In comparison, 2 strains of Toxoplasma gondii continued to show only 1 band when examined using PAGE. The ITS1 sequence of NC-Bahia, from Brazil, differs in several base pairs from those of North American and European strains of N. caninum. Intrastrain variation of the ITS1 region appears to be common in N. caninum, in contrast to T. gondii.

  14. DNA sequence variation of wild barley Hordeum spontaneum (L.) across environmental gradients in Israel.

    PubMed

    Bedada, G; Westerbergh, A; Nevo, E; Korol, A; Schmid, K J

    2014-06-01

    Wild barley Hordeum spontaneum (L.) shows a wide geographic distribution and ecological diversity. A key question concerns the spatial scale at which genetic differentiation occurs and to what extent it is driven by natural selection. The Levant region exhibits a strong ecological gradient along the North-South axis, with numerous small canyons in an East-West direction and with small-scale environmental gradients on the opposing North- and South-facing slopes. We sequenced 34 short genomic regions in 54 accessions of wild barley collected throughout Israel and from the opposing slopes of two canyons. The nucleotide diversity of the total sample is 0.0042, which is about two-thirds of a sample from the whole species range (0.0060). Thirty accessions collected at 'Evolution Canyon' (EC) at Nahal Oren, close to Haifa, have a nucleotide diversity of 0.0036, and therefore harbor a large proportion of the genetic diversity. There is a high level of genetic clustering throughout Israel and within EC, which roughly differentiates the slopes. Accessions from the hot and dry South-facing slope have significantly reduced genetic diversity and are genetically more distinct from accessions from the North-facing slope, which are more similar to accessions from other regions in Northern Israel. Statistical population models indicate that wild barley within the EC consist of three separate genetic clusters with substantial gene flow. The data indicate a high level of population structure at large and small geographic scales that shows isolation-by-distance, and is also consistent with ongoing natural selection contributing to genetic differentiation at a small geographic scale.

  15. Phylogenetic lineage of Tobacco leaf curl virus in Korea and estimation of recombination events implicated in their sequence variation.

    PubMed

    Park, Jungan; Lee, Hyejung; Kim, Mi-Kyung; Kwak, Hae-Ryun; Auh, Chung-Kyoon; Lee, Kyeong-Yeoll; Kim, Sunghan; Choi, Hong-Soo; Lee, Sukchan

    2011-08-01

    New strains of Tobacco leaf curl virus (TbLCV) were isolated from tomato plants in four different local communities of Korea, and hence were designated TbLCV-Kr. Phylogenetic analysis of the sequences of the whole genome and of individual ORFs of these viruses indicated that they are closely related to the Tobacco leaf curl Japan virus (TbLCJV) cluster, which includes Honeysuckle yellow vein virus (HYVV), Honeysuckle yellow vein mosaic virus (HYVMV), and TbLCJV isolates. Four putative recombination events were recognized within these virus sequences, suggesting that the sequence variations observed in these viruses may be attributable to intraspecific and interspecific recombination events involving some TbLCV-Kr isolates, Papaya leaf curl virus (PaLCV), and a local isolate of Tomato yellow leaf curl virus (TYLCV).

  16. The Effects of Sequence Variation on Genome-wide NRF2 Binding—New Target Genes and Regulatory SNPs

    PubMed Central

    Kuosmanen, Suvi M.; Viitala, Sari; Laitinen, Tuomo; Peräkylä, Mikael; Pölönen, Petri; Kansanen, Emilia; Leinonen, Hanna; Raju, Suresh; Wienecke-Baldacchino, Anke; Närvänen, Ale; Poso, Antti; Heinäniemi, Merja; Heikkinen, Sami; Levonen, Anna-Liisa

    2016-01-01

    Transcription factor binding specificity is crucial for proper target gene regulation. Motif discovery algorithms identify the main features of the binding patterns, but the accuracy on the lower affinity sites is often poor. Nuclear factor E2-related factor 2 (NRF2) is a ubiquitous redox-activated transcription factor having a key protective role against endogenous and exogenous oxidant and electrophile stress. Herein, we decipher the effects of sequence variation on the DNA binding sequence of NRF2, in order to identify both genome-wide binding sites for NRF2 and disease-associated regulatory SNPs (rSNPs) with drastic effects on NRF2 binding. Interactions between NRF2 and DNA were studied using molecular modelling, and NRF2 chromatin immunoprecipitation-sequence datasets together with protein binding microarray measurements were utilized to study binding sequence variation in detail. The binding model thus generated was used to identify genome-wide binding sites for NRF2, and genomic binding sites with rSNPs that have strong effects on NRF2 binding and reside on active regulatory elements in human cells. As a proof of concept, miR-126–3p and -5p were identified as NRF2 target microRNAs, and a rSNP (rs113067944) residing on NRF2 target gene (Ferritin, light polypeptide, FTL) promoter was experimentally verified to decrease NRF2 binding and result in decreased transcriptional activity. PMID:26826707

  17. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    SciTech Connect

    Le Coq, Johanne; Ghosh, Partho

    2012-06-19

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.

  18. A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny

    PubMed Central

    Pucker, Boas; Holtgräwe, Daniela; Rosleff Sörensen, Thomas; Stracke, Ralf; Viehöver, Prisca

    2016-01-01

    Arabidopsis thaliana is the most important model organism for fundamental plant biology. The genome diversity of different accessions of this species has been intensively studied, for example in the 1001 genome project which led to the identification of many small nucleotide polymorphisms (SNPs) and small insertions and deletions (InDels). In addition, presence/absence variation (PAV), copy number variation (CNV) and mobile genetic elements contribute to genomic differences between A. thaliana accessions. To address larger genome rearrangements between the A. thaliana reference accession Columbia-0 (Col-0) and another accession of about average distance to Col-0, we created a de novo next generation sequencing (NGS)-based assembly from the accession Niederzenz-1 (Nd-1). The result was evaluated with respect to assembly strategy and synteny to Col-0. We provide a high quality genome sequence of the A. thaliana accession (Nd-1, LXSY01000000). The assembly displays an N50 of 0.590 Mbp and covers 99% of the Col-0 reference sequence. Scaffolds from the de novo assembly were positioned on the basis of sequence similarity to the reference. Errors in this automatic scaffold anchoring were manually corrected based on analyzing reciprocal best BLAST hits (RBHs) of genes. Comparison of the final Nd-1 assembly to the reference revealed duplications and deletions (PAV). We identified 826 insertions and 746 deletions in Nd-1. Randomly selected candidates of PAV were experimentally validated. Our Nd-1 de novo assembly allowed reliable identification of larger genic and intergenic variants, which was difficult or error-prone by short read mapping approaches alone. While overall sequence similarity as well as synteny is very high, we detected short and larger (affecting more than 100 bp) differences between Col-0 and Nd-1 based on bi-directional comparisons. The de novo assembly provided here and additional assemblies that will certainly be published in the future will allow to

  19. High-Throughput Sequencing: A Roadmap Toward Community Ecology

    PubMed Central

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-01-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines. PMID:23610649

  20. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    PubMed

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  1. Reprint of "Identification of staphylococcal species based on variations in protein sequences (mass spectrometry) and DNA sequence (sodA microarray)".

    PubMed

    Kooken, Jennifer; Fox, Karen; Fox, Alvin; Altomare, Diego; Creek, Kim; Wunschel, David; Pajares-Merino, Sara; Martínez-Ballesteros, Ilargi; Garaizar, Javier; Oyarzabal, Omar; Samadpour, Mansour

    2014-01-01

    This report is among the first using sequence variation in newly discovered protein markers for staphylococcal (or indeed any other bacterial) speciation. Variation, at the DNA sequence level, in the sodA gene (commonly used for staphylococcal speciation) provided excellent correlation. Relatedness among strains was also assessed using protein profiling using microcapillary electrophoresis and pulsed field electrophoresis. A total of 64 strains were analyzed including reference strains representing the 11 staphylococcal species most commonly isolated from man (Staphylococcus aureus and 10 coagulase negative species [CoNS]). Matrix assisted time of flight ionization/ionization mass spectrometry (MALDI TOF MS) and liquid chromatography-electrospray ionization tandem mass spectrometry (LC ESI MS/MS) were used for peptide analysis of proteins isolated from gel bands. Comparison of experimental spectra of unknowns versus spectra of peptides derived from reference strains allowed bacterial identification after MALDI TOF MS analysis. After LC-MS/MS analysis of gel bands bacterial speciation was performed by comparing experimental spectra versus virtual spectra using the software X!Tandem. Finally LC-MS/MS was performed on whole proteomes and data analysis also employing X!tandem. Aconitate hydratase and oxoglutarate dehydrogenase served as marker proteins on focused analysis after gel separation. Alternatively on full proteomics analysis elongation factor Tu generally provided the highest confidence in staphylococcal speciation.

  2. Structural mechanisms underlying sequence-dependent variations in GAG affinities of decorin binding protein A, a Borrelia burgdorferi adhesin.

    PubMed

    Morgan, Ashli M; Wang, Xu

    2015-05-01

    Decorin-binding protein A (DBPA) is an important surface adhesin of the bacterium Borrelia burgdorferi, the causative agent of Lyme disease. DBPA facilitates the bacteria's colonization of human tissue by adhering to glycosaminoglycan (GAG), a sulfated polysaccharide. Interestingly, DBPA sequence variation among different strains of Borrelia spirochetes is high, resulting in significant differences in their GAG affinities. However, the structural mechanisms contributing to these differences are unknown. We determined the solution structures of DBPAs from strain N40 of B. burgdorferi and strain PBr of Borrelia garinii, two DBPA variants whose GAG affinities deviate significantly from strain B31, the best characterized version of DBPA. Our structures revealed that significant differences exist between PBr DBPA and B31/N40 DBPAs. In particular, the C-terminus of PBr DBPA, unlike C-termini from B31 and N40 DBPAs, is positioned away from the GAG-binding pocket and the linker between helices one and two of PBr DBPA is highly structured and retracted from the GAG-binding pocket. The repositioning of the C-terminus allowed the formation of an extra GAG-binding epitope in PBr DBPA and the retracted linker gave GAG ligands more access to the GAG-binding epitopes than other DBPAs. Characterization of GAG ligands' interactions with wild-type (WT) PBr and mutants confirmed the importance of the second major GAG-binding epitope and established the fact that the two epitopes are independent of one another and the new epitope is as important to GAG binding as the traditional epitope.

  3. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    PubMed Central

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse; Hansen, Thomas Arn; Kjartansdóttir, Kristín Rós; Guldberg Frøslev, Tobias; Snogdal Boutrup, Torsten; Nielsen, Lars Peter; Willerslev, Eske; Hansen, Anders J.

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses. PMID:26493184

  4. Optimization of shRNA inhibitors by variation of the terminal loop sequence.

    PubMed

    Schopman, Nick C T; Liu, Ying Poi; Konstantinova, Pavlina; ter Brake, Olivier; Berkhout, Ben

    2010-05-01

    Gene silencing by RNA interference (RNAi) can be achieved by intracellular expression of a short hairpin RNA (shRNA) that is processed into the effective small interfering RNA (siRNA) inhibitor by the RNAi machinery. Previous studies indicate that shRNA molecules do not always reflect the activity of corresponding synthetic siRNAs that attack the same target sequence. One obvious difference between these two effector molecules is the hairpin loop of the shRNA. Most studies use the original shRNA design of the pSuper system, but no extensive study regarding optimization of the shRNA loop sequence has been performed. We tested the impact of different hairpin loop sequences, varying in size and structure, on the activity of a set of shRNAs targeting HIV-1. We were able to transform weak inhibitors into intermediate or even strong shRNA inhibitors by replacing the loop sequence. We demonstrate that the efficacy of these optimized shRNA inhibitors is improved significantly in different cell types due to increased siRNA production. These results indicate that the loop sequence is an essential part of the shRNA design. The optimized shRNA loop sequence is generally applicable for RNAi knockdown studies, and will allow us to develop a more potent gene therapy against HIV-1.

  5. Multiplexed Metagenomic Deep Sequencing To Analyze the Composition of High-Priority Pathogen Reagents

    PubMed Central

    Wilson, Michael R.; Stenglein, Mark D.; Olejnik, Judith; Rennick, Linda J.; Nambulli, Sham; Feldmann, Friederike; Duprex, W. Paul

    2016-01-01

    ABSTRACT Laboratories studying high-priority pathogens need comprehensive methods to confirm microbial species and strains while also detecting contamination. Metagenomic deep sequencing (MDS) inventories nucleic acids present in laboratory stocks, providing an unbiased assessment of pathogen identity, the extent of genomic variation, and the presence of contaminants. Double-stranded cDNA MDS libraries were constructed from RNA extracted from in vitro-passaged stocks of six viruses (La Crosse virus, Ebola virus, canine distemper virus, measles virus, human respiratory syncytial virus, and vesicular stomatitis virus). Each library was dual indexed and pooled for sequencing. A custom bioinformatics pipeline determined the organisms present in each sample in a blinded fashion. Single nucleotide variant (SNV) analysis identified viral isolates. We confirmed that (i) each sample contained the expected microbe, (ii) dual indexing of the samples minimized false assignments of individual sequences, (iii) multiple viral and bacterial contaminants were present, and (iv) SNV analysis of the viral genomes allowed precise identification of the viral isolates. MDS can be multiplexed to allow simultaneous and unbiased interrogation of mixed microbial cultures and (i) confirm pathogen identity, (ii) characterize the extent of genomic variation, (iii) confirm the cell line used for virus propagation, and (iv) assess for contaminating microbes. These assessments ensure the true composition of these high-priority reagents and generate a comprehensive database of microbial genomes studied in each facility. MDS can serve as an integral part of a pathogen-tracking program which in turn will enhance sample security and increase experimental rigor and precision. IMPORTANCE Both the integrity and reproducibility of experiments using select agents depend in large part on unbiased validation to ensure the correct identity and purity of the species in question. Metagenomic deep sequencing

  6. [Mitochondrial DNA sequence variation, demographic history, and population structure of Amur sturgeon Acipenser schrenckii Brandt, 1869].

    PubMed

    Shedko, S V; Miroshnichenko, I L; Nemkova, G A; Koshelev, V N; Shedko, M B

    2015-02-01

    The variability of the mtDNA control region (D-loop) was examined in Amur sturgeon endemic to the Amur River. This species is also classified as critically endangered by the IUCN Red List of Threatened species. Sequencing of 796- to 812-bp fragments of the D-loop in 112 sturgeon collected in the Lower Amur revealed 73 different genotypes. The sample was characterized by a high level of haplotypic (0.976) and nucleotide (0.0194) diversity. The identified haplotypes split into two well-defined monophyletic groups, BG (n = 39) and SM (n = 34), differing (HKY distance) on average by 3.41% of nucleotide positions upon an average level of intragroup differences of 0.54 and 1.23%, respectively. Moreover, the haplotypes of the SM groups differed by the presence of a 13-14 bp deletion. Most ofthe samples (66 out of 112) carried BG haplotypes. Overall, the pattern of pairwise nucleotide differences and the results of neutrality tests, as well as the results of tests for compliance with the model of sudden demographic expansion or with the model of exponential growth pointed to a past significant increase in the number of Amur sturgeon, which was most clearly manifested in the analysis of data on the BG haplogroup. The constructed Bayesian skyline plots showed that this growth began about 18 to 16 thousand years ago. At present, the effective size of the strongly reduced (due to overharvesting) population of Amur sturgeon may be equal to or even lower than it was before the beginning of this growth during the Last Glacial Maximum. The presence in the mitochondrial gene pool ofAmur sturgeon of two haplogroups, their unequal evolutionary dynamics, and, judging by scanty data, their unequal representation in the Russian and Chinese parts of the Amur River basin point to the possible existence of at least two distinct populations of Amur sturgeon in the past.

  7. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon

    PubMed Central

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. PMID:27311063

  8. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    PubMed

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  9. High order total variation method for interior tomography

    NASA Astrophysics Data System (ADS)

    Yang, Jiansheng; Yu, Hengyong; Cong, Wenxiang; Jiang, Ming; Wang, Ge

    2012-10-01

    While classic CT theory targets exact reconstruction of a whole cross-section or an entire object, practical applications often focus on a region of interest (ROI). The long-standing interior problem is well known that an internal ROI cannot be exactly reconstruct only from truncated projection data associated with x-rays through the ROI. Although lambda tomography was developed to target gradient-like features of an internal ROI for the interior problem, it has not been well accepted in the biomedical community. On the other hand, approximate local reconstruction methods are subject to biases and artifacts. Recently, the interior problem is re-visited with appropriate prior knowledge, delivering practical results. First, the interior problem can be exactly and stably solved if a sub-region in an ROI is known. Thereafter, the sub-region knowledge can be replaced by certain rather weak constraints. For local reconstruction, a candidate image can be represented as the sum of the truth and an ambiguity component. Very surprisingly, the ROI image is prove to be the unique minimizer of the total variation (TV) or high order total variation (HOT) functional subject to the measurement, if the ROI is piece-wise constant or polynomial. Interior tomography algorithms based on HOT minimization have been developed for x-ray CT, and then extended for interior SPECT and interior differential phasecontrast tomography, respectively. In this paper, we will summarize the main theoretical and algorithmic results.

  10. High-speed lossless compression for angiography image sequences

    NASA Astrophysics Data System (ADS)

    Kennedy, Jonathon M.; Simms, Michael; Kearney, Emma; Dowling, Anita; Fagan, Andrew; O'Hare, Neil J.

    2001-05-01

    High speed processing of large amounts of data is a requirement for many diagnostic quality medical imaging applications. A demanding example is the acquisition, storage and display of image sequences in angiography. The functional performance requirements for handling angiography data were identified. A new lossless image compression algorithm was developed, implemented in C++ for the Intel Pentium/MS-Windows environment and optimized for speed of operation. Speeds of up to 6M pixels per second for compression and 12M pixels per second for decompression were measured. This represents an improvement of up to 400% over the next best high-performance algorithm (LOCO-I) without significant reduction in compression ratio. Performance tests were carried out at St. James's Hospital using actual angiography data. Results were compared with the lossless JPEG standard and other leading methods such as JPEG-LS (LOCO-I) and the lossless wavelet approach proposed for JPEG 2000. Our new algorithm represents a significant improvement in the performance of lossless image compression technology without using specialized hardware. It has been applied successfully to image sequence decompression at video rate for angiography, one of the most challenging application areas in medical imaging.

  11. Comparative analysis of methods used to define eustatic variations in outcrop: Late Cambrian interbasinal sequence development

    SciTech Connect

    Osleger, D. ); Read, J.F. )

    1993-03-01

    Interbasinal correlation of Late Cambrian cyclic carbonates from the Appalachian and Cordilleran passive margins, the Texas craton, and the southern Oklahoma aulacogen defines six major third-order depositional sequences. Graphic correlation of biostratigraphically-constrained strata was used to establish equivalency of stratigraphic sequences between the individual sections. Relatively isochronous biomere boundaries were used as time datums for lithostratigraphic correlation. Although the individual sections are composed of different types of meter-scale cycles and component lithofacies that reflect the various environmental settings of the localities, the overall upward-shallowing character of individual sequences is evident. The sequences are: late Cedaria, mid-Crepicephalus, late Crepicephalus, Aphelaspis to earliest Elvinia, Elvinia to early Saukia, and Saukia to the Cambrian-Ordovician boundary. Interbasinal correlation of stratigraphic sequences permits an evaluation of quantitative techniques for determining accommodation history. Correlation of Fischer plots of cyclic successions from separate basins supports a eustatic control of Late Cambrian sequence development. R2/R3 curves derived from subsidence analysis of the Late Cambrian sections provide good resolution of the second- and third-order scales of accommodation change, and interbasinal correlations of R2/R3 curves also support eustatic control on sequence development. Comparing the accomodation curves and subsidence analysis with paleobathymetric trends of Late Cambrian cyclic strata suggests that the curves may approximate the form of the eustatic sealevel signal. A composite eustatic sealevel curve for Late Cambrian time in North America was created by qualitatively combining the accommodation curves defined by the different techniques for each of the four localities. 129 refs., 16 figs., 3 tabs.

  12. Source quality variations tied to sequence development: Integration of physical and chemical aspects, Lower to Middle Triassic, western Barents Sea

    SciTech Connect

    Bohacs, K.M.; Isaksen, G.H. )

    1991-03-01

    Triassic mudrocks from the Barents Sea area demonstrate to covariance of physical and chemical properties of mudrocks deposited in shelfal environments and the aspect of depositional sequences in distal settings. The tie of physical parameters to chemical character within a detailed sequence-stratigraphic framework enables the construction of depositional-facies models to predict organic-matter content and quality. This allows the explorer to more closely constrain and predict the nature of potential source rocks using seismic and well-log data. Changes in lithology, bedding geometry, sedimentary structures, body and trace-fossil assemblages, and inorganic, bulk-organic, and molecular geochemistry revealed the detailed depositional environments. The depositional environments stack predictably, according to their position in the depositional sequence: from aerobic lower-shoreface--offshore transition environments in lowstand systems tracts to dysaerobic-anaerobic distal open-marine-shelf environment in transgressive and early highstand systems tracts. Quantitative molecular geochemistry also revealed variations within this distal setting and strong covariance with sequence position. Input of organic matter from terrigenous higher plants dominates the lowstands whereas marine-algal organic matter is most prevalent within transgressive and highstand systems tracts. Specifically, the abundance of C{sub 30} steranes, total steranes, and moretane reflected development of the sequences.

  13. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    PubMed

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  14. The use of museum specimens with high-throughput DNA sequencers

    PubMed Central

    Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.

    2015-01-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801

  15. The use of museum specimens with high-throughput DNA sequencers.

    PubMed

    Burrell, Andrew S; Disotell, Todd R; Bergey, Christina M

    2015-02-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists' ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials.

  16. Rapid multiplexed genotyping of simple tandem repeats using capture and high-throughput sequencing

    PubMed Central

    Guilmatre, Audrey; Highnam, Gareth; Borel, Christelle; Mittelman, David; Sharp, Andrew J.

    2013-01-01

    Although simple tandem repeats (STRs) comprise ~2% of the human genome and represent an important source of polymorphism, this class of variation remains understudied. We have developed a cost-effective strategy for performing targeted enrichment of STR regions that utilizes capture probes targeting the flanking sequences of STR loci, enabling specific capture of DNA fragments containing STRs for subsequent high-throughput sequencing. Utilizing a capture design targeting 6,243 STR loci <94bp and multiplexing eight individuals in a single Illumina HiSeq2000 sequencing lane we were able to call genotypes in at least one individual for 67.5% of the targeted STRs. We observed a strong relationship between (G+C) content and genotyping rate. STRs with moderate (G+C) content were recovered with >90% success rate, while only 12% of STRs with ≥80% (G+C) were genotyped in our assay. Analysis of a parent-offspring trio, complete hydatidiform mole samples, repeat analyses of the same individual, and Sanger sequencing-based validation indicated genotyping error rates between 7.6–12.4%. The majority of such errors were a single repeat unit at mono- or dinucleotide repeats. Altogether, our STR capture assay represents a cost-effective method that enables multiplexed genotyping of thousands of STR loci suitable for large scale population studies. PMID:23696428

  17. Associations between sequence variations in the mitochondrial DNA D-loop region and outcome of hepatocellular carcinoma

    PubMed Central

    LI, SHILAI; WAN, PEIQI; PENG, TAO; XIAO, KAIYIN; SU, MING; SHANG, LIMING; XU, BANGHAO; SU, ZHIXIONG; YE, XINPING; PENG, NING; QIN, QUANLIN; LI, LEQUN

    2016-01-01

    The association between mitochondrial DNA (mtDNA) polymorphisms or mutations and the prognoses of cancer have been investigated previously, but the results have been ambiguous. In the present study, the associations between sequence variations in the mtDNA D-loop region and the outcomes of patients with hepatocellular carcinoma (HCC) were analysed. A total of 140 patients with HCC (123 males and 17 females), who were hospitalised to undergo radical resection, were studied. Polymerase chain reaction and direct sequencing were performed to detect the sequence variations in the mtDNA D-loop region. Multivariate and univariate analyses were conducted to determine important factors in the prognosis of HCC. A total of 150 point sequence variations were observed in the 140 cases (13 point mutations, 8 insertions, 20 deletions and 116 polymorphisms). The variation rate was 13.4% (150/1, 122). mtDNA nucleotide 150 (C/T) was an independent factor in the logistic regression for early/late recurrence of HCC. Patients with 150T appeared to have later recurrences. In a Cox proportional hazards regression model, hepatitis B virus DNA, Child-Pugh class, differentiation degree, tumour-node-metastasis (TNM) stage, nucleotide 16263 (T/C) and nucleotide 315 (N/insertion C) were independent factors for tumour-free survival time. Patients with the 16263T allele had a greater tumour-free survival time than patients with the 16263C allele. Similarly, patients with 315 insertion C had a superior tumour-free survival time when compared with patients with 315 N (normal). In the Cox proportional hazards regression model, recurrence type (early/late), Child-Pugh class, TNM stage and adjuvant treatment after tumour recurrence (none or one/more than one treatment) were independent factors for overall survival. None of the mtDNA variations served as independent factors. Patients with late recurrence, Child-Pugh class A, and low TNM stages and/or those who received more than one adjuvant treatment

  18. Estimation of Response Functions Based on Variational Bayes Algorithm in Dynamic Images Sequences

    PubMed Central

    2016-01-01

    We proposed a nonparametric Bayesian model based on variational Bayes algorithm to estimate the response functions in dynamic medical imaging. In dynamic renal scintigraphy, the impulse response or retention functions are rather complicated and finding a suitable parametric form is problematic. In this paper, we estimated the response functions using nonparametric Bayesian priors. These priors were designed to favor desirable properties of the functions, such as sparsity or smoothness. These assumptions were used within hierarchical priors of the variational Bayes algorithm. We performed our algorithm on the real online dataset of dynamic renal scintigraphy. The results demonstrated that this algorithm improved the estimation of response functions with nonparametric priors. PMID:27631007

  19. Analysis of genetic variation within clonal lineages of grape phylloxera (Daktulosphaira vitifoliae Fitch) using AFLP fingerprinting and DNA sequencing.

    PubMed

    Vorwerk, S; Forneck, A

    2007-07-01

    Two AFLP fingerprinting methods were employed to estimate the potential of AFLP fingerprints for the detection of genetic diversity within single founder lineages of grape phylloxera (Daktulosphaira vitifoliae Fitch). Eight clonal lineages, reared under controlled conditions in a greenhouse and reproducing asexually throughout a minimum of 15 generations, were monitored and mutations were scored as polymorphisms between the founder individual and individuals of succeeding generations. Genetic variation was detected within all lineages, from early generations on. Six to 15 polymorphic loci (from a total of 141 loci) were detected within the lineages, making up 4.3% of the total amount of genetic variation. The presence of contaminating extra-genomic sequences (e.g., viral material, bacteria, or ingested chloroplast DNA) was excluded as a source of intraclonal variation. Sequencing of 37 selected polymorphic bands confirmed their origin in mostly noncoding regions of the grape phylloxera genome. AFLP techniques were revealed to be powerful for the identification of reproducible banding patterns within clonal lineages.

  20. Signatures of DNA flexibility, interactions and sequence-related structural variations in classical X-ray diffraction patterns

    PubMed Central

    Kornyshev, A. A.; Lee, D. J.; Wynveen, A.; Leikin, S.

    2011-01-01

    The theory of X-ray diffraction from ideal, rigid helices allowed Watson and Crick to unravel the DNA structure, thereby elucidating functions encoded in it. Yet, as we know now, the DNA double helix is neither ideal nor rigid. Its structure varies with the base pair sequence. Its flexibility leads to thermal fluctuations and allows molecules to adapt their structure to optimize their intermolecular interactions. In addition to the double helix symmetry revealed by Watson and Crick, classical X-ray diffraction patterns of DNA contain information about the flexibility, interactions and sequence-related variations encoded within the helical structure. To extract this information, we have developed a new diffraction theory that accounts for these effects. We show how double helix non-ideality and fluctuations broaden the diffraction peaks. Meridional intensity profiles of the peaks at the first three helical layer lines reveal information about structural adaptation and intermolecular interactions. The meridional width of the fifth layer line peaks is inversely proportional to the helical coherence length that characterizes sequence-related and thermal variations in the double helix structure. Analysis of measured fiber diffraction patterns based on this theory yields important parameters that control DNA structure, packing and function. PMID:21593127

  1. Signatures of DNA flexibility, interactions and sequence-related structural variations in classical X-ray diffraction patterns.

    PubMed

    Kornyshev, A A; Lee, D J; Wynveen, A; Leikin, S

    2011-09-01

    The theory of X-ray diffraction from ideal, rigid helices allowed Watson and Crick to unravel the DNA structure, thereby elucidating functions encoded in it. Yet, as we know now, the DNA double helix is neither ideal nor rigid. Its structure varies with the base pair sequence. Its flexibility leads to thermal fluctuations and allows molecules to adapt their structure to optimize their intermolecular interactions. In addition to the double helix symmetry revealed by Watson and Crick, classical X-ray diffraction patterns of DNA contain information about the flexibility, interactions and sequence-related variations encoded within the helical structure. To extract this information, we have developed a new diffraction theory that accounts for these effects. We show how double helix non-ideality and fluctuations broaden the diffraction peaks. Meridional intensity profiles of the peaks at the first three helical layer lines reveal information about structural adaptation and intermolecular interactions. The meridional width of the fifth layer line peaks is inversely proportional to the helical coherence length that characterizes sequence-related and thermal variations in the double helix structure. Analysis of measured fiber diffraction patterns based on this theory yields important parameters that control DNA structure, packing and function.

  2. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing

    PubMed Central

    Lazarevic, Vladimir; Whiteson, Katrine; Huse, Susan; Hernandez, David; Farinelli, Laurent; Østerås, Magne; Schrenzel, Jacques; François, Patrice

    2013-01-01

    To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454 pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable identification of bacterial sequences available in public databases such as the Human Oral Microbiome Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an ~82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of samples, and to assess that at least 30,000 additional reads would be required to identify only one additional phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements in read length enable the utilization of different platforms for studying communities of complex flora. Access to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost. PMID:19796657

  3. The complete mitochondrial genome of a purebred Tibetan Mastiff (Canis lupus familiaris breed Tibetan Mastiff) from Lijiang, China, and comparison of genome-wide sequence variations.

    PubMed

    Deng, Li Xin; He, Cong

    2016-01-01

    In this study, the complete mitochondrial genome sequence of the Tibetan Mastiff was reported. The total length of the mitogenome is 16,729 bp. It contains the typical structure, including 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region is in line with other canine animals. We further identified genome-wide variations among different canine mitochondrial genomes and indicated that the D-loop region harbors the most sequence variation, which will provide sequence variation information for the protection and utilization of the Tibetan Mastiff germplasm resource.

  4. High Throughput Sequencing of Extracellular RNA from Human Plasma

    PubMed Central

    Danielson, Kirsty M.; Rubio, Renee; Abderazzaq, Fieda; Das, Saumya; Wang, Yaoyu E.

    2017-01-01

    The presence and relative stability of extracellular RNAs (exRNAs) in biofluids has led to an emerging recognition of their promise as ‘liquid biopsies’ for diseases. Most prior studies on discovery of exRNAs as disease-specific biomarkers have focused on microRNAs (miRNAs) using technologies such as qRT-PCR and microarrays. The recent application of next-generation sequencing to discovery of exRNA biomarkers has revealed the presence of potential novel miRNAs as well as other RNA species such as tRNAs, snoRNAs, piRNAs and lncRNAs in biofluids. At the same time, the use of RNA sequencing for biofluids poses unique challenges, including low amounts of input RNAs, the presence of exRNAs in different compartments with varying degrees of vulnerability to isolation techniques, and the high abundance of specific RNA species (thereby limiting the sensitivity of detection of less abundant species). Moreover, discovery in human diseases often relies on archival biospecimens of varying age and limiting amounts of samples. In this study, we have tested RNA isolation methods to optimize profiling exRNAs by RNA sequencing in individuals without any known diseases. Our findings are consistent with other recent studies that detect microRNAs and ribosomal RNAs as the major exRNA species in plasma. Similar to other recent studies, we found that the landscape of biofluid microRNA transcriptome is dominated by several abundant microRNAs that appear to comprise conserved extracellular miRNAs. There is reasonable correlation of sets of conserved miRNAs across biological replicates, and even across other data sets obtained at different investigative sites. Conversely, the detection of less abundant miRNAs is far more dependent on the exact methodology of RNA isolation and profiling. This study highlights the challenges in detecting and quantifying less abundant plasma miRNAs in health and disease using RNA sequencing platforms. PMID:28060806

  5. A pedigree-based study of mitochondrial D-loop DNA sequence variation among Arabian horses.

    PubMed

    Bowling, A T; Del Valle, A; Bowling, M

    2000-02-01

    Through DNA sequence comparisons of a mitochondrial D-loop hypervariable region, we investigated matrilineal diversity for Arabian horses in the United States. Sixty-two horses were tested. From published pedigrees they traced in the maternal line to 34 mares acquired primarily in the mid to late 19th century from nomadic Bedouin tribes. Compared with the reference sequence (GenBank X79547), these samples showed 27 haplotypes with altogether 31 base substitution sites within 397 bp of sequence. Based on examination of pedigrees from a random sampling of 200 horses in current studbooks of the Arabian Horse Registry of America, we estimated that this study defined the expected mtDNA haplotypes for at least 89% of Arabian horses registered in the US. The reliability of the studbook recorded maternal lineages of Arabian pedigrees was demonstrated by haplotype concordance among multiple samplings in 14 lines. Single base differences observed within two maternal lines were interpreted as representing alternative fixations of past heteroplasmy. The study also demonstrated the utility of mtDNA sequence studies to resolve historical maternity questions without access to biological material from the horses whose relationship was in question, provided that representatives of the relevant female lines were available for comparison. The data call into question the traditional assumption that Arabian horses of the same strain necessarily share a common maternal ancestry.

  6. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway.

  7. High throughput sequencing reveals a novel fabavirus infecting sweet cherry.

    PubMed

    Villamor, D E V; Pillai, S S; Eastwell, K C

    2017-03-01

    The genus Fabavirus currently consists of five species represented by viruses that infect a wide range of hosts but none reported from temperate climate fruit trees. A virus with genomic features resembling fabaviruses (tentatively named Prunus virus F, PrVF) was revealed by high throughput sequencing of extracts from a sweet cherry tree (Prunus avium). PrVF was subsequently shown to be graft transmissible and further identified in three other non-symptomatic Prunus spp. from different geographical locations. Two genetic variants of RNA1 and RNA2 coexisted in the same samples. RNA1 consisted of 6,165 and 6,163 nucleotides, and RNA2 consisted of 3,622 and 3,468 nucleotides.

  8. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    PubMed

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  9. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

    PubMed Central

    Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

    2016-01-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant

  10. Sequence variations at the HLA-linked olfactory receptor cluster do not influence female preferences for male odors

    PubMed Central

    Thompson, Emma E; Haller, Gabe; Pinto, Jayant M; Sun, Ying; Zelano, Bethanne; Jacob, Suma; McClintock, Martha K.; Nicolae, Dan L.; Ober, Carole

    2013-01-01

    We previously reported that paternally-inherited human leukocyte antigen (HLA) alleles are a template for women's preference for male odors (P = 0.0007). However, it has been suggested that sequence variation in a nearby olfactory receptor (OR) cluster on chromosome 6p influences smell preference. To determine if the HLA-linked OR genes contribute to previously observed HLA-mediated behaviors, we use the odor preference data from our earlier study in combination with a new resequencing study of four functional HLA-linked OR genes in the same subjects. Our results indicate that OR alleles in the genes surveyed are not in linkage disequilibrium (LD) with HLA variation and do not explain the previous findings of HLA-associated odor preference. PMID:19833159

  11. Identification of eight mutations and three sequence variations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene

    SciTech Connect

    Ghanem, N.; Costes, B.; Girodon, E.; Martin, J.; Fanen, P.; Goossens, M. )

    1994-05-15

    To determine cystic fibrosis (CF) defects in a sample of 224 non-[Delta]F508 CF chromosomes, the authors used denaturing gradient gel multiplex analysis of CF transmembrane conductance regulator gene segments, a strategy based on blind exhaustive analysis rather than a search for known mutations. This process allowed detection of 11 novel variations comprising two nonsense mutations (Q890X and W1204X), a splice defect (405 + 4 A [yields] G), a frameshift (3293delA), four presumed missense mutations (S912L, H949Y, L1065P, Q1071P), and three sequence polymorphisms (R31C or 223 C/T, 3471 T/C, and T1220I or 3791 C/T). The authors describe these variations, together with the associated phenotype when defects on both CF chromosomes were identified. 8 refs., 1 fig., 1 tab.

  12. Genome diversity in Brachypodium distachyon: deep sequencing of highly diverse inbred lines.

    PubMed

    Gordon, Sean P; Priest, Henry; Des Marais, David L; Schackwitz, Wendy; Figueroa, Melania; Martin, Joel; Bragg, Jennifer N; Tyler, Ludmila; Lee, Cheng-Ruei; Bryant, Doug; Wang, Wenqin; Messing, Joachim; Manzaneda, Antonio J; Barry, Kerrie; Garvin, David F; Budak, Hikmet; Tuna, Metin; Mitchell-Olds, Thomas; Pfender, William F; Juenger, Thomas E; Mockler, Todd C; Vogel, John P

    2014-08-01

    Brachypodium distachyon is small annual grass that has been adopted as a model for the grasses. Its small genome, high-quality reference genome, large germplasm collection, and selfing nature make it an excellent subject for studies of natural variation. We sequenced six divergent lines to identify a comprehensive set of polymorphisms and analyze their distribution and concordance with gene expression. Multiple methods and controls were utilized to identify polymorphisms and validate their quality. mRNA-Seq experiments under control and simulated drought-stress conditions, identified 300 genes with a genotype-dependent treatment response. We showed that large-scale sequence variants had extremely high concordance with altered expression of hundreds of genes, including many with genotype-dependent treatment responses. We generated a deep mRNA-Seq dataset for the most divergent line and created a de novo transcriptome assembly. This led to the discovery of >2400 previously unannotated transcripts and hundreds of genes not present in the reference genome. We built a public database for visualization and investigation of sequence variants among these widely used inbred lines.

  13. Evaluation of sequencing approaches for high-throughput ...

    EPA Pesticide Factsheets

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platforms for potential application to high-throughput screening: 1. TempO-Seq utilizing custom designed paired probes per gene; 2. Targeted sequencing (TSQ) utilizing Illumina’s TruSeq RNA Access Library Prep Kit containing tiled exon-specific probe sets; 3. Low coverage whole transcriptome sequencing (LSQ) using Illumina’s TruSeq Stranded mRNA Kit. Each platform was required to cover the ~20,000 genes of the full transcriptome, operate directly with cell lysates, and be automatable with 384-well plates. Technical reproducibility was assessed using MAQC control RNA samples A and B, while functional utility for chemical screening was evaluated using six treatments at a single concentration after 6 hr in MCF7 breast cancer cells: 10 µM chlorpromazine, 10 µM ciclopriox, 10 µM genistein, 100 nM sirolimus, 1 µM tanespimycin, and 1 µM trichostatin A. All RNA samples and chemical treatments were run with 5 technical replicates. The three platforms achieved different read depths, with the TempO-Seq having ~34M mapped reads per sample, while TSQ and LSQ averaged 20M and 11M aligned reads per sample, respectively. Inter-replicate correlation averaged ≥0.95 for raw log2 expression values i

  14. Patterns of structural and sequence variation within isotype lineages of the Neisseria meningitidis transferrin receptor system

    PubMed Central

    Adamiak, Paul; Calmettes, Charles; Moraes, Trevor F; Schryvers, Anthony B

    2015-01-01

    Neisseria meningitidis inhabits the human upper respiratory tract and is an important cause of sepsis and meningitis. A surface receptor comprised of transferrin-binding proteins A and B (TbpA and TbpB), is responsible for acquiring iron from host transferrin. Sequence and immunological diversity divides TbpBs into two distinct lineages; isotype I and isotype II. Two representative isotype I and II strains, B16B6 and M982, differ in their dependence on TbpB for in vitro growth on exogenous transferrin. The crystal structure of TbpB and a structural model for TbpA from the representative isotype I N. meningitidis strain B16B6 were obtained. The structures were integrated with a comprehensive analysis of the sequence diversity of these proteins to probe for potential functional differences. A distinct isotype I TbpA was identified that co-varied with TbpB and lacked sequence in the region for the loop 3 α-helix that is proposed to be involved in iron removal from transferrin. The tightly associated isotype I TbpBs had a distinct anchor peptide region, a distinct, smaller linker region between the lobes and lacked the large loops in the isotype II C-lobe. Sequences of the intact TbpB, the TbpB N-lobe, the TbpB C-lobe, and TbpA were subjected to phylogenetic analyses. The phylogenetic clustering of TbpA and the TbpB C-lobe were similar with two main branches comprising the isotype 1 and isotype 2 TbpBs, possibly suggesting an association between TbpA and the TbpB C-lobe. The intact TbpB and TbpB N-lobe had 4 main branches, one consisting of the isotype 1 TbpBs. One isotype 2 TbpB cluster appeared to consist of isotype 1 N-lobe sequences and isotype 2 C-lobe sequences, indicating the swapping of N-lobes and C-lobes. Our findings should inform future studies on the interaction between TbpB and TbpA and the process of iron acquisition. PMID:25800619

  15. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  16. Correlates of substitution rate variation in mammalian protein-coding sequences

    PubMed Central

    2008-01-01

    Background Rates of molecular evolution in different lineages can vary widely, and some of this variation might be predictable from aspects of species' biology. Investigating such predictable rate variation can help us to understand the causes of molecular evolution, and could also help to improve molecular dating methods. Here we present a comprehensive study of the life history correlates of substitution rate variation across the mammals, comparing results for mitochondrial and nuclear loci, and for synonymous and non-synonymous sites. We use phylogenetic comparative methods, refined to take into account the special nature of substitution rate data. Particular attention is paid to the widespread correlations between the components of mammalian life history, which can complicate the interpretation of results. Results We find that mitochondrial synonymous substitution rates, estimated from the 9 longest mitochondrial genes, show strong negative correlations with body mass and with maximum recorded lifespan. But lifespan is the sole variable to remain after multiple regression and model simplification. Nuclear synonymous substitution rates, estimated from 6 genes, show strong negative correlations with body mass and generation time, and a strong positive correlation with fecundity. In contrast to the mitochondrial results, the same trends are evident in rates of nonsynonymous substitution. Conclusion A substantial proportion of variation in mammalian substitution rates can be explained by aspects of their life history, implying that molecular and life history evolution are closely interlinked in this group. The strength and consistency of the nuclear body mass effect suggests that molecular dating studies may have been systematically misled, but also that methods could be improved by incorporating the finding as a priori information. Mitochondrial synonymous rates also show the body mass effect, but for apparently quite different reasons, and the strength of the

  17. Mitochondrial DNA Sequence Variation in North Atlantic Long-Finned Pilot Whales, Globicephala melas

    DTIC Science & Technology

    1994-06-01

    Strongylocentrotus purpuratus and S . droebachiensis. Evolution 44: 403-415. Rosel, P.E. (1992). Genetic population structure and systematic relationships of...reproduce and distribute copies of this thesis document in whole or in part Signale of Amho,i^ S ^*^ Joint Program in Oceanography, Massachusetts Institute...variation used in the studies described in this chapter include: 1) Genetic distance (d, p, S , or D) is a measure of the number of nucleotide

  18. Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations

    PubMed Central

    Patel, Anand; Schwab, Richard; Liu, Yu-Tsueng; Bafna, Vineet

    2014-01-01

    Deletion of tumor-suppressor genes as well as other genomic rearrangements pervade cancer genomes across numerous types of solid tumor and hematologic malignancies. However, even for a specific rearrangement, the breakpoints may vary between individuals, such as the recurrent CDKN2A deletion. Characterizing the exact breakpoints for structural variants (SVs) is useful for designating patient-specific tumor biomarkers. We propose AmBre (Amplification of Breakpoints), a method to target SV breakpoints occurring in samples composed of heterogeneous tumor and germline DNA. Additionally, AmBre validates SVs called by whole-exome/genome sequencing and hybridization arrays. AmBre involves a PCR-based approach to amplify the DNA segment containing an SV's breakpoint and then confirms breakpoints using sequencing by Pacific Biosciences RS. To amplify breakpoints with PCR, primers tiling specified target regions are carefully selected with a simulated annealing algorithm to minimize off-target amplification and maximize efficiency at capturing all possible breakpoints within the target regions. To confirm correct amplification and obtain breakpoints, PCR amplicons are combined without barcoding and simultaneously long-read sequenced using a single SMRT cell. Our algorithm efficiently separates reads based on breakpoints. Each read group supporting the same breakpoint corresponds with an amplicon and a consensus amplicon sequence is called. AmBre was used to discover CDKN2A deletion breakpoints in cancer cell lines: A549, CEM, Detroit562, MOLT4, MCF7, and T98G. Also, we successfully assayed RUNX1–RUNX1T1 reciprocal translocations by finding both breakpoints in the Kasumi-1 cell line. AmBre successfully targets SVs where DNA harboring the breakpoints are present in 1:1000 mixtures. PMID:24307551

  19. Seasonal diversity and dynamics of haptophytes in the Skagerrak, Norway, explored by high-throughput sequencing

    PubMed Central

    Egge, Elianne Sirnæs; Johannessen, Torill Vik; Andersen, Tom; Eikrem, Wenche; Bittner, Lucie; Larsen, Aud; Sandaa, Ruth-Anne; Edvardsen, Bente

    2015-01-01

    Microalgae in the division Haptophyta play key roles in the marine ecosystem and in global biogeochemical processes. Despite their ecological importance, knowledge on seasonal dynamics, community composition and abundance at the species level is limited due to their small cell size and few morphological features visible under the light microscope. Here, we present unique data on haptophyte seasonal diversity and dynamics from two annual cycles, with the taxonomic resolution and sampling depth obtained with high-throughput sequencing. From outer Oslofjorden, S Norway, nano- and picoplanktonic samples were collected monthly for 2 years, and the haptophytes targeted by amplification of RNA/cDNA with Haptophyta-specific 18S rDNA V4 primers. We obtained 156 operational taxonomic units (OTUs), from c. 400.000 454 pyrosequencing reads, after rigorous bioinformatic filtering and clustering at 99.5%. Most OTUs represented uncultured and/or not yet 18S rDNA-sequenced species. Haptophyte OTU richness and community composition exhibited high temporal variation and significant yearly periodicity. Richness was highest in September–October (autumn) and lowest in April–May (spring). Some taxa were detected all year, such as Chrysochromulina simplex, Emiliania huxleyi and Phaeocystis cordata, whereas most calcifying coccolithophores only appeared from summer to early winter. We also revealed the seasonal dynamics of OTUs representing putative novel classes (clades HAP-3–5) or orders (clades D, E, F). Season, light and temperature accounted for 29% of the variation in OTU composition. Residual variation may be related to biotic factors, such as competition and viral infection. This study provides new, in-depth knowledge on seasonal diversity and dynamics of haptophytes in North Atlantic coastal waters. PMID:25893259

  20. Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species.

    PubMed

    Hess, Jon E; Campbell, Nathan R; Close, David A; Docker, Margaret F; Narum, Shawn R

    2013-06-01

    Unlike most anadromous fishes that have evolved strict homing behaviour, Pacific lamprey (Entosphenus tridentatus) seem to lack philopatry as evidenced by minimal population structure across the species range. Yet unexplained findings of within-region population genetic heterogeneity coupled with the morphological and behavioural diversity described for the species suggest that adaptive genetic variation underlying fitness traits may be responsible. We employed restriction site-associated DNA sequencing to genotype 4439 quality filtered single nucleotide polymorphism (SNP) loci for 518 individuals collected across a broad geographical area including British Columbia, Washington, Oregon and California. A subset of putatively neutral markers (N = 4068) identified a significant amount of variation among three broad populations: northern British Columbia, Columbia River/southern coast and 'dwarf' adults (F(CT) = 0.02, P ≪ 0.001). Additionally, 162 SNPs were identified as adaptive through outlier tests, and inclusion of these markers revealed a signal of adaptive variation related to geography and life history. The majority of the 162 adaptive SNPs were not independent and formed four groups of linked loci. Analyses with matsam software found that 42 of these outlier SNPs were significantly associated with geography, run timing and dwarf life history, and 27 of these 42 SNPs aligned with known genes or highly conserved genomic regions using the genome browser available for sea lamprey. This study provides both neutral and adaptive context for observed genetic divergence among collections and thus reconciles previous findings of population genetic heterogeneity within a species that displays extensive gene flow.

  1. Continuous record of the last 30 ka of Paleosecular Variation in a turbiditic marine sedimentary sequence off the NW Iberian Margin

    NASA Astrophysics Data System (ADS)

    Rey, Daniel; Mohamed, Kais Jacob; Coimbra, Rute

    2014-05-01

    Past variations of the geomagnetic field at decadal to centennial scales are recorded with exceptional quality in lava flows, but these are discontinuous and therefore high temporal resolution analyses of paleosecular variation of the geomagnetic field (PSV) are difficult. For such purposes, marine sediments hold a better potential since they are often regarded as continuous sedimentary archives of a range of environmental processes, in particular PSV. While this assumption is generally valid for the deep abyss, it may not be necessarily true for marginal settings and the vicinity of seamounts, where discontinuous sedimentary flows (e.g. turbidites) occur with a relatively high frequency. In this contribution, we present results from two gravity cores (TG8 and TG10) obtained from the flanks of the Galicia Bank, a structural high in the NW Iberian Margin. These cores are mostly comprised of a turbiditic sequence, with continuous pelagic sedimentation recorded continuously over the last 16 ka. Contrary to what would be expected, Alternating Field demagnetization of the NRM showed a PSV record consistent with the behaviour of the geomagnetic field in this region, which could be correlated with a published record in the adjacent Portuguese Margin (Thouveny et al., 2004). These results show that even in a unstable marine sedimentary setting, affected by discontinuous mass flows and biological activity, the delayed and gradual lock-in of the magnetization allows for a continuous record of the geomagnetic field. References: Thouveny, N., Carcaillet, J., Moreno, E., Leduc, G., Nérini, D., 2004. Geomagnetic moment variation and paleomagnetic excursions since 400 kyr BP: a stacked record from sedimentary sequences of the Portuguese Margin. Earth and Planetary Science Letters 219, 377-396.

  2. A method for high-performance sequence analysis using polyvinylidene difluoride membranes with a biphasic reaction column sequencer.

    PubMed

    Reim, D F; Speicher, D W

    1994-01-01

    Methods have been developed for high-sensitivity sequence analysis of proteins electroblotted onto polyvinylidene difluoride (PVDF) membranes using a Hewlett-Packard G1005A protein sequencer. This sequencer normally uses a biphasic (hydrophobic/hydrophilic) reaction column which was designed to accommodate loading and cleanup of samples from diverse solutions. However, the standard column, programs, and chemistry were not designed to accommodate PVDF, which has become a common sequencing support. In this study, a systematic evaluation of the suitability of this sequencer for analysis using PVDF bound samples was performed and included evaluation of: different wash and extraction solvents, multiple programming changes, two alternative formulations of coupling reagents, and the effect of direction for solvent and reagent deliveries. High-performance analysis of PVDF bound samples was achieved by: using a modified reaction column with an empty hydrophobic (top) half of the column module, program modifications for the reaction column and converter, substitution of ethyl acetate for the standard S2/3 extraction solvent and using prototype Version 2.0 formulations of the coupling reagents, R1 and R2. High-performance sequence analyses of experimental samples electroblotted from either 1D or 2D gels onto high-retention PVDF membranes were obtained with a 41-min cycle time, including experimental samples with initial coupling yields < 2 pmol. Routine sequencer performance was comparable to, or slightly better than, a conventional gas-phase sequencer which had been previously optimized by us for high-performance sequence analysis of electroblotted samples in the low pmol range.

  3. An Analysis of Stimuli that Influence Compliance during the High-Probability Instruction Sequence

    ERIC Educational Resources Information Center

    Normand, Matthew P.; Kestner, Kathryn; Jessel, Joshua

    2010-01-01

    When we evaluated variables that influence the effectiveness of the high-probability (high-p) instruction sequence, the sequence was associated with a precipitous decrease in compliance with high-"p" instructions for 1 participant, thereby precluding continued use of the sequence. We investigated the reasons for this decrease. Stimuli associated…

  4. Sequence Variation in the T-Cell Epitopes of the Plasmodium falciparum Circumsporozoite Protein among Field Isolates Is Temporally Stable: a 5-Year Longitudinal Study in Southern Vietnam

    PubMed Central

    Jalloh, Amadu; van Thien, Huynh; Ferreira, Marcelo U.; Ohashi, Jun; Matsuoka, Hiroyuki; Kanbe, Toshio; Kikuchi, Akihiko; Kawamoto, Fumihiko

    2006-01-01

    In an effort to decipher the nature and extent of antigen polymorphisms of malaria parasites in a setting where malaria is hypomesoendemic, we conducted a 5-year longitudinal study (1998 to 2003) by sequencing the Th2R and Th3R epitopes of the circumsporozoite protein (CSP) of 142 Plasmodium falciparum field isolates from Bao Loc, Vietnam. Samples were collected during the high-transmission season, September through December 1998 (n = 43), as well as from July 2000 to August 2001 (n = 34), September 2001 to July 2002 (n = 33), and August 2002 to July 2003 (n = 32). Marked sequence diversity was noted during the high-transmission season in 1998, but no significant variation in allele frequencies was observed over the years (χ2 = 70.003, degrees of freedom = 57, P = 0.116). The apparent temporal stability in allele frequency observed in this Bao Loc malaria setting may suggest that polymorphism in the Th2R and Th3R epitopes is not maintained by frequency-dependent immune selection. By including 36 isolates from Flores Island, Indonesia, and 19 isolates from Thaton, Myanmar, we investigated geographical patterns of sequence polymorphism for these epitopes in Southeast Asia; among the characterized isolates, a globally distributed variant appears to be predominant in Vietnam (75 of 142 isolates, or 52.8%) as well as in Myanmar (15 of 19 isolates, or 78.9%) and Indonesia (31 of 36 isolates, or 86.1%). Further analyses involving worldwide CSP sequences revealed distinct regional patterns, a finding which, together with the unique mutations observed here, may suggest a possible role for host or local factors in the generation of sequence diversity in the T-cell epitopes of CSP. PMID:16597843

  5. Characterization of mitochondrial ribosomal RNA genes in gadiformes: sequence variations, secondary structural features, and phylogenetic implications.

    PubMed

    Bakke, Ingrid; Johansen, Steinar

    2002-10-01

    Secondary structure features of mitochondrial ribosomal RNAs (mt-rRNAs) of bony fishes were investigated by a DNA sequence alignment approach. The small subunit (SSU) and large subunit (LSU) mt-rRNA genes were found to contain several additional variable regions compared to their mammalian counterparts. Fish mt-LSU rRNA genes were found to be longer than the mammalians due to increased length of some of the variable regions. The 5' and 3' ends of Atlantic cod mt-rRNAs were precisely mapped. The 3' ends of mt-SSU rRNAs were found to be homogenous and mono-adenylated, whereas that of the mt-LSU rRNAs were heterogenous and oligo-adenylated. The 5' ends of mt-SSU rRNAs appeared to be heterogenous, corresponding to the presumed first and second positions of the gene. Sequences of the central domain and the D-domain of the mt-SSU and mt-LSU rRNA genes, respectively, were determined and characterized for 11 gadiform species (representing the families Gadidae, Lotidae, Ranicipitidae, Merlucciidae, Phycidae, and Macrouridae) and one Lophiidae species. Detailed secondary structure models of the RNA regions are presented for the Atlantic cod (Gadus morhua) and Roundnose grenadier (Coryphaeonides rupestris). Saturation plots revealed that DNA nucleotide positions corresponding to unpaired RNA regions become saturated with transitions at sequence divergence levels about 0.15. Phylogenetic analyses revealed some aspects of gadiform relationships. Gadidae was identified as the most derived of the gadiform families. Lotidae was found to be the family closest related to Gadidae, and Ranicipitidae was also recognized as a derived gadiform taxon.

  6. Variational Bayesian strategies for high-dimensional, stochastic design problems

    SciTech Connect

    Koutsourelakis, P.S.

    2016-03-01

    This paper is concerned with a lesser-studied problem in the context of model-based, uncertainty quantification (UQ), that of optimization/design/control under uncertainty. The solution of such problems is hindered not only by the usual difficulties encountered in UQ tasks (e.g. the high computational cost of each forward simulation, the large number of random variables) but also by the need to solve a nonlinear optimization problem involving large numbers of design variables and potentially constraints. We propose a framework that is suitable for a class of such problems and is based on the idea of recasting them as probabilistic inference tasks. To that end, we propose a Variational Bayesian (VB) formulation and an iterative VB–Expectation-Maximization scheme that is capable of identifying a local maximum as well as a low-dimensional set of directions in the design space, along which, the objective exhibits the largest sensitivity. We demonstrate the validity of the proposed approach in the context of two numerical examples involving thousands of random and design variables. In all cases considered the cost of the computations in terms of calls to the forward model was of the order of 100 or less. The accuracy of the approximations provided is assessed by information-theoretic metrics.

  7. Investigating peptide sequence variations for 'double-click' stapled p53 peptides.

    PubMed

    Lau, Yu Heng; de Andrade, Peterson; Sköld, Niklas; McKenzie, Grahame J; Venkitaraman, Ashok R; Verma, Chandra; Lane, David P; Spring, David R

    2014-06-28

    Stapling peptides for inhibiting the p53/MDM2 interaction is a promising strategy for developing anti-cancer therapeutic leads. We evaluate double-click stapled peptides formed from p53-based diazidopeptides with different staple positions and azido amino acid side-chain lengths, determining the impact of these variations on MDM2 binding and cellular activity. We also demonstrate a K24R mutation, necessary for cellular activity in hydrocarbon-stapled p53 peptides, is not required for analogous 'double-click' peptides.

  8. DNA sequence-dependent variation in nucleosome structure, stability, and dynamics detected by a FRET-based analysis.

    PubMed

    Kelbauskas, L; Woodbury, N; Lohr, D

    2009-02-01

    Förster resonance energy transfer (FRET) techniques provide powerful and sensitive methods for the study of conformational features in biomolecules. Here, we review FRET-based studies of nucleosomes, focusing particularly on our work comparing the widely used nucleosome standard, 5S rDNA, and 2 promoter-derived regulatory element-containing nucleosomes, mouse mammary tumor virus (MMTV)-B and GAL10. Using several FRET approaches, we detected significant DNA sequence-dependent structure, stability, and dynamics differences among the three. In particular, 5S nucleosomes and 5S H2A/H2B-depleted nucleosomal particles have enhanced stability and diminished DNA dynamics, compared with MMTV-B and GAL10 nucleosomes and particles. H2A/H2B-depleted nucleosomes are of interest because they are produced by the activities of many transcription-associated complexes. Significant location-dependent (intranucleosomal) stability and dynamics variations were also observed. These also vary among nucleosome types. Nucleosomes restrict regulatory factor access to DNA, thereby impeding genetic processes. Eukaryotic cells possess mechanisms to alter nucleosome structure, to generate DNA access, but alterations often must be targeted to specific nucleosomes on critical regulatory DNA elements. By endowing specific nucleosomes with intrinsically higher DNA accessibility and (or) enhanced facility for conformational transitions, DNA sequence-dependent nucleosome dynamics and stability variations have the potential to facilitate nucleosome recognition and, thus, aid in the crucial targeting process. This and other nucleosome structure and function conclusions from FRET analyses are discussed.

  9. Autozygome Sequencing Expands the Horizon of Human Knockout Research and Provides Novel Insights into Human Phenotypic Variation

    PubMed Central

    Anazi, Shamsa; Alshamekh, Shomoukh; Alkuraya, Fowzan S.

    2013-01-01

    The use of autozygosity as a mapping tool in the search for autosomal recessive disease genes is well established. We hypothesized that autozygosity not only unmasks the recessiveness of disease causing variants, but can also reveal natural knockouts of genes with less obvious phenotypic consequences. To test this hypothesis, we exome sequenced 77 well phenotyped individuals born to first cousin parents in search of genes that are biallelically inactivated. Using a very conservative estimate, we show that each of these individuals carries biallelic inactivation of 22.8 genes on average. For many of the 169 genes that appear to be biallelically inactivated, available data support involvement in modulating metabolism, immunity, perception, external appearance and other phenotypic aspects, and appear therefore to contribute to human phenotypic variation. Other genes with biallelic inactivation may contribute in yet unknown mechanisms or may be on their way to conversion into pseudogenes due to true recent dispensability. We conclude that sequencing the autozygome is an efficient way to map the contribution of genes to human phenotypic variation that goes beyond the classical definition of disease. PMID:24367280

  10. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    SciTech Connect

    Leung, Elo; Huang, Amy; Cadag, Eithon; Montana, Aldrin; Soliman, Jan Lorenz; Zhou, Carol L. Ecale

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.

  11. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE PAGES

    Leung, Elo; Huang, Amy; Cadag, Eithon; ...

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  12. Functional and genetic analysis of haplotypic sequence variation at the nicastrin genomic locus.

    PubMed

    Hamilton, Gillian; Killick, Richard; Lambert, Jean-Charles; Amouyel, Philippe; Carrasquillo, Minerva M; Pankratz, V Shane; Graff-Radford, Neill R; Dickson, Dennis W; Petersen, Ronald C; Younkin, Steven G; Powell, John F; Wade-Martins, Richard

    2012-08-01

    Nicastrin (NCSTN) is a component of the γ-secretase complex and therefore potentially a candidate risk gene for Alzheimer's disease. Here, we have developed a novel functional genomics methodology to express common locus haplotypes to assess functional differences. DNA recombination was used to engineer 5 bacterial artificial chromosomes (BACs) to each express a different haplotype of the NCSTN locus. Each NCSTN-BAC was delivered to knockout nicastrin (Ncstn(-/-)) cells and clonal NCSTN-BAC(+)/Ncstn(-/-) cell lines were created for functional analyses. We showed that all NCSTN-BAC haplotypes expressed nicastrin protein and rescued γ-secretase activity and amyloid beta (Aβ) production in NCSTN-BAC(+)/Ncstn(-/-) lines. We then showed that genetic variation at the NCSTN locus affected alternative splicing in human postmortem brain tissue. However, there was no robust functional difference between clonal cell lines rescued by each of the 5 different haplotypes. Finally, there was no statistically significant association of NCSTN with disease risk in the 4 cohorts. We therefore conclude that it is unlikely that common variation at the NCSTN locus is a risk factor for Alzheimer's disease.

  13. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    NASA Astrophysics Data System (ADS)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  14. Comparative venom gland transcriptomics of Naja kaouthia (monocled cobra) from Malaysia and Thailand: elucidating geographical venom variation and insights into sequence novelty

    PubMed Central

    Chanhome, Lawan; Tan, Nget Hong

    2017-01-01

    Background The monocled cobra (Naja kaouthia) is a medically important venomous snake in Southeast Asia. Its venom has been shown to vary geographically in relation to venom composition and neurotoxic activity, indicating vast diversity of the toxin genes within the species. To investigate the polygenic trait of the venom and its locale-specific variation, we profiled and compared the venom gland transcriptomes of N. kaouthia from Malaysia (NK-M) and Thailand (NK-T) applying next-generation sequencing (NGS) technology. Methods The transcriptomes were sequenced on the Illumina HiSeq platform, assembled and followed by transcript clustering and annotations for gene expression and function. Pairwise or multiple sequence alignments were conducted on the toxin genes expressed. Substitution rates were studied for the major toxins co-expressed in NK-M and NK-T. Results and discussion The toxin transcripts showed high redundancy (41–82% of the total mRNA expression) and comprised 23 gene families expressed in NK-M and NK-T, respectively (22 gene families were co-expressed). Among the venom genes, three-finger toxins (3FTxs) predominated in the expression, with multiple sequences noted. Comparative analysis and selection study revealed that 3FTxs are genetically conserved between the geographical specimens whilst demonstrating distinct differential expression patterns, implying gene up-regulation for selected principal toxins, or alternatively, enhanced transcript degradation or lack of transcription of certain traits. One of the striking features that elucidates the inter-geographical venom variation is the up-regulation of α-neurotoxins (constitutes ∼80.0% of toxin’s fragments per kilobase of exon model per million mapped reads (FPKM)), particularly the long-chain α-elapitoxin-Nk2a (48.3%) in NK-T but only 1.7% was noted in NK-M. Instead, short neurotoxin isoforms were up-regulated in NK-M (46.4%). Another distinct transcriptional pattern observed is the

  15. Fungal community analysis by high-throughput sequencing of amplified markers – a user's guide

    PubMed Central

    Lindahl, Björn D; Nilsson, R Henrik; Tedersoo, Leho; Abarenkov, Kessy; Carlsen, Tor; Kjøller, Rasmus; Kõljalg, Urmas; Pennanen, Taina; Rosendahl, Søren; Stenlid, Jan; Kauserud, Håvard

    2013-01-01

    Novel high-throughput sequencing methods outperform earlier approaches in terms of resolution and magnitude. They enable identification and relative quantification of community members and offer new insights into fungal community ecology. These methods are currently taking over as the primary tool to assess fungal communities of plant-associated endophytes, pathogens, and mycorrhizal symbionts, as well as free-living saprotrophs. Taking advantage of the collective experience of six research groups, we here review the different stages involved in fungal community analysis, from field sampling via laboratory procedures to bioinformatics and data interpretation. We discuss potential pitfalls, alternatives, and solutions. Highlighted topics are challenges involved in: obtaining representative DNA/RNA samples and replicates that encompass the targeted variation in community composition, selection of marker regions and primers, options for amplification and multiplexing, handling of sequencing errors, and taxonomic identification. Without awareness of methodological biases, limitations of markers, and bioinformatics challenges, large-scale sequencing projects risk yielding artificial results and misleading conclusions. PMID:23534863

  16. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data

    PubMed Central

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  17. Gene expression profile of human bone marrow stromal cells: high-throughput expressed sequence tag sequencing analysis.

    PubMed

    Jia, Libin; Young, Marian F; Powell, John; Yang, Liming; Ho, Nicola C; Hotchkiss, Robert; Robey, Pamela Gehron; Francomano, Clair A

    2002-01-01

    Human bone marrow stromal cells (HBMSC) are pluripotent cells with the potential to differentiate into osteoblasts, chondrocytes, myelosupportive stroma, and marrow adipocytes. We used high-throughput DNA sequencing analysis to generate 4258 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' (97) and 3' (4161) ends of human cDNA clones from a HBMSC cDNA library. Our goal was to obtain tag sequences from the maximum number of possible genes and to deposit them in the publicly accessible database for ESTs (dbEST of the National Center for Biotechnology Information). Comparisons of our EST sequencing data with nonredundant human mRNA and protein databases showed that the ESTs represent 1860 gene clusters. The EST sequencing data analysis showed 60 novel genes found only in this cDNA library after BLAST analysis against 3.0 million ESTs in NCBI's dbEST database. The BLAST search also showed the identified ESTs that have close homology to known genes, which suggests that these may be newly recognized members of known gene families. The gene expression profile of this cell type is revealed by analyzing both the frequency with which a message is encountered and the functional categorization of expressed sequences. Comparing an EST sequence with the human genomic sequence database enables assignment of an EST to a specific chromosomal region (a process called digital gene localization) and often enables immediate partial determination of intron/exon boundaries within the genomic structure. It is expected that high-throughput EST sequencing and data mining analysis will greatly promote our understanding of gene expression in these cells and of growth and development of the skeleton.

  18. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    PubMed

    Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

    2013-01-01

    High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA.

  19. Geographical distribution and oncogenic risk association of human papillomavirus type 58 E6 and E7 sequence variations

    PubMed Central

    Chan, Paul K.S.; Zhang, Chuqing; Park, Jong-Sup; Smith-McCune, Karen K.; Palefsky, Joel M.; Giovannelli, Lucia; Coutlée, Francois; Hibbitts, Samantha; Konno, Ryo; Settheetham-Ishida, Wannapa; Chu, Tang-Yuan; Ferrera, Annabelle; Picconi, María Alejandra; De Marco, Federico; Woo, Yin-Ling; Raiol, Tainá; Piña-Sánchez, Patricia; Bae, Jeong-Hoon; Wong, Martin C.S.; Chirenje, Mike Z.; Magure, Tsitsi; Moscicki, Anna-Barbara; Fiander, Alison N.; Capra, Giuseppina; Ki, Eun Young; Tan, Yi; Chen, Zigui; Burk, Robert D.; Chan, Martin C.W.; Cheung, Tak-Hong; Pim, David; Banks, Lawrence

    2014-01-01

    Human papillomavirus (HPV) 58 accounts for a notable proportion of cervical cancers in East Asia and parts of Latin America, but it is uncommon elsewhere. The reason for such ethnogeographical predilection is unknown. In our study, nucleotide sequences of E6 and E7 genes of 401 HPV58 isolates collected from 15 countries/cities across four continents were examined. Phylogenetic relationship, geographical distribution and risk association of nucleotide sequence variations were analyzed. We found that the E6 genes of HPV58 variants were more conserved than E7. Thus, E6 is a more appropriate target for type-specific detection, whereas E7 is more appropriate for strain differentiation. The frequency of sequence variation varied geographically. Africa had significantly more isolates with E6-367A (D86E) but significantly less isolates with E6-203G, -245G, -367C (prototype-like) than other regions (p ≤ 0.003). E7-632T, -760A (T20I, G63S) was more frequently found in Asia, and E7-793G (T74A) was more frequent in Africa (p < 0.001). Variants with T20I and G63S substitutions at E7 conferred a significantly higher risk for cervical intraepithelial neoplasia grade III and invasive cervical cancer compared to other HPV58 variants (odds ratio = 4.44, p = 0.007). In conclusion, T20I and/or G63S substitution(s) at E7 of HPV58 is/are associated with a higher risk for cervical neoplasia. These substitutions are more commonly found in Asia and the Americas, which may account for the higher disease attribution of HPV58 in these areas. PMID:23136059

  20. Serial Gene Losses and Foreign DNA Underlie Size and Sequence Variation in the Plastid Genomes of Diatoms

    PubMed Central

    Ruck, Elizabeth C.; Nakov, Teofil; Jansen, Robert K.; Theriot, Edward C.; Alverson, Andrew J.

    2014-01-01

    Photosynthesis by diatoms accounts for roughly one-fifth of global primary production, but despite this, relatively little is known about their plastid genomes. We report the completely sequenced plastid genomes for eight phylogenetically diverse diatoms and show them to be variable in size, gene and foreign sequence content, and gene order. The genomes contain a core set of 122 protein-coding genes, with 15 additional genes exhibiting complex patterns of 1) gene losses at varying phylogenetic scales, 2) functional transfers to the nucleus, 3) gene duplication, divergence, and differential retention of paralogs, and 4) acquisitions of putatively functional recombinase genes from resident plasmids. The newly sequenced genomes also contain several previously unreported genes, highlighting how poorly characterized diatom plastid genomes are overall. Genome size variation reflects major expansions of the inverted repeat region in some cases but, more commonly, large-scale expansions of intergenic regions, many of which contain unique open reading frames of likely foreign origin. Although many gene clusters are conserved across species, rearrangements appear to be frequent in most lineages. PMID:24567305

  1. CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data.

    PubMed

    Sinha, Rituparna; Samaddar, Sandip; De, Rajat K

    2015-01-01

    Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.

  2. Analysis of DNA sequence variation within marine species using Beta-coalescents

    PubMed Central

    Steinrücken, Matthias; Birkner, Matthias; Blath, Jochen

    2013-01-01

    We apply recently developed inference methods based on general coalescent processes to DNA sequence data obtained from various marine species. Several of these species are believed to exhibit so-called shallow gene genealogies, potentially due to extreme reproductive behaviour, e.g. via Hedgecock’s “reproduction sweepstakes”. Besides the data analysis, in particular the inference of mutation rates and the estimation of the (real) time to the most recent common ancestor, we briefly address the question whether the genealogies might be adequately described by so-called Beta coalescents (as opposed to Kingman’s coalescent), allowing multiple mergers of genealogies. The choice of the underlying coalescent model for the genealogy has drastic implications for the estimation of the above quantities, in particular the real-time embedding of the genealogy. PMID:23376155

  3. New BZLF1 sequence variations in EBV-associated undifferentiated nasopharyngeal carcinoma in southern China.

    PubMed

    Ji, Kun-Mei; Li, Chun-Lin; Meng, Guang; Han, Ai-Dong; Wu, Xu-Li

    2008-01-01

    The viral lytic gene BZLF1 triggers replication of the Epstein-Barr virus (EBV), which is commonly found in nasopharyngeal carcinoma (NPC). Here, RT-PCR revealed five new BZLF1 variants in 8 of 12 NPC and 4 of 12 non-NPC nasopharyngeal biopsies from an NPC-endemic area in southern China. The deduced peptide sequence of the dominant BZLF1 variant differed by 11 amino acids from that of the prototypical strain B95.8 (V01555). Anti-ZEBRA antibody levels were higher in NPC than that in non-NPC patients (P < 0.001). These findings demonstrated a dominant BZLF1 variant in southern Chinese EBV-associated NPC and non-NPC patients.

  4. Assessment of megabase-scale somatic copy number variation using single-cell sequencing

    PubMed Central

    Knouse, Kristin A.; Wu, Jie; Amon, Angelika

    2016-01-01

    Megabase-scale copy number variants (CNVs) can have profound phenotypic consequences. Germline CNVs of this magnitude are associated with disease and experience negative selection. However, it is unknown whether organismal function requires that every cell maintain a balanced genome. It is possible that large somatic CNVs are tolerated or even positively selected. Single-cell sequencing is a useful tool for assessing somatic genomic heterogeneity, but its performance in CNV detection has not been rigorously tested. Here, we develop an approach that allows for reliable detection of megabase-scale CNVs in single somatic cells. We discover large CNVs in 8%–9% of cells across tissues and identify two recurrent CNVs. We conclude that large CNVs can be tolerated in subpopulations of cells, and particular CNVs are relatively prevalent within and across individuals. PMID:26772196

  5. Assessment of megabase-scale somatic copy number variation using single-cell sequencing.

    PubMed

    Knouse, Kristin A; Wu, Jie; Amon, Angelika

    2016-03-01

    Megabase-scale copy number variants (CNVs) can have profound phenotypic consequences. Germline CNVs of this magnitude are associated with disease and experience negative selection. However, it is unknown whether organismal function requires that every cell maintain a balanced genome. It is possible that large somatic CNVs are tolerated or even positively selected. Single-cell sequencing is a useful tool for assessing somatic genomic heterogeneity, but its performance in CNV detection has not been rigorously tested. Here, we develop an approach that allows for reliable detection of megabase-scale CNVs in single somatic cells. We discover large CNVs in 8%-9% of cells across tissues and identify two recurrent CNVs. We conclude that large CNVs can be tolerated in subpopulations of cells, and particular CNVs are relatively prevalent within and across individuals.

  6. Weather explains high annual variation in butterfly dispersal.

    PubMed

    Kuussaari, Mikko; Rytteri, Susu; Heikkinen, Risto K; Heliölä, Janne; von Bagh, Peter

    2016-07-27

    Weather conditions fundamentally affect the activity of short-lived insects. Annual variation in weather is therefore likely to be an important determinant of their between-year variation in dispersal, but conclusive empirical studies are lacking. We studied whether the annual variation of dispersal can be explained by the flight season's weather conditions in a Clouded Apollo (Parnassius mnemosyne) metapopulation. This metapopulation was monitored using the mark-release-recapture method for 12 years. Dispersal was quantified for each monitoring year using three complementary measures: emigration rate (fraction of individuals moving between habitat patches), average residence time in the natal patch, and average distance moved. There was much variation both in dispersal and average weather conditions among the years. Weather variables significantly affected the three measures of dispersal and together with adjusting variables explained 79-91% of the variation observed in dispersal. Different weather variables became selected in the models explaining variation in three dispersal measures apparently because of the notable intercorrelations. In general, dispersal rate increased with increasing temperature, solar radiation, proportion of especially warm days, and butterfly density, and decreased with increasing cloudiness, rainfall, and wind speed. These results help to understand and model annually varying dispersal dynamics of species affected by global warming.

  7. Sequencing technologies and genome sequencing.

    PubMed

    Pareek, Chandra Shekhar; Smoczynski, Rafal; Tretyn, Andrzej

    2011-11-01

    The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern bioinformatics tools at unprecedented pace, the target goal of sequencing individual genomes of living organism at a cost of $1,000 each is seemed to be realistically feasible in the near future. In the relatively short time frame since 2005, the HT-NGS technologies are revolutionizing the human and animal genome researches by analysis of chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), RNA sequencing (RNA-seq), whole genome genotyping, genome wide structural variation, de novo assembling and re-assembling of genome, mutation detection and carrier screening, detection of inherited disorders and complex human diseases, DNA library preparation, paired ends and genomic captures, sequencing of mitochondrial genome and personal genomics. In this review, we addressed the important features of HT-NGS like, first generation DNA sequencers, birth of HT-NGS, second generation HT-NGS platforms, third generation HT-NGS platforms: including single molecule Heliscope™, SMRT™ and RNAP sequencers, Nanopore, Archon Genomics X PRIZE foundation, comparison of second and third HT-NGS platforms, applications, advances and future perspectives of sequencing technologies on human and animal genome research.

  8. Evaluation of a Pooled Strategy for High-Throughput Sequencing of Cosmid Clones from Metagenomic Libraries

    PubMed Central

    Lam, Kathy N.; Hall, Michael W.; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D.; Charles, Trevor C.

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones. PMID:24911009

  9. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    PubMed

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  10. Communicating the Benefits of a Full Sequence of High School Science Courses

    ERIC Educational Resources Information Center

    Nicholas, Catherine Marie

    2014-01-01

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit…

  11. Study of the seasonal ozone variations at European high latitudes

    NASA Astrophysics Data System (ADS)

    Werner, R.; Stebel, K.; Hansen, H. G.; Hoppe, U.-P.; Gausa, M.; Kivi, R.; von der Gathen, P.; Orsolini, Y.; Kilifarska, N.

    2011-02-01

    The geographic area at high latitudes beyond the polar circle is characterized with long darkness during the winter (polar night) and with a long summertime insolation (polar day). Consequentially, the polar vortex is formed and the surrounding strong polar jet is characterized by a strong potential vorticity gradient representing a horizontal transport barrier. The ozone dynamics of the lower and middle stratosphere is controlled both by chemical destruction processes and transport processes.To study the seasonal ozone variation at high latitudes, ozone vertical distributions are examined, collected from the Arctic Lidar Observatory for Middle Atmosphere Research (ALOMAR) (69.3°N, 16.0°E,) station at Andenes and from the stations at Sodankylä (67.4°N, 26.6°E) and at Ny-Ålesund (78.9°N, 11.9°E). The data sets cover the time period from 1994 until 2004. We find a second ozone maximum near 13-15 km, between the tropopause and the absolute ozone maximum near 17-20 km. The maximum is built up by the combination of air mass transport and chemical ozone destruction, mainly caused by the NOx catalytic cycle, which begins after the polar night and intensifies with the increasing day length. Formation of a troposphere inversion layer is observed. The inversion layer is thicker and reaches higher altitudes in winter rather than in summer. However, the temperature inversion during summer is stronger. The formation of an enhanced ozone number density is observed during the spring-summer period. The ozone is accumulated or becomes poor by synoptic weather patterns just above the tropopause from spring to summer. In seasonal average an ozone enhancement above the tropopause is obtained.The stronger temperature inversion during the summer period inhibits the vertical stratosphere-troposphere exchange. The horizontal advection in the upper troposphere and lower stratosphere is enforced during summer. The combination of these mechanisms generates a layer with a very low

  12. High resolution MICA genotyping by sequence-based typing (SBT).

    PubMed

    Zou, Yizhou; Stastny, Peter

    2012-01-01

    We have developed a MICA typing method based on polymerase chain reaction (PCR) sequence-based typing and a computer program that determines the polymorphisms and distinguishes the GCT repeats in exon 5. One PCR amplification was performed to obtain templates of 2.2 kb, including exons 2, 3, 4, and 5 of MICA to be sequenced with two forward and two reverse primers. Overlay of nucleotide sequencing signals resulting from presence of different GCT repeats in exon 5 from two different MICA alleles can be identified by a computer program that analyses the combined signal string containing the 35 bases.

  13. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies

    PubMed Central

    2010-01-01

    Background High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. Results DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. Conclusions This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives

  14. Variations in Opsin Coding Sequences Cause X-Linked Cone Dysfunction Syndrome with Myopia and Dichromacy

    PubMed Central

    McClements, Michelle; Davies, Wayne I. L.; Michaelides, Michel; Young, Terri; Neitz, Maureen; MacLaren, Robert E.; Moore, Anthony T.; Hunt, David M.

    2013-01-01

    Purpose. To determine the role of variant L opsin haplotypes in seven families with Bornholm Eye Disease (BED), a cone dysfunction syndrome with dichromacy and myopia. Methods. Analysis of the opsin genes within the L/M opsin array at Xq28 included cloning and sequencing of an exon 3-5 gene fragment, long range PCR to establish gene order, and quantitative PCR to establish gene copy number. In vitro expression of normal and variant opsins was performed to examine cellular trafficking and spectral sensitivity of pigments. Results. All except one of the BED families possessed L opsin genes that contained a rare exon 3 haplotype. The exception was a family with the deleterious Cys203Arg substitution. Two rare exon 3 haplotypes were found and, where determined, these variant opsin genes were in the first position in the array. In vitro expression in transfected cultured neuronal cells showed that the variant opsins formed functional pigments, which trafficked to the cell membranes. The variant opsins were, however, less stable than wild type. Conclusions. It is concluded that the variant L opsin haplotypes underlie BED. The reduction in the amount of variant opsin produced in vitro compared with wild type indicates a possible disease mechanism. Alternatively, the recently identified defective splicing of exon 3 of the variant opsin transcript may be involved. Both mechanisms explain the presence of dichromacy and cone dystrophy. Abnormal pigment may also underlie the myopia that is invariably present in BED subjects. PMID:23322568

  15. Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation

    PubMed Central

    Kolodziejczyk, Aleksandra A.; Kim, Jong Kyoung; Tsang, Jason C.H.; Ilicic, Tomislav; Henriksson, Johan; Natarajan, Kedar N.; Tuck, Alex C.; Gao, Xuefei; Bühler, Marc; Liu, Pentao; Marioni, John C.; Teichmann, Sarah A.

    2015-01-01

    Summary Embryonic stem cell (ESC) culture conditions are important for maintaining long-term self-renewal, and they influence cellular pluripotency state. Here, we report single cell RNA-sequencing of mESCs cultured in three different conditions: serum, 2i, and the alternative ground state a2i. We find that the cellular transcriptomes of cells grown in these conditions are distinct, with 2i being the most similar to blastocyst cells and including a subpopulation resembling the two-cell embryo state. Overall levels of intercellular gene expression heterogeneity are comparable across the three conditions. However, this masks variable expression of pluripotency genes in serum cells and homogeneous expression in 2i and a2i cells. Additionally, genes related to the cell cycle are more variably expressed in the 2i and a2i conditions. Mining of our dataset for correlations in gene expression allowed us to identify additional components of the pluripotency network, including Ptma and Zfp640, illustrating its value as a resource for future discovery. PMID:26431182

  16. Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...

  17. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  18. Paleomagnetic directions and thermoluminescence dating from a bread oven-floor sequence in Lübeck (Germany): A record of 450 years of geomagnetic secular variation

    NASA Astrophysics Data System (ADS)

    Schnepp, Elisabeth; Pucher, Rudolf; Goedicke, Christian; Manzano, Ana; Müller, Uwe; Lanos, Philippe

    2003-02-01

    A record of about 450 years of geomagnetic secular variation is presented from a single archaeological site in Lübeck (Germany) where a sequence of 25 bread oven floors has been preserved in a bakery from medieval times until today. The age dating of the oven-floor sequence is based on historical documents, 14C-dating and thermoluminescence dating. It confines the time interval from about 1300 to 1800 A.D. Paleomagnetic directions have been determined from each oven floor by means of 198 oriented hand samples. After alternating field as well as thermal demagnetization experiments, the characteristic remanent magnetization direction was obtained using principal component analysis. The mean directions of 24 oven floors are characterized by high Fisherian precision parameters (>146) and small α95 confidence limits (1.2°-4.6°). For obtaining a smooth curve of geomagnetic secular variation for Lübeck, a spherical spline function was fitted to the data using a Bayesian approach, which considers not only the obtained ages, but also stratigraphic order. Correlation with historical magnetic records suggests that the age estimation for the upper 10 layers was too young and must date from the end of the sixteenth to the mid of the eighteenth century. For the lowermost 14 layers, dating is reliable and provides a secular variation curve for Germany. The inclination shows a minimum in the fourteenth century and then increases by more than 10°. Declination shows a local minimum around 1400 A.D. followed by a maximum in the seventeenth century. This is followed by the movement of declination about 30° to western directions.

  19. Sequence Diversity and Antigenic Variation at the rag Locus of Porphyromonas gingivalis

    PubMed Central

    Hall, Lucinda M. C.; Fawell, Stuart C.; Shi, Xiaoju; Faray-Kele, Marie-Claire; Aduse-Opoku, Joseph; Whiley, Robert A.; Curtis, Michael A.

    2005-01-01

    The rag locus of Porphyromonas gingivalis W50 encodes RagA, a predicted tonB-dependent receptor protein, and RagB, a lipoprotein that constitutes an immunodominant outer membrane antigen. The low G+C content of the locus, an association with mobility elements, and an apparent restricted distribution in the species suggested that the locus had arisen by horizontal gene transfer. In the present study, we have demonstrated that there are four divergent alleles of the rag locus. The original rag allele found in W50 was renamed rag-1, while three novel alleles, rag-2 to rag-4, were found in isolates lacking rag-1. The three novel alleles encoded variants of RagA with 63 to 71% amino acid identity to RagA1 and each other and variants of RagB with 43 to 56% amino acid identity. The RagA/B proteins have homology to numerous Bacteroides proteins, including SusC/D, implicated in polysaccharide uptake. Monoclonal and polyclonal antibodies raised against RagB1 of P. gingivalis W50 did not cross-react with proteins from isolates carrying different alleles. In a laboratory collection of 168 isolates, 26% carried rag-1, 36% carried rag-2, 25% carried rag-3, and 14% carried rag-4 (including the type strain, ATCC 33277). Restriction profiles of the locus in different isolates demonstrated polymorphism within each allele, some of which is accounted for by the presence or absence of insertion sequence elements. By reference to a previously published study on virulence in a mouse model (M. L. Laine and A. J. van Winkelhoff, Oral Microbiol. Immunol. 13:322-325, 1998), isolates that caused serious disease in mice were significantly more likely to carry rag-1 than other rag alleles. PMID:15972517

  20. Contamination-controlled high-throughput whole genome sequencing for influenza A viruses using the MiSeq sequencer

    PubMed Central

    Lee, Hong Kai; Lee, Chun Kiat; Tang, Julian Wei-Tze; Loh, Tze Ping; Koay, Evelyn Siew-Chuan

    2016-01-01

    Accurate full-length genomic sequences are important for viral phylogenetic studies. We developed a targeted high-throughput whole genome sequencing (HT-WGS) method for influenza A viruses, which utilized an enzymatic cleavage-based approach, the Nextera XT DNA library preparation kit, for library preparation. The entire library preparation workflow was adapted for the Sentosa SX101, a liquid handling platform, to automate this labor-intensive step. As the enzymatic cleavage-based approach generates low coverage reads at both ends of the cleaved products, we corrected this loss of sequencing coverage at the termini by introducing modified primers during the targeted amplification step to generate full-length influenza A sequences with even coverage across the whole genome. Another challenge of targeted HTS is the risk of specimen-to-specimen cross-contamination during the library preparation step that results in the calling of false-positive minority variants. We included an in-run, negative system control to capture contamination reads that may be generated during the liquid handling procedures. The upper limits of 99.99% prediction intervals of the contamination rate were adopted as cut-off values of contamination reads. Here, 148 influenza A/H3N2 samples were sequenced using the HTS protocol and were compared against a Sanger-based sequencing method. Our data showed that the rate of specimen-to-specimen cross-contamination was highly significant in HTS. PMID:27624998

  1. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges

    PubMed Central

    Liu, Biao; Morrison, Carl D.; Johnson, Candace S.; Trump, Donald L.; Qin, Maochun; Conroy, Jeffrey C.; Wang, Jianmin; Liu, Song

    2013-01-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections. PMID:24240121

  2. A high-definition view of functional genetic variation from natural yeast genomes.

    PubMed

    Bergström, Anders; Simpson, Jared T; Salinas, Francisco; Barré, Benjamin; Parts, Leopold; Zia, Amin; Nguyen Ba, Alex N; Moses, Alan M; Louis, Edward J; Mustonen, Ville; Warringer, Jonas; Durbin, Richard; Liti, Gianni

    2014-04-01

    The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.

  3. A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes

    PubMed Central

    Bergström, Anders; Simpson, Jared T.; Salinas, Francisco; Barré, Benjamin; Parts, Leopold; Zia, Amin; Nguyen Ba, Alex N.; Moses, Alan M.; Louis, Edward J.; Mustonen, Ville; Warringer, Jonas; Durbin, Richard; Liti, Gianni

    2014-01-01

    The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies. PMID:24425782

  4. Bats aloft: Variation in echolocation call structure at high altitudes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Bats alter their echolocation calls in response to changes in ecological and behavioral conditions, but little is known about how they adjust their call structure in response to changes in altitude. This study examines altitudinal variation in the echolocation calls of Brazilian free-tailed bats, T...

  5. Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools

    PubMed Central

    Ding, Jun; Sidore, Carlo; Butler, Thomas J.; Wing, Mary Kate; Qian, Yong; Meirelles, Osorio; Busonero, Fabio; Tsoi, Lam C.; Maschio, Andrea; Angius, Andrea; Kang, Hyun Min; Nagaraja, Ramaiah; Cucca, Francesco; Abecasis, Gonçalo R.; Schlessinger, David

    2015-01-01

    DNA sequencing identifies common and rare genetic variants for association studies, but studies typically focus on variants in nuclear DNA and ignore the mitochondrial genome. In fact, analyzing variants in mitochondrial DNA (mtDNA) sequences presents special problems, which we resolve here with a general solution for the analysis of mtDNA in next-generation sequencing studies. The new program package comprises 1) an algorithm designed to identify mtDNA variants (i.e., homoplasmies and heteroplasmies), incorporating sequencing error rates at each base in a likelihood calculation and allowing allele fractions at a variant site to differ across individuals; and 2) an estimation of mtDNA copy number in a cell directly from whole-genome sequencing data. We also apply the methods to DNA sequence from lymphocytes of ~2,000 SardiNIA Project participants. As expected, mothers and offspring share all homoplasmies but a lesser proportion of heteroplasmies. Both homoplasmies and heteroplasmies show 5-fold higher transition/transversion ratios than variants in nuclear DNA. Also, heteroplasmy increases with age, though on average only ~1 heteroplasmy reaches the 4% level between ages 20 and 90. In addition, we find that mtDNA copy number averages ~110 copies/lymphocyte and is ~54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference (p-value = 0.0031) and waist-hip ratio (p-value = 2.4×10-5), but not with body mass index, indicating an association with central fat distribution. To our knowledge, this is the largest population analysis to date of mtDNA dynamics, revealing the age-imposed increase in heteroplasmy, the relatively high heritability of copy number, and the association of copy number with metabolic traits. PMID:26172475

  6. Phylogeny and chromosomal variations in East Asian Carex, Siderostictae group (Cyperaceae), based on DNA sequences and cytological data.

    PubMed

    Yano, Okihito; Ikeda, Hiroshi; Jin, Xiao-Feng; Hoshino, Takuji

    2014-01-01

    Carex (Cyperaceae) is one of the largest genera of the flowering plants, and comprises more than 2,000 species. In Carex, section Siderostictae with broader leaves distributed in East Asia is thought to be an ancestral group. We aimed to clarify the phylogenetic relationships and chromosomal variations within the section Siderostictae, and to examine the relationship of broad-leaved species of the sections Hemiscaposae and Surculosae from East Asia, inferred from DNA sequences and cytological data. Our results indicate that a monophyletic Siderostictae clade, including the sections Hemiscaposae, Siderostictae and Surculosae, as the earliest diverging group in the tribe Cariceae. Low chromosome numbers, 2n = 12 or 24, with large sizes were observed in these three sections. Our results suggest that the genus Carex might have originated or relictly restricted in the East Asia. Geographical distributions of diploid species are restricted in narrower areas, while those of tetraploid species are wider in East Asia. It is concluded that chromosomal variations in Siderostictae clade may have been caused by polyploidization and that tetraploid species may have been able to exploit their habitats by polyploidization.

  7. Spatial thickness variation of Carboniferous coal-bearing sequences: A sedimentological response to varying levels of compactional and structural control

    SciTech Connect

    Liu, Yuejin; Ferm, J.C. . Dept. of Geological Sciences)

    1992-01-01

    A study of 1,120 borehole records from Carboniferous coal-bearing rocks in a 160 square mile area in southeastern Kentucky shows that within a stratigraphic interval of about 2,000 feet, the major lithic components are coarsening-upward sequences and coal groups. The latter are groups of rocks averaging 120 feet in thickness which include coal, thin mudstone and sandstone of channel or splay origin. The coarsening-upward sequences, which consist of mudstone overlain by sandstone, are of two types, one that is thick (mean thickness 170 feet/52 m) and very widespread and the other thin (mean thickness 70 feet/21 m) and has only local distribution. Variogram and trend surface procedures were used to characterize the dimension and areal distribution of these rock bodies. The results show that thickness variation is a product of differential compaction and movement of deep seated structures contemporaneous with sedimentation. Structural effects on two scales can be recognized, one on the order of 6 to 10 miles and other greater than 20 miles. Differential compaction effects are found to be closely associated with those produced by the smaller scale structures while some of the large scale structure effects are concordant with present Alleghenian structures.

  8. De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines.

    PubMed

    Shukla, Anjali; Singh, V K; Bharadwaj, D R; Kumar, Rajesh; Rai, Ashutosh; Rai, A K; Mugasimangalam, Raja; Parameswaran, Sriram; Singh, Major; Naik, P S

    2015-01-01

    Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd.

  9. GLA variation p.E66Q identified as the genetic etiology of Fabry disease using exome sequencing.

    PubMed

    Peng, Hao; Xu, Xiaojuan; Zhang, Lusi; Zhang, Xuehong; Peng, Hexiang; Zheng, Yu; Luo, Sanchuan; Guo, Hui; Xia, Kun; Li, Jiada; Yao, Hongliang; Hu, Zhengmao

    2016-01-10

    Fabry disease (FD) was an X-linked lysosomal storage disorder resulting from a deficiency in glycosphingolipid catabolism caused by mutations in the α-galactosidase A gene GLA. Variant FD patients did not present with classical symptoms during childhood and were undiagnosed or misdiagnosed with other kidney diseases, such as chronic glomerulonephritis (CGN). In this study, we utilized exome sequencing and Sanger sequencing identified the variation p.E66Q of GLA completely co-segregated with the disease phenotype in a Chinese family, which previously been diagnosed as possible CGN. Female patients exhibited preferential X-chromosome inactivation (XCI) of the normal p.E66 allele, as indicated by XCI analysis. By measuring α-Gal A activity, we found that male patients in the pedigree had just little enzymatic activity while female patients had residual enzymatic activity. These patients were diagnosed with renal variant FD in subsequent clinical review. Our results directly implicated the GLA mutation p.E66Q as the genetic etiology of the Chinese renal variant FD pedigree.

  10. Frequent sequence variation in the human myostatin (GDF8) gene as a marker for analysis of muscle-related phenotypes.

    PubMed

    Ferrell, R E; Conte, V; Lawrence, E C; Roth, S M; Hagberg, J M; Hurley, B F

    1999-12-01

    Myostatin is a recently identified member of the transforming growth factor-beta family of regulatory factors, also known as growth and differentiation factor 8 (GDF8). The nucleotide sequence of human myostatin was determined in 40 individuals. The invariant promoter contains a consensus MyoD binding site, and the coding sequence contains five missense substitutions in conserved amino acid residues (A55T, K153R, E164K, P198A, and I225T). Two of these, A55T in exon 1 and K153R in exon 2, are polymorphic in the general population with significantly different allele frequencies in Caucasians and African Americans (P < 0.001). Neither of the common polymorphisms had a significant impact on muscle mass response to strength training in either Caucasians or African Americans, although skewed allele frequencies preclude detection of small effects. These allelic variants provide markers for examining association between the myostatin gene and interindividual variation in muscle mass and differences in loss of muscle mass with aging.

  11. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  12. An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing.

    PubMed

    Auburn, Sarah; Campino, Susana; Clark, Taane G; Djimde, Abdoulaye A; Zongo, Issaka; Pinches, Robert; Manske, Magnus; Mangano, Valentina; Alcock, Daniel; Anastasi, Elisa; Maslen, Gareth; Macinnis, Bronwyn; Rockett, Kirk; Modiano, David; Newbold, Christopher I; Doumbo, Ogobara K; Ouédraogo, Jean Bosco; Kwiatkowski, Dominic P

    2011-01-01

    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of "natural" parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ~40-fold coverage of the genome was observed per lane for samples with ≤ 30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ~40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P

  13. Next-gen sequencing identifies non-coding variation disrupting miRNA-binding sites in neurological disorders.

    PubMed

    Devanna, P; Chen, X S; Ho, J; Gajewski, D; Smith, S D; Gialluisi, A; Francks, C; Fisher, S E; Newbury, D F; Vernes, S C

    2017-03-14

    Understanding the genetic factors underlying neurodevelopmental and neuropsychiatric disorders is a major challenge given their prevalence and potential severity for quality of life. While large-scale genomic screens have made major advances in this area, for many disorders the genetic underpinnings are complex and poorly understood. To date the field has focused predominantly on protein coding variation, but given the importance of tightly controlled gene expression for normal brain development and disorder, variation that affects non-coding regulatory regions of the genome is likely to play an important role in these phenotypes. Herein we show the importance of 3 prime untranslated region (3'UTR) non-coding regulatory variants across neurodevelopmental and neuropsychiatric disorders. We devised a pipeline for identifying and functionally validating putatively pathogenic variants from next generation sequencing (NGS) data. We applied this pipeline to a cohort of children with severe specific language impairment (SLI) and identified a functional, SLI-associated variant affecting gene regulation in cells and post-mortem human brain. This variant and the affected gene (ARHGEF39) represent new putative risk factors for SLI. Furthermore, we identified 3'UTR regulatory variants across autism, schizophrenia and bipolar disorder NGS cohorts demonstrating their impact on neurodevelopmental and neuropsychiatric disorders. Our findings show the importance of investigating non-coding regulatory variants when determining risk factors contributing to neurodevelopmental and neuropsychiatric disorders. In the future, integration of such regulatory variation with protein coding changes will be essential for uncovering the genetic causes of complex neurological disorders and the fundamental mechanisms underlying health and disease.Molecular Psychiatry advance online publication, 14 March 2017; doi:10.1038/mp.2017.30.

  14. Persistent Sub-Yearly Chromospheric Variations in Lower Main-Sequence Stars: Tau Booe and alpha Com

    NASA Technical Reports Server (NTRS)

    Maulik, Davesh; Donahue, Robert A.; Baliunas, Sallie L.

    1997-01-01

    The recent discoveries of extrasolar planetary systems around lower main-sequence stars such as tau Booe (HD 120136) has prompted further investigation into their stellar activity. A cursory analysis of tau Booe for cyclic chromospheric activity, based on its 30-yr record of Ca 2 H and K fluxes obtained as part of the HK Project from Mount Wilson Observatory, finds an intermediate, sub-yearly period (approximately 117 d) in chromospheric activity in addition to, and separate from, both its rotation (3.3 d) and long-term variability. As a persistent subyearly period in surface magnetic activity is unprecedented, we investigate this apparent anomaly further by examining chromospheric activity levels of other stars with similar mass, searching for variability in chromospheric activity with periods of less than one year, but longer than measured or predicted rotation. An examination of the time series of 40 mid-to-late F dwarfs yielded one other star for further analysis: alpha Com (HD 114378, P approximately 132 d). The variations for these two stars were checked for persistence and coherence. Based on these determinations, we eliminate the possibilities of rotation, long-term activity cycle, and the evolution of active regions as the cause of this variation in both stars. In particular, for tau Booe we infer that the phenomenon may be chromospheric in origin; however, beyond this, it is difficult to identify anything further regarding the cause of the activity variations, or even whether the observed modulation in the two stars have the same origin.

  15. Variation in high-frequency wave radiation from small repeating earthquakes as revealed by cross-spectral analysis

    NASA Astrophysics Data System (ADS)

    Hatakeyama, Norishige; Uchida, Naoki; Matsuzawa, Toru; Okada, Tomomi; Nakajima, Junichi; Matsushima, Takeshi; Kono, Toshio; Hirahara, Satoshi; Nakayama, Takashi

    2016-11-01

    We examined the variation in the high-frequency wave radiation for three repeating earthquake sequences (M = 3.1-4.1) in the northeastern Japan subduction zone by waveform analyses. Earthquakes in each repeating sequence are located at almost the same place and show low-angle thrust type focal mechanisms, indicating that they represent repeated ruptures of a seismic patch on the plate boundary. We calculated cross-spectra of the waveforms and obtained the phases and coherences for pairs of events in the respective repeating sequences in order to investigate the waveform differences. We used waveform data sampled at 1 kHz that were obtained from temporary seismic observations we conducted immediately after the 2011 Tohoku earthquake near the source area. For two repeating sequences, we found that the interevent delay times for the two waveforms in a frequency band higher than the corner frequencies are different from those in a lower frequency band for particular event pairs. The phases and coherences show that there are coherent high-frequency waves for almost all the repeaters regardless of the high-frequency delays. These results indicate that high-frequency waves are always radiated from the same vicinity (subpatch) for these events but the time intervals between the ruptures of the subpatch and the centroid times can vary. We classified events in the sequence into two subgroups according to the high-frequency band interevent delays relative to the low-frequency band. For one sequence, we found that all the events that occurred just after (within 11 days) larger nearby earthquakes belong to one subgroup while other events belong to the other subgroup. This suggests that the high-frequency wave differences were caused by stress perturbations due to the nearby earthquakes. In summary, our observations suggest that high-frequency waves from the repeating sequence are radiated not from everywhere but from a long-duration subpatch within the seismic slip area. The

  16. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    PubMed

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J; Sommer, Simone; Godoy, José A

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications.

  17. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches

    PubMed Central

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J.; Sommer, Simone

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele’s amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications. PMID:27294261

  18. Sequence variation in the cytochrome oxidase I, internal transcribed spacer 1, and Ts14 diagnostic antigen sequences of Taenia solium isolates from South and Central America, India, and Asia.

    PubMed

    Hancock, K; Broughel, D E; Moura, I N; Khan, A; Pieniazek, N J; Gonzalez, A E; Garcia, H H; Gilman, R H; Tsang, V C

    2001-12-01

    We examined the genetic variability in the pig-human tapeworm, Taenia solium, by sequencing the genes for cytochrome oxidase I, internal transcribed spacer 1, and a diagnostic antigen, Ts14, from individual cysts isolated from Peru, Colombia, Mexico, India, China, and the Philippines. For these genes, the rate of nucleotide variation was minimal. Isolates from these countries can be distinguished based on one to eight nucleotide differences in the 396 nucleotide cytochrome oxidase I (COI) sequence. However, all of the 15 isolates from within Peru had identical COI sequences. The Ts14 sequences from India and China were identical and differed from the Peru sequence by three nucleotides in 333. These data indicate that there is minimal genetic variability within the species T. solium. Minimal variability was also seen in the ITS1 sequence, but this variation was observed within the individual. Twenty-two cloned sequences from six isolates sorted into 13 unique sequences. The variability observed within the sequences from individual cysts was as great as the variability between the isolates.

  19. Spatial stress variations in the aftershock sequence following the 2008 M6 earthquake doublet in the South Iceland Seismic Zone

    NASA Astrophysics Data System (ADS)

    Hensch, M.; Árnadóttir, Th.; Lund, B.; Brandsdóttir, B.

    2012-04-01

    The South Iceland Seismic Zone (SISZ) is an approximately 80 km wide E-W transform zone, bridging the offset between the Eastern Volcanic Zone and the Hengill triple junction to the west. The plate motion is accommodated in the brittle crust by faulting on many N-S trending right-lateral strike-slip faults of 2-5 km separation. Major sequences of large earthquakes (M>6) has occurred repeatedly in the SISZ since the settlement in Iceland more than thousand years ago. On 29th May 2008, two M6 earthquakes hit the western part of the SISZ on two adjacent N-S faults within a few seconds. The intense aftershock sequence was recorded by the permanent Icelandic SIL network and a promptly installed temporary network of 11 portable seismometers in the source region. The network located thousands of aftershocks during the following days, illuminating a 12-17 km long region along both major fault ruptures as well as several smaller parallel faults along a diffuse E-W trending region west of the mainshock area without any preceding main rupture. This episode is suggested to be the continuation of an earthquake sequence which started with two M6.5 and several M5-6 events in June 2000. The time delay between the 2000 and 2008 events could be due to an inflation episode in Hengill during 1993-1998, that potentially locked N-S strike slip faults in the western part of the SISZ. Around 300 focal solutions for aftershocks have been derived by analyzing P-wave polarities, showing predominantly strike-slip movements with occasional normal faulting components (unstable P-axis direction), which suggests an extensional stress regime as their driving force. A subsequent stress inversion of four different aftershock clusters reveals slight variations of the directions of the average σ3 axes. While for both southern clusters, including the E-W cluster, the σ3 axes are rather elongated perpendicular to the overall plate spreading axis, they are more northerly trending for shallower clusters

  20. SPITZER IRS SPECTRAL MAPPING OF THE TOOMRE SEQUENCE: SPATIAL VARIATIONS OF PAH, GAS, AND DUST PROPERTIES IN NEARBY MAJOR MERGERS

    SciTech Connect

    Haan, S.; Armus, L.; Laine, S.; Surace, J. A.; Diaz-Santos, T.; Beirao, P.; Stierwalt, S.; Charmandaris, V.; Smith, J. D.; Schweizer, F.; Murphy, E. J.; Brandl, B.; Evans, A. S.; Hibbard, J. E.; Yun, M.; Jarrett, T. H.

    2011-12-01

    We have mapped the key mid-IR diagnostics in eight major merger systems of the Toomre sequence (NGC 4676, NGC 7592, NGC 6621, NGC 2623, NGC 6240, NGC 520, NGC 3921, and NGC 7252) using the Spitzer Infrared Spectrograph. With these maps, we explore the variation of the ionized-gas, polycyclic aromatic hydrocarbon (PAH), and warm gas (H{sub 2}) properties across the sequence and within the galaxies. While the global PAH interband strength and ionized gas flux ratios ([Ne III]/[Ne II]) are similar to those of normal star-forming galaxies, the distribution of the spatially resolved PAH and fine structure line flux ratios is significantly different from one system to the other. Rather than a constant H{sub 2}/PAH flux ratio, we find that the relation between the H{sub 2} and PAH fluxes is characterized by a power law with a roughly constant exponent (0.61 {+-} 0.05) over all merger components and spatial scales. While following the same power law on local scales, three galaxies have a factor of 10 larger integrated (i.e., global) H{sub 2}/PAH flux ratio than the rest of the sample, even larger than what it is in most nearby active galactic nuclei. These findings suggest a common dominant excitation mechanism for H{sub 2} emission over a large range of global H{sub 2}/PAH flux ratios in major mergers. Early-merger systems show a different distribution between the cold (CO J = 1-0) and warm (H{sub 2}) molecular gas components, which is likely due to the merger interaction. Strong evidence for buried star formation in the overlap region of the merging galaxies is found in two merger systems (NGC 6621 and NGC 7592) as seen in the PAH, [Ne II], [Ne III], and warm gas line emission, but with no apparent corresponding CO (J = 1-0) emission. The minimum of the 11.3/7.7 {mu}m PAH interband strength ratio is typically located in the nuclei of galaxies, while the [Ne III/[Ne II] ratio increases with distance from the nucleus. Our findings also demonstrate that the variations of

  1. [Current applications of high-throughput DNA sequencing technology in antibody drug research].

    PubMed

    Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong

    2012-03-01

    Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.

  2. iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data

    PubMed Central

    2013-01-01

    Background Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed. However, their strategies are diverse and there is no tool able to detect a full range of SVs accurately. Results We focused on evaluating the performance of existing deletion calling algorithms for various spanning ranges from low- to high-coverage simulation data. The simulation data was generated from a whole genome sequence with artificial SVs constructed based on the distribution of variants obtained from the 1000 Genomes Project. From the simulation analysis, deletion calls of various deletion sizes were obtained with each caller, and it was found that the performance was quite different according to the type of algorithms and targeting deletion size. Based on these results, we propose an integrated structural variant calling pipeline (iSVP) that combines existing methods with a newly devised filtering and merging processes. It achieved highly accurate deletion calling with >90% precision and >90% recall on the 30× read data for a broad range of size. We applied iSVP to the whole-genome sequence data of a CEU HapMap sample, and detected a large number of deletions, including notable peaks around 300 bp and 6,000 bp, which corresponded to Alus and long interspersed nuclear elements, respectively. In addition, many of the predicted deletions were highly consistent with experimentally validated ones by other studies. Conclusions We present iSVP, a new deletion calling pipeline to obtain a genome-wide landscape of deletions in a highly accurate manner. From simulation and real data

  3. Sequence polymorphism of GroEL gene in natural population of Bacillus and Brevibacillus spp. that showed variation in thermal tolerance capacity and mRNA expression.

    PubMed

    Sen, R; Tripathy, S; Padhi, S K; Mohanty, S; Maiti, N K

    2014-10-01

    GroEL, a class I chaperonin, plays an important role in the thermal adaptation of the cell and helps to maintain the viability of the cell under heat shock condition. Function of groEL in vivo depends on the maintenance of proper structure of the protein which in turn depends on the nucleotide and amino acid sequence of the gene. In this study, we investigated the changes in nucleotide and amino acid sequences of the partial groEL gene that may affect the thermotolerance capacity as well as mRNA expression of bacterial isolates. Sequences among the same species having differences in the amino acid level were identified as different alleles. The effect of allelic variation on the groEL gene expression was analyzed by comparison and relative quantification in each allele under thermal shock condition by RT-PCR. Evaluation of K a/K s ratio among the strains of same species showed that the groEL gene of all the species had undergone similar functional constrain during evolution. The strains showing similar thermotolerance capacity was found to carry same allele of groEL gene. The isolates carrying allele having amino acid substitution inside the highly ATP/ADP or Mg(2+)-binding region could not tolerate thermal stress and showed lower expression of the groEL gene. Our results indicate that during evolution of these bacterial species the groEL gene has undergone the process of natural selection, and the isolates have evolved with the groEL allelic sequences that help them to withstand the thermal stress during their interaction with the environment.

  4. Putting Physics First: Three Case Studies of High School Science Department and Course Sequence Reorganization

    ERIC Educational Resources Information Center

    Larkin, Douglas B.

    2016-01-01

    This article examines the process of shifting to a "Physics First" sequence in science course offerings in three school districts in the United States. This curricular sequence reverses the more common U.S. high school sequence of biology/chemistry/physics, and has gained substantial support in the physics education community over the…

  5. Variation in sequences containing microsatellite motifs in the perennial biomass and forage grass, Phalaris arundinacea (Poaceae).

    PubMed

    Barth, Susanne; Jankowska, Marta Jolanta; Hodkinson, Trevor Roland; Vellani, Tia; Klaas, Manfred

    2016-03-22

    Forty three microsatellite markers were developed for further genetic characterisation of a forage and biomass grass crop, for which genomic resources are currently scarce. The microsatellite markers were developed from a normalized EST-SSR library. All of the 43 markers gave a clear banding pattern on 3% Metaphor agarose gels. Eight selected SSR markers were tested in detail for polymorphism across eleven DNA samples of large geographic distribution across Europe. The new set of 43 SSR markers will help future research to characterise the genetic structure and diversity of Phalaris arundinacea, with a potential to further understand its invasive character in North American wetlands, as well as aid in breeding work for desired biomass and forage traits. P. arundinacea is particularly valued in the northern latitude as a crop with high biomass potential, even more so on marginal lands.

  6. Mitochondrial DNA sequence variation and phylogeography of Neotropic pumas (Puma concolor).

    PubMed

    Caragiulo, Anthony; Dias-Freedman, Isabela; Clark, J Alan; Rabinowitz, Salisa; Amato, George

    2014-08-01

    Pumas occupy the largest latitudinal range of any New World terrestrial mammal. Human population growth and associated habitat reduction has reduced their North American range by nearly two-thirds, but the impact of human expansion in Central and South America on puma populations is not clear. We examined mitochondrial DNA diversity of pumas across the majority of their range, with a focus on Central and South America. Four mitochondrial gene regions (1140 base pairs) revealed 16 unique haplotypes differentiating pumas into three geographic groupings: North America, Central America and South America. These groups were highly differentiated as indicated by significant pairwise FST values. North American samples were genetically homogenous compared to Central and South American samples, and South American pumas were the most diverse and ancestral. These findings support an earlier hypothesis that North America was recolonized by founding pumas from Central and South America.

  7. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling

    PubMed Central

    Baslan, Timour; Kendall, Jude; Ward, Brian; Cox, Hilary; Leotta, Anthony; Rodgers, Linda; Riggs, Michael; D'Italia, Sean; Sun, Guoli; Yong, Mao; Miskimen, Kristy; Gilmore, Hannah; Saborowski, Michael; Dimitrova, Nevenka; Krasnitz, Alexander; Harris, Lyndsay; Wigler, Michael; Hicks, James

    2015-01-01

    Genome-wide analysis at the level of single cells has recently emerged as a powerful tool to dissect genome heterogeneity in cancer, neurobiology, and development. To be truly transformative, single-cell approaches must affordably accommodate large numbers of single cells. This is feasible in the case of copy number variation (CNV), because CNV determination requires only sparse sequence coverage. We have used a combination of bioinformatic and molecular approaches to optimize single-cell DNA amplification and library preparation for highly multiplexed sequencing, yielding a method that can produce genome-wide CNV profiles of up to a hundred individual cells on a single lane of an Illumina HiSeq instrument. We apply the method to human cancer cell lines and biopsied cancer tissue, thereby illustrating its efficiency, reproducibility, and power to reveal underlying genetic heterogeneity and clonal phylogeny. The capacity of the method to facilitate the rapid profiling of hundreds to thousands of single-cell genomes represents a key step in making single-cell profiling an easily accessible tool for studying cell lineage. PMID:25858951

  8. Performance and microbial ecology of a nitritation sequencing batch reactor treating high-strength ammonia wastewater

    PubMed Central

    Chen, Wenjing; Dai, Xiaohu; Cao, Dawen; Wang, Sha; Hu, Xiaona; Liu, Wenru; Yang, Dianhai

    2016-01-01

    The partial nitrification (PN) performance and the microbial community variations were evaluated in a sequencing batch reactor (SBR) for 172 days, with the stepwise elevation of ammonium concentration. Free ammonia (FA) and low dissolved oxygen inhibition of nitrite-oxidized bacteria (NOB) were used to achieve nitritation in the SBR. During the 172 days operation, the nitrogen loading rate of the SBR was finally raised to 3.6 kg N/m3/d corresponding the influent ammonium of 1500 mg/L, with the ammonium removal efficiency and nitrite accumulation rate were 94.12% and 83.54%, respectively, indicating that the syntrophic inhibition of FA and low dissolved oxygen contributed substantially to the stable nitrite accumulation. The results of the 16S rRNA high-throughput sequencing revealed that Nitrospira, the only nitrite-oxidizing bacteria in the system, were successively inhibited and eliminated, and the SBR reactor was dominated finally by Nitrosomonas, the ammonium-oxidizing bacteria, which had a relative abundance of 83%, indicating that the Nitrosomonas played the primary roles on the establishment and maintaining of nitritation. Followed by Nitrosomonas, Anaerolineae (7.02%) and Saprospira (1.86%) were the other mainly genera in the biomass. PMID:27762325

  9. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling.

    PubMed

    Baslan, Timour; Kendall, Jude; Ward, Brian; Cox, Hilary; Leotta, Anthony; Rodgers, Linda; Riggs, Michael; D'Italia, Sean; Sun, Guoli; Yong, Mao; Miskimen, Kristy; Gilmore, Hannah; Saborowski, Michael; Dimitrova, Nevenka; Krasnitz, Alexander; Harris, Lyndsay; Wigler, Michael; Hicks, James

    2015-05-01

    Genome-wide analysis at the level of single cells has recently emerged as a powerful tool to dissect genome heterogeneity in cancer, neurobiology, and development. To be truly transformative, single-cell approaches must affordably accommodate large numbers of single cells. This is feasible in the case of copy number variation (CNV), because CNV determination requires only sparse sequence coverage. We have used a combination of bioinformatic and molecular approaches to optimize single-cell DNA amplification and library preparation for highly multiplexed sequencing, yielding a method that can produce genome-wide CNV profiles of up to a hundred individual cells on a single lane of an Illumina HiSeq instrument. We apply the method to human cancer cell lines and biopsied cancer tissue, thereby illustrating its efficiency, reproducibility, and power to reveal underlying genetic heterogeneity and clonal phylogeny. The capacity of the method to facilitate the rapid profiling of hundreds to thousands of single-cell genomes represents a key step in making single-cell profiling an easily accessible tool for studying cell lineage.

  10. Palaeomagnetism and 40Ar/39Ar age of a Pliocene lava flow sequence in the Lesser Caucasus: record of a clockwise rotation and analysis of palaeosecular variation

    NASA Astrophysics Data System (ADS)

    Caccavari, Ana; Calvo-Rathert, Manuel; Goguitchaichvili, Avto; Huaiyu, He; Vashakidze, Goga; Vegas, Néstor

    2014-06-01

    A palaeomagnetic and rock-magnetic investigation has been carried out on a Pliocene lava flow sequence in the Djavakheti Highland in the central Lesser Caucasus in the Republic of Georgia. In addition, a 40Ar/39Ar dating and electronic microscopic studies were performed on samples of this sequence, named the Saro section, which consists of 39 successive lava flows of doleritic basalts. A characteristic magnetization could be isolated in all studied 39 flows, yielding reverse-polarity directions in all cases, a mean direction D = 202.2°, I = -60.6° (N = 39, α95 = 2.0°, k = 138) being obtained. Thermomagnetic experiments (strong-field versus temperature curves) suggested low-Ti titanomagnetites and low Curie-temperature titanomagnetites with a rather high titanium content (x ≈ 0.5-0.7) as the main carriers of remanence. Their domain structure is characterized by a mixture of single- and multidomain grains. 40Ar/39Ar dating yielded an age of 1.73 ± 0.03 Ma, interpreted as the eruption age of the uppermost lava flow of the sequence. Analysis of palaeomagnetic results and radiometric data from the present and a previous study allows two different explanations about the time of emplacement of the section: (i) The lower 36 flows of the sequence might have been emitted between the normal-polarity Reunion and Olduvai chrons, and the upper three flows after the Olduvai chron, with a long hiatus in volcanic activity of more than 150 kyr or (ii) The whole sequence has been emitted between 1.778 and 1.73 ± 0.03 Ma, after the Olduvai chron. Comparison of the palaeomagnetic results obtained in this study with the expected direction shows that while inclination values agree well, declination shows an eastward deviation of 19.2° ± 5.8°. This discrepancy can be explained with a clockwise vertical-axis rotation of the sequence, which might have been produced by extensional structures with strike-slip component, which can be found in the study area. Virtual geomagnetic pole

  11. Structure of the Bacterial Cytoskeleton Protein Bactofilin by NMR Chemical Shifts and Sequence Variation.

    PubMed

    Kassem, Maher M; Wang, Yong; Boomsma, Wouter; Lindorff-Larsen, Kresten

    2016-06-07

    Bactofilins constitute a recently discovered class of bacterial proteins that form cytoskeletal filaments. They share a highly conserved domain (DUF583) of which the structure remains unknown, in part due to the large size and noncrystalline nature of the filaments. Here, we describe the atomic structure of a bactofilin domain from Caulobacter crescentus. To determine the structure, we developed an approach that combines a biophysical model for proteins with recently obtained solid-state NMR spectroscopy data and amino acid contacts predicted from a detailed analysis of the evolutionary history of bactofilins. Our structure reveals a triangular β-helical (solenoid) conformation with conserved residues forming the tightly packed core and polar residues lining the surface. The repetitive structure explains the presence of internal repeats as well as strongly conserved positions, and is reminiscent of other fibrillar proteins. Our work provides a structural basis for future studies of bactofilin biology and for designing molecules that target them, as well as a starting point for determining the organization of the entire bactofilin filament. Finally, our approach presents new avenues for determining structures that are difficult to obtain by traditional means.

  12. (Genomic variation in maize)

    SciTech Connect

    Rivin, C.J.

    1991-01-01

    These studies have sought to learn how different DNA sequences and sequence arrangements contribute to genome plasticity in maize. We describe quantitative variation among maize inbred lines for tandemly arrayed and dispersed repeated DNA sequences and gene families, and qualitative variation for sequences homologous to the Mutator family of transposons. The potential of these sequences to undergo unequal crossing over, non-allelic (ectopic) recombination and transposition makes them a source of genome instability. We have found examples of rapid genomic change involving these sequences in Fl hybrids, tissue culture cells and regenerated plants. We describe the repetitive portion of the maize genome as composed primarily of sequences that vary markedly in copy number among different genetic stocks. The most highly variable is the 185 bp repeat associated with the heterochromatic chromosome knobs. Even in lines without visible knobs, there is a considerable quantity of tandemly arrayed repeats. We also found a high degree of variability for the tandemly arrayed 5S and ribosomal DNA repeats. While such variation might be expected as the result of unequal cross-over, we were surprised to find considerable variation among lower copy number, dispersed repeats as well. One highly repeated sequence that showed a complex tandem and dispersed arrangement stood out as showing no detectable variability among the maize lines. In striking contrast to the variability seen between the inbred stocks, individuals within a stock were indistinguishable with regard to their repeated sequence multiplicities.

  13. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing

    PubMed Central

    Shafer, Aaron B. A.; Northrup, Joseph M.; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B. W.

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations. PMID:26745372

  14. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    PubMed

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations.

  15. Lithofacies variation across the Mammoth Cave-Pope Megagroup boundary -- a sequence stratigraphic approach

    SciTech Connect

    McDonald, T.A.; Tabor, E.; Marzolf, J.E. . Dept. of Geology)

    1994-04-01

    Regional stratigraphic relations in southern Illinois suggest a major unconformity near the top of the St. Genevieve Limestone. Large exposures below the unconformity within the Anna quarries display a retrogradational parasequence-stacking pattern. Eight to 12 m-thick parasequences comprise thinning-upward marine bioclastic wackestone overlain by oolitic and bioclastic thickening-upward eolian( ) grainstone. An eolian origin for the bioclastic grainstones is supported by large scale cross stratification (0.5 to 2 m-thick sets), reworked character of rounded, coated bioclasts, and preserved duneforms. At the quarries, the unconformity is directly overlain by mudstones and sandstones. Thinning-upward mudstones interbedded with very thin (1 to 3 cm thick) intraclastic packstone tempestites crop out in a roadcut about 500 m NE of the quarries. Small-scale ripples and absence of trace fossils in lower mudstone units suggest an estuarine or lagoonal, brackish-waver environment. The trace fossil Conostichus and horizontal burrows appear abruptly in the upper, thin mudstone units. Highly bioturbated green and red shales overlying a 1 to 4 m-thick covered interval in a roadcut 610 m farther north are interbedded with tidally deposited, medium- to coarse-grained, bioclastic grainstones. The shale-draped, medium cross-bedded grainstones document ten or more tidal bundles. The cross-bedded grainstone is overlain by wavy- to flaser-bedded very fine-grained sandstone suggestive of sand flat origin. These sandstones are overlain by the Aux Vases Sandstone. Numerous low-angle bounding surfaces within the Aux Vases enclose low-angle, wedge-planar cross-bedding. A single irregular surface coated by a few centimeters of poorly sorted unstratified sandstone defines a ravinement surface near the base of the Aux Vases Sandstone.

  16. Phylogeography of East Asian Lespedeza buergeri (Fabaceae) based on chloroplast and nuclear ribosomal DNA sequence variations.

    PubMed

    Jin, Dong-Pil; Lee, Jung-Hyun; Xu, Bo; Choi, Byoung-Hee

    2016-09-01

    The dynamic changes in land configuration during the Quaternary that were accompanied by climatic oscillations have significantly influenced the current distribution and genetic structure of warm-temperate forests in East Asia. Although recent surveys have been conducted, the historical migration of forest species via land bridges and, especially, the origins of Korean populations remains conjectural. Here, we reveal the genetic structure of Lespedeza buergeri, a warm-temperate shrub that is disjunctively distributed around the East China Sea (ECS) at China, Korea, and Japan. Two non-coding regions (rpl32-trnL, psbA-trnH) of chloroplast DNA (cpDNA) and the internal transcribed spacer of nuclear ribosomal DNA (nrITS) were analyzed for 188 individuals from 16 populations, which covered almost all of its distribution. The nrITS data demonstrated a genetic structure that followed geographic boundaries. This examination utilized AMOVA, comparisons of genetic differentiation based on haplotype frequency/genetic mutations among haplotypes, and Mantel tests. However, the cpDNA data showed contrasting genetic pattern, implying that this difference was due to a slower mutation rate in cpDNA than in nrITS. These results indicated frequent migration by this species via an ECS land bridge during the early Pleistocene that then tapered gradually toward the late Pleistocene. A genetic isolation between western and eastern Japan coincided with broad consensus that was suggested by the presence of other warm-temperate plants in that country. For Korean populations, high genetic diversity indicated the existence of refugia during the Last Glacial Maximum on the Korean Peninsula. However, their closeness with western Japanese populations at the level of haplotype clade implied that gene flow from western Japanese refugia was possible until post-glacial processing occurred through the Korea/Tsushima Strait land bridge.

  17. [Role of high-throughput sequencing in oncology].

    PubMed

    Rodrigues, Manuel Jorge; Gomez-Roca, Carlos

    2013-03-01

    New sequencing technologies are one of the most important technical advances in biology in the last 10 years. These technologies allow sequencing millions of DNA fragments in parallel, covering billions of bases in a short period of time. These techniques allowed discovering millions of variants, which functional and clinical value rest yet to be confirmed. This technology allows us to search new constitutional and somatic mutations in various samples in a short time. The complexity of data interpretation and size of data as well as the important investment needed to implement make these technologies to be present only in big institutions. The objective of this article is to present the different techniques, their associated technologies and to discuss their current applications.

  18. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia

    PubMed Central

    Koht, Jeanette; Pihlstrøm, Lasse; Rengmark, Aina H.; Henriksen, Sandra P.; Tallaksen, Chantal M. E.; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%) and variants of uncertain significance in ten probands (10%). Together these accounted for 30 probands (29%) and involved 18 different genes. Among several interesting findings, dominantly inherited KIF1A variants, p.(Val8Met) and p.(Ile27Thr) segregated in two independent families, both presenting with a pure spastic paraplegia phenotype. Two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) were found in SACS in one consanguineous family, presenting with spastic ataxia and isolated cerebellar atrophy. The average disease duration in probands with pathogenic and likely-pathogenic variants was 31 years, ranging from 4 to 51 years. In conclusion, this study confirmed and expanded the clinical phenotypes associated with known disease genes. The results demonstrate that gene panel sequencing and similar sequencing approaches can serve as efficient diagnostic tools for different heterogeneous disorders. Early use of such strategies may help to reduce both costs and time of the diagnostic process. PMID:28362824

  19. Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

    PubMed Central

    2012-01-01

    Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993

  20. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers

    PubMed Central

    Bessaud, Maël; Sadeuh-Mba, Serge A.; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses. PMID:27617004

  1. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    PubMed Central

    Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  2. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea.

    PubMed

    Goldsmith, Dawn B; Parsons, Rachel J; Beyene, Damitu; Salamon, Peter; Breitbart, Mya

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years.

  3. Chromosomal Instability Estimation Based on Next Generation Sequencing and Single Cell Genome Wide Copy Number Variation Analysis

    PubMed Central

    Dago, Angel E.; Leitz, Laura J.; Wang, Yipeng; Lee, Jerry; Werner, Shannon L.; Gendreau, Steven; Patel, Premal; Jia, Shidong; Zhang, Liangxuan; Tucker, Eric K.; Malchiodi, Michael; Graf, Ryon P.; Dittamore, Ryan; Marrinucci, Dena; Landers, Mark

    2016-01-01

    Genomic instability is a hallmark of cancer often associated with poor patient outcome and resistance to targeted therapy. Assessment of genomic instability in bulk tumor or biopsy can be complicated due to sample availability, surrounding tissue contamination, or tumor heterogeneity. The Epic Sciences circulating tumor cell (CTC) platform utilizes a non-enrichment based approach for the detection and characterization of rare tumor cells in clinical blood samples. Genomic profiling of individual CTCs could provide a portrait of cancer heterogeneity, identify clonal and sub-clonal drivers, and monitor disease progression. To that end, we developed a single cell Copy Number Variation (CNV) Assay to evaluate genomic instability and CNVs in patient CTCs. For proof of concept, prostate cancer cell lines, LNCaP, PC3 and VCaP, were spiked into healthy donor blood to create mock patient-like samples for downstream single cell genomic analysis. In addition, samples from seven metastatic castration resistant prostate cancer (mCRPC) patients were included to evaluate clinical feasibility. CTCs were enumerated and characterized using the Epic Sciences CTC Platform. Identified single CTCs were recovered, whole genome amplified, and sequenced using an Illumina NextSeq 500. CTCs were then analyzed for genome-wide copy number variations, followed by genomic instability analyses. Large-scale state transitions (LSTs) were measured as surrogates of genomic instability. Genomic instability scores were determined reproducibly for LNCaP, PC3, and VCaP, and were higher than white blood cell (WBC) controls from healthy donors. A wide range of LST scores were observed within and among the seven mCRPC patient samples. On the gene level, loss of the PTEN tumor suppressor was observed in PC3 and 5/7 (71%) patients. Amplification of the androgen receptor (AR) gene was observed in VCaP cells and 5/7 (71%) mCRPC patients. Using an in silico down-sampling approach, we determined that DNA copy

  4. Rapid detection of sequence variation in Clostridium difficile genes using LATE-PCR with multiple mismatch-tolerant hybridization probes.

    PubMed

    Pierce, Kenneth E; Khan, Huma; Mistry, Rohit; Goldenberg, Simon D; French, Gary L; Wangh, Lawrence J

    2012-11-01

    A novel molecular assay for Clostridium difficile was developed using Linear-After-The-Exponential polymerase chain reaction (LATE-PCR). Single-stranded DNA products generated by LATE-PCR were detected and distinguished by hybridization to fluorescent mismatch-tolerant probes, as the temperature was lowered after amplification in 5(°)C intervals between 65°C and 25°C. Single-tube multiplex reactions for tcdA, tcdB, tcdC, and cdtB (binary toxin) sequences were initially optimized using synthetic targets and were subsequently done using genomic DNA; each target was detected and characterized by hybridization to one or more probes of a different fluorescent color. In the case of tcdC, three probes, each labeled with a Quasar fluorophore, hybridize to different locations with known mutations, including the deletion at nucleotide 117 in ribotype 027 strains and the premature stop codon mutation at nucleotide 184 in ribotype 078 strains, each of which is associated with hypervirulent infections. These and other tcdC mutations were distinguished from the reference sequence, as well as from each other by changes in the fluorescent contour generated from the combined Quasar-labeled probes. Specific variations in tcdA and tcdB were also identified in the multiplex assay, including those that identified strains lacking toxin A production. This single closed-tube assay generates substantially more information about virulent C. difficile than currently available commercial assays and could be further expanded to provide strain typing.

  5. Sequence variation in PPP1R13L results in a novel form of cardio-cutaneous syndrome.

    PubMed

    Falik-Zaccai, Tzipora C; Barsheshet, Yiftah; Mandel, Hanna; Segev, Meital; Lorber, Avraham; Gelberg, Shachaf; Kalfon, Limor; Ben Haroush, Shani; Shalata, Adel; Gelernter-Yaniv, Liat; Chaim, Sarah; Raviv Shay, Dorith; Khayat, Morad; Werbner, Michal; Levi, Inbar; Shoval, Yishay; Tal, Galit; Shalev, Stavit; Reuveni, Eli; Avitan-Hersh, Emily; Vlodavsky, Eugene; Appl-Sarid, Liat; Goldsher, Dorit; Bergman, Reuven; Segal, Zvi; Bitterman-Deutsch, Ora; Avni, Orly

    2017-03-01

    Dilated cardiomyopathy (DCM) is a life-threatening disorder whose genetic basis is heterogeneous and mostly unknown. Five Arab Christian infants, aged 4-30 months from four families, were diagnosed with DCM associated with mild skin, teeth, and hair abnormalities. All passed away before age 3. A homozygous sequence variation creating a premature stop codon at PPP1R13L encoding the iASPP protein was identified in three infants and in the mother of the other two. Patients' fibroblasts and PPP1R13L-knocked down human fibroblasts presented higher expression levels of pro-inflammatory cytokine genes in response to lipopolysaccharide, as well as Ppp1r13l-knocked down murine cardiomyocytes and hearts of Ppp1r13l-deficient mice. The hypersensitivity to lipopolysaccharide was NF-κB-dependent, and its inducible binding activity to promoters of pro-inflammatory cytokine genes was elevated in patients' fibroblasts. RNA sequencing of Ppp1r13l-knocked down murine cardiomyocytes and of hearts derived from different stages of DCM development in Ppp1r13l-deficient mice revealed the crucial role of iASPP in dampening cardiac inflammatory response. Our results determined PPP1R13L as the gene underlying a novel autosomal-recessive cardio-cutaneous syndrome in humans and strongly suggest that the fatal DCM during infancy is a consequence of failure to regulate transcriptional pathways necessary for tuning cardiac threshold response to common inflammatory stressors.

  6. Sequence variation in couch potato and its effects on life-history traits in a northern malt fly, Drosophila montana.

    PubMed

    Kankare, Maaria; Salminen, Tiina S; Lampinen, Hanna; Hoikkala, Anneli

    2012-02-01

    Couch potato (cpo) has previously been connected to reproductive diapause in several insect species including Drosophila melanogaster, where it has been suggested to provide a link between the insulin signalling pathway and the hormonal control of diapause. In the first part of the study we sequenced nearly 3.6 kb of this gene in a northern Drosophila species (Drosophila montana) with a robust photoperiodically determined diapause and found several types of polymorphisms along the sequenced area. We also found variation among five Drosophila virilis group species in the length of the 5th exon of cpo and in the site of the stop codon at the end of this exon. The second part of the study was targeted on a deletion of six amino acids located in the last section of exon 5, which in D. melanogaster, is translated only in one short transcript lacking the following exons. The studied deletion appeared to be extremely rare in the wild D. montana population where it was found, but its frequency rapidly increased during laboratory culture. qPCR analyses showed the expression level of the deletion allele to be significantly downregulated in both the diapausing and non-diapausing females compared to the wild type allele. At the phenotypic level, the deletion and the decreased expression of cpo transcript involving it did not have direct effect on the incidence of female reproductive diapause, but it was associated with a reduction in development time under diapause-inducing conditions. This suggests that while the cpo transcript containing the prolonged version of the 5th exon with a stop codon is clearly associated with fly development time, the exons with RNA domains included in other transcripts of the gene may be more directly related to diapause regulation.

  7. A high-density remote reference magnetic variation profile in the Pacific northwest of North America

    USGS Publications Warehouse

    Hermance, J.F.; Lusi, S.; Slocum, W.; Neumann, G.A.; Green, A.W.

    1989-01-01

    During the summer of 1985, as part of the EMSLAB Project, Brown University conducted a detailed magnetic variation study of the Oregon Coast Range and Cascades volcanic system along an E-W profile in central Oregon. Comprised of a sequence of 75 remote reference magnetic variation (MV) stations spaced 3-4 km apart, the profile stretched for 225 km from Newport, on the Oregon coast, across the Coast Range, the Willamette Valley, and the High Cascades to a point ??? 50 km east of Santiam Pass. At all of the MV stations, data were collected for short periods (16-100 s), and at 17 of these stations data were also obtained at longer periods (100-1600 s). Data were monitored with a three-component ring core fluxgate magnetometer (Nanotesla), and were recorded with a microcomputer (DEC PDP 11/73) based data acquisition system. A 2-D generalized inversion of the magnetic transfer coefficients over the period range of 16-1600 s indicates four distinct conductors. First, we see the coast effect caused by a large sedimentary wedge offshore. Second, we see the effect of currents flowing in the conductive sediments of the Willamette Valley. Our inversion suggests that the Willamette Valley consists of two electrically distinct features, due perhaps to a horst-like structure imprinted on the valley sediments. Next we note an electric current system centered beneath the High Cascades. This latter feature may be associated with a sediment-filled graben beneath Santiam Pass as suggested by some of the gravity and MT results reported to date. Finally, we detect the presence of a deep conductor at mid-crustal depths which laterally extends westward from beneath the Basin and Range Province, and terminates beneath the western Cascades. One view of this last result is that it appears that modern Basin and Range structure is being imprinted on pre-existing Cascade structure. ?? 1989.

  8. Heavy-light chain interrelations of MS-associated immunoglobulins probed by deep sequencing and rational variation.

    PubMed

    Lomakin, Yakov A; Zakharova, Maria Yu; Stepanov, Alexey V; Dronina, Maria A; Smirnov, Ivan V; Bobik, Tatyana V; Pyrkov, Andrey Yu; Tikunova, Nina V; Sharanova, Svetlana N; Boitsov, Vitali M; Vyazmin, Sergey Yu; Kabilov, Marsel R; Tupikin, Alexey E; Krasnov, Alexey N; Bykova, Nadezda A; Medvedeva, Yulia A; Fridman, Marina V; Favorov, Alexander V; Ponomarenko, Natalia A; Dubina, Michael V; Boyko, Alexey N; Vlassov, Valentin V; Belogurov, Alexey A; Gabibov, Alexander G

    2014-12-01

    The mechanisms triggering most of autoimmune diseases are still obscure. Autoreactive B cells play a crucial role in the development of such pathologies and, in particular, production of autoantibodies of different specificities. The combination of deep-sequencing technology with functional studies of antibodies selected from highly representative immunoglobulin combinatorial libraries may provide unique information on specific features in the repertoires of autoreactive B cells. Here, we have analyzed cross-combinations of the variable regions of human immunoglobulins against the myelin basic protein (MBP) previously selected from a multiple sclerosis (MS)-related scFv phage-display library. On the other hand, we have performed deep sequencing of the sublibraries of scFvs against MBP, Epstein-Barr virus (EBV) latent membrane protein 1 (LMP1), and myelin oligodendrocyte glycoprotein (MOG). Bioinformatics analysis of sequencing data and surface plasmon resonance (SPR) studies have shown that it is the variable fragments of antibody heavy chains that mainly determine both the affinity of antibodies to the parent autoantigen and their cross-reactivity. It is suggested that LMP1-cross-reactive anti-myelin autoantibodies contain heavy chains encoded by certain germline gene segments, which may be a hallmark of the EBV-specific B cell subpopulation involved in MS triggering.

  9. From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

    PubMed Central

    Laurie, Steve; Fernandez‐Callejo, Marcos; Marco‐Sola, Santiago; Trotta, Jean‐Remi; Camps, Jordi; Chacón, Alejandro; Espinosa, Antonio; Gut, Marta; Gut, Ivo; Heath, Simon

    2016-01-01

    ABSTRACT As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next‐generation sequencing as standard practice in research and diagnostics. However, computing cost–performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state‐of‐the‐art read aligners (BWA‐MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available. PMID:27604516

  10. From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing.

    PubMed

    Laurie, Steve; Fernandez-Callejo, Marcos; Marco-Sola, Santiago; Trotta, Jean-Remi; Camps, Jordi; Chacón, Alejandro; Espinosa, Antonio; Gut, Marta; Gut, Ivo; Heath, Simon; Beltran, Sergi

    2016-12-01

    As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next-generation sequencing as standard practice in research and diagnostics. However, computing cost-performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state-of-the-art read aligners (BWA-MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available.

  11. High-order variational perturbation theory for the free energy.

    PubMed

    Weissbach, Florian; Pelster, Axel; Hamprecht, Bodo

    2002-09-01

    In this paper we introduce a generalization to the algebraic Bender-Wu recursion relation for the eigenvalues and the eigenfunctions of the anharmonic oscillator. We extend this well known formalism to the time-dependent quantum statistical Schrödinger equation, thus obtaining the imaginary-time evolution amplitude by solving a recursive set of ordinary differential equations. This approach enables us to evaluate global and local quantum statistical quantities of the anharmonic oscillator to much higher orders than by evaluating Feynman diagrams. We probe our perturbative results by deriving a perturbative expression for the free energy, which is then subject to variational perturbation theory as developed by Kleinert, yielding convergent results for the free energy for all values of the coupling strength.

  12. Intragenomic sequence variation at the ITS1 - ITS2 region and at the 18S and 28S nuclear ribosomal DNA genes of the New Zealand mud snail, Potamopyrgus antipodarum (Hydrobiidae: mollusca)

    USGS Publications Warehouse

    Hoy, Marshal S.; Rodriguez, Rusty J.

    2013-01-01

    Molecular genetic analysis was conducted on two populations of the invasive non-native New Zealand mud snail (Potamopyrgus antipodarum), one from a freshwater ecosystem in Devil's Lake (Oregon, USA) and the other from an ecosystem of higher salinity in the Columbia River estuary (Hammond Harbor, Oregon, USA). To elucidate potential genetic differences between the two populations, three segments of nuclear ribosomal DNA (rDNA), the ITS1-ITS2 regions and the 18S and 28S rDNA genes were cloned and sequenced. Variant sequences within each individual were found in all three rDNA segments. Folding models were utilized for secondary structure analysis and results indicated that there were many sequences which contained structure-altering polymorphisms, which suggests they could be nonfunctional pseudogenes. In addition, analysis of molecular variance (AMOVA) was used for hierarchical analysis of genetic variance to estimate variation within and among populations and within individuals. AMOVA revealed significant variation in the ITS region between the populations and among clones within individuals, while in the 5.8S rDNA significant variation was revealed among individuals within the two populations. High levels of intragenomic variation were found in the ITS regions, which are known to be highly variable in many organisms. More interestingly, intragenomic variation was also found in the 18S and 28S rDNA, which has rarely been observed in animals and is so far unreported in Mollusca. We postulate that in these P. antipodarum populations the effects of concerted evolution are diminished due to the fact that not all of the rDNA genes in their polyploid genome should be essential for sustaining cellular function. This could lead to a lessening of selection pressures, allowing mutations to accumulate in some copies, changing them into variant sequences.                   

  13. Hfqs in Bacillus anthracis: Role of protein sequence variation in the structure and function of proteins in the Hfq family.

    PubMed

    Vrentas, Catherine; Ghirlando, Rodolfo; Keefer, Andrea; Hu, Zonglin; Tomczak, Aurelie; Gittis, Apostolos G; Murthi, Athulaprabha; Garboczi, David N; Gottesman, Susan; Leppla, Stephen H

    2015-11-01

    Hfq proteins in Gram-negative bacteria play important roles in bacterial physiology and virulence, mediated by binding of the Hfq hexamer to small RNAs and/or mRNAs to post-transcriptionally regulate gene expression. However, the physiological role of Hfqs in Gram-positive bacteria is less clear. Bacillus anthracis, the causative agent of anthrax, uniquely expresses three distinct Hfq proteins, two from the chromosome (Hfq1, Hfq2) and one from its pXO1 virulence plasmid (Hfq3). The protein sequences of Hfq1 and 3 are evolutionarily distinct from those of Hfq2 and of Hfqs found in other Bacilli. Here, the quaternary structure of each B. anthracis Hfq protein, as produced heterologously in Escherichia coli, was characterized. While Hfq2 adopts the expected hexamer structure, Hfq1 does not form similarly stable hexamers in vitro. The impact on the monomer-hexamer equilibrium of varying Hfq C-terminal tail length and other sequence differences among the Hfqs was examined, and a sequence region of the Hfq proteins that was involved in hexamer formation was identified. It was found that, in addition to the distinct higher-order structures of the Hfq homologs, they give rise to different phenotypes. Hfq1 has a disruptive effect on the function of E. coli Hfq in vivo, while Hfq3 expression at high levels is toxic to E. coli but also partially complements Hfq function in E. coli. These results set the stage for future studies of the roles of these proteins in B. anthracis physiology and for the identification of sequence determinants of phenotypic complementation.

  14. Fetal akinesia deformation sequence in a highly developed acardius twin.

    PubMed

    Konstantinidou, A E; Agapitos, E V; Pavlopoulos, P M; Davaris, P S

    1997-10-01

    We report a case of a holoacardius twin with extremely advanced development of the head, face, upper and lower limbs in the absence of all thoracic and upper abdominal viscera and associated with intestinal and anal atresia. The malformed fetus also had craniofacial abnormalities, hydrops, cystic hygroma of the neck, arthrogryposis and pterygia. The monozygous co-twin was found to be normal. The association of acardia with the typical characteristics of the fetal akinesia deformation sequence has not been previously described in the literature.

  15. CYP2D7 Sequence Variation Interferes with TaqMan CYP2D6*15 and *35 Genotyping

    PubMed Central

    Riffel, Amanda K.; Dehghani, Mehdi; Hartshorne, Toinette; Floyd, Kristen C.; Leeder, J. Steven; Rosenblatt, Kevin P.; Gaedigk, Andrea

    2016-01-01

    TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs) some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false-positive CYP2D6*15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6*15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL)-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6*35) which is also located in exon 1. Although alternative CYP2D6*15 and *35 assays resolved the issue, we discovered a novel CYP2D6*15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6*15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696) SNP of CYP2D6*43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer and/or probe regions can impact

  16. CYP2D7 Sequence Variation Interferes with TaqMan CYP2D6 (*) 15 and (*) 35 Genotyping.

    PubMed

    Riffel, Amanda K; Dehghani, Mehdi; Hartshorne, Toinette; Floyd, Kristen C; Leeder, J Steven; Rosenblatt, Kevin P; Gaedigk, Andrea

    2015-01-01

    TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs) some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false-positive CYP2D6 (*) 15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6 (*) 15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL)-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6 (*) 35) which is also located in exon 1. Although alternative CYP2D6 (*) 15 and (*) 35 assays resolved the issue, we discovered a novel CYP2D6 (*) 15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6 (*) 15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696) SNP of CYP2D6 (*) 43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer

  17. Wavefront Amplitude Variation of TPF's High Contrast Imaging Testbed: Modeling and Experiment

    NASA Technical Reports Server (NTRS)

    Shi, Fang; Lowman, Andrew E.; Moody, Dwight C.; Niessner, Albert F.; Trauger, John T.

    2005-01-01

    Knowledge of wavefront amplitude is as important as the knowledge of phase for a coronagraphic high contrast imaging system. Efforts have been made to understand various contributions of the amplitude variation in Terrestrial Planet Finder's (TPF) High Contrast Imaging Testbed (HCIT). Modeling of HCIT with as-built mirror surfaces has shown an amplitude variation of 1.3% due to the phase-amplitude mixing for the testbed's front-end optics. Experimental measurements on the testbed have shown the amplitude variation is about 2.5% with the testbed's illumination pattern has a major contribution as the low order amplitude variation.

  18. Improved resolution of bacteria by high throughput sequence analysis of the rRNA internal transcribed spacer

    PubMed Central

    Ruegger, Paul M.; Clark, Robin T.; Weger, John R.; Braun, Jonathan; Borneman, James

    2014-01-01

    Current high throughput sequencing (HTS) methods are limited in their ability to resolve bacteria at or below the genus level. While the impact of this limitation may be relatively minor in whole-community analyses, it constrains the use of HTS as a tool for identifying and examining individual bacteria of interest. The limited resolution is a consequence of both short read lengths and insufficient sequence variation within the commonly targeted variable regions of the small-subunit rRNA (SSU) gene. The goal of this work was to improve the resolving power of bacterial HTS. We developed an assay targeting the hypervariable rRNA internal transcribed spacer (ITS) region residing between the SSU and large-subunit (LSU) rRNA genes. Comparisons of the ITS region and two SSU regions using annotated bacterial genomes in GenBank showed much greater resolving power is possible with the ITS region. This report presents a new HTS method for analyzing bacterial composition with improved capabilities. The greater resolving power enabled by the ITS region arises from its high sequence variation across a wide range of bacterial taxa and an associated decrease in taxonomic heterogeneity within its OTUs. Although the method should be adaptable to any HTS platform, this report presents PCR primers, amplification parameters, and protocols for Illumina-based analyses. PMID:25034229

  19. Improved resolution of bacteria by high throughput sequence analysis of the rRNA internal transcribed spacer.

    PubMed

    Ruegger, Paul M; Clark, Robin T; Weger, John R; Braun, Jonathan; Borneman, James

    2014-10-01

    Current high throughput sequencing (HTS) methods are limited in their ability to resolve bacteria at or below the genus level. While the impact of this limitation may be relatively minor in whole-community analyses, it constrains the use of HTS as a tool for identifying and examining individual bacteria of interest. The limited resolution is a consequence of both short read lengths and insufficient sequence variation within the commonly targeted variable regions of the small-subunit rRNA (SSU) gene. The goal of this work was to improve the resolving power of bacterial HTS. We developed an assay targeting the hypervariable rRNA internal transcribed spacer (ITS) region residing between the SSU and large-subunit (LSU) rRNA genes. Comparisons of the ITS region and two SSU regions using annotated bacterial genomes in GenBank showed much greater resolving power is possible with the ITS region. This report presents a new HTS method for analyzing bacterial composition with improved capabilities. The greater resolving power enabled by the ITS region arises from its high sequence variation across a wide range of bacterial taxa and an associated decrease in taxonomic heterogeneity within its OTUs. Although the method should be adaptable to any HTS platform, this report presents PCR primers, amplification parameters, and protocols for Illumina-based analyses.

  20. Discordant distribution of populations and genetic variation in a sea star with high dispersal potential.

    PubMed

    Keever, Carson C; Sunday, Jennifer; Puritz, Jonathan B; Addison, Jason A; Toonen, Robert J; Grosberg, Richard K; Hart, Michael W

    2009-12-01

    Patiria miniata, a broadcast-spawning sea star species with high dispersal potential, has a geographic range in the intertidal zone of the northeast Pacific Ocean from Alaska to California that is characterized by a large range gap in Washington and Oregon. We analyzed spatial genetic variation across the P. miniata range using multilocus sequence data (mtDNA, nuclear introns) and multilocus genotype data (microsatellites). We found a strong phylogeographic break at Queen Charlotte Sound in British Columbia that was not in the location predicted by the geographical distribution of the populations. However, this population genetic discontinuity does correspond to previously described phylogeographic breaks in other species. Northern populations from Alaska and Haida Gwaii were strongly differentiated from all southern populations from Vancouver Island and California. Populations from Vancouver Island and California were undifferentiated with evidence of high gene flow or very recent separation across the range disjunction between them. The surprising and discordant spatial distribution of populations and alleles suggests that historical vicariance (possibly caused by glaciations) and contemporary dispersal barriers (possibly caused by oceanographic conditions) both shape population genetic structure in this species.

  1. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    PubMed

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  2. Sr Isotopic Variation in Shallow Water Carbonate Sequences: Stratigraphic, Chronostratigraphic, and Eustatic Implications of the Record at Enewetak Atoll

    NASA Astrophysics Data System (ADS)

    Quinn, Terrence M.; Lohmann, K. C.; Halliday, A. N.

    1991-06-01

    Sr isotope data from two boreholes within the lagoon at Enewetak Atoll have been used to evaluate the use of such data to correlate, date, and monitor sea level change in shallow water carbonate sequences. Correlative stratigraphic intervals of relatively invariant δ87Sr, separated by abrupt transitions to lower δ87Sr with increasing depth, are recognized in both boreholes. Conversion of δ87Sr values to age via calibration with the seawater δ87Sr trend with age indicates that correlative and synchronous deposition of atoll sediments occurred at ˜ 0.4, 1.2, and 2.1 Ma. In contrast, a ˜5 m.y. hiatus is recognized in one borehole but not the other. Sr isotope stratigraphy (SIS) is a powerful stratigraphic and chronostratigraphic tool in shallow water carbonate sequences only when significant secular variation of δ87Sr occurs and retention of depositional δ87Sr values is demonstrated. The latter is best demonstrated when δ87Sr data, are integrated with δ18O, δ13C, Sr content data and petrographic observations. Several diagenetically altered intervals have greater δ87Sr values, low δ13C values, and low Sr/Ca ratios relative to adjacent intervals, a combination that is consistent with open-system meteoric diagenesis. Calcite cements from these intervals have early Pleistocene (˜1.2 Ma) δ87Sr values despite their occurrence well within the late Pliocene (˜2.1 Ma) sequence. Thus local sedimentological and diagenetic processes have produced intralagoon variability in the SIS of the two boreholes, complicating subsurface stratigraphic correlations. The occurrence of anomalously young calcite cement relative to adjacent limestone is a direct response of the interaction of sea level change and meteoric phreatic diagenesis whereby overlying metastable carbonates, with greater δ87Sr values, are dissolved during periods of atoll emergence and sea level lowstand liberating Sr and soil-gas CO2 to the pore fluid, which is then incorporated into downflow meteoric

  3. Human skin microbiota: high diversity of DNA viruses identified on the human skin by high throughput sequencing.

    PubMed

    Foulongne, Vincent; Sauvage, Virginie; Hebert, Charles; Dereure, Olivier; Cheval, Justine; Gouilh, Meriadeg Ar; Pariente, Kevin; Segondy, Michel; Burguière, Ana; Manuguerra, Jean-Claude; Caro, Valérie; Eloit, Marc

    2012-01-01

    The human skin is a complex ecosystem that hosts a heterogeneous flora. Until recently, the diversity of the cutaneous microbiota was mainly investigated for bacteria through culture based assays subsequently confirmed by molecular techniques. There are now many evidences that viruses represent a significant part of the cutaneous flora as demonstrated by the asymptomatic carriage of beta and gamma-human papillomaviruses on the healthy skin. Furthermore, it has been recently suggested that some representatives of the Polyomavirus genus might share a similar feature. In the present study, the cutaneous virome of the surface of the normal-appearing skin from five healthy individuals and one patient with Merkel cell carcinoma was investigated through a high throughput metagenomic sequencing approach in an attempt to provide a thorough description of the cutaneous flora, with a particular focus on its viral component. The results emphasize the high diversity of the viral cutaneous flora with multiple polyomaviruses, papillomaviruses and circoviruses being detected on normal-appearing skin. Moreover, this approach resulted in the identification of new Papillomavirus and Circovirus genomes and confirmed a very low level of genetic diversity within human polyomavirus species. Although viruses are generally considered as pathogen agents, our findings support the existence of a complex viral flora present at the surface of healthy-appearing human skin in various individuals. The dynamics and anatomical variations of this skin virome and its variations according to pathological conditions remain to be further studied. The potential involvement of these viruses, alone or in combination, in skin proliferative disorders and oncogenesis is another crucial issue to be elucidated.

  4. Sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R) is not associated with plumage variation in the blue-crowned manakin (Lepidothrix coronata).

    PubMed

    Cheviron, Z A; Hackett, Shannon J; Brumfield, Robb T

    2006-07-07

    Avian plumage traits are the targets of both natural and sexual selection. Consequently, genetic changes resulting in plumage variation among closely related taxa might represent important evolutionary events. The molecular basis of such differences, however, is unknown in most cases. Sequence variation in the melanocortin-1 receptor gene (MC1R) is associated with melanistic phenotypes in many vertebrate taxa, including several avian species. The blue-crowned manakin (Lepidothrix coronata), a widespread, sexually dichromatic passerine, exhibits striking geographic variation in male plumage colour across its range in southern Central America and western Amazonia. Northern males are black with brilliant blue crowns whereas southern males are green with lighter blue crowns. We sequenced 810 bp of the MC1R coding region in 23 individuals spanning the range of male plumage variation. The only variable sites we detected among L. coronata sequences were four synonymous substitutions, none of which were strictly associated with either plumage type. Similarly, comparative analyses showed that L. coronata sequences were monomorphic at the three amino acid sites hypothesized to be functionally important in other birds. These results demonstrate that genes other than MC1R underlie melanic plumage polymorphism in blue-crowned manakins.

  5. HetMappsS: Heterozygous mapping strategy for high resolution Genotyping-by-Sequencing Markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Reduced representation genotyping approaches, such as genotyping-by-sequencing (GBS), provide opportunities to generate high-resolution genetic maps at a low per-sample cost. However, missing data and non-uniform sequence coverage can complicate map creation in highly heterozygous species. To facili...

  6. Nucleosome adaptability conferred by sequence and structural variations in histone H2A-H2B dimers.

    PubMed

    Shaytan, Alexey K; Landsman, David; Panchenko, Anna R

    2015-06-01

    Nucleosome variability is essential for their functions in compacting the chromatin structure and regulation of transcription, replication and cell reprogramming. The DNA molecule in nucleosomes is wrapped around an octamer composed of four types of core histones (H3, H4, H2A, H2B). Nucleosomes represent dynamic entities and may change their conformation, stability and binding properties by employing different sets of histone variants or by becoming post-translationally modified. There are many variants of histones H2A and H2B. Specific H2A and H2B variants may preferentially associate with each other resulting in different combinations of variants and leading to the increased combinatorial complexity of nucleosomes. In addition, the H2A-H2B dimer can be recognized and substituted by chaperones/remodelers as a distinct unit, can assemble independently and is stable during nucleosome unwinding. In this review we discuss how sequence and structural variations in H2A-H2B dimers may provide necessary complexity and confer the nucleosome functional variability.

  7. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome

    PubMed Central

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions. PMID:26544948

  8. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    PubMed

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  9. Sequence Variation in Amplification Target Genes and Standards Influences Interlaboratory Comparison of BK Virus DNA Load Measurement

    PubMed Central

    Solis, Morgane; Meddeb, Mariam; Sueur, Charlotte; Domingo-Calap, Pilar; Soulier, Eric; Chabaud, Angeline; Perrin, Peggy; Moulin, Bruno; Bahram, Seiamak; Stoll-Keller, Françoise; Caillard, Sophie; Barth, Heidi

    2015-01-01

    International guidelines define a BK virus (BKV) load of ≥4 log10 copies/ml as presumptive of BKV-associated nephropathy (BKVN) and a cutoff for therapeutic intervention. To investigate whether BKV DNA loads (BKVL) are comparable between laboratories, 2 panels of 15 and 8 clinical specimens (urine, whole blood, and plasma) harboring different BKV genotypes were distributed to 20 and 27 French hospital centers in 2013 and 2014, respectively. Although 68% of the reported results fell within the acceptable range of the expected result ±0.5 log10, the interlaboratory variation ranged from 1.32 to 5.55 log10. Polymorphisms specific to BKV genotypes II and IV, namely, the number and position of mutations in amplification target genes and/or deletion in standards, arose as major sources of interlaboratory disagreements. The diversity of DNA purification methods also contributed to the interlaboratory variability, in particular for urine samples. Our data strongly suggest that (i) commercial external quality controls for BKVL assessment should include all major BKV genotypes to allow a correct evaluation of BKV assays, and (ii) the BKV sequence of commercial standards should be provided to users to verify the absence of mismatches with the primers and probes of their BKV assays. Finally, the optimization of primer and probe design and standardization of DNA extraction methods may substantially decrease interlaboratory variability and allow interinstitutional studies to define a universal cutoff for presumptive BKVN and, ultimately, ensure adequate patient care. PMID:26468499

  10. RNA sequencing analysis reveals transcriptomic variations in tobacco (Nicotiana tabacum) leaves affected by climate, soil, and tillage factors.

    PubMed

    Lei, Bo; Lu, Kun; Ding, Fuzhang; Zhang, Kai; Chen, Yi; Zhao, Huina; Zhang, Lin; Ren, Zhu; Qu, Cunmin; Guo, Wenjing; Wang, Jing; Pan, Wenjie

    2014-04-11

    The growth and development of plants are sensitive to their surroundings. Although numerous studies have analyzed plant transcriptomic variation, few have quantified the effect of combinations of factors or identified factor-specific effects. In this study, we performed RNA sequencing (RNA-seq) analysis on tobacco leaves derived from 10 treatment combinations of three groups of ecological factors, i.e., climate factors (CFs), soil factors (SFs), and tillage factors (TFs). We detected 4980, 2916, and 1605 differentially expressed genes (DEGs) that were affected by CFs, SFs, and TFs, which included 2703, 768, and 507 specific and 703 common DEGs (simultaneously regulated by CFs, SFs, and TFs), respectively. GO and KEGG enrichment analyses showed that genes involved in abiotic stress responses and secondary metabolic pathways were overrepresented in the common and CF-specific DEGs. In addition, we noted enrichment in CF-specific DEGs related to the circadian rhythm, SF-specific DEGs involved in mineral nutrient absorption and transport, and SF- and TF-specific DEGs associated with photosynthesis. Based on these results, we propose a model that explains how plants adapt to various ecological factors at the transcriptomic level. Additionally, the identified DEGs lay the foundation for future investigations of stress resistance, circadian rhythm and photosynthesis in tobacco.

  11. Gentic variation for high temperature tolerance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As global warming becomes inevitable, the sustainability of agricultural production in US and worldwide faces serious threat from extreme weather conditions, such as drought and high temperature (heat) stresses. While drought stress can be alleviated through irrigation, little can be done with high ...

  12. Standard finishing categories for high-throughput sequencing of viral genomes.

    PubMed

    Ladner, J T; Kuhn, J H; Palacios, G

    2016-04-01

    Viral genome sequencing has become the cornerstone of almost all aspects of virology. In particular, high-throughput, next-generation viral genome sequencing has become an integral part of molecular epidemiological investigations into outbreaks of viral disease, such as the recent outbreaks of Middle Eastern respiratory syndrome, Ebola virus disease and Zika virus infection. Multiple institutes have acquired the expertise and necessary infrastructure to perform such investigations, as evidenced by the accumulation of thousands of novel viral sequences over progressively shorter time periods. The authors recently proposed a nomenclature comprised of five high-throughput sequencing standard categories to describe the quality of determined viral genome sequences. These five categories (standard draft, high quality, coding complete, complete and finished) cover all levels of viral genome finishing and can be applied to sequences determined by any technology platform or assembly technique.

  13. Temporal and spatial variations in the composition of freshwater photosynthetic picoeukaryotes revealed by MiSeq sequencing from flow cytometry sorted samples.

    PubMed

    Li, Shengnan; Bronner, Gisèle; Lepère, Cécile; Kong, Fanxiang; Shi, Xiaoli

    2017-03-09

    The diversity and composition of photosynthetic picoeukaryotes (PPEs) in two large shallow lakes in China (Lake Taihu and Lake Chaohu) were investigated from flow cytometry sorted samples using Miseq high-throughput sequencing. We collected 65 samples covering different regions of the two lakes over four seasons to unveil spatial and temporal patterns of PPEs community composition. The use of flow cytometry sorting largely improved the efficiency of detecting PPEs sequences and over 70% of the retrieved reads belonged to PPEs. Chlorophyta and Bacillariophyta dominated PPEs in most of the samples. A distinct but complex seasonality of PPEs composition emerged at the OTUs level. NGS-based Miseq sequencing facilitates an in-depth view of numerous rare OTUs. Nearly 80% of the PPEs OTUs were rare and lots of them were detected only in one season, whereas most of the abundant OTUs were frequently detected in all seasons but only changed in relative abundances. Besides, a close relative of the marine PPEs species Ostreococcus sp. (OTU_1144, 99% identity) was discovered in freshwater systems for the first time and was abundant especially in winter. The diversity and community composition of PPEs were more dependent on season rather than sampling sites. Temperature, phycocyanin and NO3 N concentrations in Lake Taihu explained the PPE composition variations, whereas in Lake Chaohu TN/TP ratios, temperature, pH and nephelometric turbidity units (NTU) seemed to be the most important factors. In addition, a great number of OTUs belong to nonpigmented picoeukaryotes, especially Chytridiomycota, Perkinsozoa, Ciliophora and Cercozoa, which are known to include algae parasites as well as predators. The results of mantel test also showed that the community of photosynthetic and nonpigmented picoeukaryotes were significantly correlated in both lakes.

  14. Major breeding plumage color differences of male ruffs (Philomachus pugnax) are not associated with coding sequence variation in the MC1R gene.

    PubMed

    Farrell, Lindsay L; Küpper, Clemens; Burke, Terry; Lank, David B

    2015-01-01

    Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species.

  15. Dissecting a bacterial collagen domain from Streptococcus pyogenes: sequence and length-dependent variations in triple helix stability and folding.

    PubMed

    Yu, Zhuoxin; Brodsky, Barbara; Inouye, Masayori

    2011-05-27

    To better investigate the relationship between sequence, stability, and folding, the Streptococcus pyogenes collagenous domain CL (Gly-Xaa-Yaa)(79) was divided to create three recombinant triple helix subdomains A, B, and C of almost equal size with distinctive amino acid features: an A domain high in polar residues, a B domain containing the highest concentration of Pro residues, and a very highly charged C domain. Each segment was expressed as a monomer, a linear dimer, and a linear trimer fused with the trimerization domain (V domain) in Escherichia coli. All recombinant proteins studied formed stable triple helical structures, but the stability varied depending on the amino acid sequence in the A, B, and C segments and increased as the triple helix got longer. V-AAA was found to melt at a much lower temperature (31.0 °C) than V-ABC (V-CL), whereas V-BBB melted at almost the same temperature (∼36-37 °C). When heat-denatured, the V domain enhanced refolding for all of the constructs; however, the folding rate was affected by their amino acid sequences and became reduced for longer constructs. The folding rates of all the other constructs were lower than that of the natural V-ABC protein. Amino acid substitution mutations at all Pro residues in the C fragment dramatically decreased stability but increased the folding rate. These results indicate that the thermostability of the bacterial collagen is dominated by the most stable domain in the same manner as found with eukaryotic collagens.

  16. Crystallographic orientation variation of isothermal pearlite under high magnetic field

    SciTech Connect

    Meng, Lan Zhou, Xiaoling Chen, Jianhao

    2015-07-15

    Crystallographic orientation (CO) variation of magnetic-induced pearlite (MIP) during its microstructure evolution in 19.8 T was investigated by electron back-scatter diffraction (EBSD). It is closely related to the isothermal temperatures (ITs) and the applied magnetic time (MT) during the process of MIP formation. The <100> easy magnetization direction in MIP colonies is strengthened with the MT within the certain transformed fraction of MIP (f{sub MIP}) at the relatively lower IT (983 K) above the eutectoid temperature but below the magnetically shifted upward eutectoid temperature, while this special CO tends to be weakened at a relatively higher IT (995 K). For the same MT, the higher the IT, the relatively larger is the proportion in <100> orientation for MIP colonies at the early growth stage. These results have demonstrated that the change of <100> orientation of MIP is closely related to the growth rate of pearlite ferrite (PF), and strengthened mainly at early transformation stage. When f{sub MIP} reaches some value, the growth rate of MIP at other COs, such as <110>, even at the hard magnetization direction, turns to present speed-up. - Highlights: • HMF can induce pearlite with different fractions above the eutectoid temperature. • CO is closely related to isothermal temperatures and applied magnetic time. • <100> direction is related to the growth rate of PF, and strengthened at early stage. • When f{sub MIP} reaches some value, the growth rate at other COs turns to present speed-up.

  17. Genome-wide synteny through highly sensitive sequence alignment: Satsuma

    PubMed Central

    Grabherr, Manfred G.; Russell, Pamela; Meyer, Miriah; Mauceli, Evan; Alföldi, Jessica; Di Palma, Federica; Lindblad-Toh, Kerstin

    2010-01-01

    Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous ‘battleship’-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/ Contact: grabherr@broadinstitute.org PMID:20208069

  18. Characterization of an Unusually Conserved Alui Highly Reiterated DNA Sequence Family from the Honeybee, Apis Mellifera

    PubMed Central

    Tares, S.; Cornuet, J. M.; Abad, P.

    1993-01-01

    An AluI family of highly reiterated nontranscribed sequences has been found in the genome of the honeybee Apis mellifera. This repeated sequence is shown to be present at approximately 23,000 copies per haploid genome constituting about 2% of the total genomic DNA. The nucleotide sequence of 10 monomers was determined. The consensus sequence is 176 nucleotides long and has an A + T content of 58%. There are clusters of both direct and inverted repeats. Internal subrepeating units ranging from 11 to 17 nucleotides are observed, suggesting that it could have evolved from a shorter sequence. DNA sequence data reveal that this repeat class is unusually homogeneous compared to the other class of invertebrate highly reiterated DNA sequences. The average pairwise sequence divergence between the repeats is 2.5%. In spite of this unusual homogeneity, divergence has been found in the repeated sequence hybridization ladder between four different honeybee subspecies. Therefore, the AluI highly reiterated sequences provide a new probe for fingerprinting in A. m. mellifera. PMID:8104160

  19. Limited HLA sequence variation outside of antigen recognition domain exons of 360 10 of 10 matched unrelated hematopoietic stem cell transplant donor-recipient pairs.

    PubMed

    Hou, L; Vierra-Green, C; Lazaro, A; Brady, C; Haagenson, M; Spellman, S; Hurley, C K

    2017-01-01

    Traditional DNA-based typing focuses primarily on interrogating the exons of human leukocyte antigen (HLA) genes that form the antigen recognition domain (ARD). The relevance of mismatching donor and recipient for HLA variation outside the ARD on hematopoietic stem cell transplantation (HSCT) outcomes is unknown. This study was designed to evaluate the frequency of variation outside the ARD in 10 of 10 (HLA-A, -B, -C, -DRB1, -DQB1) matched unrelated donor transplant pairs (n = 360). Next-generation DNA sequencing was used to characterize both HLA exons and introns for HLA-A, -B, -C alleles; exons 2, 3 and the intervening intron for HLA-DRB1 and exons only for HLA-DQA1 and -DQB1. Over 97% of alleles at each locus were matched for their nucleotide sequence outside of the ARD exons. Of the 4320 allele comparisons overall, only 17 allele pairs were mismatched for non-ARD exons, 41 for noncoding regions and 9 for ARD exons. The observed variation between donor and recipient usually involved a single nucleotide difference (88% of mismatches); 88% of the non-ARD exon variants impacted the amino acid sequence. The impact of amino acid sequence variation caused by substitutions in exons outside ARD regions in D-R pairs will be difficult to assess in HSCT outcome studies because these mismatches do not occur very frequently.

  20. Identification of genetic variation between obligate plant pathogens Psuedoperonospora cubensis and P. humuli using RNA sequencing and genotyping-by-sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    RNA sequencing (RNA-seq) and genotyping-by-sequencing (GBS) were used for single nucleotide polymorphism (SNP) identification from two economically important obligate plant pathogens, Pseudoperonospora cubensis and P. humuli. Twenty isolates of P. cubensis and 19 isolates of P. humuli were genotyped...

  1. Spectral shape variation of interstellar electrons at high energies

    NASA Technical Reports Server (NTRS)

    Tan, L. C.

    1985-01-01

    The high energy electron spectrum analysis has shown that the electron intensity inside the H2 cloud region, or in a spiral arm, should be much lower than that outside it and the observed electron energy spectrum should flatten again at about 1 TeV. In the framework of the leady box model the recently established rigidity dependence of the escape pathlength of cosmic rays would predict a high energy electron spectrum which is flatter than the observed one. This divergence is explained by assuming that the leaky box model can only apply to cosmic ray heavy nuclei, and light nuclei and electrons in cosmic rays may have different behaviors in the interstellar propagation. Therefore, the measured data on high energy electrons should be analyzed based on the proposed nonuniform galactic disk (NUGD) mode.

  2. Extensive Profiling of a Complex Microbial Community by High-Throughput Sequencing

    PubMed Central

    Hill, Janet E.; Seipp, Robyn P.; Betts, Martin; Hawkins, Lindsay; Van Kessel, Andrew G.; Crosby, William L.; Hemmingsen, Sean M.

    2002-01-01

    Complex microbial communities remain poorly characterized despite their ubiquity and importance to human and animal health, agriculture, and industry. Attempts to describe microbial communities by either traditional microbiological methods or molecular methods have been limited in both scale and precision. The availability of genomics technologies offers an unprecedented opportunity to conduct more comprehensive characterizations of microbial communities. Here we describe the application of an established molecular diagnostic method based on the chaperonin-60 sequence, in combination with high-throughput sequencing, to the profiling of a microbial community: the pig intestinal microbial community. Four libraries of cloned cpn60 sequences were generated by two genomic DNA extraction procedures in combination with two PCR protocols. A total of 1,125 cloned cpn60 sequences from the four libraries were sequenced. Among the 1,125 cloned cpn60 sequences, we identified 398 different nucleotide sequences encoding 280 unique peptide sequences. Pairwise comparisons of the 398 unique nucleotide sequences revealed a high degree of sequence diversity within the library. Identification of the likely taxonomic origins of cloned sequences ranged from imprecise, with clones assigned to a taxonomic subclass, to precise, for cloned sequences with 100% DNA sequence identity with a species in our reference database. The compositions of the four libraries were compared and differences related to library construction parameters were observed. Our results indicate that this method is an alternative to 16S rRNA sequence-based studies which can be scaled up for the purpose of performing a potentially comprehensive assessment of a given microbial community or for comparative studies. PMID:12039767

  3. Sequence variation of the ribosomal DNA second internal transcribed spacer region in two spatially-distinct populations of Amblyomma americanum (L.) (Acari: Ixodidae).

    PubMed

    Reichard, M V; Kocan, A A; Van Den Bussche, R A; Barker, R W; Wyckoff, J H; Ewing, S A

    2005-04-01

    Sequence analysis of the ribosomal DNA second internal transcribed spacer (ITS 2) region in 2 spatially distinct populations of Amblyomma americanum (L.) revealed intraspecific variation. Nucleotide sequences from multiple DNA extractions and several polymerase chain reaction amplifications of eggs from mixed-parentage samples from both populations of ticks revealed that 12 of 1,145 (1.0%) sites varied. Three of the 12 sites of variation were distinct between the 2 A. americanum populations, which corresponded to a rate of 0.26%. Phylogenetic analysis based on ITS 2 sequences provided strong support (i.e., bootstrap value of 80%) that wild A. americanum clustered into a distinguishable group separate from those derived from colony ticks.

  4. Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis

    PubMed Central

    2014-01-01

    Background Sesame, Sesamum indicum L., is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. However, the molecular biology of sesame is largely unexplored. Results Here, we report a high-quality genome sequence of sesame assembled de novo with a contig N50 of 52.2 kb and a scaffold N50 of 2.1 Mb, containing an estimated 27,148 genes. The results reveal novel, independent whole genome duplication and the absence of the Toll/interleukin-1 receptor domain in resistance genes. Candidate genes and oil biosynthetic pathways contributing to high oil content were discovered by comparative genomic and transcriptomic analyses. These revealed the expansion of type 1 lipid transfer genes by tandem duplication, the contraction of lipid degradation genes, and the differential expression of essential genes in the triacylglycerol biosynthesis pathway, particularly in the early stage of seed development. Resequencing data in 29 sesame accessions from 12 countries suggested that the high genetic diversity of lipid-related genes might be associated with the wide variation in oil content. Additionally, the results shed light on the pivotal stage of seed development, oil accumulation and potential key genes for sesamin production, an important pharmacological constituent of sesame. Conclusions As an important species from the order Lamiales and a high oil crop, the sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame. PMID:24576357

  5. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

    PubMed Central

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-01-01

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam. PMID:21988835

  6. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

    PubMed

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-10-11

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

  7. High-Throughput Microdissection for Next-Generation Sequencing

    PubMed Central

    Rosenberg, Avi Z.; Armani, Michael D.; Fetsch, Patricia A.; Xi, Liqiang; Pham, Tina Thu; Raffeld, Mark; Chen, Yun; O’Flaherty, Neil; Stussman, Rebecca; Blackler, Adele R.; Du, Qiang; Hanson, Jeffrey C.; Roth, Mark J.; Filie, Armando C.; Roh, Michael H.; Emmert-Buck, Michael R.; Hipp, Jason D.; Tangrea, Michael A.

    2016-01-01

    Precision medicine promises to enhance patient treatment through the use of emerging molecular technologies, including genomics, transcriptomics, and proteomics. However, current tools in surgical pathology lack the capability to efficiently isolate specific cell populations in complex tissues/tumors, which can confound molecular results. Expression microdissection (xMD) is an immuno-based cell/subcellular isolation tool that procures targets of interest from a cytological or histological specimen. In this study, we demonstrate the accuracy and precision of xMD by rapidly isolating immunostained targets, including cytokeratin AE1/AE3, p53, and estrogen receptor (ER) positive cells and nuclei from tissue sections. Other targets procured included green fluorescent protein (GFP) expressing fibroblasts, in situ hybridization positive Epstein-Barr virus nuclei, and silver stained fungi. In order to assess the effect on molecular data, xMD was utilized to isolate specific targets from a mixed population of cells where the targets constituted only 5% of the sample. Target enrichment from this admixed cell population prior to next-generation sequencing (NGS) produced a minimum 13-fold increase in mutation allele frequency detection. These data suggest a role for xMD in a wide range of molecular pathology studies, as well as in the clinical workflow for samples where tumor cell enrichment is needed, or for those with a relative paucity of target cells. PMID:26999048

  8. Sequence Variation Analysis of Epstein-Barr Virus Nuclear Antigen 1 Gene in the Virus Associated Lymphomas of Northern China.

    PubMed

    Sun, Lingling; Zhao, Zhenzhen; Liu, Song; Liu, Xia; Sun, Zhifu; Luo, Bing

    2015-01-01

    Epstein-Barr virus (EBV) nuclear antigen 1 (EBNA1) is the only viral protein expressed in all EBV-positive tumors as it is essential for the maintenance, replication and transcription of the virus genome. According to the polymorphism of residue 487 in EBNA1 gene, EBV isolates can be classified into five subtypes: P-ala, P-thr, V-val, V-leu and V-pro. Whether these EBNA1 subtypes contribute to different tissue tropism of EBV and are consequently associated with certain malignancies remain to be determined. To elucidate the relationship, one hundred and ten EBV-positive lymphoma tissues of different types from Northern China, a non-NPC endemic area, were tested for the five subtypes by nested-PCR and DNA sequencing. In addition, EBV type 1 and type 2 classification was typed by using standard PCR assays across type-specific regions of the EBNA3C genes. Four EBNA1 subtypes were identified: V-val (68.2%, 75/110), P-thrV (15.5%, 17/110), V-leuV (3.6%, 4/110) and P-ala (10.9%, 12/110). The distribution of the EBNA1 subtypes in the four lymphoma groups was not significantly different (p = 0.075), neither was that of the EBV type 1/type 2 (p = 0.089). Compared with the previous data of gastric carcinoma (GC), nasopharyngeal carcinoma (NPC) and throat washing (TW) from healthy donors, the distribution of EBNA1 subtypes in lymphoma differed significantly (p = 0.016), with a little higher frequency of P-ala subtype. The EBV type distribution between lymphoma and the other three groups was significantly different (p = 0.000, p = 0.000, p = 0.001, respectively). The proportion of type 1 and type 2 mixed infections was higher in lymphoma than that in GC, NPC and TW. In lymphomas, the distribution of EBNA1 subtypes in the three EBV types was not significantly different (p = 0.546). These data suggested that the variation patterns of EBNA1 gene may be geographic-associated rather than tumor-specific and the role of EBNA1 gene variations in tumorigenesis needs more extensive and

  9. Identification of mitochondrial DNA sequence variation and development of single nucleotide polymorphic markers for CMS-D8 in cotton.

    PubMed

    Suzuki, Hideaki; Yu, Jiwen; Wang, Fei; Zhang, Jinfa

    2013-06-01

    Cytoplasmic male sterility (CMS), which is a maternally inherited trait and controlled by novel chimeric genes in the mitochondrial genome, plays a pivotal role in the production of hybrid seed. In cotton, no PCR-based marker has been developed to discriminate CMS-D8 (from Gossypium trilobum) from its normal Upland cotton (AD1, Gossypium hirsutum) cytoplasm. The objective of the current study was to develop PCR-based single nucleotide polymorphic (SNP) markers from mitochondrial genes for the CMS-D8 cytoplasm. DNA sequence variation in mitochondrial genes involved in the oxidative phosphorylation chain including ATP synthase subunit 1, 4, 6, 8 and 9, and cytochrome c oxidase 1, 2 and 3 subunits were identified by comparing CMS-D8, its isogenic maintainer and restorer lines on the same nuclear genetic background. An allelic specific PCR (AS-PCR) was utilized for SNP typing by incorporating artificial mismatched nucleotides into the third or fourth base from the 3' terminus in both the specific and nonspecific primers. The result indicated that the method modifying allele-specific primers was successful in obtaining eight SNP markers out of eight SNPs using eight primer pairs to discriminate two alleles between AD1 and CMS-D8 cytoplasms. Two of the SNPs for atp1 and cox1 could also be used in combination to discriminate between CMS-D8 and CMS-D2 cytoplasms. Additionally, a PCR-based marker from a nine nucleotide insertion-deletion (InDel) sequence (AATTGTTTT) at the 59-67 bp positions from the start codon of atp6, which is present in the CMS and restorer lines with the D8 cytoplasm but absent in the maintainer line with the AD1 cytoplasm, was also developed. A SNP marker for two nucleotide substitutions (AA in AD1 cytoplasm to CT in CMS-D8 cytoplasm) in the intron (1,506 bp) of cox2 gene was also developed. These PCR-based SNP markers should be useful in discriminating CMS-D8 and AD1 cytoplasms, or those with CMS-D2 cytoplasm as a rapid, simple, inexpensive, and

  10. FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections

    PubMed Central

    Alam, Khalid K; Chang, Jonathan L; Burke, Donald H

    2015-01-01

    High-throughput sequence (HTS) analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.). FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html. PMID:25734917

  11. Trait-associated sequence variation in the bovine growth hormone receptor 1A promoter does not affect promoter activity in vitro.

    PubMed

    Zhou, Y; Jiang, H

    2005-04-01

    Growth hormone (GH) plays a central role in growth and metabolism in cattle by binding to growth hormone receptor (GHR) and stimulating production of insulin-like growth factor 1 (IGF1). Two sequence variations in the promoter transcribing a major GHR mRNA variant, GHR 1A mRNA, have been reported to be associated with quantitative differences in growth rate or blood concentration of IGF1 in cattle. One such variation is in the length of a TG-repeat, being 11 or 16-20; the other variation is in the nucleotide 155 bp upstream from the transcription start site, being G or A. In this study, we determined whether these sequence variations would affect the activity of GHR 1A promoter. We cloned GHR 1A promoters bearing different sequence variations and linked each of them to a reporter gene. Transient transfection analyses revealed that these promoter-reporter constructs did not differ in reporter gene expression. Cotransfection analyses demonstrated that they also did not differ in activation by hepatocyte nuclear factor 4alpha, hepatocyte nuclear factor 4gamma and nuclear receptor subfamily 2 group F member 2, known transcription factors for bovine GHR 1A promoter. These in vitro results, together with a previous observation that neither the nucleotide 155 bp upstream from the transcription start site nor the TG-repeat was part of the GHR 1A promoter region interacting with nuclear proteins from bovine liver, do not support a cause-effect relationship between the reported sequence variations and the associated changes in growth rate or blood IGF1 concentration in cattle.

  12. DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification

    PubMed Central

    2013-01-01

    Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination

  13. Population variation revealed high-altitude adaptation of Tibetan mastiffs.

    PubMed

    Li, Yan; Wu, Dong-Dong; Boyko, Adam R; Wang, Guo-Dong; Wu, Shi-Fang; Irwin, David M; Zhang, Ya-Ping

    2014-05-01

    With the assistance of their human companions, dogs have dispersed into new environments during the expansion of human civilization. Tibetan Mastiff (TM), a native of the Tibetan Plateau, was derived from the domesticated Chinese native dog and, like Tibetans, has adapted to the extreme environment of high altitude. Here, we genotyped genome-wide single-nucleotide polymorphisms (SNPs) from 32 TMs and compared them with SNPs from 20 Chinese native dogs and 14 gray wolves (Canis lupus). We identified 16 genes with signals of positive selection in the TM, with 12 of these candidate genes associated with functions that have roles in adaptation to high-altitude adaptation, such as EPAS1, SIRT7, PLXNA4, and MAFG that have roles in responses to hypoxia. This study provides important information on the genetic diversity of the TM and potential mechanisms for adaptation to hypoxia.

  14. Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing

    PubMed Central

    Logue, Kyle; Keven, John Bosco; Cannon, Matthew V.; Reimer, Lisa; Siba, Peter; Walker, Edward D.; Zimmerman, Peter A.; Serre, David

    2016-01-01

    Understanding mosquito host choice is important for assessing vector competence or identifying disease reservoirs. Unfortunately, the availability of an unbiased method for comprehensively evaluating the composition of insect blood meals is very limited, as most current molecular assays only test for the presence of a few pre-selected species. These approaches also have limited ability to identify the presence of multiple mammalian hosts in a single blood meal. Here, we describe a novel high-throughput sequencing method that enables analysis of 96 mosquitoes simultaneously and provides a comprehensive and quantitative perspective on the composition of each blood meal. We validated in silico that universal primers targeting the mammalian mitochondrial 16S ribosomal RNA genes (16S rRNA) should amplify more than 95% of the mammalian 16S rRNA sequences present in the NCBI nucleotide database. We applied this method to 442 female Anopheles punctulatus s. l. mosquitoes collected in Papua New Guinea (PNG). While human (52.9%), dog (15.8%) and pig (29.2%) were the most common hosts identified in our study, we also detected DNA from mice, one marsupial species and two bat species. Our analyses also revealed that 16.3% of the mosquitoes fed on more than one host. Analysis of the human mitochondrial hypervariable region I in 102 human blood meals showed that 5 (4.9%) of the mosquitoes unambiguously fed on more than one person. Overall, analysis of PNG mosquitoes illustrates the potential of this approach to identify unsuspected hosts and characterize mixed blood meals, and shows how this approach can be adapted to evaluate inter-individual variations among human blood meals. Furthermore, this approach can be applied to any disease-transmitting arthropod and can be easily customized to investigate non-mammalian host sources. PMID:26963245

  15. Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing.

    PubMed

    Logue, Kyle; Keven, John Bosco; Cannon, Matthew V; Reimer, Lisa; Siba, Peter; Walker, Edward D; Zimmerman, Peter A; Serre, David

    2016-03-01