Science.gov

Sample records for high sequence variation

  1. High intraindividual variation in internal transcibed spacer sequences in Aeschynanthus (Gesneriaceae): implications for phylogenetics.

    PubMed Central

    Denduangboripant, J; Cronk, Q C

    2000-01-01

    Aeschynanthus (Gesneriaceae) is a large genus of tropical epiphytes that is widely distributed from the Himalayas and China throughout South-East Asia to New Guinea and the Solomon Islands. Polymerase chain reaction (PCR) consensus sequences of the internal transcribed spacers (ITS) of Aeschynanthus nuclear ribosomal DNA showed sequence polymorphism that was difficult to interpret. Cloning individual sequences from the PCR product generated a phylogenetic tree of 23 Aeschynanthus species (two clones per species). The intraindividual clone pairs varied from 0 to 5.01%. We suggest that the high intraindividual sequence variation results from low molecular drive in the ITS of Aeschynanthus. However, this study shows that, despite the variation found within some individuals, it is still possible to use these data to reconstruct phylogenetic relationships of the species, suggesting that clone variation, although persistent, does not pre-date the divergence of Aeschynanthus species. The Aeschynanthus analysis revealed two major clades with different but overlapping geographic distributions and reflected classification based on morphology (particularly seed hair type). PMID:10983824

  2. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi

    PubMed Central

    Holt, Kathryn E; Parkhill, Julian; Mazzoni, Camila J; Roumagnac, Philippe; Weill, François-Xavier; Goodhead, Ian; Rance, Richard; Baker, Stephen; Maskell, Duncan J; Wain, John; Dolecek, Christiane; Achtman, Mark; Dougan, Gordon

    2009-01-01

    Isolates of Salmonella enterica serovar Typhi (Typhi), a human-restricted bacterial pathogen that causes typhoid, show limited genetic variation. We generated whole-genome sequences for 19 Typhi isolates using 454 (Roche) and Solexa (Illumina) technologies. Isolates, including the previously sequenced CT18 and Ty2 isolates, were selected to represent major nodes in the phylogenetic tree. Comparative analysis showed little evidence of purifying selection, antigenic variation or recombination between isolates. Rather, evolution in the Typhi population seems to be characterized by ongoing loss of gene function, consistent with a small effective population size. The lack of evidence for antigenic variation driven by immune selection is in contrast to strong adaptive selection for mutations conferring antibiotic resistance in Typhi. The observed patterns of genetic isolation and drift are consistent with the proposed key role of asymptomatic carriers of Typhi as the main reservoir of this pathogen, highlighting the need for identification and treatment of carriers. PMID:18660809

  3. The Use of High-Throughput DNA Sequencing in the Investigation of Antigenic Variation: Application to Neisseria Species

    PubMed Central

    Davies, John K.; Harrison, Paul F.; Lin, Ya-Hsun; Bartley, Stephanie; Khoo, Chen Ai; Seemann, Torsten; Ryan, Catherine S.; Kahler, Charlene M.; Hill, Stuart A.

    2014-01-01

    Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci) into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3′ end of the silent loci (copy 1) as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11) are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species. PMID:24466206

  4. Comparative Analysis of Mycobacterium tuberculosis pe and ppe Genes Reveals High Sequence Variation and an Apparent Absence of Selective Constraints

    PubMed Central

    McEvoy, Christopher R. E.; Cloete, Ruben; Müller, Borna; Schürch, Anita C.; van Helden, Paul D.; Gagneux, Sebastien; Warren, Robin M.; Gey van Pittius, Nicolaas C.

    2012-01-01

    Mycobacterium tuberculosis complex (MTBC) genomes contain 2 large gene families termed pe and ppe. The function of pe/ppe proteins remains enigmatic but studies suggest that they are secreted or cell surface associated and are involved in bacterial virulence. Previous studies have also shown that some pe/ppe genes are polymorphic, a finding that suggests involvement in antigenic variation. Using comparative sequence analysis of 18 publicly available MTBC whole genome sequences, we have performed alignments of 33 pe (excluding pe_pgrs) and 66 ppe genes in order to detect the frequency and nature of genetic variation. This work has been supplemented by whole gene sequencing of 14 pe/ppe (including 5 pe_pgrs) genes in a cohort of 40 diverse and well defined clinical isolates covering all the main lineages of the M. tuberculosis phylogenetic tree. We show that nsSNP's in pe (excluding pgrs) and ppe genes are 3.0 and 3.3 times higher than in non-pe/ppe genes respectively and that numerous other mutation types are also present at a high frequency. It has previously been shown that non-pe/ppe M. tuberculosis genes display a remarkably low level of purifying selection. Here, we also show that compared to these genes those of the pe/ppe families show a further reduction of selection pressure that suggests neutral evolution. This is inconsistent with the positive selection pressure of “classical” antigenic variation. Finally, by analyzing such a large number of genes we were able to detect large differences in mutation type and frequency between both individual genes and gene sub-families. The high variation rates and absence of selective constraints provides valuable insights into potential pe/ppe function. Since pe/ppe proteins are highly antigenic and have been studied as potential vaccine components these results should also prove informative for aspects of M. tuberculosis vaccine design. PMID:22496726

  5. Analysis of genetic variation and diversity of Rice stripe virus populations through high-throughput sequencing.

    PubMed

    Huang, Lingzhe; Li, Zefeng; Wu, Jianxiang; Xu, Yi; Yang, Xiuling; Fan, Longjiang; Fang, Rongxiang; Zhou, Xueping

    2015-01-01

    Plant RNA viruses often generate diverse populations in their host plants through error-prone replication and recombination. Recent studies on the genetic diversity of plant RNA viruses in various host plants have provided valuable information about RNA virus evolution and emergence of new diseases caused by RNA viruses. We analyzed and compared the genetic diversity of Rice stripe virus (RSV) populations in Oryza sativa (a natural host of RSV) and compared it with that of the RSV populations generated in an infection of Nicotiana benthamiana, an experimental host of RSV, using the high-throughput sequencing technology. From infected O. sativa and N. benthamiana plants, a total of 341 and 1675 site substitutions were identified in the RSV genome, respectively, and the average substitution ratio in these sites was 1.47 and 7.05 %, respectively, indicating that the RSV populations from infected N. benthamiana plant are more diverse than those from infected O. sativa plant. Our result gives a direct evidence that virus might allow higher genetic diversity for host adaptation. PMID:25852724

  6. Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on "Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  7. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations

    PubMed Central

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-01-01

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species. PMID:26492246

  8. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes

    PubMed Central

    Zhuang, Jiali; Weng, Zhiping

    2015-01-01

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs. PMID:26283183

  9. High-Throughput Sequencing and Copy Number Variation Detection Using Formalin Fixed Embedded Tissue in Metastatic Gastric Cancer

    PubMed Central

    Hong, Min Eui; Do, In-Gu; Kang, So Young; Ha, Sang Yun; Kim, Seung Tae; Park, Se Hoon; Kang, Won Ki; Choi, Min-Gew; Lee, Jun Ho; Sohn, Tae Sung; Bae, Jae Moon; Kim, Sung; Kim, Duk-Hwan; Kim, Kyoung-Mee

    2014-01-01

    In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%), APC (10.1%), PIK3CA (5.6%), KRAS (4.5%), SMO (3.4%), STK11 (3.4%), CDKN2A (3.4%) and SMAD4 (3.4%). Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%), 4 (4.5%), 2 (2.2%), 1 (1.1%) and 1 (1.1%) cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes. PMID:25372287

  10. Sequence variation in nuclear ribosomal small subunit, internal transcribed spacer and large subunit regions of Rhizophagus irregularis and Gigaspora margarita is high and isolate-dependent.

    PubMed

    Thiéry, Odile; Vasar, Martti; Jairus, Teele; Davison, John; Roux, Christophe; Kivistik, Paula-Ann; Metspalu, Andres; Milani, Lili; Saks, Ülle; Moora, Mari; Zobel, Martin; Öpik, Maarja

    2016-06-01

    Arbuscular mycorrhizal (AM) fungi are known to exhibit high intra-organism genetic variation. However, information about intra- vs. interspecific variation among the genes commonly used in diversity surveys is limited. Here, the nuclear small subunit (SSU) rRNA gene, internal transcribed spacer (ITS) region and large subunit (LSU) rRNA gene portions were sequenced from 3 to 5 individual spores from each of two isolates of Rhizophagus irregularis and Gigaspora margarita. A total of 1482 Sanger sequences (0.5 Mb) from 239 clones were obtained, spanning ~4370 bp of the ribosomal operon when concatenated. Intrasporal and intra-isolate sequence variation was high for all three regions even though variant numbers were not exhausted by sequencing 12-40 clones per isolate. Intra-isolate nucleotide variation levels followed the expected order of ITS > LSU > SSU, but the values were strongly dependent on isolate identity. Single nucleotide polymorphism (SNP) densities over 4 SNP/kb in the ribosomal operon were detected in all four isolates. Automated operational taxonomic unit picking within the sequence set of known identity overestimated species richness with almost all cut-off levels, markers and isolates. Average intraspecific sequence similarity values were 99%, 96% and 94% for amplicons in SSU, LSU and ITS, respectively. The suitability of the central part of the SSU as a marker for AM fungal community surveys was further supported by its level of nucleotide variation, which is similar to that of the ITS region; its alignability across the entire phylum; its appropriate length for next-generation sequencing; and its ease of amplification in single-step PCR. PMID:27092961

  11. Transcriptome analysis of the variations between autotetraploid Paulownia tomentosa and its diploid using high-throughput sequencing.

    PubMed

    Fan, Guoqiang; Wang, Limin; Deng, Minjie; Niu, Suyan; Zhao, Zhenli; Xu, Enkai; Cao, Xibin; Zhang, Xiaoshen

    2015-08-01

    Timber properties of autotetraploid Paulownia tomentosa are heritable with whole genome duplication, but the molecular mechanisms for the predominant characteristics remain unclear. To illuminate the genetic basis, high-throughput sequencing technology was used to identify the related unigenes. 2677 unigenes were found to be significantly differentially expressed in autotetraploid P. tomentosa. In total, 30 photosynthesis-related, 21 transcription factor-related, and 22 lignin-related differentially expressed unigenes were detected, and the roles of the peroxidase in lignin biosynthesis, MYB DNA-binding proteins, and WRKY proteins associated with the regulation of relevant hormones are extensively discussed. The results provide transcriptome data that may bring a new perspective to explain the polyploidy mechanism in the long growth cycle of plants and offer some help to the future Paulownia breeding. PMID:25773315

  12. Sequence variation in ligand binding sites in proteins

    PubMed Central

    Magliery, Thomas J; Regan, Lynne

    2005-01-01

    Background The recent explosion in the availability of complete genome sequences has led to the cataloging of tens of thousands of new proteins and putative proteins. Many of these proteins can be structurally or functionally categorized from sequence conservation alone. In contrast, little attention has been given to the meaning of poorly-conserved sites in families of proteins, which are typically assumed to be of little structural or functional importance. Results Recently, using statistical free energy analysis of tetratricopeptide repeat (TPR) domains, we observed that positions in contact with peptide ligands are more variable than surface positions in general. Here we show that statistical analysis of TPRs, ankyrin repeats, Cys2His2 zinc fingers and PDZ domains accurately identifies specificity-determining positions by their sequence variation. Sequence variation is measured as deviation from a neutral reference state, and we present probabilistic and information theory formalisms that improve upon recently suggested methods such as statistical free energies and sequence entropies. Conclusion Sequence variation has been used to identify functionally-important residues in four selected protein families. With TPRs and ankyrin repeats, protein families that bind highly diverse ligands, the effect is so pronounced that sequence "hypervariation" alone can be used to predict ligand binding sites. PMID:16194281

  13. Lineage distribution and E2 sequence variation of high-risk human papillomavirus types isolated from patients with cervical cancer in Sichuan province, China.

    PubMed

    Wu, Haijing; Wu, Enqi; Ma, Lin; Zhang, Guonan; Shi, Yu; Huang, Jianming; Zha, Xiao

    2015-11-01

    To explore the nucleotide sequence variability of the E2 gene in high-risk HPV types in cervical cancer patients from Sichuan province, China, the E2 genes of eight high-risk HPV types were amplified and sequenced. Several novel nucleotide substitutions and deletions were observed. The lineages to which the isolates belonged were determined by phylogenetic analysis, employing the sequence of the representative lineages/sublineages in the coherent classification and nomenclature system as references. This study updates the lineage distribution data on high-risk HPV types in Southwest China and helps broaden understanding of the polymorphism of the E2 gene. PMID:26303138

  14. Identification of Sequence Variation in the Apolipoprotein A2 Gene and Their Relationship with Serum High-Density Lipoprotein Cholesterol Levels

    PubMed Central

    Bandarian, Fatemeh; Daneshpour, Maryam Sadat; Hedayati, Mehdi; Naseri, Mohsen; Azizi, Fereidoun

    2016-01-01

    Background: Apolipoprotein A2 (APOA2) is the second major apolipoprotein of the high-density lipoprotein cholesterol (HDL-C). The study aim was to identify APOA2 gene variation in individuals within two extreme tails of HDL-C levels and its relationship with HDL-C level. Methods: This cross-sectional survey was conducted on participants from Tehran Glucose and Lipid Study (TLGS) at Research Institute for Endocrine Sciences, Tehran, Iran from April 2012 to February 2013. In total, 79 individuals with extreme low HDL-C levels (≤5th percentile for age and gender) and 63 individuals with extreme high HDL-C levels (≥95th percentile for age and gender) were selected. Variants were identified using DNA amplification and direct sequencing. Results: Screen of all exons and the core promoter region of APOA2 gene identified nine single nucleotide substitutions and one microsatellite; five of which were known and four were new variants. Of these nine variants, two were common tag single nucleotide polymorphisms (SNPs) and seven were rare SNPs. Both exonic substitutions were missense mutations and caused an amino acid change. There was a significant association between the new missense mutation (variant Chr.1:16119226, Ala98Pro) and HDL-C level. Conclusion: None of two common tag SNPs of rs6413453 and rs5082 contributes to the HDL-C trait in Iranian population, but a new missense mutation in APOA2 in our population has a significant association with HDL-C. PMID:26590203

  15. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the first maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2,...

  16. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  17. Variations on strongly lacunary quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Kaplan, Huseyin; Cakalli, Huseyin

    2016-08-01

    We introduce a new function space, namely the space of Nθ (p)-ward continuous functions, which turns out to be a closed subspace of the space of continuous functions for each positive integer p. Nθα(p ) -ward continuity is also introduced and investigated for any fixed 0 < α ≤ 1, and for any fixed positive integer p. A real valued function f defined on a subset A of R, the set of real numbers is Nθα(p ) -ward continuous if it preserves Nθα(p ) -quasi-Cauchy sequences, i.e. (f (xn)) is an Nθα(p ) -quasi-Cauchy sequence whenever (xn) is Nθα(p ) -quasi-Cauchy sequence of points in A, where a sequence (xk) of points in R is called Nθα(p ) -quasi-Cauchy if lim r →∞ 1/hrα ∑k ∈Ir |Δ xk | p =0 , where Δxk = xk+1-xk for each positive integer k, p is a fixed positive integer, α is fixed in ]0, 1], Ir = (kr-1, kr], and θ = (kr) is a lacunary sequence, i.e. an increasing sequence of positive integers such that k0 ≠ 0, and hr: kr-kr-1 →∞.

  18. Protein structure prediction from sequence variation

    PubMed Central

    Marks, Debora S; Hopf, Thomas A; Sander, Chris

    2015-01-01

    Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics. PMID:23138306

  19. A variation on lacunary quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Cakalli, Huseyin; Et, Mikail; Sengul, Hacer

    2016-08-01

    In the present paper, we introduce a concept of ideal lacunary statistical quasi-Cauchy sequence of order α of real numbers in the sense that a sequence (xk) of points in R is called I-lacunary statistically quasi-Cauchy of order α, if { r ∈N :1/hrα | { k ∈Ir:| Δ xk | ≥ɛ } | ≥δ } ∈I for each ɛ > 0 and for each δ > 0, where an ideal I is a family of subsets of positive integers N which is closed under taking finite unions and subsets of its elements. The main purpose of this paper is to investigate ideal lacunary statistical ward continuity of order α, where a function f is called I- lacunary statistically ward continuous of order α if it preserves I-lacunary statistically quasi-Cauchy sequences of order α, i.e. (f (xn)) is a Sθα(I ) -quasi-Cauchy sequence whenever (xn) is.

  20. Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing

    PubMed Central

    Sevim, Volkan; Bashir, Ali; Chin, Chen-Shan; Miga, Karen H.

    2016-01-01

    Motivation: Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly. Results: We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies. Availability and implementation: Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI. Contact: ali.bashir@mssm.edu Supplementary information: Supplementary data are available at

  1. Unraveling genomic variation from next generation sequencing data

    PubMed Central

    2013-01-01

    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field. PMID:23885890

  2. A case study on the genetic origin of the high oleic acid trait through FAD2-1 DNA sequence variation in safflower (Carthamus tinctorius L.)

    PubMed Central

    Rapson, Sara; Wu, Man; Okada, Shoko; Das, Alpana; Shrestha, Pushkar; Zhou, Xue-Rong; Wood, Craig; Green, Allan; Singh, Surinder; Liu, Qing

    2015-01-01

    The safflower (Carthamus tinctorius L.) is considered a strongly domesticated species with a long history of cultivation. The hybridization of safflower with its wild relatives has played an important role in the evolution of cultivars and is of particular interest with regards to their production of high quality edible oils. Original safflower varieties were all rich in linoleic acid, while varieties rich in oleic acid have risen to prominence in recent decades. The high oleic acid trait is controlled by a partially recessive allele ol at a single locus OL. The ol allele was found to be a defective microsomal oleate desaturase FAD2-1. Here we present DNA sequence data and Southern blot analysis suggesting that there has been an ancient hybridization and introgression of the FAD2-1 gene into C. tinctorius from its wild relative C. palaestinus. It is from this gene that FAD2-1Δ was derived more recently. Identification and characterization of the genetic origin and diversity of FAD2-1 could aid safflower breeders in reducing population size and generations required for the development of new high oleic acid varieties by using perfect molecular marker-assisted selection. PMID:26442008

  3. Mitochondrial sequence variation in the Guahibo Amerindian population from Venezuela.

    PubMed

    Vona, Giuseppe; Falchi, Alessandra; Moral, Pedro; Calò, Carla M; Varesi, Laurent

    2005-07-01

    New data were obtained on mitochondrial DNA (mtDNA) from Guahibo from Venezuela, a group so far not studied using molecular data. A population sample (n = 59) was analyzed for mtDNA variation in two control-region hypervariable segments (HV1 and HV2) by sequencing. The presence or absence of a 9-bp polymorphism in the COII/tRNA(Lys) region was studied by direct amplification and electrophoretic identification. Thirty-eight variable sites were detected in regions HV1 and HV2, defining 26 mtDNA lineages; 23.7% of these were present in a single individual. The 9-bp deletion was found in 3.39% of individuals. Nucleotide and haplotype diversities were relatively high compared with other New World populations. The identified sequence haplotypes were classified into four major haplogroups (A-D) according to previous studies, with high frequencies for A (47.46%) and C (49.15%), low frequency for B (3.39%), and an absence of D. PMID:15558610

  4. The regions of sequence variation in caulimovirus gene VI.

    PubMed

    Sanger, M; Daubert, S; Goodman, R M

    1991-06-01

    The sequence of gene VI from figwort mosaic virus (FMV) clone x4 was determined and compared with that previously published for FMV clone DxS. Both clones originated from the same virus isolation, but the virus used to clone DxS was propagated extensively in a host of a different family prior to cloning whereas that used to clone x4 was not. Differences in the amino acid sequence inferred from the DNA sequences occurred in two clusters. An N-terminal conserved region preceded two regions of variation separated by a central conserved region. Variation in cauliflower mosaic virus (CaMV) gene VI sequences, all of which were derived from virus isolates from hosts from one host family, was similar to that seen in the FMV comparison, though the extent of variation was less. Alignment of gene VI domains from FMV and CaMV revealed regions of amino acid sequence identical in both viruses within the conserved regions. The similarity in the pattern of conserved and variable domains of these two viruses suggests common host-interactive functions in caulimovirus gene VI homologues, and possibly an analogy between caulimoviruses and certain animal viruses in the influence of the host on sequence variability of viral genes. PMID:2024500

  5. Mapping copy number variation by population-scale genome sequencing.

    PubMed

    Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

    2011-02-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies. PMID:21293372

  6. Sequence variation in the human T-cell receptor loci.

    PubMed

    Mackelprang, Rachel; Carlson, Christopher S; Subrahmanyan, Lakshman; Livingston, Robert J; Eberle, Michael A; Nickerson, Deborah A

    2002-12-01

    Identifying common sequence variations known as single nucleotide polymorphisms (SNPs) in human populations is one of the current objectives of the human genome project. Nearly 3 million SNPs have been identified. Analysis of the relative allele frequency of these markers in human populations and the genetic associations between these markers, known as linkage disequilibrium, is now underway to generate a high-density genetic map. Because of the central role T cells play in immune reactivity, the T-cell receptor (TCR) loci have long been considered important candidates for common disease susceptibility within the immune system (e.g., asthma, atopy and autoimmunity). Over the past two decades, hundreds of SNPs in the TCR loci have been identified. Most studies have focused on defining SNPs in the variable gene segments which are involved in antigenic recognition. On average, the coding sequence of each TCR variable gene segment contains two SNPs, with many more found in the 5', 3' and intronic sequences of these segments. Therefore, a potentially large repertoire of functional variants exists in these loci. Association between SNPs (linkage disequilibrium) extends approximately 30 kb in the TCR loci, although a few larger regions of disequilibrium have been identified. Therefore, the SNPs found in one variable gene segment may or may not be associated with SNPs in other surrounding variable gene segments. This suggests that meaningful association studies in the TCR loci will require the analysis and typing of large marker sets to fully evaluate the role of TCR loci in common disease susceptibility in human populations. PMID:12493004

  7. Determining Word Sequence Variation Patterns in Clinical Documents using Multiple Sequence Alignment

    PubMed Central

    Meng, Frank; Morioka, Craig A.; El-Saden, Suzie

    2011-01-01

    Sentences and phrases that represent a certain meaning often exhibit patterns of variation where they differ from a basic structural form by one or two words. We present an algorithm that utilizes multiple sequence alignments (MSAs) to generate a representation of groups of phrases that possess the same semantic meaning but also share in common the same basic word sequence structure. The MSA enables the determination not only of the words that compose the basic word sequence, but also of the locations within the structure that exhibit variation. The algorithm can be utilized to generate patterns of text sequences that can be used as the basis for a pattern-based classifier, as a starting point to bootstrap the pattern building process for a regular expression-based classifiers, or serve to reveal the variation characteristics of sentences and phrases within a particular domain. PMID:22195152

  8. Sequence variations in the FAD2 gene in seeded pumpkins.

    PubMed

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-01-01

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2. PMID:26782391

  9. Dissecting the relationship between protein structure and sequence variation

    NASA Astrophysics Data System (ADS)

    Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

    2015-03-01

    Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.

  10. Terminal region sequence variations in variola virus DNA.

    PubMed

    Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J

    1996-07-15

    Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted. PMID:8661439

  11. Control for stochastic sampling variation and qualitative sequencing error in next generation sequencing

    PubMed Central

    Blomquist, Thomas; Crawford, Erin L.; Yeo, Jiyoun; Zhang, Xiaolu; Willey, James C.

    2015-01-01

    Background Clinical implementation of Next-Generation Sequencing (NGS) is challenged by poor control for stochastic sampling, library preparation biases and qualitative sequencing error. To address these challenges we developed and tested two hypotheses. Methods Hypothesis 1: Analytical variation in quantification is predicted by stochastic sampling effects at input of (a) amplifiable nucleic acid target molecules into the library preparation, (b) amplicons from library into sequencer, or (c) both. We derived equations using Monte Carlo simulation to predict assay coefficient of variation (CV) based on these three working models and tested them against NGS data from specimens with well characterized molecule inputs and sequence counts prepared using competitive multiplex-PCR amplicon-based NGS library preparation method comprising synthetic internal standards (IS). Hypothesis 2: Frequencies of technically-derived qualitative sequencing errors (i.e., base substitution, insertion and deletion) observed at each base position in each target native template (NT) are concordant with those observed in respective competitive synthetic IS present in the same reaction. We measured error frequencies at each base position within amplicons from each of 30 target NT, then tested whether they correspond to those within the 30 respective IS. Results For hypothesis 1, the Monte Carlo model derived from both sampling events best predicted CV and explained 74% of observed assay variance. For hypothesis 2, observed frequency and type of sequence variation at each base position within each IS was concordant with that observed in respective NTs (R2 = 0.93). Conclusion In targeted NGS, synthetic competitive IS control for stochastic sampling at input of both target into library preparation and of target library product into sequencer, and control for qualitative errors generated during library preparation and sequencing. These controls enable accurate clinical diagnostic reporting of

  12. DNA Shape versus Sequence Variations in the Protein Binding Process.

    PubMed

    Chen, Chuanying; Pettitt, B Montgomery

    2016-02-01

    The binding process of a protein with a DNA involves three stages: approach, encounter, and association. It has been known that the complexation of protein and DNA involves mutual conformational changes, especially for a specific sequence association. However, it is still unclear how the conformation and the information in the DNA sequences affects the binding process. What is the extent to which the DNA structure adopted in the complex is induced by protein binding, or is instead intrinsic to the DNA sequence? In this study, we used the multiscale simulation method to explore the binding process of a protein with DNA in terms of DNA sequence, conformation, and interactions. We found that in the approach stage the protein can bind both the major and minor groove of the DNA, but uses different features to locate the binding site. The intrinsic conformational properties of the DNA play a significant role in this binding stage. By comparing the specific DNA with the nonspecific in unbound, intermediate, and associated states, we found that for a specific DNA sequence, ∼40% of the bending in the association forms is intrinsic and that ∼60% is induced by the protein. The protein does not induce appreciable bending of nonspecific DNA. In addition, we proposed that the DNA shape variations induced by protein binding are required in the early stage of the binding process, so that the protein is able to approach, encounter, and form an intermediate at the correct site on DNA. PMID:26840719

  13. Flagellin gene sequence variation in the genus Pseudomonas.

    PubMed

    Bellingham, N F; Morgan, J A; Saunders, J R; Winstanley, C

    2001-07-01

    Flagellin gene (fliC) sequences from 18 strains of Pseudomonas sensu stricto representing 8 different species, and 9 representative fliC sequences from other members of the gamma sub-division of proteobacteria, were compared. Analysis was performed on N-terminal, C-terminal and whole fliC sequences. The fliC analyses confirmed the inferred relationship between P. mendocina, P. oleovorans and P. aeruginosa based on 16S rRNA sequence comparisons. In addition, the analyses indicated that P. putida PRS2000 was closely related to P. fluorescens SBW25 and P. fluorescens NCIMB 9046T, but suggested that P. putida PaW8 and P. putida PRS2000 were more closely related to other Pseudomonas spp. than they were to each other. There were a number of inconsistencies in inferred evolutionary relationships between strains, depending on the analysis performed. In particular, whole flagellin gene comparisons often differed from those obtained using N- and C-terminal sequences. However, there were also inconsistencies between the terminal region analyses, suggesting that phylogenetic relationships inferred on the basis of fliC sequence should be treated with caution. Although the central domain of fliC is highly variable between Pseudomonas strains, there was evidence of sequence similarities between the central domains of different Pseudomonas fliC sequences. This indicates the possibility of recombination in the central domain of fliC genes within Pseudomonas species, and between these genes and those from other bacteria. PMID:11518318

  14. Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.

    PubMed

    Faber-Hammond, Joshua J; Brown, Kim H

    2016-07-01

    The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2-5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10-20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine. PMID:27061184

  15. STR allele sequence variation: Current knowledge and future issues.

    PubMed

    Gettings, Katherine Butler; Aponte, Rachel A; Vallone, Peter M; Butler, John M

    2015-09-01

    This article reviews what is currently known about short tandem repeat (STR) allelic sequence variation in and around the twenty-four loci most commonly used throughout the world to perform forensic DNA investigations. These STR loci include D1S1656, TPOX, D2S441, D2S1338, D3S1358, FGA, CSF1PO, D5S818, SE33, D6S1043, D7S820, D8S1179, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D18S51, D19S433, D21S11, Penta D, and D22S1045. All known reported variant alleles are compiled along with genomic information available from GenBank, dbSNP, and the 1000 Genomes Project. Supplementary files are included which provide annotated reference sequences for each STR locus, characterize genomic variation around the STR repeat region, and compare alleles present in currently available STR kit allelic ladders. Looking to the future, STR allele nomenclature options are discussed as they relate to next generation sequencing efforts underway. PMID:26197946

  16. Pyrosequencing for discovery and analysis of DNA sequence variations.

    PubMed

    Ronaghi, Mostafa; Shokralla, Shadi; Gharizadeh, Baback

    2007-10-01

    Since the invention of pyrosequencing, more than 500 articles have been published describing different applications of this technology, most notably for DNA structure variation and microbial detection. Technological advances have been made to enhance the robustness and accuracy of this technique as well as to reduce the cost and increase the throughput. This review intends to cover recent advances in this technology and discuss its application for low and high-throughput DNA variation studies. PMID:17979516

  17. Targeted capture enrichment and sequencing identifies extensive nucleotide variation in the turkey MHC-B.

    PubMed

    Reed, Kent M; Mendoza, Kristelle M; Settlage, Robert E

    2016-03-01

    Variation in the major histocompatibility complex (MHC) is increasingly associated with disease susceptibility and resistance in avian species of agricultural importance. This variation includes sequence polymorphisms but also structural differences (gene rearrangement) and copy number variation (CNV). The MHC has now been described for multiple galliform species including the best defined assemblies of the chicken (Gallus gallus) and domestic turkey (Meleagris gallopavo). Using this sequence resource, this study applied high-throughput sequencing to investigate MHC variation in turkeys of North America (NA turkeys). An MHC-specific SureSelect (Agilent) capture array was developed, and libraries were created for 14 turkeys representing domestic (commercial bred), heritage breed, and wild turkeys. In addition, a representative of the Ocellated turkey (M. ocellata) and chicken (G. gallus) was included to test cross-species applicability of the capture array allowing for identification of new species-specific polymorphisms. Libraries were hybridized to ∼12 K cRNA baits and the resulting pools were sequenced. On average, 98% of processed reads mapped to the turkey whole genome sequence and 53% to the MHC target. In addition to the MHC, capture hybridization recovered sequences corresponding to other MHC regions. Sequence alignment and de novo assembly indicated the presence of several additional BG genes in the turkey with evidence for CNV. Variant detection identified an average of 2245 polymorphisms per individual for the NA turkeys, 3012 for the Ocellated turkey, and 462 variants in the chicken (RJF-256). This study provides an extensive sequence resource for examining MHC variation and its relation to health of this agriculturally important group of birds. PMID:26729471

  18. DYZ1 arrays show sequence variation between the monozygotic males

    PubMed Central

    2014-01-01

    Background Monozygotic twins (MZT) are an important resource for genetical studies in the context of normal and diseased genomes. In the present study we used DYZ1, a satellite fraction present in the form of tandem arrays on the long arm of the human Y chromosome, as a tool to uncover sequence variations between the monozygotic males. Results We detected copy number variation, frequent insertions and deletions within the sequences of DYZ1 arrays amongst all the three sets of twins used in the present study. MZT1b showed loss of 35 bp compared to that in 1a, whereas 2a showed loss of 31 bp compared to that in 2b. Similarly, 3b showed 10 bp insertion compared to that in 3a. MZT1a germline DNA showed loss of 5 bp and 1b blood DNA showed loss of 26 bp compared to that of 1a blood and 1b germline DNA, respectively. Of the 69 restriction sites detected in DYZ1 arrays, MboII, BsrI, TspEI and TaqI enzymes showed frequent loss and or gain amongst all the 3 pairs studied. MZT1 pair showed loss/gain of VspI, BsrDI, AgsI, PleI, TspDTI, TspEI, TfiI and TaqI restriction sites in both blood and germline DNA. All the three sets of MZT showed differences in the number of DYZ1 copies. FISH signals reflected somatic mosaicism of the DYZ1 copies across the cells. Conclusions DYZ1 showed both sequence and copy number variation between the MZT males. Sequence variation was also noticed between germline and blood DNA samples of the same individual as we observed at least in one set of sample. The result suggests that DYZ1 faithfully records all the genetical changes occurring after the twining which may be ascribed to the environmental factors. PMID:24495361

  19. Comparative RNA sequencing reveals substantial genetic variation in endangered primates

    PubMed Central

    Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav

    2012-01-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615

  20. The Quantification of Representative Sequences pipeline for amplicon sequencing: case study on within-population ITS1 sequence variation in a microparasite infecting Daphnia.

    PubMed

    González-Tortuero, E; Rusek, J; Petrusek, A; Gießler, S; Lyras, D; Grath, S; Castro-Monzón, F; Wolinska, J

    2015-11-01

    Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within-population variation. Additionally, a public Illumina data set was used to validate the pipeline on community-level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova) revealed that population structure of C. mesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within-population structure but also the successful application of the QRS pipeline on Illumina-generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences. PMID:25728529

  1. Geochemical variations during the 2012 Emilia seismic sequence

    NASA Astrophysics Data System (ADS)

    Sciarra, Alessandra; Cantucci, Barbara; Galli, Gianfranco; Cinti, Daniele; Pizzino, Luca

    2015-04-01

    , apart one sample, are not thermally anomalous. Stable isotopes of H and O point out the absence of mixing with connate waters, prolonged interaction with the host-rock at high temperature and/or heavy gas-water exchange at depth. Isotopic carbon composition emphasizes its organic (i.e. shallow) origin; only "La Canonica" site, the deepest well sampled in this study, shows a probable deep(er) provenance of dissolved carbon. Waters trend away from the atmospheric end-member composition, dissolving CO2 or CH4 depending on their redox state. Dissolved radon activity is very low, likely due to the particular hydrogeological setting of the study area (i.e. the presence of waters with long residence times in the considered aquifers). Obtained results highlight a different behavior before and after the seismic events, proved also by the different carbon isotopic signature of CH4. These variations could be produced by increasing of bacterial (e.g. peat strata) and methanogenic fermentation processes in the first meters of the soil.

  2. Variational formulation of high performance finite elements: Parametrized variational principles

    NASA Technical Reports Server (NTRS)

    Felippa, Carlos A.; Militello, Carmello

    1991-01-01

    High performance elements are simple finite elements constructed to deliver engineering accuracy with coarse arbitrary grids. This is part of a series on the variational basis of high-performance elements, with emphasis on those constructed with the free formulation (FF) and assumed natural strain (ANS) methods. Parametrized variational principles that provide a foundation for the FF and ANS methods, as well as for a combination of both are presented.

  3. Lysoplex: An efficient toolkit to detect DNA sequence variations in the autophagy-lysosomal pathway

    PubMed Central

    Di Fruscio, Giuseppina; Schulz, Angela; De Cegli, Rossella; Savarese, Marco; Mutarelli, Margherita; Parenti, Giancarlo; Banfi, Sandro; Braulke, Thomas; Nigro, Vincenzo; Ballabio, Andrea

    2015-01-01

    The autophagy-lysosomal pathway (ALP) regulates cell homeostasis and plays a crucial role in human diseases, such as lysosomal storage disorders (LSDs) and common neurodegenerative diseases. Therefore, the identification of DNA sequence variations in genes involved in this pathway and their association with human diseases would have a significant impact on health. To this aim, we developed Lysoplex, a targeted next-generation sequencing (NGS) approach, which allowed us to obtain a uniform and accurate coding sequence coverage of a comprehensive set of 891 genes involved in lysosomal, endocytic, and autophagic pathways. Lysoplex was successfully validated on 14 different types of LSDs and then used to analyze 48 mutation-unknown patients with a clinical phenotype of neuronal ceroid lipofuscinosis (NCL), a genetically heterogeneous subtype of LSD. Lysoplex allowed us to identify pathogenic mutations in 67% of patients, most of whom had been unsuccessfully analyzed by several sequencing approaches. In addition, in 3 patients, we found potential disease-causing variants in novel NCL candidate genes. We then compared the variant detection power of Lysoplex with data derived from public whole exome sequencing (WES) efforts. On average, a 50% higher number of validated amino acid changes and truncating variations per gene were identified. Overall, we identified 61 truncating sequence variations and 488 missense variations with a high probability to cause loss of function in a total of 316 genes. Interestingly, some loss-of-function variations of genes involved in the ALP pathway were found in homozygosity in the normal population, suggesting that their role is not essential. Thus, Lysoplex provided a comprehensive catalog of sequence variants in ALP genes and allows the assessment of their relevance in cell biology as well as their contribution to human disease. PMID:26075876

  4. Genotyping common and rare variation using overlapping pool sequencing

    PubMed Central

    2011-01-01

    Background Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants. Results In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications. Conclusions Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences. PMID:21989232

  5. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein

  6. Mitochondrial Sequence Variation in African-American Primary Open-Angle Glaucoma Patients

    PubMed Central

    Collins, David W.; Gudiseva, Harini V.; Trachtman, Benjamin T.; Jerrehian, Matthew; Gorry, Thomasine; Merritt III, William T.; Rhodes, Allison L.; Sankar, Prithvi S.; Regina, Meredith; Miller-Ellis, Eydie; O’Brien, Joan M.

    2013-01-01

    Primary open-angle glaucoma (POAG) is a major cause of blindness and results from irreversible retinal ganglion cell damage and optic nerve degeneration. In the United States, POAG is most prevalent in African-Americans. Mitochondrial genetics and dysfunction have been implicated in POAG, and potentially pathogenic sequence variations, in particular novel transversional base substitutions, are reportedly common in mitochondrial genomes (mtDNA) from POAG patient blood. The purpose of this study was to ascertain the spectrum of sequence variation in mtDNA from African-American POAG patients and determine whether novel nonsynonymous, transversional or other potentially pathogenic sequence variations are observed more commonly in POAG cases than controls. mtDNA from African-American POAG cases (n = 22) and age-matched controls (n = 22) was analyzed by deep sequencing of a single 16,487 base pair PCR amplicon by Ion Torrent, and candidate novel variants were validated by Sanger sequencing. Sequence variants were classified and interpreted using the MITOMAP compendium of polymorphisms. 99.8% of the observed variations had been previously reported. The ratio of novel variants to POAG cases was 7-fold lower than a prior estimate. Novel mtDNA variants were present in 3 of 22 cases, novel nonsynonymous changes in 1 of 22 cases and novel transversions in 0 of 22 cases; these proportions are significantly lower (p<.0005, p<.0004, p<.0001) than estimated previously for POAG, and did not differ significantly from controls. Although it is possible that mitochondrial genetics play a role in African-Americans’ high susceptibility to POAG, it is unlikely that any mitochondrial respiratory dysfunction is due to an abnormally high incidence of novel mutations that can be detected in mtDNA from peripheral blood. PMID:24146900

  7. Mitochondrial sequence variation in African-American primary open-angle glaucoma patients.

    PubMed

    Collins, David W; Gudiseva, Harini V; Trachtman, Benjamin T; Jerrehian, Matthew; Gorry, Thomasine; Merritt, William T; Rhodes, Allison L; Sankar, Prithvi S; Regina, Meredith; Miller-Ellis, Eydie; O'Brien, Joan M

    2013-01-01

    Primary open-angle glaucoma (POAG) is a major cause of blindness and results from irreversible retinal ganglion cell damage and optic nerve degeneration. In the United States, POAG is most prevalent in African-Americans. Mitochondrial genetics and dysfunction have been implicated in POAG, and potentially pathogenic sequence variations, in particular novel transversional base substitutions, are reportedly common in mitochondrial genomes (mtDNA) from POAG patient blood. The purpose of this study was to ascertain the spectrum of sequence variation in mtDNA from African-American POAG patients and determine whether novel nonsynonymous, transversional or other potentially pathogenic sequence variations are observed more commonly in POAG cases than controls. mtDNA from African-American POAG cases (n = 22) and age-matched controls (n = 22) was analyzed by deep sequencing of a single 16,487 base pair PCR amplicon by Ion Torrent, and candidate novel variants were validated by Sanger sequencing. Sequence variants were classified and interpreted using the MITOMAP compendium of polymorphisms. 99.8% of the observed variations had been previously reported. The ratio of novel variants to POAG cases was 7-fold lower than a prior estimate. Novel mtDNA variants were present in 3 of 22 cases, novel nonsynonymous changes in 1 of 22 cases and novel transversions in 0 of 22 cases; these proportions are significantly lower (p<.0005, p<.0004, p<.0001) than estimated previously for POAG, and did not differ significantly from controls. Although it is possible that mitochondrial genetics play a role in African-Americans' high susceptibility to POAG, it is unlikely that any mitochondrial respiratory dysfunction is due to an abnormally high incidence of novel mutations that can be detected in mtDNA from peripheral blood. PMID:24146900

  8. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  9. Natural Allelic Variations in Highly Polyploidy Saccharum Complex

    PubMed Central

    Song, Jian; Yang, Xiping; Resende, Marcio F. R.; Neves, Leandro G.; Todd, James; Zhang, Jisen; Comstock, Jack C.; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes. PMID:27375658

  10. Natural Allelic Variations in Highly Polyploidy Saccharum Complex.

    PubMed

    Song, Jian; Yang, Xiping; Resende, Marcio F R; Neves, Leandro G; Todd, James; Zhang, Jisen; Comstock, Jack C; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes. PMID:27375658

  11. Natural Allelic Variations in Highly Polyploidy Saccharum Complex

    DOE PAGESBeta

    Song, Jian; Yang, Xiping; Resende, Marcio F. R.; Neves, Leandro G.; Todd, James; Zhang, Jisen; Comstock, Jack C.; Wang, Jianping

    2016-06-08

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed basedmore » on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non -redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp, diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less

  12. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

    PubMed

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

    2015-01-15

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  13. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions

    PubMed Central

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M.; Greenwood, Alex D.; Roca, Alfred L.

    2014-01-01

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  14. Deep sequencing of the hepatitis B virus in hepatocellular carcinoma patients reveals enriched integration events, structural alterations and sequence variations.

    PubMed

    Toh, Soo Ting; Jin, Yu; Liu, Lizhen; Wang, Jingbo; Babrzadeh, Farbod; Gharizadeh, Baback; Ronaghi, Mostafa; Toh, Han Chong; Chow, Pierce Kah-Hoe; Chung, Alexander Y-F; Ooi, London L-P-J; Lee, Caroline G-L

    2013-04-01

    Chronic hepatitis B virus (HBV) infection is epidemiologically associated with hepatocellular carcinoma (HCC), but its role in HCC remains poorly understood due to technological limitations. In this study, we systematically characterize HBV in HCC patients. HBV sequences were enriched from 48 HCC patients using an oligo-bead-based strategy, pooled together and sequenced using the FLX-Genome-Sequencer. In the tumors, preferential integration of HBV into promoters of genes (P < 0.001) and significant enrichment of integration into chromosome 10 (P < 0.01) were observed. Integration into chromosome 10 was significantly associated with poorly differentiated tumors (P < 0.05). Notably, in the tumors, recurrent integration into the promoter of the human telomerase reverse transcriptase (TERT) gene was found to correlate with increased TERT expression. The preferred region within the HBV genome involved in integration and viral structural alteration is at the 3'-end of hepatitis B virus X protein (HBx), where viral replication/transcription initiates. Upon integration, the 3'-end of the HBx is often deleted. HBx-human chimeric transcripts, the most common type of chimeric transcripts, can be expressed as chimeric proteins. Sequence variation resulting in non-conservative amino acid substitutions are commonly observed in HBV genome. This study highlights HBV as highly mutable in HCC patients with preferential regions within the host and virus genome for HBV integration/structural alterations. PMID:23276797

  15. Rate variation of DNA sequence evolution in the Drosophila lineages.

    PubMed Central

    Takano, T S

    1998-01-01

    Rate constancy of DNA sequence evolution was examined for three species of Drosophila, using two samples: the published sequences of eight genes from regions of the normal recombination rates and new data of the four AS-C (ac, sc, l'sc and ase) and ci genes. The AS-C and ci genes were chosen because these genes are located in the regions of very reduced recombination in Drosophila melanogaster and their locations remain unchanged throughout the entire lineages involved, yielding less effect of ancestral polymorphism in the study of rate constancy. The synonymous substitution pattern of the three lineages was found to be erratic in both samples. The dispersion index for replacement substitution was relatively high for the per, G6pd and ac genes. A significant heterogeneity was found in the number of synonymous substitutions in the three lineages between the two samples of genes with different recombination rates. This is partly due to a lack of the lineage effect in the D. melanogaster and Drosophila simulans lineages in the AS-C and ci genes in contrast to Akashi's observation of genes in regions of normal recombination. The higher codon bias in Drosophila yakuba as compared with D. melanogaster and D. simulans was observed in the four AS-C genes, which suggests change(s) in action of natural selection involved in codon usage on these genes. Fluctuating selection intensity may also be responsible for the observed locus-lineage interaction effects in synonymous substitution. PMID:9611206

  16. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  17. A map of human genome variation from population-scale sequencing.

    PubMed

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. PMID:20981092

  18. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware. PMID:24524158

  19. Population genetic structure of Indian shad, Tenualosa ilisha inferred from variation in mitochondrial DNA sequences.

    PubMed

    Behera, B K; Singh, N S; Paria, P; Sahoo, A K; Panda, D; Meena, D K; Das, P; Pakrashi, S; Biswas, D K; Sharma, A P

    2015-09-01

    Indian shad, Tenualosa ilisha, is a commercially important anadromous fish representing major catch in Indo-pacific region. The present study evaluated partial Cytochrome b (Cyt b) gene sequence of mtDNA in T. ilisha for determining genetic variation from Bay of Bengal and Arabian Sea origins. The genomic DNA extracted from T. ilisha samples representing two distant rivers in the Indian subcontinent, the Bhagirathi (lower stretch of Ganges) and the Tapi was analyzed. Sequencing of 307 bp mtDNA Cytochrome b gene fragment revealed the presence of 5 haplotypes, with high haplotype diversity (Hd) of 0.9048 with variance 0.103 and low nucleotide diversity (π) of 0.14301. Three population specific haplotypes were observed in river Ganga and two haplotypes in river Tapi. Neighbour-joining tree based on Cytochrome b gene sequences of T. ilisha showed that population from Bay of Bengal and Arabian Sea origins belonged to two distinct clusters. PMID:26521565

  20. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing.

    PubMed

    Ferreira, Pedro G; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R; Rivas, Manuel A; Esteve-Codina, Anna; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-01-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing-alternative splice sites, introns, and cleavage sites-which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts. PMID:27617755

  1. A framework for variation discovery and genotyping using next-generation DNA sequencing data.

    PubMed

    DePristo, Mark A; Banks, Eric; Poplin, Ryan; Garimella, Kiran V; Maguire, Jared R; Hartl, Christopher; Philippakis, Anthony A; del Angel, Guillermo; Rivas, Manuel A; Hanna, Matt; McKenna, Aaron; Fennell, Tim J; Kernytsky, Andrew M; Sivachenko, Andrey Y; Cibulskis, Kristian; Gabriel, Stacey B; Altshuler, David; Daly, Mark J

    2011-05-01

    Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets. PMID:21478889

  2. Spatio-temporal Variations of Characteristic Repeating Earthquake Sequences along the Middle America Trench in Mexico

    NASA Astrophysics Data System (ADS)

    Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.

    2015-12-01

    Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.

  3. Sources of variation in ancestral sequence reconstruction for HIV-1 envelope genes

    PubMed Central

    Ross, Howard A.; Nickle, David C.; Liu, Yi; Heath, Laura; Jensen, Mark A.; Rodrigo, Allen G.; Mullins, James I.

    2007-01-01

    We characterized the variation in the reconstructed ancestor of 118 HIV-1 envelope gene sequences arising from the methods used for (a) estimating and (b) rooting the phylogenetic tree, and (c) reconstructing the ancestor on that tree, from (d) the sequence format, and from (e) the number of input sequences. The method of rooting the tree was responsible for most of the sequence variation both among the reconstructed ancestral sequences and between the ancestral and observed sequences. Variation in predicted 3-D structural properties of the ancestors mirrored their sequence variation. The observed sequence consensus and ancestral sequences from center-rooted trees were most similar in all predicted attributes. Only for the predicted number of N-glycosylation sites was there a difference between MP and ML methods of reconstruction. Taxon sampling effects were observed only for outgroup-rooted trees, not center-rooted, reflecting the occurrence of several divergent basal sequences. Thus, for sequences exhibiting a radial phylogenetic tree, as does HIV-1, most of the variation in the estimated ancestor arises from the method of rooting the phylogenetic tree. Those investigating the ancestors of genes exhibiting such a radial tree should pay particular attention to alternate rooting methods in order to obtain a representative sample of ancestors. PMID:19455202

  4. BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers

    PubMed Central

    Abo, Ryan P.; Ducar, Matthew; Garcia, Elizabeth P.; Thorner, Aaron R.; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M.; Hahn, William C.; Meyerson, Matthew; Lindeman, Neal I.; Van Hummelen, Paul; MacConaill, Laura E.

    2015-01-01

    Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for ‘targeted’ resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a ‘kmer’ strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. PMID:25428359

  5. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity.

    PubMed

    Petrovski, Slavé; Gussow, Ayal B; Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H; Allen, Andrew S; Goldstein, David B

    2015-09-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, nc

  6. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

    PubMed Central

    Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

    2015-01-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance

  7. GENETIC VARIATION IN CLONAL VERTEBRATES DETECTED BY SIMPLE SEQUENCE FINGERPRINTING

    EPA Science Inventory

    Measurement of clonal heterogeneity is central to understanding evolutionary and population genetics of roughly 50 species of vertebrates lack effective genetic recombination. imple-sequence DNA fingerprinting with oligonucleotide probes (CAG)5 and (GACA)4 was used to detect hete...

  8. Storage and retrieval of highly repetitive sequence collections.

    PubMed

    Mäkinen, Veli; Navarro, Gonzalo; Sirén, Jouni; Välimäki, Niko

    2010-03-01

    A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible sequence will be incompressible under this model. We develop new static and dynamic full-text indexes that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations. The new indexes can be plugged into a recent dynamic fully-compressed suffix tree, achieving full functionality for sequence analysis, while retaining the reduced space and the polylogarithmic slowdown. Our experimental results confirm the practicality of our proposal. PMID:20377446

  9. From sequence to function: Insights from natural variation in budding yeasts☆

    PubMed Central

    Nieduszynski, Conrad A.; Liti, Gianni

    2011-01-01

    Background Natural variation offers a powerful approach for assigning function to DNA sequence—a pressing challenge in the age of high throughput sequencing technologies. Scope of Review Here we review comparative genomic approaches that are bridging the sequence–function and genotype–phenotype gaps. Reverse genomic approaches aim to analyse sequence to assign function, whereas forward genomic approaches start from a phenotype and aim to identify the underlying genotype responsible. Major Conclusions Comparative genomic approaches, pioneered in budding yeasts, have resulted in dramatic improvements in our understanding of the function of both genes and regulatory sequences. Analogous studies in other systems, including humans, demonstrate the ubiquity of comparative genomic approaches. Recently, forward genomic approaches, exploiting natural variation within yeast populations, have started to offer powerful insights into how genotype influences phenotype and even the ability to predict phenotypes. General Significance Comparative genomic experiments are defining the fundamental rules that govern complex traits in natural populations from yeast to humans. This article is part of a Special Issue entitled Systems Biology of Microorganisms. PMID:21320572

  10. CNV-TV: A robust method to discover copy number variation from short sequencing reads

    PubMed Central

    2013-01-01

    Background Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. Results A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. Conclusion The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods. PMID:23634703

  11. Cytochrome b nucleotide sequence variation among the Atlantic Alcidae.

    PubMed

    Friesen, V L; Montevecchi, W A; Davidson, W S

    1993-01-01

    Analysis of cytochrome b nucleotide sequences of the six extant species of Atlantic alcids and a gull revealed an excess of adenines and cytosines and a deficit of guanines at silent sites on the coding strand. Phylogenetic analyses grouped the sequences of the common (Uria aalge) and Brünnich's (U. lomvia) guillemots, followed by the razorbill (Alca torda) and little auk (Alle alle). The black guillemot (Cepphus grylle) sequence formed a sister taxon, and the puffin (Fratercula arctica) fell outside the other alcids. Phylogenetic comparisons of substitutions indicated that mutabilities of bases did not differ, but that C was much more likely to be incorporated than was G. Imbalances in base composition appear to result from a strand bias in replication errors, which may result from selection on secondary RNA structure and/or the energetics of codon-anticodon interactions. PMID:7916741

  12. Analysis of simian immunodeficiency virus sequence variation in tissues of rhesus macaques with simian AIDS.

    PubMed Central

    Kodama, T; Mori, K; Kawahara, T; Ringler, D J; Desrosiers, R C

    1993-01-01

    One rhesus macaque displayed severe encephalomyelitis and another displayed severe enterocolitis following infection with molecularly cloned simian immunodeficiency virus (SIV) strain SIVmac239. Little or no free anti-SIV antibody developed in these two macaques, and they died relatively quickly (4 to 6 months) after infection. Manifestation of the tissue-specific disease in these macaques was associated with the emergence of variants with high replicative capacity for macrophages and primary infection of tissue macrophages. The nature of sequence variation in the central region (vif, vpr, and vpx), the env gene, and the nef long terminal repeat (LTR) region in brain, colon, and other tissues was examined to see whether specific genetic changes were associated with SIV replication in brain or gut. Sequence analysis revealed strong conservation of the intergenic central region, nef, and the LTR. However, analysis of env sequences in these two macaques and one other revealed significant, interesting patterns of sequence variation. (i) Changes in env that were found previously to contribute to the replicative ability of SIVmac for macrophages in culture were present in the tissues of these animals. (ii) The greatest variability was located in the regions between V1 and V2 and from "V3" through C3 in gp120, which are different in location from the variable regions observed previously in animals with strong antibody responses and long-term persistent infection. (iii) The predominant sequence change of D-->N at position 385 in C3 is most surprising, since this change in both SIV and human immunodeficiency virus type 1 has been associated with dramatically diminished affinity for CD4 and replication in vitro. (iv) The nature of sequence changes at some positions (146, 178, 345, 385, and "V3") suggests that viral replication in brain and gut may be facilitated by specific sequence changes in env in addition to those that impart a general ability to replicate well in

  13. Variation in Symbiodinium ITS2 sequence assemblages among coral colonies.

    PubMed

    Stat, Michael; Bird, Christopher E; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J; Concepcion, Gregory T; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J; Gates, Ruth D

    2011-01-01

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping. PMID:21246044

  14. Variation in Symbiodinium ITS2 Sequence Assemblages among Coral Colonies

    PubMed Central

    Stat, Michael; Bird, Christopher E.; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J.; Concepcion, Gregory T.; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J.; Gates, Ruth D.

    2011-01-01

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping. PMID:21246044

  15. Tandem gene arrays in Trypanosoma brucei: Comparative phylogenomic analysis of duplicate sequence variation

    PubMed Central

    Jackson, Andrew P

    2007-01-01

    Background The genome sequence of the protistan parasite Trypanosoma brucei contains many tandem gene arrays. Gene duplicates are created through tandem duplication and are expressed through polycistronic transcription, suggesting that the primary purpose of long, tandem arrays is to increase gene dosage in an environment where individual gene promoters are absent. This report presents the first account of the tandem gene arrays in the T. brucei genome, employing several related genome sequences to establish how variation is created and removed. Results A systematic survey of tandem gene arrays showed that substantial sequence variation existed across the genome; variation from different regions of an array often produced inconsistent phylogenetic affinities. Phylogenetic relationships of gene duplicates were consistent with concerted evolution being a widespread homogenising force. However, tandem duplicates were not usually identical; therefore, any homogenising effect was coincident with divergence among duplicates. Allelic gene conversion was detected using various criteria and was apparently able to both remove and introduce sequence variation. Tandem arrays containing structural heterogeneity demonstrated how sequence homogenisation and differentiation can occur within a single locus. Conclusion The use of multiple genome sequences in a comparative analysis of tandem gene arrays identified substantial sequence variation among gene duplicates. The distribution of sequence variation is determined by a dynamic balance of conservative and innovative evolutionary forces. Gene trees from various species showed that intraspecific duplicates evolve in concert, perhaps through frequent gene conversion, although this does not prevent sequence divergence, especially where structural heterogeneity physically separates a duplicate from its neighbours. In describing dynamics of sequence variation that have consequences beyond gene dosage, this survey provides a basis for

  16. Sequence variations in the introns of the triosephosphate isomerase genes of Oesophagostomum dentatum and O. quadrispinulatum.

    PubMed

    Joachim, A; von Samson-Himmelstjerna, G

    2001-09-01

    Degenerated primers were used to amplify DNA fragments of the triosephosphate isomerase (TPI) gene from complementary DNA (cDNA) and from genomic DNA of two species of porcine gastrointestinal nematodes, Oesophagostomum dentatum and O.quadrispinulatum. Polymerase chain reaction (PCR) fragments amplified from cDNA were 520 bp in size for both species, while genomic fragments were 1,035 bp for O. dentatum (GC-content: 45%) and 1,331 bp for O. quadrispinulatum (44%). Sequence analyses revealed blocks of high homology in the exons interrupted by more variable parts in the intron regions. Five exons were predicted from the genomic sequences in the conserved regions which corresponded to the respective cDNA sequences with 6% interspecific differences. The predicted protein sequences (161 amino acids) were 98% similar between the species and showed 71% similarity to the putative protein of Caenorhabditis elegans. As a housekeeping gene, TPI could be amplified from cDNA of both infectious third-stage larvae and adults. Interspecific variations in the non-coding regions allow the PCR-based differentiation of the two Oesophagostomum spp. PMID:11570563

  17. Highly multiplexed DNA sequencing by capillary electrophoresis

    SciTech Connect

    Yeung, E.S.; Ueno, K.; Chang, H.T.

    1994-12-31

    It is obvious that irrespective of whichever basic technology is eventually selected to sequence the entire human genome there are substantial gains to be made if a high degree of multiplexing of parallel runs can be implemented. Such multiplexing should not involve expensive instrumentation and should not require additional personnel, or else the main objective of cost reduction will not be satisfied even though the total time for sequencing is reduced. In the last two years, several research groups have shown that capillary electrophoresis (CE) is an attractive alternative for DNA sequencing. Part of the improvement in sequencing speed in CE is counteracted by the inherent ability of slab gels for accommodating multiple lanes in a single run. Recently, the authors have developed several excitation schemes for highly multiplexed capillary electrophoresis. Detection at the pM level was demonstrated. The authors report here the use of a novel excitation geometry to simultaneously monitor 100 capillary tubes during electrophoresis. This represents a truly parallel multiplexing scheme for high-speed DNA sequencing.

  18. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-12-31

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  19. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-01-01

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants' isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  20. Mitochondrial DNA hypervariable region-1 sequence variation and phylogeny of the concolor gibbons, Nomascus.

    PubMed

    Monda, Keri; Simmons, Rachel E; Kressirer, Philipp; Su, Bing; Woodruff, David S

    2007-11-01

    The still little known concolor gibbons are represented by 14 taxa (five species, nine subspecies) distributed parapatrically in China, Myanmar, Vietnam, Laos and Cambodia. To set the stage for a phylogeographic study of the genus we examined DNA sequences from the highly variable mitochondrial hypervariable region-1 (HVR-1 or control region) in 51 animals, mostly of unknown geographic provenance. We developed gibbon-specific primers to amplify mtDNA noninvasively and obtained >477 bp sequences from 38 gibbons in North American and European zoos and >159 bp sequences from ten Chinese museum skins. In hindsight, we believe these animals represent eight of the nine nominal subspecies and four of the five nominal species. Bayesian, maximum likelihood and maximum parsimony haplotype network analyses gave concordant results and show Nomascus to be monophyletic. Significant intraspecific variation within N. leucogenys (17 haplotypes) is comparable with that reported earlier in Hylobates lar and less than half the known interspecific pairwise distances in gibbons. Sequence data support the recognition of five species (concolor, leucogenys, nasutus, gabriellae and probably hainanus) and suggest that nasutus is the oldest and leucogenys, the youngest taxon. In contrast, the subspecies N. c. furvogaster, N. c. jingdongensis, and N. leucogenys siki, are not recognizable at this otherwise informative genetic locus. These results show that HVR-1 sequence is variable enough to define evolutionarily significant units in Nomascus and, if coupled with multilocus microsatellite or SNP genotyping, more than adequate to characterize their phylogeographic history. There is an urgent need to obtain DNA from gibbons of known geographic provenance before they are extirpated to facilitate the conservation genetic management of the surviving animals. PMID:17455231

  1. Sequence variation at the major histocompatibility complex locus DQ beta in beluga whales (Delphinapterus leucas)

    PubMed

    Murray, B W; Malik, S; White, B N

    1995-07-01

    Genetic variation at the Major Histocompatibility Complex locus DQ beta was analyzed in 233 beluga whales (Delphinapterus leucas) from seven populations: St. Lawrence Estuary, eastern Beaufort Sea, eastern Chukchi Sea, western Hudson Bay, eastern Hudson Bay, southeastern Baffin Island, and High Arctic and in 12 narwhals (Monodon monoceros) sympatric with the High Arctic beluga population. Variation was assessed by amplification of the exon coding for the peptide binding region via the polymerase chain reaction, followed by either cloning and DNA sequencing or single-stranded conformation polymorphism analysis. Five alleles were found across the beluga populations and one in the narwhal. Pairwise comparisons of these alleles showed a 5:1 ratio of nonsynonymous to synonymous substitutions per site leading to eight amino acid differences, five of which were nonconservative substitutions, centered around positions previously shown to be important for peptide binding. Although the amount of allelic variation is low when compared with terrestrial mammals, the nature of the substitutions in the peptide binding sites indicates an important role for the DQ beta locus in the cellular immune response of beluga whales. Comparisons of allele frequencies among populations show the High Arctic population to be different (P < or = .005) from the other beluga populations surveyed. In these other populations an allele, Dele-DQ beta*0101-2, was found in 98% of the animals, while in the High Arctic it was found in only 52% of the animals. Two other alleles were found at high frequencies in the High Arctic population, one being very similar to the single allele found in narwhal. PMID:7659014

  2. Sequence Variation in the Small-Subunit rRNA Gene of Plasmodium malariae and Prevalence of Isolates with the Variant Sequence in Sichuan, China

    PubMed Central

    Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko

    1998-01-01

    By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600

  3. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  4. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

    PubMed

    Sheynkman, Gloria M; Shortreed, Michael R; Cesnik, Anthony J; Smith, Lloyd M

    2016-06-12

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  5. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  6. [Genuineness of Morinda officinalis How germplasm inferred from ITS sequences variation of nuclear ribosomal DNA].

    PubMed

    Ding, Ping; Liu, Jin; Qiu, Jin-Ying; Lai, Xiao-Ping

    2012-04-01

    PCR sequencing ITS genes methods were used to assess the genetic diversity of Morinda officinalis How different populations. The sequence of Morinda officinalis ITS gene was 567 bp in length, and the content of G/C was 64.5%. In this study, 17 haplotypes were obtained, which were at a high level of branching, and the haplotypes of Guangdong population showed to be the expansion origin. The result of the analysis of molecular variance (AMOVA) also showed that the percentage of variation among populations (56.65%) was greater than that within a population (43.35%). The F(ST) value was 0.566 5, and the genetic divergence among populations was significant. Mantel test results also indicated that the level of geneflow was positively correlated with geographic distances (R2 = 0.721 1). The result showed a good correlation between genotype and geographic distribution of Morinda officinalis, and ITS gene sequencing could be useful molecular method for the genuineness and phylogeography of Morinda officinalis. PMID:22799040

  7. Genome organization and variation in the 3'-partial sequence of garlic latent virus in China.

    PubMed

    Chen, Jiong; Zheng, Hongying; Chen, Jianping; Yang, Chongliang

    2002-08-01

    Ten different isolates of a carlavirus were detected by degenerate PCR from 12 garlic samples collected from 6 provinces in China, and the complete genome sequence of the Zhejiang isolate ZJ1 and 3'-terminal sequences of 9 other isolates were determined. The RNA genome of isolate ZJ1 consisted of 8363nts excluding the 3'-poly (A) tail, and the genome organization was similar to other carlaviruses with 6 open reading frames encoding a replicase, TGB1, TGB2, TGB3, CP and NABP respectively. Sequence comparisons showed that all 10 isolates were Garlic latent virus (GarLV). The variations in the TGB2, TGB3 and NABP were more significant than those in the CP. High homology was also detected between those isolates and Shallot latent virus (ShLV). Phylogenetic analysis suggested that GarLV isolates from garlic can be divided into 4 main groups and Chinese isolates belonged to each group. This is the first reported molecular analysis of members of the genus Carlavirus in China. PMID:18759032

  8. Mitochondrial control-region sequence variation in aboriginal Australians.

    PubMed

    van Holst Pellekaan, S; Frommer, M; Sved, J; Boettcher, B

    1998-02-01

    The mitochondrial D-loop hypervariable segment 1 (mt HVS1) between nucleotides 15997 and 16377 has been examined in aboriginal Australian people from the Darling River region of New South Wales (riverine) and from Yuendumu in central Australia (desert). Forty-seven unique HVS1 types were identified, varying at 49 nucleotide positions. Pairwise analysis by calculation of BEPPI (between population proportion index) reveals statistically significant structure in the populations, although some identical HVS1 types are seen in the two contrasting regions. mt HVS1 types may reflect more-ancient distributions than do linguistic diversity and other culturally distinguishing attributes. Comparison with sequences from five published global studies reveals that these Australians demonstrate greatest divergence from some Africans, least from Papua New Guinea highlanders, and only slightly more from some Pacific groups (Indonesian, Asian, Samoan, and coastal Papua New Guinea), although the HVS1 types vary at different nucleotide sites. Construction of a median network, displaying three main groups, suggests that several hypervariable nucleotide sites within the HVS1 are likely to have undergone mutation independently, making phylogenetic comparison with global samples by conventional methods difficult. Specific nucleotide-site variants are major separators in median networks constructed from Australian HVS1 types alone and for one global selection. The distribution of these, requiring extended study, suggests that they may be signatures of different groups of prehistoric colonizers into Australia, for which the time of colonization remains elusive. PMID:9463317

  9. Haplotypes and Sequence Variation in the Ovine Adiponectin Gene (ADIPOQ)

    PubMed Central

    An, Qing-Ming; Zhou, Hui-Tong; Hu, Jiang; Luo, Yu-Zhu; Hickford, Jon G. H.

    2015-01-01

    The adiponectin gene (ADIPOQ) plays an important role in energy homeostasis. In this study five separate regions (regions 1 to 5) of ovine ADIPOQ were analysed using PCR-SSCP. Four different PCR-SSCP patterns (A1-D1, A2-D2) were detected in region-1 and region-2, respectively, with seven and six SNPs being revealed. In region-3, three different patterns (A3-C3) and three SNPs were observed. Two patterns (A4-B4, A5-B5) and two and one SNPs were observed in region-4 and region-5, respectively. In total, nineteen SNPs were detected, with five of them in the coding region and two (c.46T/C and c.515G/A) putatively resulting in amino acid changes (p.Tyr16His and p.Lys172Arg). In region-1, -2 and -3 of 316 sheep from eight New Zealand breeds, variants A1, A2 and A3 were the most common, although variant frequencies differed in the eight breeds. Across region-1 and region-3, nine haplotypes were identified and haplotypes A1-A3, A1-C3, B1-A3 and B1-C3 were most common. These results indicate that the ADIPOQ gene is polymorphic and suggest that further analysis is required to see if the variation in the gene is associated with animal production traits. PMID:26610572

  10. Genome-Wide Characterization of Insertion and Deletion Variation in Chicken Using Next Generation Sequencing

    PubMed Central

    Yan, Yiyuan; Yi, Guoqiang; Sun, Congjiao; Qu, Lujiang; Yang, Ning

    2014-01-01

    Insertion and deletion (INDEL) is one of the main events contributing to genetic and phenotypic diversity, which receives less attention than SNP and large structural variation. To gain a better knowledge of INDEL variation in chicken genome, we applied next generation sequencing on 12 diverse chicken breeds at an average effective depth of 8.6. Over 1.3 million non-redundant short INDELs (1–49 bp) were obtained, the vast majority (92.48%) of which were novel. Follow-up validation assays confirmed that most (88.00%) of the randomly selected INDELs represent true variations. The majority (95.76%) of INDELs were less than 10 bp. Both the detected number and affected bases were larger for deletions than insertions. In total, INDELs covered 3.8 Mbp, corresponding to 0.36% of the chicken genome. The average genomic INDEL density was estimated as 0.49 per kb. INDELs were ubiquitous and distributed in a non-uniform fashion across chromosomes, with lower INDEL density in micro-chromosomes than in others, and some functional regions like exons and UTRs were prone to less INDELs than introns and intergenic regions. Nearly 620,253 INDELs fell in genic regions, 1,765 (0.28%) of which located in exons, spanning 1,358 (7.56%) unique Ensembl genes. Many of them are associated with economically important traits and some are the homologues of human disease-related genes. We demonstrate that sequencing multiple individuals at a medium depth offers a promising way for reliable identification of INDELs. The coding INDELs are valuable candidates for further elucidation of the association between genotypes and phenotypes. The chicken INDELs revealed by our study can be useful for future studies, including development of INDEL markers, construction of high density linkage map, INDEL arrays design, and hopefully, molecular breeding programs in chicken. PMID:25133774

  11. Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes.

    PubMed Central

    Stoneking, M; Hedgecock, D; Higuchi, R G; Vigilant, L; Erlich, H A

    1991-01-01

    A method for detecting sequence variation of hypervariable segments of the mtDNA control region was developed. The technique uses hybridization of sequence-specific oligonucleotide (SSO) probes to DNA sequences that have been amplified by PCR. The nucleotide sequences of the two hypervariable segments of the mtDNA control region from 52 individuals were determined; these sequences were then used to define nine regions suitable for SSO typing. A total of 23 SSO probes were used to detect sequence variants at these nine regions in 525 individuals from five ethnic groups (African, Asian, Caucasian, Japanese, and Mexican). The SSO typing revealed an enormous amount of variability, with 274 mtDNA types observed among these 525 individuals and with diversity values, for each population, exceeding .95. For each of the nine mtDNA regions significant differences in the frequencies of sequence variants were observed between these five populations. The mtDNA SSO-typing system was successfully applied to a case involving individual identification of skeletal remains; the probability of a random match was approximately 0.7%. The potential useful applications of this mtDNA SSO-typing system thus include the analysis of individual identity as well as population genetic studies. Images Figure 3 PMID:1990843

  12. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

    PubMed Central

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169

  13. Targeted Exome Sequencing Outcome Variations of Colorectal Tumors within and across Two Sequencing Platforms

    PubMed Central

    Ashktorab, Hassan; Azimi, Hamed; Nickerson, Michael L.; Bass, Sara; Varma, Sudhir; Brim, Hassan

    2016-01-01

    Background and Aim Next generation sequencing (NGS) has quickly the tool of choice for genome and exome data generation. The multitude of sequencing platforms as well as the variabilities within each platform need to be assessed. In this paper we used two platforms (ION TORRENT AND ILLUMINA) to assess single nucleotides variants in colorectal cancer (CRC) specimens. Methods CRC specimens (n = 13) collected from 6 CRC (cancer and matched normal) patients were used to establish the mutational profile using ION TORRENT AND ILLUMINA sequencing platforms. We analyzed a set of samples from Formalin Fixed Paraffin Embedded and FF (FF) samples on both platforms to assess the effect of sample nature (FFPE vs. FF) on sequencing outcome and to evaluate the similarity/differences of SNVs across the two platforms. In addition, duplicates of FF samples were sequenced on each platform to assess variability within platform. Results The comparison of FF replicates to each other gave a concordance of 77% (± 15.3%) in Ion Torrent and 70% (± 3.7%) in Illumina. FFPE vs. FF replicates gave a concordance of 40% (± 32%) in Ion Torrent and 49% (± 19%) in Illumina. For the cross platform concordance were FFPE compared to FF (Average of 75% (± 9.8%) for FFPE samples and 67% (± 32%) for FF and 70% (± 26.8%) overall average). Conclusion Our data show a significant variability within and across platforms. Also the number of detected variants depend on the nature of the specimen; FF vs. FFPE. Validation of NGS discovered mutations is a must to rule-out false positive mutants. This validation might either be performed through a second NGS platform or through Sanger sequencing.

  14. Extensive sequence variation in rice blast resistance gene Pi54 makes it broad spectrum in nature

    PubMed Central

    Thakur, Shallu; Singh, Pankaj K.; Das, Alok; Rathour, R.; Variar, M.; Prashanthi, S. K.; Singh, A. K.; Singh, U. D.; Chand, Duni; Singh, N. K.; Sharma, Tilak R.

    2015-01-01

    Rice blast resistant gene, Pi54 cloned from rice line, Tetep, is effective against diverse isolates of Magnaporthe oryzae. In this study, we prospected the allelic variants of the dominant blast resistance gene from a set of 92 rice lines to determine the nucleotide diversity, pattern of its molecular evolution, phylogenetic relationships and evolutionary dynamics, and to develop allele specific markers. High quality sequences were generated for homologs of Pi54 gene. Using comparative sequence analysis, InDels of variable sizes in all the alleles were observed. Profiling of the selected sites of SNP (Single Nucleotide Polymorphism) and amino acids (N sites ≥ 10) exhibited constant frequency distribution of mutational and substitutional sites between the resistance and susceptible rice lines, respectively. A total of 50 new haplotypes based on the nucleotide polymorphism was also identified. A unique haplotype (H_3) was found to be linked to all the resistant alleles isolated from indica rice lines. Unique leucine zipper and tyrosine sulfation sites were identified in the predicted Pi54 proteins. Selection signals were observed in entire coding sequence of resistance alleles, as compared to LRR domains for susceptible alleles. This is a maiden report of extensive variability of Pi54 alleles in different landraces and cultivated varieties, possibly, attributing broad-spectrum resistance to Magnaporthe oryzae. The sequence variation in two consensus region: 163 and 144 bp were used for the development of allele specific DNA markers. Validated markers can be used for the selection and identification of better allele(s) and their introgression in commercial rice cultivars employing marker assisted selection. PMID:26052332

  15. Sequence variation in three mitochondrial DNA genes among isolates of Ascaridia galli originating from Guangdong, Hunan and Yunnan provinces, China.

    PubMed

    Li, J Y; Liu, G H; Wang, Y; Song, H Q; Lin, R Q; Zou, F C; Liu, W; Xu, M J; Zhu, X Q

    2013-09-01

    The present study examined sequence variation in three mitochondrial DNA (mtDNA) genes, namely cytochrome c oxidase subunit 3 (cox3) and NADH dehydrogenase subunits 1 and 4 (nad1 and nad4), among Ascaridia galli isolates from different geographical localities in China. A portion of cox3 (pcox3), nad1 (pnad1) and nad4 (pnad4) genes were amplified by polymerase chain reaction (PCR) separately from adult A. galli individuals and the amplicons were subjected to sequencing from both directions. The length of the sequences of pcox3, pnad1 and pnad4 were 408 bp, 471 bp and 333 bp, respectively. The intraspecific sequence variations within A. galli were 0-1.7% for pcox3, 0-2.8% for pnad1 and 0-3.4% for pnad4. The A+T contents of the sequences were 67.16-67.65% (pcox3), 67.09-67.94% (pnad1) and 69.91-71.77% (pnad4). The interspecific sequence differences among members of the Ascaridida were significantly higher, being 13.2-30.9%, 12.8-29.0% and 15.1-34.1% for pcox3, pnad1 and pnad4, respectively. Phylogenetic analyses using combined sequences of pcox3, pnad1 and pnad4, with three different computational algorithms (Bayesian analysis, maximum likelihood and maximum parsimony), all revealed distinct groups with high statistical support. These findings demonstrated the existence of intraspecific variation in mitochondrial DNA (mtDNA) sequences among A. galli isolates from different geographical regions in China, and have implications for studying molecular epidemiology and population genetics of A. galli. PMID:23046568

  16. Validation of copy number variation sequencing for detecting chromosome imbalances in human preimplantation embryos.

    PubMed

    Wang, Li; Cram, David S; Shen, Jiandong; Wang, Xiaohong; Zhang, Jianguang; Song, Zhuo; Xu, Genming; Li, Na; Fan, Junmei; Wang, Shufang; Luo, Yaning; Wang, Jun; Yu, Li; Liu, Jiayin; Yao, Yuanqing

    2014-08-01

    Chromosome aneuploidies commonly arise in embryos produced by assisted reproductive technologies and represent a major cause of implantation failure and miscarriage. Currently, preimplantation genetic diagnosis (PGD) is performed by array-based methods to identify euploid embryos for transfer to the patient. We speculated that a combination of next-generation sequencing technologies and sophisticated bioinformatics would deliver a more comprehensive and accurate methodology to improve the overall efficacy of embryo testing. To meet this challenge, we developed a high-resolution copy number variation (CNV) sequencing pipeline suitable for single-cell analysis. In validation studies, we showed that CNV-Seq was highly sensitive and specific for detection of euploidy, aneuploidy, and segmental imbalances in 24 whole genome amplification samples from PGD embryos that were originally diagnosed by gold standard array comparative genomic hybridization. In addition, CNV-Seq was capable of detecting, mapping, and accurately quantifying terminal chromosome imbalances down to 1 Mb in size originating from abnormal segregation of translocation chromosomes. These validation studies indicate that CNV-Seq displays the hallmarks of an accurate and reliable embryo test with the potential to further improve the overall efficacy of PGD. PMID:24966395

  17. Laying-sequence-specific variation in yolk oestrogen levels, and relationship to plasma oestrogen in female zebra finches (Taeniopygia guttata)

    PubMed Central

    Williams, Tony D.; Ames, Caroline E.; Kiparissis, Yiannis; Wynne-Edwards, Katherine E.

    2005-01-01

    We investigated the relationship between plasma and yolk oestrogens in laying female zebra finches (Taeniopygia guttata) by manipulating plasma oestradiol (E2) levels, via injection of oestradiol-17β, in a sequence-specific manner to maintain chronically high plasma levels for later-developing eggs (contrasting with the endogenous pattern of decreasing plasma E2 concentrations during laying). We report systematic variation in yolk oestrogen concentrations, in relation to laying sequence, similar to that widely reported for androgenic steroids. In sham-manipulated females, yolk E2 concentrations decreased with laying sequence. However, in E2-treated females plasma E2 levels were higher during the period of rapid yolk development of later-laid eggs, compared with control females. As a consequence, we reversed the laying-sequence-specific pattern of yolk E2: in E2-treated females, yolk E2 concentrations increased with laying-sequence. In general therefore, yolk E2 levels were a direct reflection of plasma E2 levels. However, in control females there was some inter-individual variability in the endogenous pattern of plasma E2 levels through the laying cycle which could generate variation in sequence-specific patterns of yolk hormone levels even if these primarily reflect circulating steroid levels. PMID:15695208

  18. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  19. Tough coating proteins: subtle sequence variation modulates cohesion.

    PubMed

    Das, Saurabh; Miller, Dusty R; Kaufman, Yair; Martinez Rodriguez, Nadine R; Pallaoro, Alessia; Harrington, Matthew J; Gylys, Maryte; Israelachvili, Jacob N; Waite, J Herbert

    2015-03-01

    Mussel foot protein-1 (mfp-1) is an essential constituent of the protective cuticle covering all exposed portions of the byssus (plaque and the thread) that marine mussels use to attach to intertidal rocks. The reversible complexation of Fe(3+) by the 3,4-dihydroxyphenylalanine (Dopa) side chains in mfp-1 in Mytilus californianus cuticle is responsible for its high extensibility (120%) as well as its stiffness (2 GPa) due to the formation of sacrificial bonds that help to dissipate energy and avoid accumulation of stresses in the material. We have investigated the interactions between Fe(3+) and mfp-1 from two mussel species, M. californianus (Mc) and M. edulis (Me), using both surface sensitive and solution phase techniques. Our results show that although mfp-1 homologues from both species bind Fe(3+), mfp-1 (Mc) contains Dopa with two distinct Fe(3+)-binding tendencies and prefers to form intramolecular complexes with Fe(3+). In contrast, mfp-1 (Me) is better adapted to intermolecular Fe(3+) binding by Dopa. Addition of Fe(3+) did not significantly increase the cohesion energy between the mfp-1 (Mc) films at pH 5.5. However, iron appears to stabilize the cohesive bridging of mfp-1 (Mc) films at the physiologically relevant pH of 7.5, where most other mfps lose their ability to adhere reversibly. Understanding the molecular mechanisms underpinning the capacity of M. californianus cuticle to withstand twice the strain of M. edulis cuticle is important for engineering of tunable strain tolerant composite coatings for biomedical applications. PMID:25692318

  20. A high-resolution cattle CNV map by population-scale genome sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. Prior studies in cattle have produced low-resolution CNV maps. We constructed a draft, high-resolution map of cattle CNVs based on whole genome sequencing data from 7...

  1. Otopalatodigital syndrome type 2 in a male infant: A case report with a novel sequence variation

    PubMed Central

    Sankararaman, Senthilkumar; Kurepa, Dalibor; Shen, Yiping; Kakkilaya, Venkatakrishna; Ursin, Sussone; Chen, Harold

    2013-01-01

    We report a male infant with typical clinical, pathological and radiological features of otopalatodigital syndrome type 2 (OPD 2) with a novel sequence variation in the FLNA gene. His clinical manifestations include typical craniofacial features, cleft palate, hearing impairment, omphalocele, bowing of the long bones, absent fibulae and digital abnormalities consistent with OPD 2. Two hemizygous sequence variations in the FLNA gene were identified. The variation c.5290G>A/p.Ala1764Thr has been previously reported in a patient with periventricular nodular heterotopia, but subsequently it has been reported as a polymorphism. The other variation c.613T>C/p.Cys205Arg detected in the proband has not been previously reported and our analysis indicates that this is a novel disease-causing mutation for OPD2.

  2. Sequence variation of ribosomal internal transcribed spacers (ITS) in commercially important Phytoseiidae mites.

    PubMed

    Navajas, M; Lagnel, J; Fauvel, G; de Moraes, G

    1999-11-01

    Preliminary work is needed to assess the usefulness of different markers at different taxonomic scales when a new group is analyzed, such as the commercially important Phytoseiidae mites. We investigate here the level of sequence variation of the nuclear ribosomal spacers ITS 1 and 2 and the 5.8S gene in six species of Phytoseiidae: Neoseiulus culifornicus, N. fallacis, Euseius concordis, Metaseiulus occidentalis, Typhlodromus pyri and Phytoseiulus persimilis. As expected, the 5.8S gene (148 base pairs) is markedly conserved and displays little variation in between genera comparisons. ITS1 and ITS2 show contrasting patterns: while the ITS2 is short (80-89 bp) and shows little variation, the ITS1 is longer (303-404 bp) and is very variable in sequence. This fact compromises reliable nucleotide homologies when comparing the genera. The comparison of ITS1 sequence similarity at the species level might be useful for species identification, however, the value of ITS in taxonomic studies does not extend to the level of the family. The intraspecific variations of ITS were investigated in three species: N. californicus, N. fallacis and E. concordis. The first species has identical ITS1 sequences and the last two display low polymorphism (2 nucleotide substitutions). The ITS2 and 5.8S sequences were identical in all three subspecies comparisons. PMID:10668860

  3. Low-level sequence variation in Toxoplasma gondii calcium-dependent protein kinases among different genotypes.

    PubMed

    Wang, J L; Zhang, N Z; Huang, S Y; Xu, Y; Wang, R A; Zhu, X Q

    2015-01-01

    The causative agent of toxoplasmosis, Toxoplasma gondii, can infect virtually all nucleated cell types of warm-blooded animals. In this study, we examined the sequence variation in calcium-dependent protein kinase 2 (CDPK2) genes among 13 T. gondii strains from different hosts and geographical locations. The results showed that the lengths of the complete CDPK2 DNA and cDNA sequences were 3671-3673 and 2136 bp, respectively, and the sequence variation was 0-0.9% among different T. gondii strains. Phylogenetic analysis based on the CDPK2 gene sequences revealed that T. gondii strains of the same genotypes were clustered in different clades. Further analysis of all the other T. gondii CDPK genes in genotype I (GT1), II (ME49), or III (VEG) strains indicated the T. gondii CDPK gene family is quite conserved, with sequence variation ranging from 0 to 1.40%. We concluded that CDPK2 as well as all the other CDPK genes in T. gondii cannot be used as proper markers for studying the variants of different T. gondii genotypes from different hosts and geographical locations, but their sequence conservation may be a useful feature promoting them as anti-T. gondii vaccine candidates in further studies. PMID:25966270

  4. DNA sequence variation in a non-coding region of low recombination on the human X chromosome.

    PubMed

    Kaessmann, H; Heissig, F; von Haeseler, A; Pääbo, S

    1999-05-01

    DNA sequence variation has become a major source of insight regarding the origin and history of our species as well as an important tool for the identification of allelic variants associated with disease. Comparative sequencing of DNA has to date focused mainly on mitochondrial (mt) DNA, which due to its apparent lack of recombination and high evolutionary rate lends itself well to the study of human evolution. These advantages also entail limitations. For example, the high mutation rate of mtDNA results in multiple substitutions that make phylogenetic analysis difficult and, because mtDNA is maternally inherited, it reflects only the history of females. For the history of males, the non-recombining part of the paternally inherited Y chromosome can be studied. The extent of variation on the Y chromosome is so low that variation at particular sites known to be polymorphic rather than entire sequences are typically determined. It is currently unclear how some forms of analysis (such as the coalescent) should be applied to such data. Furthermore, the lack of recombination means that selection at any locus affects all 59 Mb of DNA. To gauge the extent and pattern of point substitutional variation in non-coding parts of the human genome, we have sequenced 10 kb of non-coding DNA in a region of low recombination at Xq13.3. Analysis of this sequence in 69 individuals representing all major linguistic groups reveals the highest overall diversity in Africa, whereas deep divergences also exist in Asia. The time elapsed since the most recent common ancestor (MRCA) is 535,000+/-119,000 years. We expect this type of nuclear locus to provide more answers about the genetic origin and history of humans. PMID:10319866

  5. Polarimetric Variations of Binary Stars. VI. Orbit-Induced Variations in the Pre-Main-Sequence Binary AK Scorpii

    NASA Astrophysics Data System (ADS)

    Manset, N.; Bastien, P.; Bertout, C.

    2005-01-01

    We present simultaneous UBV polarimetric and photometric observations of the pre-main-sequence binary AK Sco, obtained over 12 nights, slightly less than the orbital period of 13.6 days. The polarization is a sum of interstellar and intrinsic polarization, with a significant intrinsic polarization of 1% at 5250 Å, indicating the presence of circumstellar matter distributed in an asymmetric geometry. The polarization and its position angle are clearly variable on timescales of hours and nights in all three wavelengths, with a behavior related to the orbital motion. The variations have the highest amplitudes seen so far for pre-main-sequence binaries (~1% and ~30°) and are sinusoidal with periods similar to the orbital period and half of it. The polarization variations are generally correlated with the photometric ones: when the star gets fainter, it also gets redder, and its polarization increases. The (B-V, V) color-magnitude diagram exhibits a ratio of total to selective absorption R=4.3, higher than in normal interstellar clouds (R=3.1). The interpretation of the simultaneous photometric and polarimetric observations is that a cloud of circumstellar matter passes in front of the star, decreasing the amount of direct, unpolarized light and hence increasing the contribution of scattered (blue) light. We show that the large amplitude of the polarization variations cannot be reproduced with a single-scattering model and axially symmetric circumbinary or circumstellar disks. Based on observations made with the ESO telescopes at the La Silla Observatory.

  6. Temporal Stability of Epigenetic Markers: Sequence Characteristics and Predictors of Short-Term DNA Methylation Variations

    PubMed Central

    Coull, Brent A.; Tarantini, Letizia; Hou, Lifang; Bonzini, Matteo; Apostoli, Pietro; Bertazzi, Pier Alberto; Baccarelli, Andrea

    2012-01-01

    Background DNA methylation is an epigenetic mechanism that has been increasingly investigated in observational human studies, particularly on blood leukocyte DNA. Characterizing the degree and determinants of DNA methylation stability can provide critical information for the design and conduction of human epigenetic studies. Methods We measured DNA methylation in 12 gene-promoter regions (APC, p16, p53, RASSF1A, CDH13, eNOS, ET-1, IFNγ, IL-6, TNFα, iNOS, and hTERT) and 2 of non-long terminal repeat elements, i.e., L1 and Alu in blood samples obtained from 63 healthy individuals at baseline (Day 1) and after three days (Day 4). DNA methylation was measured by bisulfite-PCR-Pyrosequencing. We calculated intraclass correlation coefficients (ICCs) to measure the within-individual stability of DNA methylation between Day 1 and 4, subtracted of pyrosequencing error and adjusted for multiple covariates. Results Methylation markers showed different temporal behaviors ranging from high (IL-6, ICC = 0.89) to low stability (APC, ICC = 0.08) between Day 1 and 4. Multiple sequence and marker characteristics were associated with the degree of variation. Density of CpG dinucleotides nearby the sequence analyzed (measured as CpG(o/e) or G+C content within ±200bp) was positively associated with DNA methylation stability. The 3′ proximity to repeat elements and range of DNA methylation on Day 1 were also positively associated with methylation stability. An inverted U-shaped correlation was observed between mean DNA methylation on Day 1 and stability. Conclusions The degree of short-term DNA methylation stability is marker-dependent and associated with sequence characteristics and methylation levels. PMID:22745719

  7. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  8. Copy number variation of individual cattle genomes using next-generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...

  9. Copy number variation of individual cattle genomes using next-generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...

  10. Whole-genome sequencing reveals the diversity of cattle copy number variations and multicopy genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Structural and functional impacts of copy number variations (CNVs) on livestock genomes are not yet well understood. We identified 1853 CNV regions using population-scale sequencing data generated from 75 cattle representing 8 breeds (Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, Romagnol...

  11. Sequence variation in the androgen receptor gene is not a common determinant of male sexual orientation.

    PubMed Central

    Macke, J P; Hu, N; Hu, S; Bailey, M; King, V L; Brown, T; Hamer, D; Nathans, J

    1993-01-01

    To test the hypothesis that DNA sequence variation in the androgen receptor gene plays a causal role in the development of male sexual orientation, we have (1) measured the degree of concordance of androgen receptor alleles in 36 pairs of homosexual brothers, (2) compared the lengths of polyglutamine and polyglycine tracts in the amino-terminal domain of the androgen receptor in a sample of 197 homosexual males and 213 unselected subjects, and (3) screened the the entire androgen receptor coding region for sequence variation by PCR and denaturing gradient-gel electrophoresis (DGGE) and/or single-strand conformation polymorphism analysis in 20 homosexual males with homosexual or bisexual brothers and one homosexual male with no homosexual brothers, and screened the amino-terminal domain of the receptor for sequence variation in an additional 44 homosexual males, 37 of whom had one or more first- or second-degree male relatives who were either homosexual or bisexual. These analyses show that (1) homosexual brothers are as likely to be discordant as concordant for androgen receptor alleles; (2) there are no large-scale differences between the distributions of polyglycine or polyglutamine tract lengths in the homosexual and control groups; and (3) coding region sequence variation is not commonly found within the androgen receptor gene of homosexual men. The DGGE screen identified two rare amino acid substitutions, ser205-to-arg and glu793-to-asp, the biological significance of which is unknown. Images Figure 2 PMID:8213813

  12. A sequencing strategy for identifying variation throughout the prion gene of BSE-affected cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cattle prion gene (PRNP) polymorphisms have been associated with bovine spongiform encephalopathy (BSE) susceptibility. We developed a method for sequencing bovine PRNP through all exons, introns and part of the promoter (25.2 kb) that accounts for known variation. The method can be used to detect...

  13. Sequence variation in the androgen receptor gene is not a common determinant of male sexual orientation

    SciTech Connect

    Macke, J.P.; Nathans, J.; King, V.L. ); Hu, N.; Hu, S.; Hamer, D.; Bailey, M. ); Brown, T. )

    1993-10-01

    To test the hypothesis that DNA sequence variation in the androgen receptor gene plays a causal role in the development of male sexual orientation, the authors have (1) measured the degree of concordance of androgen receptor alleles in 36 pairs of homosexual brothers, (2) compared the lengths of polyglutamine and polyglycine tracts in the amino-terminal domain of the androgen receptor in a sample of 197 homosexual males and 213 unselected subjects, and (3) screened the entire androgen receptor coding region for sequence variation by PCR and denaturing gradient-gel electrophoresis (DGGE) and/or single-strand conformation polymorphism analysis in 20 homosexual males with homosexual or bisexual brothers and one homosexual male with no homosexual brothers, and screened the amino-terminal domain of the receptor for sequence variation in an additional 44 homosexual males, 37 of whom had one or more first- or second-degree male relatives who were either homosexual or bisexual. These analyses show that (1) homosexual brothers are as likely to be discordant as concordant for androgen receptor alleles; (2) there are no large-scale differences between the distributions of polyglycine or polyglutamine tract lengths in the homosexual and control groups; and (3) coding region sequence variation is not commonly found within the androgen receptor gene of homosexual men. The DGGE screen identified two rare amino acid substitutions, ser[sup 205] -to-arg and glu[sup 793]-to-asp, the biological significance of which is unknown. 32 refs., 2 figs., 2 tabs.

  14. Mapping cattle copy number variation by population-scale genome sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variation (CNV) is abundant in livestock, differing from SNPs in extent, origin and functional impact. Despite progress in CNV discovery, the nucleotide resolution architecture of most CNVs remains elusive. As a pilot population study of cattle CNV, we sequenced 100 representative cattle...

  15. Attacin gene sequence variations in different ecoraces of tasar silkworm Antheraea mylitta

    PubMed Central

    Sudha, Rati; Murthy, Geetha N; Awasthi, Arvind K; Ponnuvel, Kangayam M

    2015-01-01

    Attacin gene exists as paralogous conversion and is being used for identification of strain variations in insects based on the sequence variation. Hence, a study was undertaken to analyze the sequence variation of the attacin gene isoforms in the tasar silkworm Anthereae mylitta that exists in the form of different ecoraces depending upon the environment, food plant and location. Comparison of the previously reported attacin sequences with the DNA sequences of attacin A and B genes revealed six amino acid substitutions among the sequences of the ecoraces which however did not affect the functional domain of Attacin. The generated dendrogram clearly indicated unique branches for each ecorace with two separate gene clusters for attacin A and B. The Sarihan ecorace formed a separate sub-group under both the gene clusters. The present study also revealed the presence of Attacin_N Superfamily domain exclusively in Exon I separated from the Attacin_C Superfamily domain that was present in Exon II and part of Exon III, a prominent character of attacin gene. The phylogenetic reconstruction analysis of attacin gene in A.mylitta supported the common evolutionary origin of attacin genes belonging to the Lepidoteran and Dipteran families that formed two separate clusters. PMID:26664033

  16. Molecular indicators for palaeoenvironmental change in a Messinian evaporitic sequence (Vena del Gesso, Italy). II: High-resolution variations in abundances and 13C contents of free and sulphur-bound carbon skeletons in a single marl bed

    NASA Technical Reports Server (NTRS)

    Kenig, F.; Damste, J. S.; Frewin, N. L.; Hayes, J. M.; De Leeuw, J. W.

    1995-01-01

    The extractable organic matter of 10 immature samples from a marl bed of one evaporitic cycle of the Vena del Gesso sediments (Gessoso-solfifera Fm., Messinian, Italy) was analyzed quantitatively for free hydrocarbons and organic sulphur compounds. Nickel boride was used as a desulphurizing agent to recover sulphur-bound lipids from the polar and asphaltene fractions. Carbon isotopic compositions (delta vs PDB) of free hydrocarbons and of S-bound hydrocarbons were also measured. Relationships between these carbon skeletons, precursor biolipids, and the organisms producing them could then be examined. Concentrations of S-bound lipids and free hydrocarbons and their delta values were plotted vs depth in the marl bed and the profiles were interpreted in terms of variations in source organisms, 13 C contents of the carbon source, and environmentally induced changes in isotopic fractionation. The overall range of delta values measured was 24.7%, from -11.6% for a component derived from green sulphur bacteria (Chlorobiaceae) to -36.3% for a lipid derived from purple sulphur bacteria (Chromatiaceae). Deconvolution of mixtures of components deriving from multiple sources (green and purple sulphur bacteria, coccolithophorids, microalgae and higher plants) was sometimes possible because both quantitative and isotopic data were available and because either the free or S-bound pool sometimes appeared to contain material from a single source. Several free n-alkanes and S-bound lipids appeared to be specific products of upper-water-column primary producers (i.e. algae and cyanobacteria). Others derived from anaerobic photoautotrophs and from heterotrophic protozoa (ciliates), which apparently fed partly on Chlorobiaceae. Four groups of n-alkanes produced by algae or cyanobacteria were also recognized based on systematic variations of abundance and isotopic composition with depth. For hydrocarbons probably derived from microalgae, isotopic variations are well correlated with

  17. Mitochondrial COI sequences in mites: evidence for variations in base composition.

    PubMed

    Navajas, M; Fournier, D; Lagnel, J; Gutierrez, J; Boursot, P

    1996-11-01

    Studies of mitochondrial DNA sequences in a variety of animals have shown important differences between phyla, including differences in the genetic codes used, and varying constraints on base composition. In that respect, little is known of mites, an important and diversified group. We sequenced a portion (340 nt) of the cytochrome oxidase subunit I (COI) encoding gene in twenty species of phytophagous mites belonging to nine genera of the two families Tetranychidae and Tenuipalpidae. The mitochondrial genetic code used in mites appeared to be the same as in insects. As is generally also the case in insects, the mite sequences were very rich in A + T (75% on average), especially at the third codon position (94%). However, important variations of base composition were observed among mite species, one of them showing as little as 69% A + T. Variations of base composition occur mostly through synonymous transitions, and do not have detectable effects on polypeptide evolution in this group. PMID:8933179

  18. Phylogenetic and functional analysis of sequence variation of human papillomavirus type 31 E6 and E7 oncoproteins.

    PubMed

    Ferenczi, Annamária; Gyöngyösi, Eszter; Szalmás, Anita; László, Brigitta; Kónya, József; Veress, György

    2016-09-01

    High-risk human papillomaviruses (HPV) are the causative agents of cervical and other anogenital cancers as well as a subset of head and neck cancers. The E6 and E7 oncoproteins of HPV contribute to oncogenesis by associating with the tumour suppressor protein p53 and pRb, respectively. For HPV types 16 and 18, intratypic sequence variation was shown to have biological and clinical significance. The functional significance of sequence variation among HPV 31 variants was studied less intensively. HPV 31 variants belonging to different variant lineages were found to have differences in persistence and in the ability to cause high grade cervical intraepithelial neoplasia. In the present study, we started to explore the functional effects of natural sequence variation of HPV 31 E6 and E7 oncoproteins. The E6 variants were tested for their effects on p53 protein stability and transcriptional activity, while the E7 variants were tested for their effects on pRb protein level and also on the transcriptional activity of E2F transcription factors. HPV 31 E7 variants displayed uniform effects on pRb stability and also on the activity of E2F transcription factors. HPV 31 E6 variants had remarkable differences in the ability to inhibit the trans-activation function of p53 but not in the ability to induce the in vivo degradation of p53. Our results indicate that natural sequence variation of the HPV 31 E6 protein may be involved in the observed differences in the oncogenic potential between HPV 31 variants. PMID:27197052

  19. Genome sequencing of Metrosideros polymorpha (Myrtaceae), a dominant species in various habitats in the Hawaiian Islands with remarkable phenotypic variations.

    PubMed

    Izuno, Ayako; Hatakeyama, Masaomi; Nishiyama, Tomoaki; Tamaki, Ichiro; Shimizu-Inatsugi, Rie; Sasaki, Ryuta; Shimizu, Kentaro K; Isagi, Yuji

    2016-07-01

    Whole genome sequences, which can be provided even for non-model organisms owing to high-throughput sequencers, are valuable in enhancing the understanding of adaptive evolution. Metrosideros polymorpha, a tree species endemic to the Hawaiian Islands, occupies a wide range of ecological habitats and shows remarkable polymorphism in phenotypes among/within populations. The biological functions of genetic variations observed within this species could provide significant insights into the adaptive radiation found in a single species. Here de novo assembled genome sequences of M. polymorpha are presented to reveal basic genomic parameters about this species and to develop our knowledge of ecological divergences. The assembly yielded 304-Mbp genome sequences, half of which were covered by 19 scaffolds with >5 Mbp, and contained 30 K protein-coding genes. Demographic history inferred from the genome-wide heterozygosity indicated that this species experienced a dramatic rise and fall in the effective population size, possibly owing to past geographic or climatic changes in the Hawaiian Islands. This M. polymorpha genome assembly represents a high-quality genome resource useful for future functional analyses of both intra- and interspecies genetic variations or comparative genomics. PMID:27052216

  20. Nucleotide sequence variation of chitin synthase genes among ectomycorrhizal fungi and its potential use in taxonomy.

    PubMed Central

    Mehmann, B; Brunner, I; Braus, G H

    1994-01-01

    DNA sequences of single-copy genes coding for chitin synthases (UDP-N-acetyl-D-glucosamine:chitin 4-beta-N-acetylglucosaminyltransferase; EC 2.4.1.16) were used to characterize ectomycorrhizal fungi. Degenerate primers deduced from short, completely conserved amino acid stretches flanking a region of about 200 amino acids of zymogenic chitin synthases allowed the amplification of DNA fragments of several members of this gene family. Different DNA band patterns were obtained from basidiomycetes because of variation in the number and length of amplified fragments. Cloning and sequencing of the most prominent DNA fragments revealed that these differences were due to various introns at conserved positions. The presence of introns in basidiomycetous fungi therefore has a potential use in identification of genera by analyzing PCR-generated DNA fragment patterns. Analyses of the nucleotide sequences of cloned fragments revealed variations in nucleotide sequences from 4 to 45%. By comparison of the deduced amino acid sequences, the majority of the DNA fragments were identified as members of genes for chitin synthase class II. The deduced amino acid sequences from species of the same genus differed only in one amino acid residue, whereas identity between the amino acid sequences of ascomycetous and basidiomycetous fungi within the same taxonomic class was found to be approximately 43 to 66%. Phylogenetic analysis of the amino acid sequence of class II chitin synthase-encoding gene fragments by using parsimony confirmed the current taxonomic groupings. In addition, our data revealed a fourth class of putative zymogenic chitin synthesis. Images PMID:7944356

  1. BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations.

    PubMed

    Wang, Junbai; Batmanov, Kirill

    2015-12-01

    Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein-DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein-DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972

  2. Genome-wide Mycobacterium tuberculosis variation (GMTV) database: a new tool for integrating sequence variations and epidemiology

    PubMed Central

    2014-01-01

    Background Tuberculosis (TB) poses a worldwide threat due to advancing multidrug-resistant strains and deadly co-infections with Human immunodeficiency virus. Today large amounts of Mycobacterium tuberculosis whole genome sequencing data are being assessed broadly and yet there exists no comprehensive online resource that connects M. tuberculosis genome variants with geographic origin, with drug resistance or with clinical outcome. Description Here we describe a broadly inclusive unifying Genome-wide Mycobacterium tuberculosis Variation (GMTV) database, (http://mtb.dobzhanskycenter.org) that catalogues genome variations of M. tuberculosis strains collected across Russia. GMTV contains a broad spectrum of data derived from different sources and related to M. tuberculosis molecular biology, epidemiology, TB clinical outcome, year and place of isolation, drug resistance profiles and displays the variants across the genome using a dedicated genome browser. GMTV database, which includes 1084 genomes and over 69,000 SNP or Indel variants, can be queried about M. tuberculosis genome variation and putative associations with drug resistance, geographical origin, and clinical stages and outcomes. Conclusions Implementation of GMTV tracks the pattern of changes of M. tuberculosis strains in different geographical areas, facilitates disease gene discoveries associated with drug resistance or different clinical sequelae, and automates comparative genomic analyses among M. tuberculosis strains. PMID:24767249

  3. Using evolutionary sequence variation to make inferences about protein structure and function

    NASA Astrophysics Data System (ADS)

    Colwell, Lucy

    2015-03-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using the natural variation present in homologous protein sequences to infer these constraints and thus identify residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the correlation structure of the data. To address this we build a maximum entropy probability model for the protein sequence. The parameters of the inferred model are constrained by the statistics of a large sequence alignment. Pairs of sequence positions with the strongest interactions accurately predict contacts in protein tertiary structure, enabling all atom structural models to be constructed. We describe development of a theoretical inference framework that enables the relationship between the amount of available input data and the reliability of structural predictions to be better understood.

  4. Sequence variation in ROP8 gene among Toxoplasma gondii isolates from different hosts and geographical localities.

    PubMed

    Li, Z Y; Chen, J; Lu, J; Wang, C R; Zhu, X Q

    2015-01-01

    The protozoan parasite Toxoplasma gondii has a worldwide distribution; it can cause serious diseases in humans and almost all other warm-blooded animals. Different genotypes of T. gondii result in different lesions in the same host. T. gondii rhoptry protein 8 (TgROP8) is a major factor of T. gondii acute virulence. We examined sequence variation in the TgROP8 gene among T. gondii isolates from different hosts and geographical localities. The TgROP8 gene was amplified from individual isolates and sequenced. A phylogenetic tree was constructed using Bayesian inference, maximum parsimony, and maximum likelihood based on the sequences obtained plus TgME49 from the ToxoDB database. The TgROP8 gene was 1728 bp in length for all the examined T. gondii strains, and their A+T contents were 45.37-45.95%. Sequence analysis detected 140 (0.06-5.56%) variable nucleotide positions resulting in 96 (0-10.78%) amino acid substitutions. Sequence variations in the TgROP8 gene resulted in polymorphic restriction sites for endonucleases BstBI, BsaI, and XhoI, which allowed the differentiation of the three classical genotype strains (types I, II, and III) by polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP). However, phylogenetic analyses indicated that the TgROP8 gene is not a suitable genetic marker for population studies of T. gondii. PMID:26436382

  5. Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    NASA Astrophysics Data System (ADS)

    Hofmann, Hansjörg; Sakti, Sakriani; Hori, Chiori; Kashioka, Hideki; Nakamura, Satoshi; Minker, Wolfgang

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  6. SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations

    PubMed Central

    Hart, Steven N.; Sarangi, Vivekananda; Moore, Raymond; Baheti, Saurabh; Bhavsar, Jaysheel D.; Couch, Fergus J.; Kocher, Jean-Pierre A.

    2013-01-01

    Background Structural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. Results We developed and validated SoftSearch using real and synthetic datasets. SoftSearch’s key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. Conclusions We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance. PMID:24358278

  7. Sequence variation and differential splicing of the midgut cadherin gene in Trichoplusia ni.

    PubMed

    Zhang, Xin; Kain, Wendy; Wang, Ping

    2013-08-01

    The insect midgut cadherin serves as an important receptor for the Cry toxins from Bacillus thuringiensis (Bt). Variation of the cadherin in insect populations provides a genetic potential for development of cadherin-based Bt resistance in insect populations. Sequence analysis of the cadherin from the cabbage looper, Trichoplusia ni, together with cadherins from 18 other lepidopterans showed a similar phylogenetic relationship of the cadherins to the phylogeny of Lepidoptera. The midgut cadherin in three laboratory populations of T. ni exhibited high variability, although the resistance to Bt toxin Cry1Ac in the T. ni strain is not genetically associated with cadherin gene mutations. A total of 142 single nucleotide polymorphisms (SNPs) were identified in the cadherin cDNAs from the T. ni strains, including 20 missense mutations. In addition, insertion and deletion polymorphisms (indels) were also identified in the cadherin alleles in T. ni. More interestingly, the results from this study reveal that differential splicing of mRNA also occurs in the cadherin gene expression. Therefore, variation of the midgut cadherin in insects may not only be caused by cadherin gene mutations, but could also result from alternative splicing of its mRNA regulated by factors acting in trans. Analysis of cadherin gene alleles in F2, F3 and F4 progenies from the cross between the Cry1Ac resistant and the susceptible strain after consecutive selections with Cry1Ac for three generations showed that selection with Cry1Ac did not result in an increase of frequencies of the cadherin alleles originated from the resistant strain. PMID:23743444

  8. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...

  9. Optimal assembly for high throughput shotgun sequencing

    PubMed Central

    2013-01-01

    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization. PMID:23902516

  10. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted. PMID:26846812

  11. Detailed Analysis of Sequence Changes Occurring during vlsE Antigenic Variation in the Mouse Model of Borrelia burgdorferi Infection

    PubMed Central

    Coutte, Loïc; Botkin, Douglas J.; Gao, Lihui; Norris, Steven J.

    2009-01-01

    Lyme disease Borrelia can infect humans and animals for months to years, despite the presence of an active host immune response. The vls antigenic variation system, which expresses the surface-exposed lipoprotein VlsE, plays a major role in B. burgdorferi immune evasion. Gene conversion between vls silent cassettes and the vlsE expression site occurs at high frequency during mammalian infection, resulting in sequence variation in the VlsE product. In this study, we examined vlsE sequence variation in B. burgdorferi B31 during mouse infection by analyzing 1,399 clones isolated from bladder, heart, joint, ear, and skin tissues of mice infected for 4 to 365 days. The median number of codon changes increased progressively in C3H/HeN mice from 4 to 28 days post infection, and no clones retained the parental vlsE sequence at 28 days. In contrast, the decrease in the number of clones with the parental vlsE sequence and the increase in the number of sequence changes occurred more gradually in severe combined immunodeficiency (SCID) mice. Clones containing a stop codon were isolated, indicating that continuous expression of full-length VlsE is not required for survival in vivo; also, these clones continued to undergo vlsE recombination. Analysis of clones with apparent single recombination events indicated that recombinations into vlsE are nonselective with regard to the silent cassette utilized, as well as the length and location of the recombination event. Sequence changes as small as one base pair were common. Fifteen percent of recovered vlsE variants contained “template-independent” sequence changes, which clustered in the variable regions of vlsE. We hypothesize that the increased frequency and complexity of vlsE sequence changes observed in clones recovered from immunocompetent mice (as compared with SCID mice) is due to rapid clearance of relatively invariant clones by variable region-specific anti-VlsE antibody responses. PMID:19214205

  12. Effective normalization for copy number variation detection from whole genome sequencing

    PubMed Central

    2012-01-01

    Background Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. Methods We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. Results The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable

  13. GeneSV - an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences.

    PubMed

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W C; Cardosa, Jane; Weaver, Scott C; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  14. GeneSV – an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences

    PubMed Central

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W. C.; Cardosa, Jane; Weaver, Scott C.; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  15. Virus Load and Sequence Variation in Simian Retrovirus Type 2 Infection

    PubMed Central

    Rosenblum, Lisa L.; Weiss, Robin A.; McClure, Myra O.

    2000-01-01

    The natural history of type D simian retrovirus (SRV) infection is poorly characterized in terms of viral load, antibody status, and sequence variation. To investigate this, blood samples were taken from a small cohort of mostly asymptomatic cynomolgus macaques (Macaca fascicularis), naturally infected with SRV type 2 (SRV-2), some of which were followed over an 8-month period with blood taken every 2 months. Provirus and RNA virus loads were obtained, the samples were screened for presence of antibodies to SRV-2 and neutralizing antibody titers to SRV-2 were assayed. env sequences were aligned to determine intra- and intermonkey variation over time. Virus loads varied greatly among cohort individuals but, conversely, remained steady for each macaque over the 8-month period, regardless of their initial levels. No significant sequence variation was found within an individual over time. No clear picture emerged from these results, which indicate that the variables of SRV-2 infection are complex, differ from those for lentivirus infection, and are not distinctly related to disease outcome. PMID:10729117

  16. mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

    PubMed

    Vellarikkal, Shamsudheen Karuthedath; Dhiman, Heena; Joshi, Kandarp; Hasija, Yasha; Sivasubbu, Sridhar; Scaria, Vinod

    2015-04-01

    The human mitochondrial genome has been reported to have a very high mutation rate as compared with the nuclear genome. A large number of mitochondrial mutations show significant phenotypic association and are involved in a broad spectrum of diseases. In recent years, there has been a remarkable progress in the understanding of mitochondrial genetics. The availability of next-generation sequencing (NGS) technologies have not only reduced sequencing cost by orders of magnitude but has also provided us good quality mitochondrial genome sequences with high coverage, thereby enabling decoding of a number of human mitochondrial diseases. In this study, we report a computational and experimental pipeline to decipher the human mitochondrial DNA variations and examine them for their clinical correlation. As a proof of principle, we also present a clinical study of a patient with Leigh disease and confirmed maternal inheritance of the causative allele. The pipeline is made available as a user-friendly online tool to annotate variants and find haplogroup, disease association, and heteroplasmic sites. The "mit-o-matic" computational pipeline represents a comprehensive cloud-based tool for clinical evaluation of mitochondrial genomic variations from NGS datasets. The tool is freely available at http://genome.igib.res.in/mitomatic/. PMID:25677119

  17. Variation and genetic structure of Tunisian Festuca arundinacea populations based on inter-simple sequence repeat pattern.

    PubMed

    Chtourou-Ghorbel, N; Elazreg, H; Ghariani, S; Ben Mheni, N; Sekmani, M; Chakroun, M; Trifi-Farah, N

    2015-01-01

    Tunisian tall fescue (Festuca arundinacea Schreb.) is an important grass for forages or soil conservation, particularly in marginal sites. Inter-simple sequence repeats were used to estimate genetic diversity within and among 8 natural populations and 1 cultivar from Northern Tunisia. A total of 181 polymorphic inter-simple sequence repeat markers were generated using 7 primers. Shannon's index and analysis of molecular variance evidenced a high molecular polymorphism at intra-specific levels for wild and cultivated accessions, showing that Tunisian tall fescue germplasm constitutes an important pool of diversity. Within-population variation accounted for 39.42% of the total variation, but no regional differentiation was discernible to designate close relationships between regions. Most of the variation (GST = 67%) occurred between populations, rather than within populations. The ɸST (0.60) revealed high population structuring. Additionally, the population structure was independent of the geographic origin and was not affected by environmental factors. The unweighted pair group method with arithmetic mean tree based on genetic similarity and principal coordinate analysis based on coefficient similarity illustrated that continental populations from the proximate localities of Beja and Jendouba were genetically closely related, while the wild Skalba population from the littoral Tunisian locality was the most diverse from the others. Moreover, great molecular similarity of the spontaneous population Sedjnane originated from the mountain areas was revealed with the local cultivar Mornag. The observed genetic diversity can be used to implement conservation strategies and breeding programs for improving forage crops in Tunisia. PMID:25966071

  18. Genetic variation in and spatial structure of natural populations of Dipterocarpus alatus (Dipterocarpaceae) determined using single sequence repeat markers.

    PubMed

    Tam, N M; Duy, V D; Duc, N M; Giap, V D; Xuan, B T T

    2014-01-01

    Dipterocarpus alatus (Dipterocarpaceae) is widely distributed in lowland forests in central and southern Vietnam, Cambodia, Laos, Myanmar, Philippines, Thailand, and India. Due to over-exploitation and habitat destruction, the species is now threatened. The genetic variation within and among populations of D. alatus was investigated on the basis of 9 microsatellite (single sequence repeat, SSR) loci. In all, 268 sampled trees from 10 populations in central and southern Vietnam were analyzed in this study. The SSR data showed a high genetic variability within populations with an average of HO = 0.209 and HE = 0.239. Genetic differentiation among populations was high (FST = 0.266), indicating limited gene flow (Nm = 0.69). Analysis of molecular variance showed that most genetic variation was within populations (74.96%). This study highlights the importance of conserving the genetic resources of D. alatus species. PMID:25078594

  19. Magnetic susceptibility variations in Loess sequences and their relationship to astronomical forcing

    NASA Technical Reports Server (NTRS)

    Verosub, Kenneth L.; Singer, Michael J.

    1992-01-01

    The long, well-exposed and often continuous sequences of loess found throughout the world are generally thought to provide an excellent opportunity for studying long-term, large-scale environmental change during the last few million years. In recent years, the most fruitful loess studies have been those involving the deposits of the loess in China. One of the most intriguing results of that work has been the discovery of an apparent correlation between variations in the magnetic susceptibility of the loess sequence and the oxygen isotope record of the deep sea. This correlation implies that magnetic susceptibility variations are being driven by astronomical parameters. However, the basic data have been interpreted in various ways by different authors, most of whom assumed that the magnetic minerals in the loess have not been affected by post-depositional processes. Using a chemical extraction procedure that allows us to separate the contribution of secondary pedogenic magnetic minerals from primary inherited magnetic minerals, we have found that the magnetic susceptibility of the Chinese paleosols is largely due to a pedogenic component which is present to a lesser degree in the loess. We have also found that the smaller inherited component of the magnetic susceptibility is about the same in the paleosols and the loess. These results demonstrate the need for additional study of the processes that create magnetic susceptibility variations in order to interpret properly the role of astronomical forcing in producing these variations.

  20. Bm86 midgut protein sequence variation in South Texas cattle fever ticks

    PubMed Central

    2010-01-01

    Background Cattle fever ticks, Rhipicephalus (Boophilus) microplus and R. (B.) annulatus, vector bovine and equine babesiosis, and have significantly expanded beyond the permanent quarantine zone established in South Texas. Currently, there are no vaccines approved for use within the United States for controlling these vectors. Vaccines developed in Australia and Cuba based on the midgut antigen Bm86 have variable efficacy against cattle fever ticks. A possible explanation for this variation in vaccine efficacy is amino acid sequence divergence between the recombinant Bm86 vaccine component and native Bm86 expressed in ticks from different geographical regions of the world. Results There was 91.8% amino acid sequence identity in Bm86 among R. microplus and R. annulatus sequenced from South Texas infestations. When South Texas isolates were compared to the Australian Yeerongpilly and Cuban Camcord vaccine strains, there was 89.8% and 90.0% identity, respectively. Most of the sequence divergence was focused in one region of the protein, amino acids 206-298. Hydrophilicity profiles revealed that two short regions of Bm86 (amino acids 206-210 and 560-570) appear to be more hydrophilic in South Texas isolates compared to vaccine strains. Only one amino acid difference was found between South Texas and vaccine strains within two previously described B-cell epitopes. A total of 4 amino acid differences were observed within three peptides previously shown to induce protective immune responses in cattle. Conclusions Sequence differences between South Texas isolates and Yeerongpilly and Camcord strains are spread throughout the entire Bm86 sequence, suggesting that geographic variation does exist. Differences within previously described B-cell epitopes between South Texas isolates and vaccine strains are minimal; however, short regions of hydrophilic amino acids found unique to South Texas isolates suggest that additional unique surface exposed peptides could be targeted

  1. Simple sequence repeat variations expedite phage divergence: Mechanisms of indels and gene mutations.

    PubMed

    Lin, Tiao-Yin

    2016-07-01

    Phages are the most abundant biological entities and influence prokaryotic communities on Earth. Comparing closely related genomes sheds light on molecular events shaping phage evolution. Simple sequence repeat (SSR) variations impart over half of the genomic changes between T7M and T3, indicating an important role of SSRs in accelerating phage genetic divergence. Differences in coding and noncoding regions of phages infecting different hosts, coliphages T7M and T3, Yersinia phage ϕYeO3-12, and Salmonella phage ϕSG-JL2, frequently arise from SSR variations. Such variations modify noncoding and coding regions; the latter efficiently changes multiple amino acids, thereby hastening protein evolution. Four classes of events are found to drive SSR variations: insertion/deletion of SSR units, expansion/contraction of SSRs without alteration of genome length, changes of repeat motifs, and generation/loss of repeats. The categorization demonstrates the ways SSRs mutate in genomes during phage evolution. Indels are common constituents of genome variations and human diseases, yet, how they occur without preexisting repeat sequence is less understood. Non-repeat-unit-based misalignment-elongation (NRUBME) is proposed to be one mechanism for indels without adjacent repeats. NRUBME or consecutive NRUBME may also change repeat motifs or generate new repeats. NRUBME invoking a non-Watson-Crick base pair explains insertions that initiate mononucleotide repeats. Furthermore, NRUBME successfully interprets many inexplicable human di- to tetranucleotide repeat generations. This study provides the first evidence of SSR variations expediting phage divergence, and enables insights into the events and mechanisms of genome evolution. NRUBME allows us to emulate natural evolution to design indels for various applications. PMID:27133219

  2. High levels of variation in Salix lignocellulose genes revealed using poplar genomic resources

    PubMed Central

    2013-01-01

    Background Little is known about the levels of variation in lignin or other wood related genes in Salix, a genus that is being increasingly used for biomass and biofuel production. The lignin biosynthesis pathway is well characterized in a number of species, including the model tree Populus. We aimed to transfer the genomic resources already available in Populus to its sister genus Salix to assess levels of variation within genes involved in wood formation. Results Amplification trials for 27 gene regions were undertaken in 40 Salix taxa. Twelve of these regions were sequenced. Alignment searches of the resulting sequences against reference databases, combined with phylogenetic analyses, showed the close similarity of these Salix sequences to Populus, confirming homology of the primer regions and indicating a high level of conservation within the wood formation genes. However, all sequences were found to vary considerably among Salix species, mainly as SNPs with a smaller number of insertions-deletions. Between 25 and 176 SNPs per kbp per gene region (in predicted exons) were discovered within Salix. Conclusions The variation found is sizeable but not unexpected as it is based on interspecific and not intraspecific comparison; it is comparable to interspecific variation in Populus. The characterisation of genetic variation is a key process in pre-breeding and for the conservation and exploitation of genetic resources in Salix. This study characterises the variation in several lignocellulose gene markers for such purposes. PMID:23924375

  3. Advances in DNA sequencing technologies for high resolution HLA typing.

    PubMed

    Cereb, Nezih; Kim, Hwa Ran; Ryu, Jaejun; Yang, Soo Young

    2015-12-01

    This communication describes our experience in large-scale G group-level high resolution HLA typing using three different DNA sequencing platforms - ABI 3730 xl, Illumina MiSeq and PacBio RS II. Recent advances in DNA sequencing technologies, so-called next generation sequencing (NGS), have brought breakthroughs in deciphering the genetic information in all living species at a large scale and at an affordable level. The NGS DNA indexing system allows sequencing multiple genes for large number of individuals in a single run. Our laboratory has adopted and used these technologies for HLA molecular testing services. We found that each sequencing technology has its own strengths and weaknesses, and their sequencing performances complement each other. HLA genes are highly complex and genotyping them is quite challenging. Using these three sequencing platforms, we were able to meet all requirements for G group-level high resolution and high volume HLA typing. PMID:26423536

  4. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    PubMed Central

    2013-01-01

    Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low

  5. Population subdivision and molecular sequence variation: theory and analysis of Drosophila ananassae data.

    PubMed Central

    Vogl, Claus; Das, Aparup; Beaumont, Mark; Mohanty, Sujata; Stephan, Wolfgang

    2003-01-01

    Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center. PMID:14668389

  6. Association Between Sequence Variations in RCAN1 Promoter and the Risk of Sporadic Congenital Heart Disease in a Chinese Population.

    PubMed

    Li, Xiaoyong; Wang, Gang; An, Yong; Li, Hongbo; Li, Yonggang; Wu, Chun

    2015-10-01

    The pathogenesis of congenital heart disease (CHD) is unclear. There is a high incidence of CHD in Down syndrome, in which RCAN1 (regulator of calcineurin 1) overexpression is observed. However, whether RCAN1 plays an important role in non-syndromic CHD is unknown. This study investigates the relationship between sequence variations in the RCAN1 promoter and sporadic CHD. This was a case-control study in which the RCAN1 promoter was cloned and sequenced in 128 CHD patients (median age 1.1 year) and 150 normal controls (median age 3.0 year). No mutation sites had been identified in this research. Three single-nucleotide (C to T) polymorphisms were detected: rs193289374, rs149048873 and rs143081213. The polymorphisms were not associated with CHD risk according to a logistic regression analysis. Functional assays in vitro showed that compared with the wild-type genotype, the rs149048873 polymorphism decreased, and the rs143081213 increased, the RCAN1 promoter activity, though the rs193289374 polymorphism had no effect. In conclusion, the sequence variations in RCAN1 promoter are not major genetic factors involved in sporadic CHD, at least in the current research population. PMID:25863471

  7. Chromosomal localization and sequence variation of 5S rRNA gene in five Capsicum species.

    PubMed

    Park, Y K; Park, K C; Park, C H; Kim, N S

    2000-02-29

    Chromosomal localization and sequence analysis of the 5S rRNA gene were carried out in five Capsicum species. Fluorescence in situ hybridization revealed that chromosomal location of the 5S rRNA gene was conserved in a single locus at a chromosome which was assigned to chromosome 1 by the synteny relationship with tomato. In sequence analysis, the repeating units of the 5S rRNA genes in the Capsicum species were variable in size from 278 bp to 300 bp. In sequence comparison of our results to the results with other Solanaceae plants as published by others, the coding region was highly conserved, but the spacer regions varied in size and sequence. T stretch regions, just after the end of the coding sequences, were more prominant in the Capsicum species than in two other plants. High G x C rich regions, which might have similar functions as that of the GC islands in the genes transcribed by RNA PolII, were observed after the T stretch region. Although we could not observe the TATA like sequences, an AT rich segment at -27 to -18 was detected in the 5S rRNA genes of the Capsicum species. Species relationship among the Capsicum species was also studied by the sequence comparison of the 5S rRNA genes. While C. chinense, C. frutescens, and C. annuum formed one lineage, C. baccatum was revealed to be an intermediate species between the former three species and C. pubescens. PMID:10774742

  8. Complete plastid genome sequence of Primula sinensis (Primulaceae): structure comparison, sequence variation and evidence for accD transfer to nucleus

    PubMed Central

    Liu, Tong-Jian; Zhang, Cai-Yun; Yan, Hai-Fei; Zhang, Lu

    2016-01-01

    Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp) were separated by a large single-copy region (82,064 bp) and a small single-copy region (17,725 bp). The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF) were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36–rps8, rps16–trnQ, trnH–psbA and ndhC–trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis. PMID:27375965

  9. Nonoverlapping clone pooling for high-throughput sequencing.

    PubMed

    Kuroshu, Reginaldo M

    2013-01-01

    Simultaneously sequencing multiple clones using second-generation sequencers can speed up many essential clone-based sequencing methods. However, in applications such as fosmid clone sequencing and full-length cDNA sequencing, it is important to create pools of clones that do not overlap on the genome for the identification of structural variations and alternatively spliced transcripts, respectively. We define the nonoverlapping clone pooling problem and provide practical solutions based on optimal graph coloring and bin-packing algorithms with constant absolute worst-case ratios, and further extend them to cope with repetitive mappings. Using theoretical analysis and experiments, we also show that the proposed methods are applicable. PMID:24384700

  10. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    PubMed

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  11. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma

    PubMed Central

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-01-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  12. High-throughput sequencing and vaccine design.

    PubMed

    Luciani, F

    2016-04-01

    Next-generation sequencing (NGS) technologies have reshaped genome research. The resulting increase in sequencing depth and resolution has led to an unprecedented level of genomic detail and thus an increasing awareness of the complexity of animal, human and pathogen genomes. This has resulted in new approaches to vaccine research. On the one hand, the increase in genome complexity challenges our ability to study and understand pathogen biology and pathogen-host interactions. On the other hand, the increase in genomic data also provides key information for developing and designing improved vaccines against pathogens that were previously extremely difficult to deal with, such as rapidly mutating RNA viruses or bacteria that have complex interactions with the host immune system. This review describes how the broad application of NGS technologies to genome research is affecting vaccine research. It focuses on implications for the field of viral genomics, and includes recent animal and human studies. PMID:27217168

  13. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV

    PubMed Central

    Sathirapongsasuti, Jarupon Fah; Lee, Hane; Horst, Basil A. J.; Brunner, Georg; Cochran, Alistair J.; Binder, Scott; Quackenbush, John; Nelson, Stanley F.

    2011-01-01

    Motivation: The ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data extends the utility of this powerful approach that has mainly been used for point or small insertion/deletion detection. Results: We present ExomeCNV, a statistical method to detect CNV and LOH using depth-of-coverage and B-allele frequencies, from mapped short sequence reads, and we assess both the method's power and the effects of confounding variables. We apply our method to a cancer exome resequencing dataset. As expected, accuracy and resolution are dependent on depth-of-coverage and capture probe design. Availability: CRAN package ‘ExomeCNV’. Contact: fsathira@fas.harvard.edu; snelson@ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21828086

  14. High-throughput sequencing of cytosine methylation in plant DNA

    PubMed Central

    2013-01-01

    Cytosine methylation is a significant and widespread regulatory factor in plant systems. Methods for the high-throughput sequencing of methylation have allowed a greatly improved characterisation of the methylome. Here we discuss currently available methods for generation and analysis of high-throughput sequencing of methylation data. We also discuss the results previously acquired through sequencing plant methylomes, and highlight remaining challenges in this field. PMID:23758782

  15. Capturing genomic signatures of DNA sequence variation using a standard anonymous microarray platform

    PubMed Central

    Cannon, C. H.; Kua, C. S.; Lobenhofer, E. K.; Hurban, P.

    2006-01-01

    Comparative genomics, using the model organism approach, has provided powerful insights into the structure and evolution of whole genomes. Unfortunately, only a small fraction of Earth's biodiversity will have its genome sequenced in the foreseeable future. Most wild organisms have radically different life histories and evolutionary genomics than current model systems. A novel technique is needed to expand comparative genomics to a wider range of organisms. Here, we describe a novel approach using an anonymous DNA microarray platform that gathers genomic samples of sequence variation from any organism. Oligonucleotide probe sequences placed on a custom 44 K array were 25 bp long and designed using a simple set of criteria to maximize their complexity and dispersion in sequence probability space. Using whole genomic samples from three known genomes (mouse, rat and human) and one unknown (Gonystylus bancanus), we demonstrate and validate its power, reliability, transitivity and sensitivity. Using two separate statistical analyses, a large numbers of genomic ‘indicator’ probes were discovered. The construction of a genomic signature database based upon this technique would allow virtual comparisons and simple queries could generate optimal subsets of markers to be used in large-scale assays, using simple downstream techniques. Biologists from a wide range of fields, studying almost any organism, could efficiently perform genomic comparisons, at potentially any phylogenetic level after performing a small number of standardized DNA microarray hybridizations. Possibilities for refining and expanding the approach are discussed. PMID:17000641

  16. A quantitative trait locus for variation in dopamine metabolism mapped in a primate model using reference sequences from related species

    PubMed Central

    Freimer, Nelson B.; Service, Susan K.; Ophoff, Roel A.; Jasinska, Anna J.; McKee, Kevin; Villeneuve, Amelie; Belisle, Alexandre; Bailey, Julia N.; Breidenthal, Sherry E.; Jorgensen, Matthew J.; Mann, J. John; Cantor, Rita M.; Dewar, Ken; Fairbanks, Lynn A.

    2007-01-01

    Non-human primates (NHP) provide crucial research models. Their strong similarities to humans make them particularly valuable for understanding complex behavioral traits and brain structure and function. We report here the genetic mapping of an NHP nervous system biologic trait, the cerebrospinal fluid (CSF) concentration of the dopamine metabolite homovanillic acid (HVA), in an extended inbred vervet monkey (Chlorocebus aethiops sabaeus) pedigree. CSF HVA is an index of CNS dopamine activity, which is hypothesized to contribute substantially to behavioral variations in NHP and humans. For quantitative trait locus (QTL) mapping, we carried out a two-stage procedure. We first scanned the genome using a first-generation genetic map of short tandem repeat markers. Subsequently, using >100 SNPs within the most promising region identified by the genome scan, we mapped a QTL for CSF HVA at a genome-wide level of significance (peak logarithm of odds score >4) to a narrow well delineated interval (<10 Mb). The SNP discovery exploited conserved segments between human and rhesus macaque reference genome sequences. Our findings demonstrate the potential of using existing primate reference genome sequences for designing high-resolution genetic analyses applicable across a wide range of NHP species, including the many for which full genome sequences are not yet available. Leveraging genomic information from sequenced to nonsequenced species should enable the utilization of the full range of NHP diversity in behavior and disease susceptibility to determine the genetic basis of specific biological and behavioral traits. PMID:17884980

  17. BRCA1 and BRCA2 sequence variations detected with next-generation sequencing in patients with premature ovarian insufficiency

    PubMed Central

    Yılmaz, Nafiye Karakaş; Karagin, Peren Hatice; Terzi, Yunus Kasım; Kahyaoğlu, İnci; Yılmaz, Saynur; Erkaya, Salim; Şahin, Feride İffet

    2016-01-01

    Objective Although the association between BRCA1 and BRCA2 gene mutations and breast and ovarian cancer is known, there is insufficient data about premature ovarian insufficiency (POI). However, several studies have reported that there might be a relationship between POI and BRCA1 and BRCA2 gene mutation. Therefore, in the present study, we aimed to investigate the role of BRCA1 and BRCA2 gene mutations in the etiology of POI in a Turkish population. Material and Methods The cohort was classified into two groups: a study group, consisting of 56 individuals diagnosed with premature ovarian insufficiency (and who were younger than 40 years of age, had an antral follicle count <3–5, and FSH levels >12 IU/I), and a control group, consisting of 45 fertile individuals. A total of 101 individuals were analyzed by next-generation sequencing to detect BRCA1 and BRCA2 gene mutations. Results We detected four new variations (p.T1246N and p.R1835Q in BRCA1 and p.I3312V and IVS-7T>A in BRCA2) that had not been reported before. Conclusion We did not find an association between the BRCA1 and BRCA2 gene mutations and premature ovarian insufficiency. However, larger, functional studies are needed to clarify the association. PMID:27403073

  18. Sequence Variation among Group III F-Specific RNA Coliphages from Water Samples and Swine Lagoons

    PubMed Central

    Stewart, Jill R.; Vinjé, Jan; Oudejans, Sjon J. G.; Scott, Geoff I.; Sobsey, Mark D.

    2006-01-01

    Typing of F-specific RNA (FRNA) coliphages has been proposed as a useful method for distinguishing human from animal fecal contamination in environmental samples. Group II and III FRNA coliphages are generally associated with human wastes, but several exceptions have been noted. In the present study, we have genotyped and partially sequenced group III FRNA coliphage field isolates from swine lagoons in North Carolina (NC) and South Carolina (SC), along with isolates from surface waters and municipal wastewaters. Phylogenetic analysis of a region of the 5′ end of the maturation protein gene revealed two genetically different group III FRNA subclusters with 36.6% sequence variation. The SC swine lagoon isolates were more closely related to group III prototype virus M11, whereas the isolates from a swine lagoon in NC, surface waters, and wastewaters grouped with prototype virus Q-beta. These results suggest that refining phage genotyping systems to discriminate M11-like phages from Q-beta-like phages would not necessarily provide greater discriminatory power in distinguishing human from animal sources of pollution. Within the group III subclusters, nucleotide sequence diversity ranged from 0% to 6.9% for M11-like strains and from 0% to 8.7% for Q-beta-like strains. It is demonstrated here that nucleotide sequencing of closely related FRNA strains can be used to help track sources of contamination in surface waters. A similar use of phage genomic sequence information to track fecal pollution promises more reliable results than phage typing by nucleic acid hybridization and may hold more potential for field applications. PMID:16461670

  19. Sequence variation within the rRNA gene loci of 12 Drosophila species

    PubMed Central

    Stage, Deborah E.; Eickbush, Thomas H.

    2007-01-01

    Concerted evolution maintains at near identity the hundreds of tandemly arrayed ribosomal RNA (rRNA) genes and their spacers present in any eukaryote. Few comprehensive attempts have been made to directly measure the identity between the rDNA units. We used the original sequencing reads (trace archives) available through the whole-genome shotgun sequencing projects of 12 Drosophila species to locate the sequence variants within the 7.8–8.2 kb transcribed portions of the rDNA units. Three to 18 variants were identified in >3% of the total rDNA units from 11 species. Species where the rDNA units are present on multiple chromosomes exhibited only minor increases in sequence variation. Variants were 10–20 times more abundant in the noncoding compared with the coding regions of the rDNA unit. Within the coding regions, variants were three to eight times more abundant in the expansion compared with the conserved core regions. The distribution of variants was largely consistent with models of concerted evolution in which there is uniform recombination across the transcribed portion of the unit with the frequency of standing variants dependent upon the selection pressure to preserve that sequence. However, the 28S gene was found to contain fewer variants than the 18S gene despite evolving 2.5-fold faster. We postulate that the fewer variants in the 28S gene is due to localized gene conversion or DNA repair triggered by the activity of retrotransposable elements that are specialized for insertion into the 28S genes of these species. PMID:17989256

  20. No increase in bleeding identified in type 1 VWD subjects with D1472H sequence variation.

    PubMed

    Flood, Veronica H; Friedman, Kenneth D; Gill, Joan Cox; Haberichter, Sandra L; Christopherson, Pamela A; Branchford, Brian R; Hoffmann, Raymond G; Abshire, Thomas C; Dunn, Amy L; Di Paola, Jorge A; Hoots, W Keith; Brown, Deborah L; Leissinger, Cindy; Lusher, Jeanne M; Ragni, Margaret V; Shapiro, Amy D; Montgomery, Robert R

    2013-05-01

    The diagnosis of von Willebrand disease (VWD) is complicated by issues with current laboratory testing, particularly the ristocetin cofactor activity assay (VWF:RCo). We have recently reported a sequence variation in the von Willebrand factor (VWF) A1 domain, p.D1472H (D1472H), associated with a decrease in the VWF:RCo/VWF antigen (VWF:Ag) ratio but not associated with bleeding in healthy control subjects. This report expands the previous study to include subjects with symptoms leading to the diagnosis of type 1 VWD. Type 1 VWD subjects with D1472H had a significant decrease in the VWF:RCo/VWF:Ag ratio compared with those without D1472H, similar to the findings in the healthy control population. No increase in bleeding score was observed, however, for VWD subjects with D1472H compared with those without D1472H. These results suggest that the presence of the D1472H sequence variation is not associated with a significant increase in bleeding symptoms, even in type 1 VWD subjects. PMID:23520336

  1. A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.

    PubMed

    Wallberg, Andreas; Han, Fan; Wellhagen, Gustaf; Dahle, Bjørn; Kawata, Masakado; Haddad, Nizar; Simões, Zilá Luz Paulino; Allsopp, Mike H; Kandemir, Irfan; De la Rúa, Pilar; Pirk, Christian W; Webster, Matthew T

    2014-10-01

    The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system- and sperm motility-related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees. PMID:25151355

  2. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  3. Effects of sequence variation on differential allelic transcription factor occupancy and gene expression.

    PubMed

    Reddy, Timothy E; Gertz, Jason; Pauli, Florencia; Kucera, Katerina S; Varley, Katherine E; Newberry, Kimberly M; Marinov, Georgi K; Mortazavi, Ali; Williams, Brian A; Song, Lingyun; Crawford, Gregory E; Wold, Barbara; Willard, Huntington F; Myers, Richard M

    2012-05-01

    A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphoblastoid cell line GM12878. Overall, 5% of human TF binding sites have an allelic imbalance in occupancy. At many sites, TFs clustered in TF-binding hubs on the same homolog in especially open chromatin. While genetic variation in core TF binding motifs generally resulted in large allelic differences in TF occupancy, most allelic differences in occupancy were subtle and associated with disruption of weak or noncanonical motifs. We also measured genome-wide differential allelic expression of genes with and without heterozygous exonic variants in the same cells. We found that genes with differential allelic expression were overall less expressed both in GM12878 cells and in unrelated human cell lines. Comparing TF occupancy with expression, we found strong association between allelic occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs. Sites of differential allelic occupancy were significantly enriched for variants associated with disease, particularly autoimmune disease, suggesting that allelic differences in TF occupancy give functional insights into intergenic variants associated with disease. Our results have the potential to increase the power and interpretability of association studies by targeting functional intergenic variants in addition to protein coding sequences. PMID:22300769

  4. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

    PubMed Central

    Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

    2008-01-01

    Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465

  5. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    PubMed

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-01-01

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. PMID:27172215

  6. Deleted copy number variation of Hanwoo and Holstein using next generation sequencing at the population level

    PubMed Central

    2014-01-01

    Background Copy number variation (CNV), a source of genetic diversity in mammals, has been shown to underlie biological functions related to production traits. Notwithstanding, there have been few studies conducted on CNVs using next generation sequencing at the population level. Results Illumina NGS data was obtained for ten Holsteins, a dairy cattle, and 22 Hanwoo, a beef cattle. The sequence data for each of the 32 animals varied from 13.58-fold to almost 20-fold coverage. We detected a total of 6,811 deleted CNVs across the analyzed individuals (average length = 2732.2 bp) corresponding to 0.74% of the cattle genome (18.6 Mbp of variable sequence). By examining the overlap between CNV deletion regions and genes, we selected 30 genes with the highest deletion scores. These genes were found to be related to the nervous system, more specifically with nervous transmission, neuron motion, and neurogenesis. We regarded these genes as having been effected by the domestication process. Further analysis of the CNV genotyping information revealed 94 putative selected CNVs and 954 breed-specific CNVs. Conclusions This study provides useful information for assessing the impact of CNVs on cattle traits using NGS at the population level. PMID:24673797

  7. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    PubMed Central

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-01-01

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. PMID:27172215

  8. Cytochrome Oxidase I (COI) sequence conservation and variation patterns in the yellowfin and longtail tunas.

    PubMed

    Kunal, Swaraj Priyaranjan; Kumar, Girish

    2013-01-01

    Tunas are commercially important fishery worldwide. There are at least 13 species of tuna belonging to three genera, out of which genus Thunnus has maximum eight species. On the basis of their availability, they can be characterised as oceanic such as Thunnus albacares (yellowfin tuna) or coastal such as Thunnus tonggol (longtail tuna). Although these two are different species, morphological differentiation can only be seen in mature individuals, hence misidentification may result in erroneous data set, which ultimately affect conservation strategies. The mitochondrial DNA cytochrome oxidase c subunit 1 (COI) gene is one of the most popular markers for population genetic and phylogeographic studies across the animal kingdom. The present study aims to study the sequence conservation and variation in mitochondrial Cytochrome Oxidase I (COI) between these two species of tuna. COI sequence analysis of yellowfin and longtail revealed the close relationship between them in Thunnus genera. The present study is the first direct comparison of mitochondrial COI sequences of these two tuna species. PMID:23649742

  9. Extra-binomial variation approach for analysis of pooled DNA sequencing data

    PubMed Central

    Wallace, Chris

    2012-01-01

    Motivation: The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate. Results: We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods. Availability: Package ‘extraBinomial’ is on http://cran.r-project.org/ Contact: chris.wallace@cimr.cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:22976083

  10. Fad7 gene identification and fatty acids phenotypic variation in an olive collection by EcoTILLING and sequencing approaches.

    PubMed

    Sabetta, Wilma; Blanco, Antonio; Zelasco, Samanta; Lombardo, Luca; Perri, Enzo; Mangini, Giacomo; Montemurro, Cinzia

    2013-08-01

    The ω-3 fatty acid desaturases (FADs) are enzymes responsible for catalyzing the conversion of linoleic acid to α-linolenic acid localized in the plastid or in the endoplasmic reticulum. In this research we report the genotypic and phenotypic variation of Italian Olea europaea L. germoplasm for the fatty acid composition. The phenotypic oil characterization was followed by the molecular analysis of the plastidial-type ω-3 FAD gene (fad7) (EC 1.14.19), whose full-length sequence has been here identified in cultivar Leccino. The gene consisted of 2635 bp with 8 exons and 5'- and 3'-UTRs of 336 and 282 bp respectively, and showed a high level of heterozygousity (1/110 bp). The natural allelic variation was investigated both by a LiCOR EcoTILLING assay and the PCR product direct sequencing. Only three haplotypes were identified among the 96 analysed cultivars, highlighting the strong degree of conservation of this gene. PMID:23685785

  11. DNA Sequence and Expression Variation of Hop (Humulus lupulus) Valerophenone Synthase (VPS), a Key Gene in Bitter Acid Biosynthesis

    PubMed Central

    Castro, Consuelo B.; Whittock, Lucy D.; Whittock, Simon P.; Leggett, Grey; Koutoulis, Anthony

    2008-01-01

    Background The hop plant (Humulus lupulus) is a source of many secondary metabolites, with bitter acids essential in the beer brewing industry and others having potential applications for human health. This study investigated variation in DNA sequence and gene expression of valerophenone synthase (VPS), a key gene in the bitter acid biosynthesis pathway of hop. Methods Sequence variation was studied in 12 varieties, and expression was analysed in four of the 12 varieties in a series across the development of the hop cone. Results Nine single nucleotide polymorphisms (SNPs) were detected in VPS, seven of which were synonymous. The two non-synonymous polymorphisms did not appear to be related to typical bitter acid profiles of the varieties studied. However, real-time quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of VPS expression during hop cone development showed a clear link with the bitter acid content. The highest levels of VPS expression were observed in two triploid varieties, ‘Symphony’ and ‘Ember’, which typically have high bitter acid levels. Conclusions In all hop varieties studied, VPS expression was lowest in the leaves and an increase in expression was consistently observed during the early stages of cone development. PMID:18519445

  12. A Genome-Wide Survey of Genetic Variation in Gorillas Using Reduced Representation Sequencing

    PubMed Central

    Xue, Yali; Ayub, Qasim; Durbin, Richard; Tyler-Smith, Chris

    2013-01-01

    All non-human great apes are endangered in the wild, and it is therefore important to gain an understanding of their demography and genetic diversity. Whole genome assembly projects have provided an invaluable foundation for understanding genetics in all four genera, but to date genetic studies of multiple individuals within great ape species have largely been confined to mitochondrial DNA and a small number of other loci. Here, we present a genome-wide survey of genetic variation in gorillas using a reduced representation sequencing approach, focusing on the two lowland subspecies. We identify 3,006,670 polymorphic sites in 14 individuals: 12 western lowland gorillas (Gorilla gorilla gorilla) and 2 eastern lowland gorillas (Gorilla beringei graueri). We find that the two species are genetically distinct, based on levels of heterozygosity and patterns of allele sharing. Focusing on the western lowland population, we observe evidence for population substructure, and a deficit of rare genetic variants suggesting a recent episode of population contraction. In western lowland gorillas, there is an elevation of variation towards telomeres and centromeres on the chromosomal scale. On a finer scale, we find substantial variation in genetic diversity, including a marked reduction close to the major histocompatibility locus, perhaps indicative of recent strong selection there. These findings suggest that despite their maintaining an overall level of genetic diversity equal to or greater than that of humans, population decline, perhaps associated with disease, has been a significant factor in recent and long-term pressures on wild gorilla populations. PMID:23750230

  13. A genome-wide survey of genetic variation in gorillas using reduced representation sequencing.

    PubMed

    Scally, Aylwyn; Yngvadottir, Bryndis; Xue, Yali; Ayub, Qasim; Durbin, Richard; Tyler-Smith, Chris

    2013-01-01

    All non-human great apes are endangered in the wild, and it is therefore important to gain an understanding of their demography and genetic diversity. Whole genome assembly projects have provided an invaluable foundation for understanding genetics in all four genera, but to date genetic studies of multiple individuals within great ape species have largely been confined to mitochondrial DNA and a small number of other loci. Here, we present a genome-wide survey of genetic variation in gorillas using a reduced representation sequencing approach, focusing on the two lowland subspecies. We identify 3,006,670 polymorphic sites in 14 individuals: 12 western lowland gorillas (Gorilla gorilla gorilla) and 2 eastern lowland gorillas (Gorilla beringei graueri). We find that the two species are genetically distinct, based on levels of heterozygosity and patterns of allele sharing. Focusing on the western lowland population, we observe evidence for population substructure, and a deficit of rare genetic variants suggesting a recent episode of population contraction. In western lowland gorillas, there is an elevation of variation towards telomeres and centromeres on the chromosomal scale. On a finer scale, we find substantial variation in genetic diversity, including a marked reduction close to the major histocompatibility locus, perhaps indicative of recent strong selection there. These findings suggest that despite their maintaining an overall level of genetic diversity equal to or greater than that of humans, population decline, perhaps associated with disease, has been a significant factor in recent and long-term pressures on wild gorilla populations. PMID:23750230

  14. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association

    PubMed Central

    Strachan, Norval J. C.; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J.; Hanson, Mary F.; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H. A. M.; French, Nigel P.; George, Tessy; Biggs, Patrick J.; Forbes, Ken J.

    2015-01-01

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei’s genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei’s genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world. PMID:26442781

  15. Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

    PubMed Central

    2010-01-01

    Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441

  16. [Genetic variation of Manchurian pheasant (Phasianus colchicus pallasi Rotshild, 1903) inferred from mitochondrial DNA control region sequences].

    PubMed

    Kozyrenko, M M; Fisenko, P V; Zhuravlev, Iu N

    2009-04-01

    Sequence variation of the mitochondrial DNA control region was studied in Manchurian pheasants (Phasianus colchicus pallasi Rotshild, 1903) representing three geographic populations from the southern part of the Russian Far East. Extremely low population genetic differentiation (F(ST) = 0.0003) pointed to a very high gene exchange between the populations. Combination of such characters as high haplotype diversity (0.884 to 0.913), low nucleotide diversity (0.0016 to 0.0022), low R2 values (0.1235 to 0.1337), certain patterns of pairwise-difference distributions, and the absence of phylogenetic structure suggested that the phylogenetic history of Ph. C. pallasi included passing through a bottleneck with further expansion in the postglacial period. According to the data obtained, it was suggested that differentiation between the mitochondrial lineages started approximately 100 000 years ago. PMID:19507706

  17. High Throughput Sequence Analysis for Disease Resistance in Maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  18. Evolutionary sequence comparisons using high-density oligonucleotide arrays.

    PubMed

    Hacia, J G; Makalowski, W; Edgemon, K; Erdos, M R; Robbins, C M; Fodor, S P; Brody, L C; Collins, F S

    1998-02-01

    We explored the utility of high-density oligonucleotide arrays (DNA chips) for obtaining sequence information from homologous genes in closely related species. Orthologues of the human BRCA1 exon 11, all approximately 3.4 kb in length and ranging from 98.2% to 83.5% nucleotide identity, were subjected to hybridization-based and conventional dideoxysequencing analysis. Retrospective guidelines for identifying high-fidelity hybridization-based sequence calls were formulated based upon dideoxysequencing results. Prospective application of these rules yielded base-calling with at least 98.8% accuracy over orthologous sequence tracts shown to have approximately 99% identity. For higher primate sequences with greater than 97% nucleotide identity, base-calling was made with at least 99.91% accuracy covering a minimum of 97% of the sequence. Using a second-tier confirmatory hybridization chip strategy, shown in several cases to confirm the identity of predicted sequence changes, the complete sequence of the chimpanzee, gorilla and orangutan orthologues should be deducible solely through hybridization-based methodologies. Analysis of less highly conserved orthologues can still identify conserved nucleotide tracts of at least 15 nucleotides and can provide useful information for designing primers. DNA-chip based assays can be a valuable new technology for obtaining high-throughput cost-effective sequence information from related genomes. PMID:9462745

  19. Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins

    PubMed Central

    Turnbaugh, Peter J.; Quince, Christopher; Faith, Jeremiah J.; McHardy, Alice C.; Yatsunenko, Tanya; Niazi, Faheem; Affourtit, Jason; Egholm, Michael; Henrissat, Bernard; Knight, Rob; Gordon, Jeffrey I.

    2010-01-01

    We deeply sampled the organismal, genetic, and transcriptional diversity in fecal samples collected from a monozygotic (MZ) twin pair and compared the results to 1,095 communities from the gut and other body habitats of related and unrelated individuals. Using a new scheme for noise reduction in pyrosequencing data, we estimated the total diversity of species-level bacterial phylotypes in the 1.2-1.5 million bacterial 16S rRNA reads obtained from each deeply sampled cotwin to be ~800 (35.9%, 49.1% detected in both). A combined 1.1 million read 16S rRNA dataset representing 281 shallowly sequenced fecal samples from 54 twin pairs and their mothers contained an estimated 4,018 species-level phylotypes, with each sample having a unique species assemblage (53.4 ± 0.6% and 50.3 ± 0.5% overlap with the deeply sampled cotwins). Of the 134 phylotypes with a relative abundance of >0.1% in the combined dataset, only 37 appeared in >50% of the samples, with one phylotype in the Lachnospiraceae family present in 99%. Nongut communities had significantly reduced overlap with the deeply sequenced twins’ fecal microbiota (18.3 ± 0.3%, 15.3 ± 0.3%). The MZ cotwins’ fecal DNA was deeply sequenced (3.8-6.3 Gbp/sample) and assembled reads were assigned to 25 genus-level phylogenetic bins. Only 17% of the genes in these bins were shared between the cotwins. Bins exhibited differences in their degree of sequence variation, gene content including the repertoire of carbohydrate active enzymes present within and between twins (e.g., predicted cellulases, dockerins), and transcriptional activities. These results provide an expanded perspective about features that make each of us unique life forms and directions for future characterization of our gut ecosystems. PMID:20363958

  20. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

    PubMed Central

    Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan

    2014-01-01

    Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/ PMID:24918764

  1. Mitochondrial DNA control region sequence variation in migraine headache and cyclic vomiting syndrome.

    PubMed

    Wang, Qingxue; Ito, Masamichi; Adams, Kathleen; Li, B U K; Klopstock, Thomas; Maslim, Audrey; Higashimoto, Tomoyasu; Herzog, Juergen; Boles, Richard G

    2004-11-15

    Migraine headache is a very common condition affecting about 10% of the population that results in substantial morbidity and economic loss. The two most common variants are migraine with (MA) and without (MO) aura. Often considered to be a migraine-like variant, cyclic vomiting syndrome (CVS) is a predominately childhood condition characterized by severe, discrete episodes of nausea, vomiting, and lethargy. Disease-associated mitochondrial DNA (mtDNA) sequence variants are suggested in common migraine and CVS based upon a strong bias towards the maternal inheritance of disease, and several other factors. Temporal temperature gradient gel electrophoresis (TTGE) followed by cyclosequencing and RFLP was used to screen almost 90% of the mtDNA, including the control region (CR), for heteroplasmy in 62 children with CVS and neuromuscular disease (CVS+) and in 95 control subjects. One or two rare mtDNA-CR heteroplasmic sequence variants were found in six CVS+ and in zero control subjects (P = 0.003). These variants comprised 6 point and 2 length variants in hypervariable regions 1 and 2 (HV1 and HV2, both part of the mtDNA-CR), one half of which were clustered in the nt 16040-16188 segment of HV1 that includes the termination associated sequence (TAS), a functional location important in the regulation of mtDNA replication. Based upon our findings, sequencing and statistical analysis looking for homoplasmic nucleotide changes was performed in HV1 among 30 CVS+, 30 randomly-ascertained CVS (rCVS), 18 MA, 32 MO, and 35 control haplogroup H cases. Within the nt 16040-16188 segment, homoplasmic sequence variants were three-fold more common relative to control subjects in both CVS groups (P = 0.01 combined data) and in MO (P = 0.02), but not in MA (P = 0.5 vs. control subjects and 0.02 vs. MO). No group differences were noted in the remainder of HV1. We conclude that sequence variation in this small "peri-TAS" segment is associated with CVS and MO, but not MA. These variants

  2. [Sequence variation of mitochondrial cytochrome b gene and phylogenetic relationships among twelve species of Charadriiformes].

    PubMed

    Chen, Xiao-Fang; Wang, Xiang; Yuan, Xiao-Dong; Tang, Min-Qian; Li, Yu-Xiang; Guo, Yu-Mei; Li, Qing-Wei

    2003-05-01

    Studies of the phylogenetic relationships of the Charadriiformes have been largely based on conservative morphological characters. During the past 10 years, many studies on the evolutionary biology of birds adopted phylogenetic information obtained from mitochondrial DNA, but few work on the Charadriiformes has been reported to date. Therefore, phylogenetic relationships and classification of the Charadriiformes remains controversial. In this study, we try to shed light on these relationships via DNA sequence analysis of the mitochondrial Cyt b gene in 12 species of Charadriiformes. It was a preliminary study of the origin and evolution of the species by using nucleotide sequence data. Using the well-known PCR techniques, the complete mitochondrial Cyt b gene sequences were amplified and sequenced respectively from Charadrius mongolus, Charadrius alexandrinus, Numenius madagascariensis, Numenius arquat, Numenius phaeopus, Tringa totanus, Tringa glareola, Xenus cineres, Arenaria interpres, Calidris tenuirostris, Recurvirostra avosetts and Haematopus ostralensis. The 1143 bp long DNA sequences of the gene from these species were obtained, in which 381 variable sites were identified without insertions or deletions. The nucleic acid sequence variation of the mitochondrial Cyt b gene was 5.16%-16.01% among these species. Phylogenetic trees constructed using the NJ method, MP method and ML method with Ciconia ciconia as the outgroup indicate that the 12 species of Charadriiformes examined in this study are clustered in two major clades. The first clade includes T. totanus, T. glareola, A. interpres, C. tenuirostris, X. cineres, N. madagascariensis, N. arquata and N. phaeopus. The second one includes C. mongolus, C. alexandrinus, R. avosetts and H. ostralensis. Our molecular data show that the phylogenetic relationships among species of Scolopacidae are consistent with the classification based on morphological studies; R. avosetts and H. ostralensis are relatively closer

  3. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  4. Association Between Absolute Neutrophil Count and Variation at TCIRG1: The NHLBI Exome Sequencing Project.

    PubMed

    Rosenthal, Elisabeth A; Makaryan, Vahagn; Burt, Amber A; Crosslin, David R; Kim, Daniel Seung; Smith, Joshua D; Nickerson, Deborah A; Reiner, Alex P; Rich, Stephen S; Jackson, Rebecca D; Ganesh, Santhi K; Polfus, Linda M; Qi, Lihong; Dale, David C; Jarvik, Gail P

    2016-09-01

    Neutrophils are a key component of innate immunity. Individuals with low neutrophil count are susceptible to frequent infections. Linkage and association between congenital neutropenia and a single rare missense variant in TCIRG1 have been reported in a single family. Here, we report on nine rare missense variants at evolutionarily conserved sites in TCIRG1 that are associated with lower absolute neutrophil count (ANC; p = 0.005) in 1,058 participants from three cohorts: Atherosclerosis Risk in Communities (ARIC), Coronary Artery Risk Development in Young Adults (CARDIA), and Jackson Heart Study (JHS) of the NHLBI Grand Opportunity Exome Sequencing Project (GO ESP). These results validate the effects of TCIRG1 coding variation on ANC and suggest that this gene may be associated with a spectrum of mild to severe effects on ANC. PMID:27229898

  5. Ethnic variation in the mitochondrial targeting sequence polymorphism of MnSOD.

    PubMed

    Van Landeghem, G F; Tabatabaie, P; Kucinskas, V; Saha, N; Beckman, G

    1999-07-01

    In contrast to CuZn superoxide dismutase (SOD), only a very limited number of mutations have been described in MnSOD. One interesting example is a polymorphism (Ala-9Val) in the mitochondrial targeting sequence of this radical-scavenging enzyme. We have studied the Ala-9Val polymorphism in various ethnic groups by means of the oligonucleotide ligation assay. There were significant variations in this unique polymorphism between three different language groups: Baltic (Lithuanians), Finnic (Finns and Saamis) and Germanic (Swedes). The Ala frequency in an Asiatic population (Chinese) was significantly lower than in most European populations. This polymorphism may affect the mitochondrial targeting rate of MnSOD which may result in mitochondrial damage with implication in various late-onset neurological diseases. PMID:10436379

  6. Virology. Mutation rate and genotype variation of Ebola virus from Mali case sequences.

    PubMed

    Hoenen, T; Safronetz, D; Groseth, A; Wollenberg, K R; Koita, O A; Diarra, B; Fall, I S; Haidara, F C; Diallo, F; Sanogo, M; Sarro, Y S; Kone, A; Togo, A C G; Traore, A; Kodio, M; Dosseh, A; Rosenke, K; de Wit, E; Feldmann, F; Ebihara, H; Munster, V J; Zoon, K C; Feldmann, H; Sow, S

    2015-04-01

    The occurrence of Ebola virus (EBOV) in West Africa during 2013-2015 is unprecedented. Early reports suggested that in this outbreak EBOV is mutating twice as fast as previously observed, which indicates the potential for changes in transmissibility and virulence and could render current molecular diagnostics and countermeasures ineffective. We have determined additional full-length sequences from two clusters of imported EBOV infections into Mali, and we show that the nucleotide substitution rate (9.6 × 10(-4) substitutions per site per year) is consistent with rates observed in Central African outbreaks. In addition, overall variation among all genotypes observed remains low. Thus, our data indicate that EBOV is not undergoing rapid evolution in humans during the current outbreak. This finding has important implications for outbreak response and public health decisions and should alleviate several previously raised concerns. PMID:25814067

  7. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  8. Extensive Variation and Rapid Shift of the MG192 Sequence in Mycoplasma genitalium Strains from Patients with Chronic Infection

    PubMed Central

    Mancuso, Miriam; Williams, James A.; Van Der Pol, Barbara; Fortenberry, J. Dennis; Jia, Qiuyao; Myers, Leann; Martin, David H.

    2014-01-01

    Mycoplasma genitalium causes persistent urogenital tract infection in humans. Antigenic variation of the protein encoded by the MG192 gene has been proposed as one of the mechanisms for persistence. The aims of this study were to determine MG192 sequence variation in patients with chronic M. genitalium infection and to analyze the sequence structural features of the MG192 gene and its encoded protein. Urogenital specimens were obtained from 13 patients who were followed for 10 days to 14 months. The variable region of the MG192 gene was PCR amplified, subcloned into plasmids, and sequenced. Sequence analysis of 220 plasmid clones yielded 97 unique MG192 variant sequences. MG192 sequence shift was identified between sequential specimens from all but one patient. Despite great variation of the MG192 gene among and within clinical specimens from different patients, MG192 sequences were more related within M. genitalium specimens from an individual patient than between patients. The MG192 variable region consisted of 11 discrete subvariable regions with different degrees of variability. Analysis of the two most variable regions (V4 and V6) in five sequential specimens from one patient showed that sequence changes increased over time and that most sequences were present at only one time point, suggesting immune selection. Topology analysis of the deduced MG192 protein predicted a surface-exposed membrane protein. Extensive variation of the MG192 sequence may not only change the antigenicity of the protein to allow immune evasion but also alter the mobility and adhesion ability of the organism to adapt to diverse host microenvironments, thus facilitating persistent infection. PMID:24396043

  9. Variation and association to diabetes in 2000 full mtDNA sequences mined from an exome study in a Danish population.

    PubMed

    Li, Shengting; Besenbacher, Soren; Li, Yingrui; Kristiansen, Karsten; Grarup, Niels; Albrechtsen, Anders; Sparsø, Thomas; Korneliussen, Thorfinn; Hansen, Torben; Wang, Jun; Nielsen, Rasmus; Pedersen, Oluf; Bolund, Lars; Schierup, Mikkel H

    2014-08-01

    In this paper, we mine full mtDNA sequences from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this resource. The sample includes 1000 individuals with type 2 diabetes and 1000 controls. We characterise the variation found in the mtDNA sequence in Danes and relate the variation to diabetes risk as well as to several blood phenotypes of the controls but find no significant associations. We report 2025 polymorphisms, of which 393 have not been reported previously. These 393 mutations are both very rare and estimated to be caused by very recent mutations but individuals with type 2 diabetes do not possess more of these variants. Population genetics analysis using Bayesian skyline plot shows a recent history of rapid population growth in the Danish population in accordance with the fact that >40% of variable sites are observed as singletons. PMID:24448545

  10. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  11. Copy number variations in Hanwoo and Yanbian cattle genomes using the massively parallel sequencing data.

    PubMed

    Choi, Jung-Woo; Chung, Won-Hyong; Lim, Kyu-Sang; Lim, Won-Jun; Choi, Bong-Hwan; Lee, Seung-Hwan; Kim, Hyeong-Cheol; Lee, Seung-Soo; Cho, Eun-Seok; Lee, Kyung-Tai; Kim, Namshin; Kim, Jeong-Dae; Kim, Jong-Bok; Chai, Han-Ha; Cho, Yong-Min; Kim, Tae-Hun; Lim, Dajeong

    2016-09-01

    Hanwoo is an indigenous Korean beef cattle breed, and it shared an ancestor with Yanbian cattle that are found in the Northeast provinces in China until the last century. During recent decades, those cattle breeds experienced different selection pressures. Here, we present genome-wide copy number variations (CNVs) by comparing Hanwoo and Yanbian cattle sequencing data. We used ~3.12 and ~3.07 billion sequence reads from Hanwoo and Yanbian cattle, respectively. A total of 901 putative CNV regions (CNVRs) were identified throughout the genome, representing 5,513,340bp. This is a smaller number than has been reported in previous studies, indicating that Hanwoo are genetically close to Yanbian cattle. Of the CNVRs, 53.2% and 46.8% were found to be gains and losses in Hanwoo. Potential functional roles of each CNVR were assessed by annotating all CNVRs and gene ontology (GO) enrichment analysis. We found that 278 CNVRs overlapped with cattle gene-sets (genic-CNVRs) that could be promising candidates to account for economically important traits in cattle. The enrichment analysis indicated that genes were significantly over-represented in GO terms, including developmental process, multicellular organismal process, reproduction, and response to stimulus. These results provide a valuable genomic resource for determining how CNVs are associated with cattle traits. PMID:27188257

  12. Serine Hydroxymethyltransferase 1 and 2: Gene Sequence Variation and Functional Genomic Characterization

    PubMed Central

    Hebbring, Scott J.; Chai, Yubo; Ji, Yuan; Abo, Ryan P.; Jenkins, Gregory D.; Fridley, Brooke; Zhang, Jianping; Eckloff, Bruce W.; Wieben, Eric D.; Weinshilboum, Richard M.

    2012-01-01

    Serine hydroxymethyltransferase (SHMT) catalyzes the transfer of a beta carbon from serine to tetrahydrofolate (THF) to form glycine and 5,10-methylene-THF. This reaction plays an important role in neurotransmitter synthesis and metabolism. We set out to resequence SHMT1 and SHMT2, followed by functional genomic studies. We identified 87 and 60 polymorphisms in SHMT1 and SHMT2, respectively. We observed no significant functional effect of the 13 nonsynonymous SNPs in these genes, either on catalytic activity or protein quantity. We imputed additional variants across the two genes using “1000 Genomes” data, and identified 14 variants that were significantly associated (p-value < 1.0E-10) with SHMT1 mRNA expression in lymphoblastoid cell lines. Many of these SNPs were also significantly correlated with basal SHMT1 protein expression in 268 human liver biopsy samples. Reporter gene assays suggested that the SHMT1 promoter SNP, rs669340, contributed to this variation. Finally, SHMT1 and SHMT2 expression were significantly correlated with those of other Folate and Methionine Cycle genes at both the mRNA and protein levels. These experiments represent a comprehensive study of SHMT1 and SHMT2 gene sequence variation and its functional implications. In addition, we obtained preliminary indications that these genes may be co-regulated with other Folate and Methionine Cycle genes. PMID:22220685

  13. Detection and implication of significant temporal b-value variation during earthquake sequences

    NASA Astrophysics Data System (ADS)

    Gulia, Laura; Tormann, Thessa; Schorlemmer, Danijel; Wiemer, Stefan

    2016-04-01

    Earthquakes tend to cluster in space and time and periods of increased seismic activity are also periods of increased seismic hazard. Forecasting models currently used in statistical seismology and in Operational Earthquake Forecasting (e.g. ETAS) consider the spatial and temporal changes in the activity rates whilst the spatio-temporal changes in the earthquake size distribution, the b-value, are not included. Laboratory experiments on rock samples show an increasing relative proportion of larger events as the system approaches failure, and a sudden reversal of this trend after the main event. The increasing fraction of larger events during the stress increase period can be mathematically represented by a systematic b-value decrease, while the b-value increases immediately following the stress release. We investigate whether these lab-scale observations also apply to natural earthquake sequences and can help to improve our understanding of the physical processes generating damaging earthquakes. A number of large events nucleated in low b-value regions and spatial b-value variations have been extensively documented in the past. Detecting temporal b-value evolution with confidence is more difficult, one reason being the very different scales that have been suggested for a precursory drop in b-value, from a few days to decadal scale gradients. We demonstrate with the results of detailed case studies of the 2009 M6.3 L'Aquila and 2011 M9 Tohoku earthquakes that significant and meaningful temporal b-value variability can be detected throughout the sequences, which e.g. suggests that foreshock probabilities are not generic but subject to significant spatio-temporal variability. Such potential conclusions require and motivate the systematic study of many sequences to investigate whether general patterns exist that might eventually be useful for time-dependent or even real-time seismic hazard assessment.

  14. Effect of laying sequence on egg mercury in captive zebra finches: an interpretation considering individual variation.

    PubMed

    Ou, Langbo; Varian-Ramos, Claire W; Cristol, Daniel A

    2015-08-01

    Bird eggs are used widely as noninvasive bioindicators for environmental mercury availability. Previous studies, however, have found varying relationships between laying sequence and egg mercury concentrations. Some studies have reported that the mercury concentration was higher in first-laid eggs or declined across the laying sequence, whereas in other studies mercury concentration was not related to egg order. Approximately 300 eggs (61 clutches) were collected from captive zebra finches dosed throughout their reproductive lives with methylmercury (0.3 μg/g, 0.6 μg/g, 1.2 μg/g, or 2.4 μg/g wet wt in diet); the total mercury concentration (mean ± standard deviation [SD] dry wt basis) of their eggs was 7.03 ± 1.38 μg/g, 14.15 ± 2.52 μg/g, 26.85 ± 5.85 μg/g, and 49.76 ± 10.37 μg/g, respectively (equivalent to fresh wt egg mercury concentrations of 1.24 μg/g, 2.50 μg/g, 4.74 μg/g, and 8.79 μg/g). The authors observed a significant decrease in the mercury concentration of successive eggs when compared with the first egg and notable variation between clutches within treatments. The mercury level of individual females within and among treatments did not alter this relationship. Based on the results, sampling of a single egg in each clutch from any position in the laying sequence is sufficient for purposes of population risk assessment, but it is not recommended as a proxy for individual female exposure or as an estimate of average mercury level within the clutch. PMID:25760460

  15. Variation among Bm86 sequences in Rhipicephalus (Boophilus) microplus ticks collected from cattle across Thailand.

    PubMed

    Kaewmongkol, S; Kaewmongkol, G; Inthong, N; Lakkitjaroen, N; Sirinarumitr, T; Berry, C M; Jonsson, N N; Stich, R W; Jittapalapong, S

    2015-06-01

    Anti-tick vaccines based on recombinant homologues Bm86 and Bm95 have become a more cost-effective and sustainable alternative to chemical pesticides commonly used to control the cattle tick, Rhipicephalus (Boophilus) microplus. However, Bm86 polymorphism among geographically separate ticks is reportedly associated with reduced effectiveness of these vaccines. The purpose of this study was to investigate the variation of Bm86 among cattle ticks collected from Northern, Northeastern, Central and Southern areas across Thailand. Bm86 cDNA and deduced amino acid sequences representing 29 female tick midgut samples were 95.6-97.0 and 91.5-93.5 % identical to the nucleotide and amino acid reference sequences, respectively, of the Australian Yeerongpilly vaccine strain. Multiple sequence analyses of these Bm86 variants indicated geographical relationships and polymorphism among Thai cattle ticks. Two larger groups of cattle tick strains were discernable based on this phylogenetic analysis of Bm86, a Thai group and a Latin American group. Thai female and male cattle ticks (50 pairs) were also subjected to detailed morphological characterization to confirm their identity. The majority of female ticks had morphological features consistent with those described for R. (B.) microplus, whereas, curiously, the majority of male ticks were more consistent with the recently re-instated R. (B.) australis. A number of these ticks had features consistent with both species. Further investigations are warranted to test the efficacies of rBm86-based vaccines to homologous and heterologous challenge infestations with Thai tick strains and for in-depth study of the phylogeny of Thai cattle ticks. PMID:25777941

  16. Synthetic promoter elements obtained by nucleotide sequence variation and selection for activity

    PubMed Central

    Edelman, Gerald M.; Meech, Robyn; Owens, Geoffrey C.; Jones, Frederick S.

    2000-01-01

    Eukaryotic transcriptional regulation in different cells involves large numbers and arrangements of cis and trans elements. To survey the number of cis regulatory elements that are active in different contexts, we have devised a high-throughput selection procedure permitting synthesis of active cis motifs that enhance the activity of a minimal promoter. This synthetic promoter construction method (SPCM) was used to identify >100 DNA sequences that showed increased promoter activity in the neuroblastoma cell line Neuro2A. After determining DNA sequences of selected synthetic promoters, database searches for known elements revealed a predominance of eight motifs: AP2, CEBP, GRE, Ebox, ETS, CREB, AP1, and SP1/MAZ. The most active of the selected synthetic promoters contain composites of a number of these motifs. Assays of DNA binding and promoter activity of three exemplary motifs (ETS, CREB, and SP1/MAZ) were used to prove the effectiveness of SPCM in uncovering active sequences. Up to 10% of 133 selected active sequences had no match in currently available databases, raising the possibility that new motifs and transcriptional regulatory proteins to which they bind may be revealed by SPCM. The method may find uses in constructing databases of active cis motifs, in diagnostics, and in gene therapy. PMID:10725347

  17. Mitochondrial intronic open reading frames in Podospora: Mobility and consecutive exonic sequence variations

    SciTech Connect

    Sellem, C.H.; Rossignol, M.; Belcour, L.

    1996-06-01

    The mitochondrial genome of 23 wild-type strains belonging to three different species of the filamentous fungus Podospora was examined. Among the 15 optical sequences identified are two intronic reading frames, nad1-i4-orf1 and cox1-i7-orf2. We show that the presence of these sequences was strictly correlated with tightly clustered nucleotide substitutions in the adjacent exon. This correlation applies to the presence or absence of closely related open reading frames (ORFs), found at the same genetic locations, in all the Pyrenomycete genera examined. The recent gain of these optional ORFs in the evolution of the genus Podospora probably account for such sequence differences. In the homoplasmic progeny from heteroplasmons constructed between Podospora strains differing by the presence of these optional ORFs, nad1-i4-orf1 and cox1-i7-orf2 appeared highly invasive. Sequence comparisons in the nad1-i4 intron of various strains of the Pyrenomycete family led us to propose a scenario of its evolution that includes several events of loss and gain of intronic ORFs. These results strongly reinforce the idea that group I intronic ORFs are mobile elements and that their transfer, and comcomitant modification of the adjacent exon, could participate in the modular evolution of mitochondrial genomes. 46 refs., 5 figs., 2 tabs.

  18. LOVD: easy creation of a locus-specific sequence variation database using an "LSDB-in-a-box" approach.

    PubMed

    Fokkema, Ivo F A C; den Dunnen, Johan T; Taschner, Peter E M

    2005-08-01

    The completion of the human genome project has initiated, as well as provided the basis for, the collection and study of all sequence variation between individuals. Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). We have developed the Leiden Open (source) Variation Database (LOVD) software approaching the "LSDB-in-a-Box" idea for the easy creation and maintenance of a fully web-based gene sequence variation database. LOVD is platform-independent and uses PHP and MySQL open source software only. The basic gene-centered and modular design of the database follows the recommendations of the Human Genome Variation Society (HGVS) and focuses on the collection and display of DNA sequence variations. With minimal effort, the LOVD platform is extendable with clinical data. The open set-up should both facilitate and promote functional extension with scripts written by the community. The LOVD software is freely available from the Leiden Muscular Dystrophy pages (www.DMD.nl/LOVD/). To promote the use of LOVD, we currently offer curators the possibility to set up an LSDB on our Leiden server. PMID:15977173

  19. Spatial and Temporal Stress Drop Variations of the 2011 Tohoku Earthquake Sequence

    NASA Astrophysics Data System (ADS)

    Miyake, H.

    2013-12-01

    The 2011 Tohoku earthquake sequence consists of foreshocks, mainshock, aftershocks, and repeating earthquakes. To quantify spatial and temporal stress drop variations is important for understanding M9-class megathrust earthquakes. Variability and spatial and temporal pattern of stress drop is a basic information for rupture dynamics as well as useful to source modeling. As pointed in the ground motion prediction equations by Campbell and Bozorgnia [2008, Earthquake Spectra], mainshock-aftershock pairs often provide significant decrease of stress drop. We here focus strong motion records before and after the Tohoku earthquake, and analyze source spectral ratios considering azimuth- and distance dependency [Miyake et al., 2001, GRL]. Due to the limitation of station locations on land, spatial and temporal stress drop variations are estimated by adjusting shifts from the omega-squared source spectral model. The adjustment is based on the stochastic Green's function simulations of source spectra considering azimuth- and distance dependency. We assumed the same Green's functions for event pairs for each station, both the propagation path and site amplification effects are cancelled out. Precise studies of spatial and temporal stress drop variations have been performed [e.g., Allmann and Shearer, 2007, JGR], this study targets the relations between stress drop vs. progression of slow slip prior to the Tohoku earthquake by Kato et al. [2012, Science] and plate structures. Acknowledgement: This study is partly supported by ERI Joint Research (2013-B-05). We used the JMA unified earthquake catalogue and K-NET, KiK-net, and F-net data provided by NIED.

  20. Genetic variation and population structure of hair crab (Erimacrus isenbeckii ) in Japan inferred from mitochondrial DNA sequence analysis.

    PubMed

    Azuma, Noriko; Kunihiro, Yasushi; Sasaki, Jun; Mihara, Eiji; Mihara, Yukio; Yasunaga, Tomoaki; Jin, Deuk-Hee; Abe, Syuiti

    2008-01-01

    Genetic variation and population structure of hair crab (Erimacrus isenbeckii) were examined using nucleotide sequence analysis of 580 base pairs (bp) in the 3' portion of the mitochondrial cytochrome c oxidase subunit I gene (COI) of 20 samples collected from 16 locales in Japan (the Hokkaido and Honshu Islands) and one in Korea. A total of 27 haplotypes was defined by 23 variable nucleotide sites in the examined COI region. Pairwise population F (ST) estimates and neighbor-joining tree inferred distinct genetic differentiation between the representative samples from the Pacific Ocean off the Eastern Hokkaido Island and the Sea of Japan, while others were intermediate between these two groups. AMOVA also showed a weak but significant differentiation among these three groups. The present results suggest a moderate population structure of hair crab, probably influenced by high gene flow between regional populations due to sea current dependent larval dispersal of this species. PMID:17955293

  1. DNA Sequence Variation at the Period Locus Reveals the History of Species and Speciation Events in the Drosophila Virilis Group

    PubMed Central

    Hilton, H.; Hey, J.

    1996-01-01

    The virilis phylad of the Drosophila virilis group consists of five closely related taxa: D. virilis, D. lummei, D. novamexicana, D. americana americana and D. americana texana. DNA sequences from a 2.1-kb pair portion of the period locus were generated in four to eight individuals from each of the five taxa. We found evidence of recombination and high levels of variation within species. We found no evidence of recent natural selection. Surprisingly there was no evidence of divergence between D. a. americana and D. a. texana, and they collectively appear to have had a large historical effective population size. The ranges of these two taxa overlap in a large hybrid zone that has been delineated in the eastern U.S. on the basis of the geographic pattern of a chromosomal fusion. Also surprisingly, D. novamexicana appears to consist of two distinct groups each with low population size and no gene flow between them. PMID:8913746

  2. Identification of the ovine KAP11-1 gene (KRTAP11-1) and genetic variation in its coding sequence.

    PubMed

    Gong, Hua; Zhou, Huitong; Dyer, Jolon M; Hickford, Jon G H

    2011-11-01

    Keratin-associated proteins (KAPs) are a structural component of the wool fibre and form the matrix between the keratin intermediate filaments (KIFs). The gene encoding high sulphur-protein KAP11-1 has been identified in human, cattle and mouse, but not yet in sheep, despite the economic importance of wool. In this study, PCR using primers based on the cattle KAP11-1 gene sequence produced an amplicon of the expected size with sheep DNA. Upon using PCR-Single Stranded Conformational Polymorphism (PCR-SSCP) analysis in 260 sheep, six different PCR-SSCP patterns were detected. Either one or a combination of two banding patterns was observed for each sheep, suggesting they were either homozygous or heterozygous for this gene. Sequencing of the amplicons confirmed the occurrence of six DNA sequences. All of these were unique, and the greatest homology was with KRTAP11-1 sequences from cattle, human and mouse, suggesting that they were derived from the ovine KAP11-1 gene and were allelic variants. The ovine KAP11-1 gene had an open reading frame of 477 nucleotides encoding 159 amino acids. The putative protein was rich in serine, cysteine, and threonine which account for 18.2-18.9, 12.6 and 12.0 mol%, respectively. Of these, approximately 20 of the serine and threonine residues might be phosphorylated. Five nucleotide substitutions were identified, and one was non-synonymous and would result in an amino acid change at a potential phosphorylation site. The genetic variation found in KRTAP11-1 may influence its expression, protein structure, and/or post-translational modifications, and consequently affect wool fibre structure and wool traits. PMID:21400094

  3. Biological Processes Discovered by High-Throughput Sequencing.

    PubMed

    Reon, Brian J; Dutta, Anindya

    2016-04-01

    Advances in DNA and RNA sequencing technologies have completely transformed the field of genomics. High-throughput sequencing (HTS) is now a widely used and accessible technology that allows scientists to sequence an entire transcriptome or genome in a timely and cost-effective manner. Application of HTS techniques has led to many key discoveries, including the identification of long noncoding RNAs, microDNAs, a family of small extrachromosomal circular DNA species, and tRNA-derived fragments, which are a group of small non-miRNAs that are derived from tRNAs. Furthermore, public sequencing repositories provide unique opportunities for laboratories to parse large sequencing databases to identify proteins and noncoding RNAs at a scale that was not possible a decade ago. Herein, we review how HTS has led to the discovery of novel nucleic acid species and uncovered new biological processes during the course. PMID:26828742

  4. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing.

    PubMed

    Zheng, Grace X Y; Lau, Billy T; Schnall-Levin, Michael; Jarosz, Mirna; Bell, John M; Hindson, Christopher M; Kyriazopoulou-Panagiotopoulou, Sofia; Masquelier, Donald A; Merrill, Landon; Terry, Jessica M; Mudivarti, Patrice A; Wyatt, Paul W; Bharadwaj, Rajiv; Makarewicz, Anthony J; Li, Yuan; Belgrader, Phillip; Price, Andrew D; Lowe, Adam J; Marks, Patrick; Vurens, Gerard M; Hardenbol, Paul; Montesclaros, Luz; Luo, Melissa; Greenfield, Lawrence; Wong, Alexander; Birch, David E; Short, Steven W; Bjornson, Keith P; Patel, Pranav; Hopmans, Erik S; Wood, Christina; Kaur, Sukhvinder; Lockwood, Glenn K; Stafford, David; Delaney, Joshua P; Wu, Indira; Ordonez, Heather S; Grimes, Susan M; Greer, Stephanie; Lee, Josephine Y; Belhocine, Kamila; Giorda, Kristina M; Heaton, William H; McDermott, Geoffrey P; Bent, Zachary W; Meschi, Francesca; Kondov, Nikola O; Wilson, Ryan; Bernate, Jorge A; Gauby, Shawn; Kindwall, Alex; Bermejo, Clara; Fehr, Adrian N; Chan, Adrian; Saxonov, Serge; Ness, Kevin D; Hindson, Benjamin J; Ji, Hanlee P

    2016-03-01

    Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants. PMID:26829319

  5. Simultaneous alignment and folding of 28S rRNA sequences uncovers phylogenetic signal in structure variation.

    PubMed

    Letsch, Harald O; Greve, Carola; Kück, Patrick; Fleck, Günther; Stocsits, Roman R; Misof, Bernhard

    2009-12-01

    Secondary structure models of mitochondrial and nuclear (r)RNA sequences are frequently applied to aid the alignment of these molecules in phylogenetic analyses. Additionally, it is often speculated that structure variation of (r)RNA sequences might profitably be used as phylogenetic markers. The benefit of these approaches depends on the reliability of structure models. We used a recently developed approach to show that reliable inference of large (r)RNA secondary structures as a prerequisite of simultaneous sequence and structure alignment is feasible. The approach iteratively establishes local structure constraints of each sequence and infers fully folded individual structures by constrained MFE optimization. A comparison of structure edit distances of individual constraints and fully folded structures showed pronounced phylogenetic signal in fully folded structures. As model sequences we characterized secondary structures of 28S rRNA sequences of selected insects and examined their phylogenetic signal according to established phylogenetic hypotheses. PMID:19654047

  6. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  7. Empirical assessment of sequencing errors for high throughput pyrosequencing data

    PubMed Central

    2013-01-01

    Background Sequencing-by-synthesis technologies significantly improve over the Sanger method in terms of speed and cost per base. However, they still usually fail to compete in terms of read length and quality. Current high-throughput implementations of the pyrosequencing technique yield reads whose length approach those of the capillary electrophoresis method. A less obvious question is whether their quality is affected by platform-specific sequencing errors. Results We present an empirical study aimed at assessing the quality and characterising sequencing errors for high throughput pyrosequencing data. We have developed a procedure for extracting sequencing error data from genome assemblies and study their characteristics, in particular the length distribution of indel gaps and their relation to the sequence contexts where they occur. We used this procedure to analyse data from three prokaryotic genomes sequenced with the GS FLX technology. We also compared two models previously employed with success for peptide sequence alignment. Conclusions We observed an overall very low error rate in the analysed data, with indel errors being much more abundant than substitutions. We also observed a dependence between the length of the gaps and that of the homopolymer context where they occur. As with protein alignments, a power-law model seems to approximate the indel errors more accurately, although the results are not so conclusive as to justify a depart from the commonly used affine gap penalty scheme. In whichever case, however, our procedure can be used to estimate more realistic error model parameters. PMID:23339526

  8. Application of PCR amplicon sequencing using a single primer pair in PCR amplification to assess variations in Helicobacter pylori CagA EPIYA tyrosine phosphorylation motifs

    PubMed Central

    2010-01-01

    Background The presence of various EPIYA tyrosine phosphorylation motifs in the CagA protein of Helicobacter pylori has been suggested to contribute to pathogenesis in adults. In this study, a unique PCR assay and sequencing strategy was developed to establish the number and variation of cagA EPIYA motifs. Findings MDA-DNA derived from gastric biopsy specimens from eleven subjects with gastritis was used with M13- and T7-sequence-tagged primers for amplification of the cagA EPIYA motif region. Automated capillary electrophoresis using a high resolution kit and amplicon sequencing confirmed variations in the cagA EPIYA motif region. In nine cases, sequencing revealed the presence of AB, ABC, or ABCC (Western type) cagA EPIYA motif, respectively. In two cases, double cagA EPIYA motifs were detected (ABC/ABCC or ABC/AB), indicating the presence of two H. pylori strains in the same biopsy. Conclusion Automated capillary electrophoresis and Amplicon sequencing using a single, M13- and T7-sequence-tagged primer pair in PCR amplification enabled a rapid molecular typing of cagA EPIYA motifs. Moreover, the techniques described allowed for a rapid detection of mixed H. pylori strains present in the same biopsy specimen. PMID:20181142

  9. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    PubMed Central

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M. G.; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-01-01

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material. PMID:26285800

  10. High sequence conservation among cucumber mosaic virus isolates from lily.

    PubMed

    Chen, Y K; Derks, A F; Langeveld, S; Goldbach, R; Prins, M

    2001-08-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV isolates of Alstroemeria and crocus were classified as subgroup II isolates, whereas 8 other isolates, from lily, gladiolus, amaranthus, larkspur, and lisianthus, were identified as subgroup I members. In general, nucleotide sequence comparisons correlated well with geographic distribution, with one notable exception: the analyzed nucleotide sequences of 5 lily isolates showed remarkably high homology despite different origins. PMID:11676424

  11. Genome reassembly with high-throughput sequencing data

    PubMed Central

    2013-01-01

    Motivation Recent studies in genomics have highlighted the significance of structural variation in determining individual variation. Current methods for identifying structural variation, however, are predominantly focused on either assembling whole genomes from scratch, or identifying the relatively small changes between a genome and a reference sequence. While significant progress has been made in recent years on both de novo assembly and resequencing (read mapping) methods, few attempts have been made to bridge the gap between them. Results In this paper, we present a computational method for incorporating a reference sequence into an assembly algorithm. We propose a novel graph construction that builds upon the well-known de Bruijn graph to incorporate the reference, and describe a simple algorithm, based on iterative message passing, which uses this information to significantly improve assembly results. We validate our method by applying it to a series of 5 Mb simulation genomes derived from both mammalian and bacterial references. The results of applying our method to this simulation data are presented along with a discussion of the benefits and drawbacks of this technique. PMID:23368744

  12. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed Central

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-01-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas. PMID:7637022

  13. Paleosecular Variation Study on a Pliocene Lava Flow Sequence in the Lesser Caucasus

    NASA Astrophysics Data System (ADS)

    Caccavari, A.; Calvo-Rathert, M.; Gogichaishvili, A.; Huaiyu, H.; Vashakidze, G.; Vegas, N.; Aguilar, B.

    2013-05-01

    A paleomagnetic and rock magnetic study was carried out on 39 successive Pliocene lava flows from the Saro sequence, which is located in the Djavakheti Highland in the Lesser Caucasus in Georgia. Previous K-Ar ages carried out by Lebedev et al. (Stratigraphy and Geological Correlation, 2008, Vol. 16, No.2, 204-224) yielded an age of 2.2 Ma for the sequence. For the present study a new Ar-Ar dating has been performed on samples from the lower and the upper part of the section. Rock magnetism experiments were carried out to characterize the carries of remanence and obtain information about their stability. Thermomagnetic experiments show that titanomagnetite with differing content of titan is the main carrier in the 39 lava flows. Analysis of hysteresis parameters suggests that the grain size of most studied samples corresponds to pseudo single-domain particles, which can also be interpreted in terms of a mixture of single-domain and multi-domain grains. Paleomagnetic experiments reveal in all flows only a single paleomagnetic component with reverse polarity, D= 205.6°, I= -60.7°, (α 95= 2.0, k= 129.6) and the calculated paleomagnetic pole yields a longitude λ= 123.1 and a latitude 71.1° (α 95=2.8°, k=72.1). The angular distance between the Pliocene paleomagnetic pole obtained in this work and the expected one is 17°. With the purpose of analysing the behaviour of paleosecular variation (PSV), the scatter of virtual geomagnetic poles was calculated and a value SB = 12.9, with an upper confidence limit Sup=14.28 and a lower confidence limit Slow= 10.45 was obtained. This result is lower than predicted by specific models for VGP dispersion at 41°N.

  14. Structural variation detection using next-generation sequencing data: A comparative technical review.

    PubMed

    Guan, Peiyong; Sung, Wing-Kin

    2016-06-01

    Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies. PMID:26845461

  15. Mitochondrial DNA D-loop sequence variation in maternal lineages of Iranian native horses.

    PubMed

    Moridi, M; Masoudi, A A; Vaez Torshizi, R; Hill, E W

    2013-04-01

    To understand the origin and genetic diversity of Iranian native horses, mitochondrial DNA (mtDNA) D-loop sequences were generated for 95 horses from five breeds sampled in eight geographical locations in Iran. Sequence analysis of a 247-bp segment revealed a total of 27 haplotypes with 38 polymorphic sites. Twelve of 19 mtDNA haplogroups were identified in the samples. The most common haplotypes were found within haplogroup X2. Within-population haplotype and nucleotide diversities of the five breeds ranged from 0.838 ± 0.056 to 0.974 ± 0.022 and 0.011 ± 0.002 to 0.021 ± 0.001 respectively, indicating a relatively high genetic diversity in Iranian horses. The identification of several ancient sequences common between the breeds suggests that the lineage of the majority of Iranian horse breeds is old and obviously originated from a vast number of mares. We found in all native Iranian horse breeds lineages of the haplogroups D and K, which is concordant with the previous findings of Asian origins of these haplogroups. The presence of haplotypes E and K in our study also is consistent with a geographical west-east direction of increasing frequency of these haplotypes and a genetic fusion in Iranian horse breeds. PMID:22732008

  16. Whole-genome sequencing and assembly with high-throughput, short-read technologies.

    PubMed

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  17. High-throughput sequencing in veterinary infection biology and diagnostics.

    PubMed

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine. PMID:24761741

  18. High-Throughput Sequencing of Complete Mitochondrial Genomes.

    PubMed

    Briscoe, Andrew George; Hopkins, Kevin Peter; Waeschenbach, Andrea

    2016-01-01

    Next-generation sequencing has revolutionized mitogenomics, turning a cottage industry into a high throughput process. This chapter outlines methodologies used to sequence, assemble, and annotate mitogenomes of non-model organisms using Illumina sequencing technology, utilizing either long-range PCR amplicons or gDNA as starting template. Instructions are given on how to extract DNA, conduct long-range PCR amplifications, generate short Sanger barcode tag sequences, prepare equimolar sample pools, construct and assess quality library preparations, assemble Illumina reads using either seeded reference mapping or de novo assembly, and annotate mitogenomes in the absence of an automated pipeline. Notes and recommendations, derived from our own experience, are given throughout this chapter. PMID:27460369

  19. Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas.

    PubMed

    Sri, Tanu; Mayee, Pratiksha; Singh, Anandita

    2015-09-01

    Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification. PMID:26276216

  20. Genetic variation of Sargassum horneri populations detected by inter-simple sequence repeats.

    PubMed

    Ren, J R; Yang, R; He, Y Y; Sun, Q H

    2015-01-01

    The seaweed Sargassum horneri is an important brown alga in the marine environment, and it is an important raw material in the alginate industry. Unfortunately, the fixed resource that was originally reported is now reduced or disappeared, and increased floating populations have been reported in recent years. We sampled a floating population and 4 fixed cultivated populations of S. horneri along the coast of Zhejiang, China. Inter-simple sequence repeat (ISSR) markers were applied in this research to analyze the genetic variation between floating populations and fixed cultivated populations of S. horneri. In total, 220 loci were amplified with 23 ISSR primers. The percentage of polymorphic loci within each population ranged from 53.64 to 95.45%. The highest diversity was observed in population 3, which was the local species that was suspension cultured in the lab and then fixed cultivated in the Nanji Islands before sampling. The lowest diversity was obtained in the floating population 4. The genetic distances among the 5 S. horneri populations ranged from 0.0819 to 0.2889, and the distance tendency confirmed the genetic diversity. The results suggest that the floating population had the lowest genetic diversity and could not be joined into the cluster branch of the fixed cultivated populations. PMID:25729997

  1. Structural variation discovery in the cancer genome using next generation sequencing: computational solutions and perspectives.

    PubMed

    Liu, Biao; Conroy, Jeffrey M; Morrison, Carl D; Odunsi, Adekunle O; Qin, Maochun; Wei, Lei; Trump, Donald L; Johnson, Candace S; Liu, Song; Wang, Jianmin

    2015-03-20

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  2. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    PubMed Central

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  3. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins

    PubMed Central

    Miura, Sayaka; Tate, Stephanie; Kumar, Sudhir

    2015-01-01

    Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome. PMID:26604664

  4. Natural allelic variations in highly polyploidy Saccharum complex

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sugarcane (Saccharum spp.) as important sugar and biofuel crop are highly polypoid with complex genomes. A large amount of natural phenotypic variation exists in sugarcane germplasm. Understanding its allelic variance has been challenging but is a critical foundation for discovery of the genomic seq...

  5. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

    PubMed

    Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

    2013-01-01

    Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720

  6. Ovine mitochondrial DNA sequence variation and its association with production and reproduction traits within an Afec-Assaf flock.

    PubMed

    Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E

    2012-07-01

    Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy. PMID:22266988

  7. Apolipoprotein E Variation at the Sequence Haplotype Level: Implications for the Origin and Maintenance of a Major Human Polymorphism

    PubMed Central

    Fullerton, Stephanie M.; Clark, Andrew G.; Weiss, Kenneth M.; Nickerson, Deborah A.; Taylor, Scott L.; Stengård, Jari H.; Salomaa, Veikko; Vartiainen, Erkki; Perola, Markus; Boerwinkle, Eric; Sing, Charles F.

    2000-01-01

    Three common protein isoforms of apolipoprotein E (apoE), encoded by the ε2, ε3, and ε4 alleles of the APOE gene, differ in their association with cardiovascular and Alzheimer's disease risk. To gain a better understanding of the genetic variation underlying this important polymorphism, we identified sequence haplotype variation in 5.5 kb of genomic DNA encompassing the whole of the APOE locus and adjoining flanking regions in 96 individuals from four populations: blacks from Jackson, MS (n=48 chromosomes), Mayans from Campeche, Mexico (n=48), Finns from North Karelia, Finland (n=48), and non-Hispanic whites from Rochester, MN (n=48). In the region sequenced, 23 sites varied (21 single nucleotide polymorphisms, or SNPs, 1 diallelic indel, and 1 multiallelic indel). The 22 diallelic sites defined 31 distinct haplotypes in the sample. The estimate of nucleotide diversity (site-specific heterozygosity) for the locus was 0.0005±0.0003. Sequence analysis of the chimpanzee APOE gene showed that it was most closely related to human ε4-type haplotypes, differing from the human consensus sequence at 67 synonymous (54 substitutions and 13 indels) and 9 nonsynonymous fixed positions. The evolutionary history of allelic divergence within humans was inferred from the pattern of haplotype relationships. This analysis suggests that haplotypes defining the ε3 and ε2 alleles are derived from the ancestral ε4s and that the ε3 group of haplotypes have increased in frequency, relative to ε4s, in the past 200,000 years. Substantial heterogeneity exists within all three classes of sequence haplotypes, and there are important interpopulation differences in the sequence variation underlying the protein isoforms that may be relevant to interpreting conflicting reports of phenotypic associations with variation in the common protein isoforms. PMID:10986041

  8. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2013-01-25

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  10. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    PubMed Central

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  11. Sequence variation of Bemisia tabaci Chemosensory Protein 2 in cryptic species B and Q: New DNA markers for whitefly recognition.

    PubMed

    Liu, Guo-Xia; Ma, Hong-Mei; Xie, Hong-Yan; Xuan, Ning; Picimbon, Jean-François

    2016-01-15

    Bemisia tabaci Gennadius biotypes B and Q are two of the most important worldwide agricultural insect pests. Genomic sequences of Type-2 B. tabaci chemosensory protein (BtabCSP2) were cloned and sequenced in B and Q biotypes, revealing key biotype-specific variations in the intron sequence. A Q260 sequence was found specifically in Q-BtabCSP2 and Cucumis melo LN692399, suggesting ancestral horizontal transfer of gene between the insect and the plant through bacteria. A cleaved amplified polymorphic sequences (CAPS) method was then developed to differentiate B and Q based on the sequence variation in exon of BtabCSP2 gene. The performances of CSP2-based CAPS for whitefly recognition were assessed using B. tabaci field collections from Shandong Province (P.R. China). Our SacII based CAPS method led to the same result compared to mitochondrial cytochrome oxidase-based CAPS method in the field collections. We therefore propose an explanation for CSP origin and a new rapid simple molecular method based on genomic DNA and chemosensory gene to differentiate accurately the B and Q whiteflies of the Bemisia complex around the world. PMID:26481237

  12. Binary interactions with high accretion rates onto main sequence stars

    NASA Astrophysics Data System (ADS)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10‑2 M ⊙ yr‑1 for solar type stars, and up to ≈ 1 M ⊙ yr‑1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  13. Identification of conserved genomic regions and variation therein amongst Cetartiodactyla species using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background Next Generation Sequencing has created an opportunity to genetically characterize an individual both inexpensively and comprehensively. In earlier work produced in our collaboration [1], it was demonstrated that, for animals without a reference genome, their Next Generation Sequence data ...

  14. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179

  15. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed Central

    Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179

  16. Optical Transitions in Highly Charged Californium Ions with High Sensitivity to Variation of the Fine-Structure Constant

    NASA Astrophysics Data System (ADS)

    Berengut, J. C.; Dzuba, V. A.; Flambaum, V. V.; Ong, A.

    2012-08-01

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf16+ is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf16+ has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters.

  17. Sequence Polymorphisms at the REDUCED DORMANCY5 Pseudophosphatase Underlie Natural Variation in Arabidopsis Dormancy.

    PubMed

    Xiang, Yong; Song, Baoxing; Née, Guillaume; Kramer, Katharina; Finkemeier, Iris; Soppe, Wim J J

    2016-08-01

    Seed dormancy controls the timing of germination, which regulates the adaptation of plants to their environment and influences agricultural production. The time of germination is under strong natural selection and shows variation within species due to local adaptation. The identification of genes underlying dormancy quantitative trait loci is a major scientific challenge, which is relevant for agricultural and ecological goals. In this study, we describe the identification of the DELAY OF GERMINATION18 (DOG18) quantitative trait locus, which was identified as a factor in natural variation for seed dormancy in Arabidopsis (Arabidopsis thaliana). DOG18 encodes a member of the clade A of the type 2C protein phosphatases family, which we previously identified as the REDUCED DORMANCY5 (RDO5) gene. DOG18/RDO5 shows a relatively high frequency of loss-of-function alleles in natural accessions restricted to northwestern Europe. The loss of dormancy in these loss-of-function alleles can be compensated for by genetic factors like DOG1 and DOG6, and by environmental factors such as low temperature. RDO5 does not have detectable phosphatase activity. Analysis of the phosphoproteome in dry and imbibed seeds revealed a general decrease in protein phosphorylation during seed imbibition that is enhanced in the rdo5 mutant. We conclude that RDO5 acts as a pseudophosphatase that inhibits dephosphorylation during seed imbibition. PMID:27288362

  18. Sequence Polymorphisms at the REDUCED DORMANCY5 Pseudophosphatase Underlie Natural Variation in Arabidopsis Dormancy1[OPEN

    PubMed Central

    Xiang, Yong; Song, Baoxing; Née, Guillaume; Kramer, Katharina; Soppe, Wim J.J.

    2016-01-01

    Seed dormancy controls the timing of germination, which regulates the adaptation of plants to their environment and influences agricultural production. The time of germination is under strong natural selection and shows variation within species due to local adaptation. The identification of genes underlying dormancy quantitative trait loci is a major scientific challenge, which is relevant for agricultural and ecological goals. In this study, we describe the identification of the DELAY OF GERMINATION18 (DOG18) quantitative trait locus, which was identified as a factor in natural variation for seed dormancy in Arabidopsis (Arabidopsis thaliana). DOG18 encodes a member of the clade A of the type 2C protein phosphatases family, which we previously identified as the REDUCED DORMANCY5 (RDO5) gene. DOG18/RDO5 shows a relatively high frequency of loss-of-function alleles in natural accessions restricted to northwestern Europe. The loss of dormancy in these loss-of-function alleles can be compensated for by genetic factors like DOG1 and DOG6, and by environmental factors such as low temperature. RDO5 does not have detectable phosphatase activity. Analysis of the phosphoproteome in dry and imbibed seeds revealed a general decrease in protein phosphorylation during seed imbibition that is enhanced in the rdo5 mutant. We conclude that RDO5 acts as a pseudophosphatase that inhibits dephosphorylation during seed imbibition. PMID:27288362

  19. Heteroplasmy, length and sequence variation in the mtDNA control regions of three percid fish species (Perca fluviatilis, Acerina cernua, Stizostedion lucioperca).

    PubMed Central

    Nesbø, C L; Arab, M O; Jakobsen, K S

    1998-01-01

    The nucleotide sequence of the control region and flanking tRNA genes of perch (Perca fluviatilis) mtDNA was determined. The organization of this region is similar to that of other vertebrates. A tandem array of 10-bp repeats, associated with length variation and heteroplasmy was observed in the 5' end. While the location of the array corresponds to that reported in other species, the length of the repeated unit is shorter than previously observed for tandem repeats in this region. The repeated sequence was highly similar to the Mt5 element which has been shown to specifically bind a putative D-loop DNA termination protein. Of 149 perch analyzed, 74% showed length variation heteroplasmy. Single-cell PCR on oocytes suggested that the high level of heteroplasmy is passively maintained by maternal transmission. The array was also observed in the two other percid species, ruffe (Acerina cernua) and zander (Stizostedion lucioperca). The array and the associated length variation heteroplasmy are therefore likely to be general features of percid mtDNAs. Among the perch repeats, the mutation pattern is consistent with unidirectional slippage, and statistical analyses supported the notion that the various haplotypes are associated with different levels of heteroplasmy. The variation in array length among and within species is ascribed to differences in predicted stability of secondary structures made between repeat units. PMID:9560404

  20. Sequence variations in the collagen IX and XI genes are associated with degenerative lumbar spinal stenosis

    PubMed Central

    Noponen-Hietala, N; Kyllonen, E; Mannikko, M; Ilkko, E; Karppinen, J; Ott, J; Ala-Kokko, L

    2003-01-01

    Background: Degenerative lumbar spinal stenosis (LSS) is usually caused by disc herniation or degeneration. Several genetic factors have been implicated in disc disease. Tryptophan alleles in COL9A2 and COL9A3 have been shown to be associated with lumbar disc disease in the Finnish population, and polymorphisms in the vitamin D receptor gene (VDR) (FokI and TaqI), the matrix metalloproteinase-3 gene (MMP-3) and an aggrecan gene (AGC1) VNTR have been reported to be associated with disc degeneration. In addition, an IVS6-4 a>t polymorphism in COL11A2 has been found in connection with stenosis caused by ossification of the posterior longitudinal ligament in the Japanese population. Objective: To study the role of genetic factors in LSS. Methods: 29 Finnish probands were analysed for mutations in the genes coding for intervertebral disc matrix proteins, COL1A1, COL1A2, COL2A1, COL9A1, COL9A2, COL9A3, COL11A1, COL11A2, and AGC1. VDR and MMP-3 polymorphisms were also analysed. Sequence variations were tested in 56 Finnish controls. Results: Several disease associated alleles were identified. A splice site mutation in COL9A2 leading to a premature translation termination codon and the generation of a truncated protein was identified in one proband, another had the Trp2 allele, and four others the Trp3 allele. The frequency of the COL11A2 IVS6-4 t allele was 93.1% in the probands and 72.3% in controls (p = 0.0016). The differences in genotype frequencies for this site were less significant (p = 0.0043). Conclusions: Genetic factors have an important role in the pathogenesis of LSS. PMID:14644861

  1. Fin whale MDH-1 and MPI allozyme variation is not reflected in the corresponding DNA sequences

    PubMed Central

    Olsen, Morten Tange; Pampoulie, Christophe; Daníelsdóttir, Anna K; Lidh, Emmelie; Bérubé, Martine; Víkingsson, Gísli A; Palsbøll, Per J

    2014-01-01

    The appeal of genetic inference methods to assess population genetic structure and guide management efforts is grounded in the correlation between the genetic similarity and gene flow among populations. Effects of such gene flow are typically genomewide; however, some loci may appear as outliers, displaying above or below average genetic divergence relative to the genomewide level. Above average population, genetic divergence may be due to divergent selection as a result of local adaptation. Consequently, substantial efforts have been directed toward such outlying loci in order to identify traits subject to local adaptation. Here, we report the results of an investigation into the molecular basis of the substantial degree of genetic divergence previously reported at allozyme loci among North Atlantic fin whale (Balaenoptera physalus) populations. We sequenced the exons encoding for the two most divergent allozyme loci (MDH-1 and MPI) and failed to detect any nonsynonymous substitutions. Following extensive error checking and analysis of additional bioinformatic and morphological data, we hypothesize that the observed allozyme polymorphisms may reflect phenotypic plasticity at the cellular level, perhaps as a response to nutritional stress. While such plasticity is intriguing in itself, and of fundamental evolutionary interest, our key finding is that the observed allozyme variation does not appear to be a result of genetic drift, migration, or selection on the MDH-1 and MPI exons themselves, stressing the importance of interpreting allozyme data with caution. As for North Atlantic fin whale population structure, our findings support the low levels of differentiation found in previous analyses of DNA nucleotide loci. PMID:24963377

  2. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation.

    PubMed

    Sharma, C; Kumar, N; Pandey, R; Meis, J F; Chowdhary, A

    2016-09-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L). PMID:27617098

  3. High-Frequency Variation of Purine Biosynthesis Genes Is a Mechanism of Success in Campylobacter jejuni

    PubMed Central

    Cameron, Andrew; Huynh, Steven; Scott, Nichollas E.; Frirdich, Emilisa; Apel, Dmitry; Foster, Leonard J.; Parker, Craig T.

    2015-01-01

    ABSTRACT Phenotypic variation is prevalent in the zoonotic pathogen Campylobacter jejuni, the leading agent of enterocolitis in the developed world. Heterogeneity enhances the survival and adaptive malleability of bacterial populations because variable phenotypes may allow some cells to be protected against future stress. Exposure to hyperosmotic stress previously revealed prevalent differences in growth between C. jejuni strain 81-176 colonies due to resistant or sensitive phenotypes, and these isolated colonies continued to produce progeny with differential phenotypes. In this study, whole-genome sequencing of isolated colonies identified allelic variants of two purine biosynthesis genes, purF and apt, encoding phosphoribosyltransferases that utilize a shared substrate. Genetic analyses determined that purF was essential for fitness, while apt was critical. Traditional and high-depth amplicon-sequencing analyses confirmed extensive intrapopulation genetic variation of purF and apt that resulted in viable strains bearing alleles with in-frame insertion duplications, deletions, or missense polymorphisms. Different purF and apt alleles were associated with various stress survival capabilities under several niche-relevant conditions and contributed to differential intracellular survival in an epithelial cell infection model. Amplicon sequencing revealed that intracellular survival selected for stress-fit purF and apt alleles, as did exposure to oxygen and hyperosmotic stress. Putative protein recognition direct repeat sequences were identified in purF and apt, and a DNA-protein affinity screen captured a predicted exonuclease that promoted the global spontaneous mutation rate. This work illustrates the adaptive properties of high-frequency genetic variation in two housekeeping genes, which influences C. jejuni survival under stress and promotes its success as a pathogen. PMID:26419875

  4. Generating long sequences of high-intensity femtosecond pulses.

    PubMed

    Bitter, M; Milner, V

    2016-02-01

    We present an approach to creating pulse sequences extending beyond 150 ps in duration, comprised of 100 μJ femtosecond pulses. A quarter of the pulse train is produced by a high-resolution pulse shaper, which allows full controllability over the timing of each pulse. Two nested Michelson interferometers follow to quadruple the pulse number and the sequence duration. To boost the pulse energy, the long train is sent through a multipass Ti:sapphire amplifier, followed by an external compressor. A periodic sequence of 84 pulses of 120 fs width and an average pulse energy of 107 μJ, separated by 2 ps, is demonstrated as a proof of principle. PMID:26836087

  5. High-throughput polymorphism detection and genotyping in Brassica napus using next-generation RAD sequencing

    PubMed Central

    2012-01-01

    Background The complex genome of rapeseed (Brassica napus) is not well understood despite the economic importance of the species. Good knowledge of sequence variation is needed for genetics approaches and breeding purposes. We used a diversity set of B. napus representing eight different germplasm types to sequence genome-wide distributed restriction-site associated DNA (RAD) fragments for polymorphism detection and genotyping. Results More than 113,000 RAD clusters with more than 20,000 single nucleotide polymorphisms (SNPs) and 125 insertions/deletions were detected and characterized. About one third of the RAD clusters and polymorphisms mapped to the Brassica rapa reference sequence. An even distribution of RAD clusters and polymorphisms was observed across the B. rapa chromosomes, which suggests that there might be an equal distribution over the Brassica oleracea chromosomes, too. The representation of Gene Ontology (GO) terms for unigenes with RAD clusters and polymorphisms revealed no signature of selection with respect to the distribution of polymorphisms within genes belonging to a specific GO category. Conclusions Considering the decreasing costs for next-generation sequencing, the results of our study suggest that RAD sequencing is not only a simple and cost-effective method for high-density polymorphism detection but also an alternative to SNP genotyping from transcriptome sequencing or SNP arrays, even for species with complex genomes such as B. napus. PMID:22726880

  6. Identification of Genes Responsible for Natural Variation in Volatile Content Using Next-Generation Sequencing Technology.

    PubMed

    Amaya, Iraida; Pillet, Jeremy; Folta, Kevin M

    2016-01-01

    Identification of the genes controlling the variation of key traits remains a challenge for plant researchers and represents a goal for the development of functional markers and their implementation in marker-assisted crop breeding. As an example we describe the identification of volatile organic compounds (VOCs) that segregate as single locus or mayor quantitative trait loci (QTL) in strawberry F1 segregating populations. Next, we describe a fast and efficient method for RNA extraction in strawberry that yields high-quality RNA for downstream RNA-seq analysis. Finally, two alternative methods for analysis of global transcript expression in contrasting lines will be described in order to identify the candidate gene and genes with differential expression using RNA-seq. PMID:26577779

  7. Savant: genome browser for high-throughput sequencing data

    PubMed Central

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-01-01

    Motivation: The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Availability: Savant is freely available at http://compbio.cs.toronto.edu/savant Contact: savant@cs.toronto.edu PMID:20562449

  8. Salmonella Serotype Determination Utilizing High-Throughput Genome Sequencing Data

    PubMed Central

    Zhang, Shaokang; Yin, Yanlong; Jones, Marcus B.; Zhang, Zhenzhen; Deatherage Kaiser, Brooke L.; Dinsmore, Blake A.; Fitzgerald, Collette; Fields, Patricia I.

    2015-01-01

    Serotyping forms the basis of national and international surveillance networks for Salmonella, one of the most prevalent foodborne pathogens worldwide (1–3). Public health microbiology is currently being transformed by whole-genome sequencing (WGS), which opens the door to serotype determination using WGS data. SeqSero (www.denglab.info/SeqSero) is a novel Web-based tool for determining Salmonella serotypes using high-throughput genome sequencing data. SeqSero is based on curated databases of Salmonella serotype determinants (rfb gene cluster, fliC and fljB alleles) and is predicted to determine serotype rapidly and accurately for nearly the full spectrum of Salmonella serotypes (more than 2,300 serotypes), from both raw sequencing reads and genome assemblies. The performance of SeqSero was evaluated by testing (i) raw reads from genomes of 308 Salmonella isolates of known serotype; (ii) raw reads from genomes of 3,306 Salmonella isolates sequenced and made publicly available by GenomeTrakr, a U.S. national monitoring network operated by the Food and Drug Administration; and (iii) 354 other publicly available draft or complete Salmonella genomes. We also demonstrated Salmonella serotype determination from raw sequencing reads of fecal metagenomes from mice orally infected with this pathogen. SeqSero can help to maintain the well-established utility of Salmonella serotyping when integrated into a platform of WGS-based pathogen subtyping and characterization. PMID:25762776

  9. Mitochondrial DNA variation and phylogenetic relationships among five tuna species based on sequencing of D-loop region.

    PubMed

    Kumar, Girish; Kocour, Martin; Kunal, Swaraj Priyaranjan

    2016-05-01

    In order to assess the DNA sequence variation and phylogenetic relationship among five tuna species (Auxis thazard, Euthynnus affinis, Katsuwonus pelamis, Thunnus tonggol, and T. albacares) out of all four tuna genera, partial sequences of the mitochondrial DNA (mtDNA) D-loop region were analyzed. The estimate of intra-specific sequence variation in studied species was low, ranging from 0.027 to 0.080 [Kimura's two parameter distance (K2P)], whereas values of inter-specific variation ranged from 0.049 to 0.491. The longtail tuna (T. tonggol) and yellowfin tuna (T. albacares) were found to share a close relationship (K2P = 0.049) while skipjack tuna (K. pelamis) was most divergent studied species. Phylogenetic analysis using Maximum-Likelihood (ML) and Neighbor-Joining (NJ) methods supported the monophyletic origin of Thunnus species. Similarly, phylogeny of Auxis and Euthynnus species substantiate the monophyly. However, results showed a distinct origin of K. pelamis from genus Thunnus as well as Auxis and Euthynnus. Thus, the mtDNA D-loop region sequence data supports the polyphyletic origin of tuna species. PMID:25329285

  10. Combined examination of sequence and copy number variations in human deafness genes improves diagnosis for cases of genetic deafness

    PubMed Central

    2014-01-01

    Background Copy number variations (CNVs) are the major type of structural variation in the human genome, and are more common than DNA sequence variations in populations. CNVs are important factors for human genetic and phenotypic diversity. Many CNVs have been associated with either resistance to diseases or identified as the cause of diseases. Currently little is known about the role of CNVs in causing deafness. CNVs are currently not analyzed by conventional genetic analysis methods to study deafness. Here we detected both DNA sequence variations and CNVs affecting 80 genes known to be required for normal hearing. Methods Coding regions of the deafness genes were captured by a hybridization-based method and processed through the standard next-generation sequencing (NGS) protocol using the Illumina platform. Samples hybridized together in the same reaction were analyzed to obtain CNVs. A read depth based method was used to measure CNVs at the resolution of a single exon. Results were validated by the quantitative PCR (qPCR) based method. Results Among 79 sporadic cases clinically diagnosed with sensorineural hearing loss, we identified previously-reported disease-causing sequence mutations in 16 cases. In addition, we identified a total of 97 CNVs (72 CNV gains and 25 CNV losses) in 27 deafness genes. The CNVs included homozygous deletions which may directly give rise to deleterious effects on protein functions known to be essential for hearing, as well as heterozygous deletions and CNV gains compounded with sequence mutations in deafness genes that could potentially harm gene functions. Conclusions We studied how CNVs in known deafness genes may result in deafness. Data provided here served as a basis to explain how CNVs disrupt normal functions of deafness genes. These results may significantly expand our understanding about how various types of genetic mutations cause deafness in humans. PMID:25342930

  11. Phylogenetic Relationships and Genetic Variation in Longidorus and Xiphinema Species (Nematoda: Longidoridae) Using ITS1 Sequences of Nuclear Ribosomal DNA

    PubMed Central

    Ye, Weimin; Szalanski, Allen L.; Robbins, R. T.

    2004-01-01

    Genetic analyses using DNA sequences of nuclear ribosomal DNA ITS1 were conducted to determine the extent of genetic variation within and among Longidorus and Xiphinema species. DNA sequences were obtained from samples collected from Arkansas, California and Australia as well as 4 Xiphinema DNA sequences from GenBank. The sequences of the ITS1 region including the 3' end of the 18S rDNA gene and the 5' end of the 5.8S rDNA gene ranged from 1020 bp to 1244 bp for the 9 Longidorus species, and from 870 bp to 1354 bp for the 7 Xiphinema species. Nucleotide frequencies were: A = 25.5%, C = 21.0%, G = 26.4%, and T = 27.1%. Genetic variation between the two genera had a maximum divergence of 38.6% between X. chambersi and L. crassus. Genetic variation among Xiphinema species ranged from 3.8% between X. diversicaudatum and X. bakeri to 29.9% between X. chambersi and X. italiae. Within Longidorus, genetic variation ranged from 8.9% between L. crassus and L. grandis to 32.4% between L. fragilis and L. diadecturus. Intraspecific genetic variation in X. americanum sensu lato ranged from 0.3% to 1.9%, while genetic variation in L. diadecturus had 0.8% and L. biformis ranged from 0.6% to 10.9%. Identical sequences were obtained between the two populations of L. grandis, and between the two populations of X. bakeri. Phylogenetic analyses based on the ITS1 DNA sequence data were conducted on each genus separately using both maximum parsimony and maximum likelihood analysis. Among the Longidorus taxa, 4 subgroups are supported: L. grandis, L. crassus, and L. elongatus are in one cluster; L. biformis and L. paralongicaudatus are in a second cluster; L. fragilis and L. breviannulatus are in a third cluster; and L. diadecturus is in a fourth cluster. Among the Xiphinema taxa, 3 subgroups are supported: X. americanum with X. chambersi, X. bakeri with X. diversicaudatum, and X. italiae and X. vuittenezi forming a sister group with X. index. The relationships observed in this study

  12. Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

    PubMed

    Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

    2016-03-01

    Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. PMID:26715454

  13. High-Resolution Specificity from DNA Sequencing Highlights Alternative Modes of Lac Repressor Binding

    PubMed Central

    Zuo, Zheng; Stormo, Gary D.

    2014-01-01

    Knowing the specificity of transcription factors is critical to understanding regulatory networks in cells. The lac repressor–operator system has been studied for many years, but not with high-throughput methods capable of determining specificity comprehensively. Details of its binding interaction and its selection of an asymmetric binding site have been controversial. We employed a new method to accurately determine relative binding affinities to thousands of sequences simultaneously, requiring only sequencing of bound and unbound fractions. An analysis of 2560 different DNA sequence variants, including both base changes and variations in operator length, provides a detailed view of lac repressor sequence specificity. We find that the protein can bind with nearly equal affinities to operators of three different lengths, but the sequence preference changes depending on the length, demonstrating alternative modes of interaction between the protein and DNA. The wild-type operator has an odd length, causing the two monomers to bind in alternative modes, making the asymmetric operator the preferred binding site. We tested two other members of the LacI/GalR protein family and find that neither can bind with high affinity to sites with alternative lengths or shows evidence of alternative binding modes. A further comparison with known and predicted motifs suggests that the lac repressor may be unique in this ability and that this may contribute to its selection. PMID:25209146

  14. High-resolution specificity from DNA sequencing highlights alternative modes of Lac repressor binding.

    PubMed

    Zuo, Zheng; Stormo, Gary D

    2014-11-01

    Knowing the specificity of transcription factors is critical to understanding regulatory networks in cells. The lac repressor-operator system has been studied for many years, but not with high-throughput methods capable of determining specificity comprehensively. Details of its binding interaction and its selection of an asymmetric binding site have been controversial. We employed a new method to accurately determine relative binding affinities to thousands of sequences simultaneously, requiring only sequencing of bound and unbound fractions. An analysis of 2560 different DNA sequence variants, including both base changes and variations in operator length, provides a detailed view of lac repressor sequence specificity. We find that the protein can bind with nearly equal affinities to operators of three different lengths, but the sequence preference changes depending on the length, demonstrating alternative modes of interaction between the protein and DNA. The wild-type operator has an odd length, causing the two monomers to bind in alternative modes, making the asymmetric operator the preferred binding site. We tested two other members of the LacI/GalR protein family and find that neither can bind with high affinity to sites with alternative lengths or shows evidence of alternative binding modes. A further comparison with known and predicted motifs suggests that the lac repressor may be unique in this ability and that this may contribute to its selection. PMID:25209146

  15. A High Resolution Genetic Map Anchoring Scaffolds of the Sequenced Watermelon Genome

    PubMed Central

    Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F8 population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits. PMID:22247776

  16. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics. PMID:25293589

  17. Fusion genes and their discovery using high throughput sequencing.

    PubMed

    Annala, M J; Parker, B C; Zhang, W; Nykter, M

    2013-11-01

    Fusion genes are hybrid genes that combine parts of two or more original genes. They can form as a result of chromosomal rearrangements or abnormal transcription, and have been shown to act as drivers of malignant transformation and progression in many human cancers. The biological significance of fusion genes together with their specificity to cancer cells has made them into excellent targets for molecular therapy. Fusion genes are also used as diagnostic and prognostic markers to confirm cancer diagnosis and monitor response to molecular therapies. High-throughput sequencing has enabled the systematic discovery of fusion genes in a wide variety of cancer types. In this review, we describe the history of fusion genes in cancer and the ways in which fusion genes form and affect cellular function. We also describe computational methodologies for detecting fusion genes from high-throughput sequencing experiments, and the most common sources of error that lead to false discovery of fusion genes. PMID:23376639

  18. Variations in T Cell Transcription Factor Sequence and Expression Associated with Resistance to the Sheep Nematode Teladorsagia circumcincta.

    PubMed

    Wilkie, Hazel; Gossner, Anton; Bishop, Stephen; Hopkins, John

    2016-01-01

    This study used selected lambs that varied in their resistance to the gastrointestinal parasite Teladorsagia circumcincta. Infection over 12 weeks identified susceptible (high adult worm count, AWC; high fecal egg count, FEC; low body weight, BW; low IgA) and resistant sheep (no/low AWC and FEC, high BW and high IgA). Resistance is mediated largely by a Th2 response and IgA and IgE antibodies, and is a heritable characteristic. The polarization of T cells and the development of appropriate immune responses is controlled by the master regulators, T-bet (TBX21), GATA-3 (GATA3), RORγt (RORC2) and RORα (RORA); and several inflammatory diseases of humans and mice are associated with allelic or transcript variants of these transcription factors. This study tested the hypothesis that resistance of sheep to T. circumcincta is associated with variations in the structure, sequence or expression levels of individual master regulator transcripts. We have identified and sequenced one variant of sheep TBX21, two variants of GATA3 and RORC2 and five variants of RORA from lymph node mRNA. Relative RT-qPCR analysis showed that TBX21, GATA3 and RORC2 were not significantly differentially-expressed between the nine most resistant (AWC, 0; FEC, 0) and the nine most susceptible sheep (AWC, mean 6078; FEC, mean 350). Absolute RT-qPCR on all 45 animals identified RORVv5 as being significantly differentially-expressed (p = 0.038) between resistant, intermediate and susceptible groups; RORCv2 was not differentially-expressed (p = 0.77). Spearman's rank analysis showed that RORAv5 transcript copy number was significantly negatively correlated with parameters of susceptibility, AWC and FEC; and was positively correlated with BW. RORCv2 was not correlated with AWC, FEC or BW but was significantly negatively correlated with IgA antibody levels. This study identifies the full length RORA variant (RORAv5) as important in controlling the protective immune response to T. circumcincta infection

  19. Variations in T Cell Transcription Factor Sequence and Expression Associated with Resistance to the Sheep Nematode Teladorsagia circumcincta

    PubMed Central

    Wilkie, Hazel; Gossner, Anton; Hopkins, John

    2016-01-01

    This study used selected lambs that varied in their resistance to the gastrointestinal parasite Teladorsagia circumcincta. Infection over 12 weeks identified susceptible (high adult worm count, AWC; high fecal egg count, FEC; low body weight, BW; low IgA) and resistant sheep (no/low AWC and FEC, high BW and high IgA). Resistance is mediated largely by a Th2 response and IgA and IgE antibodies, and is a heritable characteristic. The polarization of T cells and the development of appropriate immune responses is controlled by the master regulators, T-bet (TBX21), GATA-3 (GATA3), RORγt (RORC2) and RORα (RORA); and several inflammatory diseases of humans and mice are associated with allelic or transcript variants of these transcription factors. This study tested the hypothesis that resistance of sheep to T. circumcincta is associated with variations in the structure, sequence or expression levels of individual master regulator transcripts. We have identified and sequenced one variant of sheep TBX21, two variants of GATA3 and RORC2 and five variants of RORA from lymph node mRNA. Relative RT-qPCR analysis showed that TBX21, GATA3 and RORC2 were not significantly differentially-expressed between the nine most resistant (AWC, 0; FEC, 0) and the nine most susceptible sheep (AWC, mean 6078; FEC, mean 350). Absolute RT-qPCR on all 45 animals identified RORVv5 as being significantly differentially-expressed (p = 0.038) between resistant, intermediate and susceptible groups; RORCv2 was not differentially-expressed (p = 0.77). Spearman’s rank analysis showed that RORAv5 transcript copy number was significantly negatively correlated with parameters of susceptibility, AWC and FEC; and was positively correlated with BW. RORCv2 was not correlated with AWC, FEC or BW but was significantly negatively correlated with IgA antibody levels. This study identifies the full length RORA variant (RORAv5) as important in controlling the protective immune response to T. circumcincta infection

  20. Allelic sequence variation of the HLA-DQ loci: relationship to serology and to insulin-dependent diabetes susceptibility.

    PubMed Central

    Horn, G T; Bugawan, T L; Long, C M; Erlich, H A

    1988-01-01

    Analysis of sequence variation in the polymorphic second exon of the major histocompatibility complex genes HLA-DQ alpha and -DQ beta has revealed 8 allelic variants at the alpha locus and 13 variants at the beta locus. Correlation of sequence variation with serologic typing suggests that the DQw2, DQw3, and DQ(blank) types are determined by the DQ beta subunit, while the DQw1 specificity is determined by DQ alpha. The nature of the amino acid at position 57 in the DQ beta subunit is correlated with susceptibility to insulin-dependent diabetes mellitus. This region of the DQ beta chain contains shared peptides with Epstein-Barr virus and rubella virus. PMID:2842756

  1. The tryptophan repressor sequence is highly conserved among the Enterobacteriaceae.

    PubMed Central

    Arvidson, D N; Arvidson, C G; Lawson, C L; Miner, J; Adams, C; Youderian, P

    1994-01-01

    Tryptophan biosynthesis in Escherichia coli is regulated by the product of the trpR gene, the tryptophan (Trp) repressor. Trp aporepressor binds the corepressor, L-tryptophan, to form a holorepressor complex, which binds trp operator DNA tightly, and inhibits transcription of the tryptophan biosynthetic operon. The conservation of trp operator sequences among enteric Gram-negative bacteria suggests that trpR genes from other bacterial species can be cloned by complementation in E. coli. To clone trpR homologues, a deletion of the E. coli trpR gene, delta trpR504, was made on a plasmid by site-directed mutagenesis, then crossed onto the E. coli genome. Plasmid clones of the trpR genes of Enterobacter aerogenes and Enterobacter cloacae were isolated by complementation of the delta trpR504 allele, scored as the ability to repress beta-galactosidase synthesis from a prophage-borne trpE-lacZ gene fusion. The predicted amino acid sequences of four enteric TrpR proteins show differences, clustered on the backside of the folded repressor, opposite the DNA-binding helix-turn-helix substructures. These differences are predicted to have little effect on the interactions of the aporepressor with tryptophan, holorepressor with operator DNA, or tandemly bound holorepressor dimers with one another. Although there is some variation observed at the dimer interface, interactions predicted to stabilize the interface are conserved. The phylogenetic relationships revealed by the TrpR amino acid sequence alignment agree with the results of others. PMID:8208606

  2. Historical high-resolution dynamic sea level variations

    NASA Astrophysics Data System (ADS)

    Brunnabend, Sandra-Esther; Dijkstra, Henk A.; Kliphuis, Michael; van Werkhoven, Ben; Bal, Henri E.; Maassen, Jason; van Meersbergen, Maarten; Seinstra, Frank

    2014-05-01

    To investigate future changes in the dynamics of the ocean and therefore in dynamic sea level, ocean models need to be able to adequately represent oceanic dynamical processes. Therefore, resolving ocean eddies and representing boundary currents is of major importance. In this study, we investigate historical variations in dynamical sea surface height using the strongly eddying global version of the Parallel Ocean Program (POP). First, differences in high and low-resolution ocean model results (0.1 vs. 1.0 degree) were analyzed using a climatological atmospheric forcing dataset. Second, we forced the high-resolution model by atmospheric conditions over the period from 1950 to 2000 that are derived from a simulation using the ECHAM5-OM1 model (within the ESSENCE project, see www.knmi.nl/~sterl/Essence/). In general, the large-scale ocean fields of the POP model simulation agree well with those of the low-resolution ocean model (MPI-OM) results. Variations occur due to the different models used and, especially, due to the capability of the high-resolution POP model to resolve eddies. A comparison of high-resolution ocean model results with in-situ measurements, such as dynamic topography provided by altimetry, and salinity and temperature provided by the WOA2013, also show good agreement.

  3. Sequence variation in the IL4 gene and resistance to Trypanosoma cruzi infection in Bolivians

    PubMed Central

    Alvarado Arnez, Lucia Elena; Venegas, Evaristo N.; Ober, Carole; Thompson, Emma E.

    2013-01-01

    Summary Variation in the IL4 gene has been associated with parastic infections, but has not been studied in Bolivians infected with Trypanosoma cruzi. Our results suggest that variation at IL4 influences susceptibility to T. cruzi infection in Bolivians. PMID:21211660

  4. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    SciTech Connect

    Le Coq, Johanne; Ghosh, Partho

    2012-06-19

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.

  5. The Effects of Sequence Variation on Genome-wide NRF2 Binding—New Target Genes and Regulatory SNPs

    PubMed Central

    Kuosmanen, Suvi M.; Viitala, Sari; Laitinen, Tuomo; Peräkylä, Mikael; Pölönen, Petri; Kansanen, Emilia; Leinonen, Hanna; Raju, Suresh; Wienecke-Baldacchino, Anke; Närvänen, Ale; Poso, Antti; Heinäniemi, Merja; Heikkinen, Sami; Levonen, Anna-Liisa

    2016-01-01

    Transcription factor binding specificity is crucial for proper target gene regulation. Motif discovery algorithms identify the main features of the binding patterns, but the accuracy on the lower affinity sites is often poor. Nuclear factor E2-related factor 2 (NRF2) is a ubiquitous redox-activated transcription factor having a key protective role against endogenous and exogenous oxidant and electrophile stress. Herein, we decipher the effects of sequence variation on the DNA binding sequence of NRF2, in order to identify both genome-wide binding sites for NRF2 and disease-associated regulatory SNPs (rSNPs) with drastic effects on NRF2 binding. Interactions between NRF2 and DNA were studied using molecular modelling, and NRF2 chromatin immunoprecipitation-sequence datasets together with protein binding microarray measurements were utilized to study binding sequence variation in detail. The binding model thus generated was used to identify genome-wide binding sites for NRF2, and genomic binding sites with rSNPs that have strong effects on NRF2 binding and reside on active regulatory elements in human cells. As a proof of concept, miR-126–3p and -5p were identified as NRF2 target microRNAs, and a rSNP (rs113067944) residing on NRF2 target gene (Ferritin, light polypeptide, FTL) promoter was experimentally verified to decrease NRF2 binding and result in decreased transcriptional activity. PMID:26826707

  6. High-Throughput Sequencing of a South American Amerindian

    PubMed Central

    Almeida, Renan; Alencar, Dayse O.; Barbosa, Maria Silvanira; Gusmão, Leonor; Silva, Wilson A.; de Souza, Sandro J.; Silva, Artur; Ribeiro-dos-Santos, Ândrea; Darnet, Sylvain; Santos, Sidney

    2013-01-01

    The emergence of next-generation sequencing technologies allowed access to the vast amounts of information that are contained in the human genome. This information has contributed to the understanding of individual and population-based variability and improved the understanding of the evolutionary history of different human groups. However, the genome of a representative of the Amerindian populations had not been previously sequenced. Thus, the genome of an individual from a South American tribe was completely sequenced to further the understanding of the genetic variability of Amerindians. A total of 36.8 giga base pairs (Gbp) were sequenced and aligned with the human genome. These Gbp corresponded to 95.92% of the human genome with an estimated miscall rate of 0.0035 per sequenced bp. The data obtained from the alignment were used for SNP (single-nucleotide) and INDEL (insertion-deletion) calling, which resulted in the identification of 502,017 polymorphisms, of which 32,275 were potentially new high-confidence SNPs and 33,795 new INDELs, specific of South Native American populations. The authenticity of the sample as a member of the South Native American populations was confirmed through the analysis of the uniparental (maternal and paternal) lineages. The autosomal comparison distinguished the investigated sample from others continental populations and revealed a close relation to the Eastern Asian populations and Aboriginal Australian. Although, the findings did not discard the classical model of America settlement; it brought new insides to the understanding of the human population history. The present study indicates a remarkable genetic variability in human populations that must still be identified and contributes to the understanding of the genetic variability of South Native American populations and of the human populations history. PMID:24386182

  7. A HIGH COVERAGE GENOME SEQUENCE FROM AN ARCHAIC DENISOVAN INDIVIDUAL

    PubMed Central

    Meyer, Matthias; Kircher, Martin; Gansauge, Marie-Theres; Li, Heng; Racimo, Fernando; Mallick, Swapan; Schraiber, Joshua G.; Jay, Flora; Prüfer, Kay; de Filippo, Cesare; Sudmant, Peter H.; Alkan, Can; Fu, Qiaomei; Do, Ron; Rohland, Nadin; Tandon, Arti; Siebauer, Michael; Green, Richard E.; Bryc, Katarzyna; Briggs, Adrian W.; Stenzel, Udo; Dabney, Jesse; Shendure, Jay; Kitzman, Jacob; Hammer, Michael F.; Shunkov, Michael V.; Derevianko, Anatoli P.; Patterson, Nick; Andrés, Aida M.; Eichler, Evan E.; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante

    2013-01-01

    We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans. PMID:22936568

  8. Validation of high throughput sequencing and microbial forensics applications

    PubMed Central

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166

  9. Advances, practice, and clinical perspectives in high-throughput sequencing.

    PubMed

    Park, S-J; Saito-Adachi, M; Komiyama, Y; Nakai, K

    2016-07-01

    Remarkable advances in high-throughput sequencing technologies have fundamentally changed our understanding of the genetic and epigenetic molecular bases underlying human health and diseases. As these technologies continue to revolutionize molecular biology leading to fresh perspectives, it is imperative to thoroughly consider the enormous excitement surrounding the technologies by highlighting the characteristics of platforms and their global trends as well as potential benefits and limitations. To date, with a variety of platforms, the technologies provide an impressive range of applications, including sequencing of whole genomes and transcriptomes, identifying of genome modifications, and profiling of protein interactions. Because these applications produce a flood of data, simultaneous development of bioinformatics tools is required to efficiently deal with the big data and to comprehensively analyze them. This review covers the major achievements and performances of the high-throughput sequencing and further summarizes the characteristics of their applications along with introducing applicable bioinformatics tools. Moreover, a step-by-step procedure for a practical transcriptome analysis is described employing an analytical pipeline. Clinical perspectives with special consideration to human oral health and diseases are also covered. PMID:26602181

  10. Validation of high throughput sequencing and microbial forensics applications.

    PubMed

    Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166

  11. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    PubMed Central

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse; Hansen, Thomas Arn; Kjartansdóttir, Kristín Rós; Guldberg Frøslev, Tobias; Snogdal Boutrup, Torsten; Nielsen, Lars Peter; Willerslev, Eske; Hansen, Anders J.

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses. PMID:26493184

  12. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads.

    PubMed

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse; Hansen, Thomas Arn; Kjartansdóttir, Kristín Rós; Guldberg Frøslev, Tobias; Snogdal Boutrup, Torsten; Nielsen, Lars Peter; Willerslev, Eske; Hansen, Anders J

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses. PMID:26493184

  13. ClinVar: public archive of relationships among sequence variation and human phenotype

    PubMed Central

    Landrum, Melissa J.; Lee, Jennifer M.; Riley, George R.; Jang, Wonhee; Rubinstein, Wendy S.; Church, Deanna M.; Maglott, Donna R.

    2014-01-01

    ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on the phenotypic descriptions maintained in MedGen (http://www.ncbi.nlm.nih.gov/medgen). Each ClinVar record represents the submitter, the variation and the phenotype, i.e. the unit that is assigned an accession of the format SCV000000000.0. The submitter can update the submission at any time, in which case a new version is assigned. To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations. Data in ClinVar are available in multiple formats, including html, download as XML, VCF or tab-delimited subsets. Data from ClinVar are provided as annotation tracks on genomic RefSeqs and are used in tools such as Variation Reporter (http://www.ncbi.nlm.nih.gov/variation/tools/reporter), which reports what is known about variation based on user-supplied locations. PMID:24234437

  14. [Mitochondrial DNA sequence variation, demographic history, and population structure of Amur sturgeon Acipenser schrenckii Brandt, 1869].

    PubMed

    Shedko, S V; Miroshnichenko, I L; Nemkova, G A; Koshelev, V N; Shedko, M B

    2015-02-01

    The variability of the mtDNA control region (D-loop) was examined in Amur sturgeon endemic to the Amur River. This species is also classified as critically endangered by the IUCN Red List of Threatened species. Sequencing of 796- to 812-bp fragments of the D-loop in 112 sturgeon collected in the Lower Amur revealed 73 different genotypes. The sample was characterized by a high level of haplotypic (0.976) and nucleotide (0.0194) diversity. The identified haplotypes split into two well-defined monophyletic groups, BG (n = 39) and SM (n = 34), differing (HKY distance) on average by 3.41% of nucleotide positions upon an average level of intragroup differences of 0.54 and 1.23%, respectively. Moreover, the haplotypes of the SM groups differed by the presence of a 13-14 bp deletion. Most ofthe samples (66 out of 112) carried BG haplotypes. Overall, the pattern of pairwise nucleotide differences and the results of neutrality tests, as well as the results of tests for compliance with the model of sudden demographic expansion or with the model of exponential growth pointed to a past significant increase in the number of Amur sturgeon, which was most clearly manifested in the analysis of data on the BG haplogroup. The constructed Bayesian skyline plots showed that this growth began about 18 to 16 thousand years ago. At present, the effective size of the strongly reduced (due to overharvesting) population of Amur sturgeon may be equal to or even lower than it was before the beginning of this growth during the Last Glacial Maximum. The presence in the mitochondrial gene pool ofAmur sturgeon of two haplogroups, their unequal evolutionary dynamics, and, judging by scanty data, their unequal representation in the Russian and Chinese parts of the Amur River basin point to the possible existence of at least two distinct populations of Amur sturgeon in the past. PMID:25966586

  15. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon

    PubMed Central

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. PMID:27311063

  16. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    PubMed

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. PMID:27311063

  17. Contributions of 18 Additional DNA Sequence Variations in the Gene Encoding Apolipoprotein E to Explaining Variation in Quantitative Measures of Lipid Metabolism

    PubMed Central

    Stengård, Jari H.; Clark, Andrew G.; Weiss, Kenneth M.; Kardia, Sharon; Nickerson, Deborah A.; Salomaa, Veikko; Ehnholm, Christian; Boerwinkle, Eric; Sing, Charles F.

    2002-01-01

    Apolipoprotein E (ApoE) is a major constituent of many lipoprotein particles. Previous genetic studies have focused on six genotypes defined by three alleles, denoted ε2, ε3, and ε4, encoded by two variable exonic sites that segregate in most populations. We have reported studies of the distribution of alleles of 20 biallelic variable sites in the gene encoding the ApoE molecule within and among samples, ascertained without regard to health, from each of three populations: African Americans from Jackson, Miss.; Europeans from North Karelia, Finland; and non-Hispanic European Americans from Rochester, Minn. Here we ask (1) how much variation in blood levels of ApoE (lnApoE), of total cholesterol (TC), of high-density lipoprotein cholesterol (HDL-C), and of triglyceride (lnTG) is statistically explained by variation among APOE genotypes defined by the ε2, ε3, and ε4 alleles; (2) how much additional variation in these traits is explained by genotypes defined by combining the two variable sites that define these three alleles with one or more additional variable sites; and (3) what are the locations and relative allele frequencies of the sites that define multisite genotypes that significantly improve the statistical explanation of variation beyond that provided by the genotypes defined by the ε2, ε3, and ε4 alleles, separately for each of the six gender-population strata. This study establishes that the use of only genotypes defined by the ε2, ε3, and ε4 alleles gives an incomplete picture of the contribution that the variation in the APOE gene makes to the statistical explanation of interindividual variation in blood measurements of lipid metabolism. The addition of variable sites to the genotype definition significantly improved the ability to explain variation in lnApoE and in TC and resulted in the explanation of variation in HDL-C and in lnTG. The combination of additional sites that explained the greatest amount of trait variation was different for

  18. Source quality variations tied to sequence development: Integration of physical and chemical aspects, Lower to Middle Triassic, western Barents Sea

    SciTech Connect

    Bohacs, K.M.; Isaksen, G.H. )

    1991-03-01

    Triassic mudrocks from the Barents Sea area demonstrate to covariance of physical and chemical properties of mudrocks deposited in shelfal environments and the aspect of depositional sequences in distal settings. The tie of physical parameters to chemical character within a detailed sequence-stratigraphic framework enables the construction of depositional-facies models to predict organic-matter content and quality. This allows the explorer to more closely constrain and predict the nature of potential source rocks using seismic and well-log data. Changes in lithology, bedding geometry, sedimentary structures, body and trace-fossil assemblages, and inorganic, bulk-organic, and molecular geochemistry revealed the detailed depositional environments. The depositional environments stack predictably, according to their position in the depositional sequence: from aerobic lower-shoreface--offshore transition environments in lowstand systems tracts to dysaerobic-anaerobic distal open-marine-shelf environment in transgressive and early highstand systems tracts. Quantitative molecular geochemistry also revealed variations within this distal setting and strong covariance with sequence position. Input of organic matter from terrigenous higher plants dominates the lowstands whereas marine-algal organic matter is most prevalent within transgressive and highstand systems tracts. Specifically, the abundance of C{sub 30} steranes, total steranes, and moretane reflected development of the sequences.

  19. The influence of aging, environmental exposures and local sequence features on the variation of DNA methylation in blood

    PubMed Central

    Langevin, Scott M; Houseman, E Andres; Christensen, Brock C; Wiencke, John K; Nelson, Heather H; Karagas, Margaret R; Marsit, Carmen J

    2011-01-01

    In order to properly comprehend the epigenetic dysregulation that occurs during the course of disease, there is a need to characterize the epigenetic variability in healthy individuals that arises in response to aging and exposures, and to understand such variation within the biological context of the DNA sequence. We analyzed the methylation of 26,486 autosomal CpG loci in blood from 205 healthy subjects, using three complementary approaches to assess the association between methylation, age or exposures and local sequence features, such as CpG island status, repeat sequences, location within a polycomb target gene or proximity to a transcription factor binding site. We clustered CpGs (1) using unsupervised recursively partitioned mixture modeling (RPMM) and (2) bioinformatically-informed methods and (3) also employed a marginal model-based (non-clustering) approach. We observed associations between age and methylation and hair dye use and methylation, where the direction and magnitude was contingent on the local sequence features of the CpGs. Our results demonstrate that CpGs are differentially methylated dependent upon the genomic features of the sequence in which they are embedded, and that CpG methylation is associated with age and hair dye use in a CpG context-dependent manner in healthy individuals. PMID:21617368

  20. Widespread Sequence Variations in VAMP1 across Vertebrates Suggest a Potential Selective Pressure from Botulinum Neurotoxins

    PubMed Central

    Peng, Lisheng; Adler, Michael; Demogines, Ann; Borrell, Andrew; Liu, Huisheng; Tao, Liang; Tepp, William H.; Zhang, Su-Chun; Johnson, Eric A.; Sawyer, Sara L.; Dong, Min

    2014-01-01

    Botulinum neurotoxins (BoNT/A-G), the most potent toxins known, act by cleaving three SNARE proteins required for synaptic vesicle exocytosis. Previous studies on BoNTs have generally utilized the major SNARE homologues expressed in brain (VAMP2, syntaxin 1, and SNAP-25). However, BoNTs target peripheral motor neurons and cause death by paralyzing respiratory muscles such as the diaphragm. Here we report that VAMP1, but not VAMP2, is the SNARE homologue predominantly expressed in adult rodent diaphragm motor nerve terminals and in differentiated human motor neurons. In contrast to the highly conserved VAMP2, BoNT-resistant variations in VAMP1 are widespread across vertebrates. In particular, we identified a polymorphism at position 48 of VAMP1 in rats, which renders VAMP1 either resistant (I48) or sensitive (M48) to BoNT/D. Taking advantage of this finding, we showed that rat diaphragms with I48 in VAMP1 are insensitive to BoNT/D compared to rat diaphragms with M48 in VAMP1. This unique intra-species comparison establishes VAMP1 as a physiological toxin target in diaphragm motor nerve terminals, and demonstrates that the resistance of VAMP1 to BoNTs can underlie the insensitivity of a species to members of BoNTs. Consistently, human VAMP1 contains I48, which may explain why humans are insensitive to BoNT/D. Finally, we report that residue 48 of VAMP1 varies frequently between M and I across seventeen closely related primate species, suggesting a potential selective pressure from members of BoNTs for resistance in vertebrates. PMID:25010769

  1. Widespread sequence variations in VAMP1 across vertebrates suggest a potential selective pressure from botulinum neurotoxins.

    PubMed

    Peng, Lisheng; Adler, Michael; Demogines, Ann; Borrell, Andrew; Liu, Huisheng; Tao, Liang; Tepp, William H; Zhang, Su-Chun; Johnson, Eric A; Sawyer, Sara L; Dong, Min

    2014-07-01

    Botulinum neurotoxins (BoNT/A-G), the most potent toxins known, act by cleaving three SNARE proteins required for synaptic vesicle exocytosis. Previous studies on BoNTs have generally utilized the major SNARE homologues expressed in brain (VAMP2, syntaxin 1, and SNAP-25). However, BoNTs target peripheral motor neurons and cause death by paralyzing respiratory muscles such as the diaphragm. Here we report that VAMP1, but not VAMP2, is the SNARE homologue predominantly expressed in adult rodent diaphragm motor nerve terminals and in differentiated human motor neurons. In contrast to the highly conserved VAMP2, BoNT-resistant variations in VAMP1 are widespread across vertebrates. In particular, we identified a polymorphism at position 48 of VAMP1 in rats, which renders VAMP1 either resistant (I48) or sensitive (M48) to BoNT/D. Taking advantage of this finding, we showed that rat diaphragms with I48 in VAMP1 are insensitive to BoNT/D compared to rat diaphragms with M48 in VAMP1. This unique intra-species comparison establishes VAMP1 as a physiological toxin target in diaphragm motor nerve terminals, and demonstrates that the resistance of VAMP1 to BoNTs can underlie the insensitivity of a species to members of BoNTs. Consistently, human VAMP1 contains I48, which may explain why humans are insensitive to BoNT/D. Finally, we report that residue 48 of VAMP1 varies frequently between M and I across seventeen closely related primate species, suggesting a potential selective pressure from members of BoNTs for resistance in vertebrates. PMID:25010769

  2. Distribution of sequence variation in the mtDNA control region of Native North Americans.

    PubMed

    Lorenz, J G; Smith, D G

    1997-12-01

    The distributions of mtDNA diversity within and/or among North American haplogroups, language groups, and tribes were used to characterize the process of tribalization that followed the colonization of the New World. Approximately 400 bp from the mtDNA control region of 1 Na-Dene and 33 Amerind individuals representing a wide variety of languages and geographic origins were sequenced. With the inclusion of data from previous studies, 225 native North American (284 bp) sequences representing 85 distinct mtDNA lineages were analyzed. Mean pairwise sequence differences between (and within) tribes and language groups were primarily due to differences in the distribution of three of the four major haplogroups that evolved before settlement of the New World. Pairwise sequence differences within each of these three haplogroups were more similar than previous studies based on restriction enzyme analysis have indicated. The mean of pairwise sequence differences between Amerind members of haplogroup A, the most common of the four haplogroups in North America, was only slightly higher than that for the Eskimo, providing no evidence of separate ancestry, but was about two-thirds higher than that for the Na-Dene. However, analysis of pairwise sequence divergence between only tribal-specific lineages, unweighted for sample size, suggests that random evolutionary processes have reduced sequence diversity within the Na-Dene and that members of all three language groups possess approximately equally diverse mtDNA lineages. Comparisons of diversity within and between specific ethnic groups with the largest sample size were also consistent with this outcome. These data are not consistent with the hypothesis that the New World was settled by more than a single migration. Because lineages tended not to cluster by tribe and because lineage sharing among linguistically unrelated groups was restricted to geographically proximate groups, the tribalization process probably did not occur

  3. Variations of the sequence stratigraphic model: Past concepts, present understandings, and future directions

    SciTech Connect

    Posamentier, H.W. ); James, D.H. )

    1991-03-01

    The working hypothesis upon which the sequence concepts are based is that the relative sea level change results in changes in the capacity of a basin to accommodate sediment, which, in turn, results in a succession of sequences. The interplay between eustasy, tectonics, sediment flux, and physiography yields a predictable geologic response in carbonate, clastic, as well as mixed carbonate/clastic settings. The criteria for recognition of sequence boundaries can be varied within a given basin as well as between basins. They include but are not restricted to (1) a basinward shift of facies across a sharp bedding contact, (2) onlapping stratal geometry, and (3) truncation of strata. The key to the correct utilization of these concepts is to recognize sequence stratigraphy as an approach or a tool rather than a rigid template. Observations from the upper Albian, Cretaceous, Viking Formation of the western Canadian sedimentary basin are presented to illustrate the stratigraphic expression of clastic depositional sequences on a ramp margin. In this setting, forced regressions and lowstand shorelines commonly occur, incised valleys sometimes occur, and submarine fans rarely occur, in response to fluctuations of relative sea level. The base of the Viking Formation sometimes is characterized by relatively coarse-grained sediments sharply overlying fine-grained offshore muds and is interpreted as a third-order sequence boundary. Pebbles occasionally are observed at this contact. Subsequently, a number of higher-order sequences within the lower to middle Viking are observed and are characterized by the occurrence of forced regressions and lowstand shorelines without associated incised valleys.

  4. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    PubMed

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources. PMID:25950195

  5. BM-SNP: A Bayesian Model for SNP Calling Using High Throughput Sequencing Data.

    PubMed

    Xu, Yanxun; Zheng, Xiaofeng; Yuan, Yuan; Estecio, Marcos R; Issa, Jean-Pierre; Qiu, Peng; Ji, Yuan; Liang, Shoudan

    2014-01-01

    A single-nucleotide polymorphism (SNP) is a sole base change in the DNA sequence and is the most common polymorphism. Detection and annotation of SNPs are among the central topics in biomedical research as SNPs are believed to play important roles on the manifestation of phenotypic events, such as disease susceptibility. To take full advantage of the next-generation sequencing (NGS) technology, we propose a Bayesian approach, BM-SNP, to identify SNPs based on the posterior inference using NGS data. In particular, BM-SNP computes the posterior probability of nucleotide variation at each covered genomic position using the contents and frequency of the mapped short reads. The position with a high posterior probability of nucleotide variation is flagged as a potential SNP. We apply BM-SNP to two cell-line NGS data, and the results show a high ratio of overlap ( >95 percent) with the dbSNP database. Compared with MAQ, BM-SNP identifies more SNPs that are in dbSNP, with higher quality. The SNPs that are called only by BM-SNP but not in dbSNP may serve as new discoveries. The proposed BM-SNP method integrates information from multiple aspects of NGS data, and therefore achieves high detection power. BM-SNP is fast, capable of processing whole genome data at 20-fold average coverage in a short amount of time. PMID:26357041

  6. Single-Molecule LATE-PCR Analysis of Human Mitochondrial Genomic Sequence Variations

    PubMed Central

    Osborne, Adam; Reis, Arthur H.; Bach, Loren; Wangh, Lawrence J.

    2009-01-01

    It is thought that changes in mitochondrial DNA are associated with many degenerative diseases, including Alzheimer's and diabetes. Much of the evidence, however, depends on correlating disease states with changing levels of heteroplasmy within populations of mitochondrial genomes, rather than individual mitochondrial genomes. Thus these measurements are likely to either overestimate the extent of heteroplasmy due to technical artifacts, or underestimate the actual level of heteroplasmy because only the most abundant changes are observable. In contrast, Single Molecule (SM) LATE-PCR analysis achieves efficient amplification of single-stranded amplicons from single target molecules. The product molecules, in turn, can be accurately sequenced using a convenient Dilute-‘N’-Go protocol, as shown here. Using these novel technologies we have rigorously analyzed levels of mitochondrial genome heteroplasmy found in single hair shafts of healthy adult individuals. Two of the single molecule sequences (7% of the samples) were found to contain mutations. Most of the mtDNA sequence changes, however, were due to the presence of laboratory contaminants. Amplification and sequencing errors did not result in mis-identification of mutations. We conclude that SM-LATE-PCR in combination with Dilute-‘N’-Go Sequencing are convenient technologies for detecting infrequent mutations in mitochondrial genomes, provided great care is taken to control and document contamination. We plan to use these technologies in the future to look for age, drug, and disease related mitochondrial genome changes in model systems and clinical samples. PMID:19461959

  7. High-throughput sequencing: a roadmap toward community ecology.

    PubMed

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-04-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines. PMID:23610649

  8. High-Throughput Sequencing: A Roadmap Toward Community Ecology

    PubMed Central

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-01-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines. PMID:23610649

  9. Associations between sequence variations in the mitochondrial DNA D-loop region and outcome of hepatocellular carcinoma

    PubMed Central

    LI, SHILAI; WAN, PEIQI; PENG, TAO; XIAO, KAIYIN; SU, MING; SHANG, LIMING; XU, BANGHAO; SU, ZHIXIONG; YE, XINPING; PENG, NING; QIN, QUANLIN; LI, LEQUN

    2016-01-01

    The association between mitochondrial DNA (mtDNA) polymorphisms or mutations and the prognoses of cancer have been investigated previously, but the results have been ambiguous. In the present study, the associations between sequence variations in the mtDNA D-loop region and the outcomes of patients with hepatocellular carcinoma (HCC) were analysed. A total of 140 patients with HCC (123 males and 17 females), who were hospitalised to undergo radical resection, were studied. Polymerase chain reaction and direct sequencing were performed to detect the sequence variations in the mtDNA D-loop region. Multivariate and univariate analyses were conducted to determine important factors in the prognosis of HCC. A total of 150 point sequence variations were observed in the 140 cases (13 point mutations, 8 insertions, 20 deletions and 116 polymorphisms). The variation rate was 13.4% (150/1, 122). mtDNA nucleotide 150 (C/T) was an independent factor in the logistic regression for early/late recurrence of HCC. Patients with 150T appeared to have later recurrences. In a Cox proportional hazards regression model, hepatitis B virus DNA, Child-Pugh class, differentiation degree, tumour-node-metastasis (TNM) stage, nucleotide 16263 (T/C) and nucleotide 315 (N/insertion C) were independent factors for tumour-free survival time. Patients with the 16263T allele had a greater tumour-free survival time than patients with the 16263C allele. Similarly, patients with 315 insertion C had a superior tumour-free survival time when compared with patients with 315 N (normal). In the Cox proportional hazards regression model, recurrence type (early/late), Child-Pugh class, TNM stage and adjuvant treatment after tumour recurrence (none or one/more than one treatment) were independent factors for overall survival. None of the mtDNA variations served as independent factors. Patients with late recurrence, Child-Pugh class A, and low TNM stages and/or those who received more than one adjuvant treatment

  10. Purifying selection, sequence composition, and context-specific indel mutations shape intraspecific variation in a bacterial endosymbiont.

    PubMed

    Williams, Laura E; Wernegreen, Jennifer J

    2012-01-01

    Comparative genomics of closely related bacterial strains can clarify mutational processes and selective forces that impact genetic variation. Among primary bacterial endosymbionts of insects, such analyses have revealed ongoing genome reduction, raising questions about the ultimate evolutionary fate of these partnerships. Here, we explored genomic variation within Blochmannia vafer, an obligate mutualist of the ant Camponotus vafer. Polymorphism analysis of the Illumina data set used previously for de novo assembly revealed a second Bl. vafer genotype. To determine why a single ant colony contained two symbiont genotypes, we examined polymorphisms in 12 C. vafer mitochondrial sequences assembled from the Illumina data; the spectrum of variants suggests that the colony contained two maternal lineages, each harboring a distinct Bl. vafer genotype. Comparing the two Bl. vafer genotypes revealed that purifying selection purged most indels and nonsynonymous differences from protein-coding genes. We also discovered that indels occur frequently in multimeric simple sequence repeats, which are relatively abundant in Bl. vafer and may play a more substantial role in generating variation in this ant mutualist than in the aphid endosymbiont Buchnera. Finally, we explored how an apparent relocation of the origin of replication in Bl. vafer and the resulting shift in strand-associated mutational pressures may have caused accelerated gene loss and an elevated rate of indel polymorphisms in the region spanning the origin relocation. Combined, these results point to significant impacts of purifying selection on genomic polymorphisms as well as distinct patterns of indels associated with unusual genomic features of Blochmannia. PMID:22117087

  11. Analysis of genetic variation within clonal lineages of grape phylloxera (Daktulosphaira vitifoliae Fitch) using AFLP fingerprinting and DNA sequencing.

    PubMed

    Vorwerk, S; Forneck, A

    2007-07-01

    Two AFLP fingerprinting methods were employed to estimate the potential of AFLP fingerprints for the detection of genetic diversity within single founder lineages of grape phylloxera (Daktulosphaira vitifoliae Fitch). Eight clonal lineages, reared under controlled conditions in a greenhouse and reproducing asexually throughout a minimum of 15 generations, were monitored and mutations were scored as polymorphisms between the founder individual and individuals of succeeding generations. Genetic variation was detected within all lineages, from early generations on. Six to 15 polymorphic loci (from a total of 141 loci) were detected within the lineages, making up 4.3% of the total amount of genetic variation. The presence of contaminating extra-genomic sequences (e.g., viral material, bacteria, or ingested chloroplast DNA) was excluded as a source of intraclonal variation. Sequencing of 37 selected polymorphic bands confirmed their origin in mostly noncoding regions of the grape phylloxera genome. AFLP techniques were revealed to be powerful for the identification of reproducible banding patterns within clonal lineages. PMID:17893744

  12. Maintenance of Sperm Variation in a Highly Promiscuous Wild Bird

    PubMed Central

    Calhim, Sara; Double, Michael C.; Margraf, Nicolas; Birkhead, Tim R.; Cockburn, Andrew

    2011-01-01

    Postcopulatory sexual selection is an important force in the evolution of reproductive traits, including sperm morphology. In birds, sperm morphology is known to be highly heritable and largely condition-independent. Theory predicts, and recent comparative work corroborates, that strong selection in such traits reduces intraspecific phenotypic variation. Here we show that some variation can be maintained despite extreme promiscuity, as a result of opposing, copulation-role-specific selection forces. After controlling for known correlates of siring success in the superb fairy-wren (Malurus cyaneus), we found that (a) lifetime extra-pair paternity success was associated with sperm with a shorter flagellum and relatively large head, and (b) males whose sperm had a longer flagellum and a relatively smaller head achieved higher within-pair paternity. In this species extrapair copulations occur in the same morning, but preceding, pair copulations during a female's fertile period, suggesting that shorter and relatively larger-headed sperm are most successful in securing storage (defense), whereas the opposite phenotype might be better at outcompeting stored sperm (offense). Furthermore, since cuckolding ability is a major contributor to differential male reproductive output, stronger selection on defense sperm competition traits might explain the short sperm of malurids relative to other promiscuous passerines. PMID:22194918

  13. Estimation of Response Functions Based on Variational Bayes Algorithm in Dynamic Images Sequences

    PubMed Central

    2016-01-01

    We proposed a nonparametric Bayesian model based on variational Bayes algorithm to estimate the response functions in dynamic medical imaging. In dynamic renal scintigraphy, the impulse response or retention functions are rather complicated and finding a suitable parametric form is problematic. In this paper, we estimated the response functions using nonparametric Bayesian priors. These priors were designed to favor desirable properties of the functions, such as sparsity or smoothness. These assumptions were used within hierarchical priors of the variational Bayes algorithm. We performed our algorithm on the real online dataset of dynamic renal scintigraphy. The results demonstrated that this algorithm improved the estimation of response functions with nonparametric priors.

  14. In silico structure-function analysis of pathological variation in the HSD11B2 gene sequence.

    PubMed

    Manning, Jonathan R; Bailey, Matthew A; Soares, Dinesh C; Dunbar, Donald R; Mullins, John J

    2010-08-01

    11beta-Hydroxysteroid dehydrogenase type 2 (11betaHSD2) is a short-chain dehydrogenase/reductase (SDR) responsible for inactivating cortisol and preventing its binding to the mineralocorticoid receptor (MR). Nonfunctional mutations in HSD11B2, the gene encoding 11betaHSD2, cause the hypertensive syndrome of apparent mineralocorticoid excess (AME). Like other such Mendelian disorders, AME is rare but has nevertheless helped to illuminate principles fundamental to the regulation of blood pressure. Furthermore, polymorphisms in HSD11B2 have been associated with salt sensitivity, a major risk factor for cardiovascular mortality. It is therefore highly likely that sequence variation in HSD11B2, having subtle functional ramifications, will affect blood pressure in the wider population. In this study, a three-dimensional homology model of 11betaHSD2 was created and used to hypothesize the functional consequences in terms of protein structure of published mutations in HSD11B2. This approach underscored the strong genotype-phenotype correlation of AME: severe forms of the disease, associated with little in vivo enzyme activity, arise from mutations occurring in invariant alignment positions. These were predicted to exert gross structural changes in the protein. In contrast, those mutations causing a mild clinical phenotype were in less conserved regions of the protein that were predicted to be relatively more tolerant to substitution. Finally, a number of pathogenic mutations are shown to be associated with regions predicted to participate in dimer formation, and in protein stabilization, which may therefore suggest molecular mechanisms of disease. PMID:20571110

  15. In silico structure-function analysis of pathological variation in the HSD11B2 gene sequence

    PubMed Central

    Bailey, Matthew A.; Soares, Dinesh C.; Dunbar, Donald R.; Mullins, John J.

    2010-01-01

    11β-Hydroxysteroid dehydrogenase type 2 (11βHSD2) is a short-chain dehydrogenase/reductase (SDR) responsible for inactivating cortisol and preventing its binding to the mineralocorticoid receptor (MR). Nonfunctional mutations in HSD11B2, the gene encoding 11βHSD2, cause the hypertensive syndrome of apparent mineralocorticoid excess (AME). Like other such Mendelian disorders, AME is rare but has nevertheless helped to illuminate principles fundamental to the regulation of blood pressure. Furthermore, polymorphisms in HSD11B2 have been associated with salt sensitivity, a major risk factor for cardiovascular mortality. It is therefore highly likely that sequence variation in HSD11B2, having subtle functional ramifications, will affect blood pressure in the wider population. In this study, a three-dimensional homology model of 11βHSD2 was created and used to hypothesize the functional consequences in terms of protein structure of published mutations in HSD11B2. This approach underscored the strong genotype-phenotype correlation of AME: severe forms of the disease, associated with little in vivo enzyme activity, arise from mutations occurring in invariant alignment positions. These were predicted to exert gross structural changes in the protein. In contrast, those mutations causing a mild clinical phenotype were in less conserved regions of the protein that were predicted to be relatively more tolerant to substitution. Finally, a number of pathogenic mutations are shown to be associated with regions predicted to participate in dimer formation, and in protein stabilization, which may therefore suggest molecular mechanisms of disease. PMID:20571110

  16. Bovine herpesvirus-1: comparison and differentiation of vaccine and field strains based on genomic sequence variation.

    PubMed

    Fulton, R W; d'Offay, J M; Eberle, R

    2013-03-01

    Bovine herpesvirus-1 (BoHV-1) causes significant disease in cattle including respiratory, fetal diseases, and reproductive tract infections. Control programs usually include vaccination with a modified live viral (MLV) vaccine. On occasion BoHV-1 strains are isolated from diseased animals or fetuses postvaccination. Currently there are no markers for differentiating MLV strains from field strains of BoHV-1. In this study several BoHV-1 strains were sequenced using whole-genome sequencing technologies and the data analyzed to identify single nucleotide polymorphisms (SNPs). Strains sequenced included the reference BoHV-1 Cooper strain (GenBank Accession JX898220), eight commercial MLV vaccine strains, and 14 field strains from cases presented for diagnosis. Based on SNP analyses, the viruses could be classified into groups having similar SNP patterns. The eight MLV strains could be differentiated from one another although some were closely related to each other. A number of field strains isolated from animals with a history of prior vaccination had SNP patterns similar to specific MLV viruses, while other field isolates were very distinct from all vaccine strains. The results indicate that some BoHV-1 isolates from clinically ill cattle/fetuses can be associated with a prior MLV vaccination history, but more information is needed on the rate of BoHV-1 genome sequence change before irrefutable associations can be drawn. PMID:23333211

  17. A pedigree-based study of mitochondrial D-loop DNA sequence variation among Arabian horses.

    PubMed

    Bowling, A T; Del Valle, A; Bowling, M

    2000-02-01

    Through DNA sequence comparisons of a mitochondrial D-loop hypervariable region, we investigated matrilineal diversity for Arabian horses in the United States. Sixty-two horses were tested. From published pedigrees they traced in the maternal line to 34 mares acquired primarily in the mid to late 19th century from nomadic Bedouin tribes. Compared with the reference sequence (GenBank X79547), these samples showed 27 haplotypes with altogether 31 base substitution sites within 397 bp of sequence. Based on examination of pedigrees from a random sampling of 200 horses in current studbooks of the Arabian Horse Registry of America, we estimated that this study defined the expected mtDNA haplotypes for at least 89% of Arabian horses registered in the US. The reliability of the studbook recorded maternal lineages of Arabian pedigrees was demonstrated by haplotype concordance among multiple samplings in 14 lines. Single base differences observed within two maternal lines were interpreted as representing alternative fixations of past heteroplasmy. The study also demonstrated the utility of mtDNA sequence studies to resolve historical maternity questions without access to biological material from the horses whose relationship was in question, provided that representatives of the relevant female lines were available for comparison. The data call into question the traditional assumption that Arabian horses of the same strain necessarily share a common maternal ancestry. PMID:10690354

  18. Distribution and sequence variations of selected virulence genes among group A streptococcal isolates from western Norway.

    PubMed

    Mylvaganam, H; Bjorvatn, B; Osland, A

    2000-11-01

    In order to compare the distribution of selected virulence genes among group A streptococci recovered from invasive disease and superficial infections, 42 isolates were screened for mga, speB, speA, ssa and ska, by PCR. The isolates were predominantly of the sequence types emm1, emm3 and emm6, but also included a few of the types emm22, emm28, emm75 and emm78. The phage-mediated speA seemed to be prevalent in emm types 1 and 3, and its distribution was not related to disease severity. The other genes were present in all isolates. The mga, speB and speA were further studied by sequence analysis. Although allotypic associations with invasiveness were not found, allelic specificity to the emm sequence type was observed. In addition, the mga sequences indicated two lineages, related to opacity factor production. A possible recombination between these two main divergent mga genes was observed in isolates of the types emm22 and emm75. A logical nomenclature of the alleles of mga and speB is suggested. PMID:11211972

  19. The use of museum specimens with high-throughput DNA sequencers

    PubMed Central

    Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.

    2015-01-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801

  20. High-speed lossless compression for angiography image sequences

    NASA Astrophysics Data System (ADS)

    Kennedy, Jonathon M.; Simms, Michael; Kearney, Emma; Dowling, Anita; Fagan, Andrew; O'Hare, Neil J.

    2001-05-01

    High speed processing of large amounts of data is a requirement for many diagnostic quality medical imaging applications. A demanding example is the acquisition, storage and display of image sequences in angiography. The functional performance requirements for handling angiography data were identified. A new lossless image compression algorithm was developed, implemented in C++ for the Intel Pentium/MS-Windows environment and optimized for speed of operation. Speeds of up to 6M pixels per second for compression and 12M pixels per second for decompression were measured. This represents an improvement of up to 400% over the next best high-performance algorithm (LOCO-I) without significant reduction in compression ratio. Performance tests were carried out at St. James's Hospital using actual angiography data. Results were compared with the lossless JPEG standard and other leading methods such as JPEG-LS (LOCO-I) and the lossless wavelet approach proposed for JPEG 2000. Our new algorithm represents a significant improvement in the performance of lossless image compression technology without using specialized hardware. It has been applied successfully to image sequence decompression at video rate for angiography, one of the most challenging application areas in medical imaging.

  1. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing

    PubMed Central

    Lazarevic, Vladimir; Whiteson, Katrine; Huse, Susan; Hernandez, David; Farinelli, Laurent; Østerås, Magne; Schrenzel, Jacques; François, Patrice

    2013-01-01

    To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454 pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable identification of bacterial sequences available in public databases such as the Human Oral Microbiome Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an ~82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of samples, and to assess that at least 30,000 additional reads would be required to identify only one additional phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements in read length enable the utilization of different platforms for studying communities of complex flora. Access to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost. PMID:19796657

  2. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway. PMID:26025428

  3. High-throughput sequencing of immune repertoires in multiple sclerosis.

    PubMed

    Lossius, Andreas; Johansen, Jorunn N; Vartdal, Frode; Holmøy, Trygve

    2016-04-01

    T cells and B cells are crucial in the initiation and maintenance of multiple sclerosis (MS), and the activation of these cells is believed to be mediated through specific recognition of antigens by the T- and B-cell receptors. The antigen receptors are highly polymorphic due to recombination (T- and B-cell receptors) and mutation (B-cell receptors) of the encoding genes, which can therefore be used as fingerprints to track individual T- and B-cell clones. Such studies can shed light on mechanisms driving the immune responses and provide new insights into the pathogenesis. Here, we summarize studies that have explored the T- and B-cell receptor repertoires using earlier methodological approaches, and we focus on how high-throughput sequencing has provided new knowledge by surveying the immune repertoires in MS in even greater detail and with unprecedented depth. PMID:27081660

  4. Detection of sequence variation in parasite ribosomal DNA by electrophoresis in agarose gels supplemented with a DNA-intercalating agent.

    PubMed

    Zhu, X Q; Chilton, N B; Gasser, R B

    1998-05-01

    This study evaluated the use of a commercially available DNA intercalating agent (Resolver Gold) in agarose gels for the direct detection of sequence variation in ribosomal DNA (rDNA). This agent binds preferentially to AT sequence motifs in DNA. Regions of nuclear rDNA, known to provide genetic markers for the identification of species of parasitic ascarid nematodes (order Ascaridida), were amplified by polymerase chain reaction (PCR) and subjected to electrophoresis in standard agarose gels versus gels supplemented with Resolver Gold. Individual taxa examined could not be distinguished reliably based on the size of their amplicons in standard agarose gels, whereas they could be readily delineated based on mobility using Resolver Gold-supplemented gels. The latter was achieved because of differences (approximately 0.1-8.2%) in the AT content of the fragments among different taxa, which were associated with significant interspecific differences (approximately 11-39%) in the rDNA sequences employed. There was a tendency for fragments with higher AT content to migrate slower in supplemented agarose gels compared with those of lower AT content. The results indicate the usefulness of this electrophoretic approach to rapidly screen for sequence variability within or among PCR-amplified rDNA fragments of similar sizes but differing AT contents. Although evaluated on rDNA of parasites, the approach has potential to be applied to a range of genes of different groups of infectious organisms. PMID:9629896

  5. Effect of Sequence Variation on the Mechanical Response of Amyloid Fibrils Probed by Steered Molecular Dynamics Simulation

    PubMed Central

    Ndlovu, Hlengisizwe; Ashcroft, Alison E.; Radford, Sheena E.; Harris, Sarah A.

    2012-01-01

    The mechanical failure of mature amyloid fibers produces fragments that act as seeds for the growth of new fibrils. Fragmentation may also be correlated with cytotoxicity. We have used steered atomistic molecular dynamics simulations to study the mechanical failure of fibrils formed by the amyloidogenic fragment of human amylin hIAPP20-29 subjected to force applied in a variety of directions. By introducing systematic variations to this peptide sequence in silico, we have also investigated the role of the amino-acid sequence in determining the mechanical stability of amyloid fibrils. Our calculations show that the force required to induce mechanical failure depends on the direction of the applied stress and upon the degree of structural order present in the β-sheet assemblies, which in turn depends on the peptide sequence. The results have implications for the importance of sequence-dependent mechanical properties on seeding the growth of new fibrils and the role of breakage events in cytotoxicity. PMID:22325282

  6. Identification of eight mutations and three sequence variations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene

    SciTech Connect

    Ghanem, N.; Costes, B.; Girodon, E.; Martin, J.; Fanen, P.; Goossens, M. )

    1994-05-15

    To determine cystic fibrosis (CF) defects in a sample of 224 non-[Delta]F508 CF chromosomes, the authors used denaturing gradient gel multiplex analysis of CF transmembrane conductance regulator gene segments, a strategy based on blind exhaustive analysis rather than a search for known mutations. This process allowed detection of 11 novel variations comprising two nonsense mutations (Q890X and W1204X), a splice defect (405 + 4 A [yields] G), a frameshift (3293delA), four presumed missense mutations (S912L, H949Y, L1065P, Q1071P), and three sequence polymorphisms (R31C or 223 C/T, 3471 T/C, and T1220I or 3791 C/T). The authors describe these variations, together with the associated phenotype when defects on both CF chromosomes were identified. 8 refs., 1 fig., 1 tab.

  7. Lack of sequence variation in sporadic bovine leucosis in regions of tumour suppressor genes p53 and p16.

    PubMed

    Mayr, B; Grüneis, C; Brem, G; Reifinger, M; Schaffner, G; Hochsteiner, W

    2001-08-01

    Regions of the promoter and exons 5-8 of the tumour suppressor gene p53 were analysed in 25 cases of sporadic bovine leucosis. The study included 17 cases of juvenile leucosis, five cases of adult leucosis and three cases of skin leucosis. Exon 2 of tumour suppressor gene p16 was also investigated in the same samples. No sequence variations were present in the analysed areas of the genes. In p53, this fact represents a clear difference in comparison with enzootic bovine leucosis. In p16, no comparative data are available. PMID:11554494

  8. The non-obese diabetic mouse sequence, annotation and variation resource: an aid for investigating type 1 diabetes

    PubMed Central

    Steward, Charles A.; Gonzalez, Jose M.; Trevanion, Steve; Sheppard, Dan; Kerry, Giselle; Gilbert, James G. R.; Wicker, Linda S.; Rogers, Jane; Harrow, Jennifer L.

    2013-01-01

    Model organisms are becoming increasingly important for the study of complex diseases such as type 1 diabetes (T1D). The non-obese diabetic (NOD) mouse is an experimental model for T1D having been bred to develop the disease spontaneously in a process that is similar to humans. Genetic analysis of the NOD mouse has identified around 50 disease loci, which have the nomenclature Idd for insulin-dependent diabetes, distributed across at least 11 different chromosomes. In total, 21 Idd regions across 6 chromosomes, that are major contributors to T1D susceptibility or resistance, were selected for finished sequencing and annotation at the Wellcome Trust Sanger Institute. Here we describe the generation of 40.4 mega base-pairs of finished sequence from 289 bacterial artificial chromosomes for the NOD mouse. Manual annotation has identified 738 genes in the diabetes sensitive NOD mouse and 765 genes in homologous regions of the diabetes resistant C57BL/6J reference mouse across 19 candidate Idd regions. This has allowed us to call variation consequences between homologous exonic sequences for all annotated regions in the two mouse strains. We demonstrate the importance of this resource further by illustrating the technical difficulties that regions of inter-strain structural variation between the NOD mouse and the C57BL/6J reference mouse can cause for current next generation sequencing and assembly techniques. Furthermore, we have established that the variation rate in the Idd regions is 2.3 times higher than the mean found for the whole genome assembly for the NOD/ShiLtJ genome, which we suggest reflects the fact that positive selection for functional variation in immune genes is beneficial in regard to host defence. In summary, we provide an important resource, which aids the analysis of potential causative genes involved in T1D susceptibility. Database URLs: http://www.sanger.ac.uk/resources/mouse/nod/; http://vega

  9. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  10. First High-Quality Draft Genome Sequence of Pasteurella multocida Sequence Type 128 Isolated from Infected Bone.

    PubMed

    Kavousi, Niloofar; Eng, Wilhelm Wei Han; Lee, Yin Peng; Tan, Lian Huat; Thuraisingham, Ravindran; Yule, Catherine M; Gan, Han Ming

    2016-01-01

    We report here the first high-quality draft genome sequence of Pasteurella multocida sequence type 128, which was isolated from the infected finger bone of an adult female who was bitten by a domestic dog. The draft genome will be a valuable addition to the scarce genomic resources available for P. multocida. PMID:26941132

  11. Patterns of structural and sequence variation within isotype lineages of the Neisseria meningitidis transferrin receptor system

    PubMed Central

    Adamiak, Paul; Calmettes, Charles; Moraes, Trevor F; Schryvers, Anthony B

    2015-01-01

    Neisseria meningitidis inhabits the human upper respiratory tract and is an important cause of sepsis and meningitis. A surface receptor comprised of transferrin-binding proteins A and B (TbpA and TbpB), is responsible for acquiring iron from host transferrin. Sequence and immunological diversity divides TbpBs into two distinct lineages; isotype I and isotype II. Two representative isotype I and II strains, B16B6 and M982, differ in their dependence on TbpB for in vitro growth on exogenous transferrin. The crystal structure of TbpB and a structural model for TbpA from the representative isotype I N. meningitidis strain B16B6 were obtained. The structures were integrated with a comprehensive analysis of the sequence diversity of these proteins to probe for potential functional differences. A distinct isotype I TbpA was identified that co-varied with TbpB and lacked sequence in the region for the loop 3 α-helix that is proposed to be involved in iron removal from transferrin. The tightly associated isotype I TbpBs had a distinct anchor peptide region, a distinct, smaller linker region between the lobes and lacked the large loops in the isotype II C-lobe. Sequences of the intact TbpB, the TbpB N-lobe, the TbpB C-lobe, and TbpA were subjected to phylogenetic analyses. The phylogenetic clustering of TbpA and the TbpB C-lobe were similar with two main branches comprising the isotype 1 and isotype 2 TbpBs, possibly suggesting an association between TbpA and the TbpB C-lobe. The intact TbpB and TbpB N-lobe had 4 main branches, one consisting of the isotype 1 TbpBs. One isotype 2 TbpB cluster appeared to consist of isotype 1 N-lobe sequences and isotype 2 C-lobe sequences, indicating the swapping of N-lobes and C-lobes. Our findings should inform future studies on the interaction between TbpB and TbpA and the process of iron acquisition. PMID:25800619

  12. High-resolution heteronuclear multi-dimensional NMR spectroscopy in magnetic fields with unknown spatial variations

    NASA Astrophysics Data System (ADS)

    Zhang, Zhiyong; Huang, Yuqing; Smith, Pieter E. S.; Wang, Kaiyu; Cai, Shuhui; Chen, Zhong

    2014-05-01

    Heteronuclear NMR spectroscopy is an extremely powerful tool for determining the structures of organic molecules and is of particular significance in the structural analysis of proteins. In order to leverage the method’s potential for structural investigations, obtaining high-resolution NMR spectra is essential and this is generally accomplished by using very homogeneous magnetic fields. However, there are several situations where magnetic field distortions and thus line broadening is unavoidable, for example, the samples under investigation may be inherently heterogeneous, and the magnet’s homogeneity may be poor. This line broadening can hinder resonance assignment or even render it impossible. We put forth a new class of pulse sequences for obtaining high-resolution heteronuclear spectra in magnetic fields with unknown spatial variations based on distant dipolar field modulations. This strategy’s capabilities are demonstrated with the acquisition of high-resolution 2D gHSQC and gHMBC spectra. These sequences’ performances are evaluated on the basis of their sensitivities and acquisition efficiencies. Moreover, we show that by encoding and decoding NMR observables spatially, as is done in ultrafast NMR, an extra dimension containing J-coupling information can be obtained without increasing the time necessary to acquire a heteronuclear correlation spectrum. Since the new sequences relax magnetic field homogeneity constraints imposed upon high-resolution NMR, they may be applied in portable NMR sensors and studies of heterogeneous chemical and biological materials.

  13. High Spatial Variation Tropospheric Model for GPS-Data Simulation

    NASA Astrophysics Data System (ADS)

    Farah, Ashraf; Moore, Terry; Hill, Chris J.

    2005-09-01

    Precise GPS simulated data requires accurate simulation of the two major sources of error in GPS measurements, namely the ionospheric and tropospheric delays. The ionospheric delay modelling has been handled in a previous work (Farah, 2002). In this paper the simulation of the tropospheric delay is discussed. The suggested model should be accurate in estimating the tropospheric delay as well as capable of simulating high spatial variations of the troposphere resulting in more realistic simulated GPS data. In this paper, the EGNOS tropospheric correction model is considered as a possible tool for simulating the tropospheric delay in order to obtain more realistic simulated GPS data. Comparing the total tropospheric zenith delays from the EGNOS model with the CODE-tropospheric product has allowed the quality of the EGNOS model to be assessed. Four IGS-tracking stations have been selected for this study. Data from four non-consecutive weeks in different seasons over a period of one year were tested to assess the seasonal variation of the weather conditions. It is shown that the EGNOS model agrees well with the CODE-estimations with a mean zenith delay difference of approximately 2 cm. The maximum zenith delay difference between the EGNOS model and the CODE-estimations was in the range of 5 cm to 16 cm, which agrees well with previous studies. A second study has investigated the behaviour of the EGNOS model with other established tropospheric models such as the Saastamoinen, the Hopfield, the Marini and the Magnet model for three IGS-stations. It can be concluded from this study that the EGNOS model shows better agreement with the IGS estimations than the Magnet model and compares well with other models. The major shortcoming in the EGNOS model is its inability to simulate the variations in the troposphere over small regions. This shortcoming could be overcome by using the theory of Gaussian Random Fields, which has been previously used to model real life phenomena

  14. A family of differentially amplified repetitive DNA sequences in the genus Beta reveals genetic variation in Beta vulgaris subspecies and cultivars.

    PubMed

    Kubis, S; Heslop-Harrison, J S; Schmidt, T

    1997-03-01

    Members of a highly abundant restriction satellite family have been isolated from the wild beet species Beta nana. The satellite DNA sequence is characterized by a conserved RsaI restriction site and is present in three of four sections of the genus Beta, namely Nanae, Corollinae, and Beta. It was not detected in species of the evolutionary old section Procumbentes, suggesting its amplification after separation of this section. Sequences of eight monomers were aligned revealing a size variation from 209 to 233 bp and an AT content ranging from 56.5% to 60.5%. The similarity between monomers in B. nana varied from 77.7% to 92.2%. Diverged subfamilies were identified by sequence analysis and Southern hybridization. A comparative study of this repetitive DNA element by fluorescent in situ hybridization and Southern analyses in three representative species was performed showing a variable genomic organization and heterogeneous localizations along metaphase chromosomes both within and between species. In B. nana the copy number of this satellite, with some 30,000 per haploid genome, is more than tenfold higher than in Beta lomatogona and up to 200 times higher than in Beta vulgaris, indicating different levels of sequence amplification during evolution in the genus Beta. In sugar beet (B. vulgaris), the large-scale organization of this tandem repeat was examined by pulsed-field gel electrophoresis. Southern hybridization to genomic DNA digested with DraI demonstrated that satellite arrays are located in AT-rich regions and the tandem repeat is a useful probe for the detection of genetic variation in closely related B. vulgaris cultivars, accessions, and subspecies. PMID:9060397

  15. Optical transitions in highly charged californium ions with high sensitivity to variation of the fine-structure constant.

    PubMed

    Berengut, J C; Dzuba, V A; Flambaum, V V; Ong, A

    2012-08-17

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf(16+) is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf(16+) has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters. PMID:23006353

  16. The Sequence of Learning Cycle Activities in High School Chemistry.

    ERIC Educational Resources Information Center

    Abraham, Michael R.; Renner, John W.

    1986-01-01

    Different learning cycle sequences were investigated to determine factors accounting for success of the cycle, compared learning with conventional instruction, and examined relationships between Piaget's theory and learning cycles. Results show that the normal learning cycle sequence is the optimum sequence for achievement of content knowledge in…

  17. Sequence analysis and identification of new variations in the coding sequence of melatonin receptor gene (MTNR1A) of Indian Chokla sheep breed

    PubMed Central

    Saxena, Vijay Kumar; Jha, Bipul Kumar; Meena, Amar Singh; Naqvi, S.M.K.

    2014-01-01

    Melatonin receptor 1A gene is the prime receptor mediating the effect of melatonin at the neuroendocrine level for control of seasonal reproduction in sheep. The aims of this study were to examine the polymorphism pattern of coding sequence of MTNR1A gene in Chokla sheep, a breed of Indian arid tract and to identify new variations in relation to its aseasonal status. Genomic DNAs of 101 Chokla sheep were collected and an 824 bp coding sequence of Exon II was amplified. RFLP was performed with enzyme RsaI and MnlI to assess the presence of polymorphism at position C606T and G612A, respectively. Genotyping revealed significantly higher frequency of M and R alleles than m and r alleles. RR and MM were found to be dominantly present in the group of studied population. Cloning and sequencing of Exon II followed by mutation/polymorphism analysis revealed ten mutations of which three were non-synonymous mutations (G706A, C893A, G931C). G706A leads to substitution of valine by isoleucine Val125I (U14109) in the fifth transmembrane domain. C893A leads to substitution of alanine by aspartic acid in the third extracellular loop. G931C mutation brings about substitution of amino acid alanine by proline in the seventh transmembrane helix, can affect the conformational stability of the molecule. Polyphen-2 analysis revealed that the polymorphism at position 931 is potentially damaging while the mutations at positions 706 and 893 were benign. It is concluded that G931C mutation of MTNR 1A gene, may explain, in part, the importance of melatonin structure integrity in influencing seasonality in sheep. PMID:25606429

  18. Advantages of Single-Molecule Real-Time Sequencing in High-GC Content Genomes

    PubMed Central

    Shin, Seung Chul; Ahn, Do Hwan; Kim, Su Jin; Lee, Hyoungseok; Oh, Tae-Jin; Lee, Jong Eun; Park, Hyun

    2013-01-01

    Next-generation sequencing has become the most widely used sequencing technology in genomics research, but it has inherent drawbacks when dealing with high-GC content genomes. Recently, single-molecule real-time sequencing technology (SMRT) was introduced as a third-generation sequencing strategy to compensate for this drawback. Here, we report that the unbiased and longer read length of SMRT sequencing markedly improved genome assembly with high GC content via gap filling and repeat resolution. PMID:23894349

  19. Characterizing immune repertoires by high throughput sequencing: strategies and applications

    PubMed Central

    Calis, Jorg J.A.; Rosenberg, Brad R.

    2014-01-01

    As the key cellular effectors of adaptive immunity, T and B lymphocytes utilize specialized receptors to recognize, respond to, and neutralize a diverse array of extrinsic threats. These receptors (immunoglobulins in B lymphocytes, T cell receptors in T lymphocytes) are incredibly variable, the products of specialized genetic diversification mechanisms that generate complex lymphocyte repertoires with extensive collections of antigen specificities. Recent advances in high throughput sequencing (HTS) technologies have transformed our ability to examine antigen receptor repertoires at single nucleotide, and more recently, single cell, resolution. Here we review current approaches to examining antigen receptor repertoires by HTS, and discuss inherent biological and technical challenges. We further describe emerging applications of this powerful methodology for exploring the adaptive immune system. PMID:25306219

  20. Genetic variation in Labeo fimbriatus (Cypriniformes: Cyprinidae) populations as revealed by partial cytochrome b sequences of mitochondrial DNA.

    PubMed

    Swain, Subrat Kumar; Bej, Dillip; Das, Sofia Priyadarsani; Sahoo, Lakshman; Jayasankar, Pallipuram; Das, Pratap Chandra; Das, Paramananda

    2016-05-01

    Labeo fimbriatus, a medium sized carp is assessed as a commercially important aquaculture species in Indian subcontinent. In the present study, the genetic diversity and population structure of four Indian riverine populations of L. fimbriatus have been evaluated using partial cytochrome b sequences of mitochondrial DNA. Sequencing and analysis of this gene from 108 individuals defined 7 distinct haplotypes. Haplotype diversity (Hd) and nucleotide diversity (π) ranged from 0.067 to 0.405 and 0.00023 to 0.03231, respectively. The Mahanadi population had the highest π level. Analysis of molecular variance (AMOVA) indicated that 47.36% of genetic variation contained within population and 53.76% of genetic variation among groups. Pairwise FST analysis indicated that there was little or no genetic differentiation among populations (-0.0018 to 04572) from different geographical regions except Mahanadi population. The Mahanadi population can be considered as a separate stock from rest three riverine populations. Accordingly, the genetic information generated from this study can be implemented while taking decision in formulating base population for the sustainable selective breeding programs of this species. PMID:25329277

  1. Sequence variations in the Boophilus microplus Bm86 locus and implications for immunoprotection in cattle vaccinated with this antigen.

    PubMed

    García-García, J C; Gonzalez, I L; González, D M; Valdés, M; Méndez, L; Lamberti, J; D'Agostino, B; Citroni, D; Fragoso, H; Ortiz, M; Rodríguez, M; de la Fuente, J

    1999-11-01

    Cattle tick infestations constitute a major problem for the cattle industry in tropical and subtropical regions of the world. Traditional control methods have been only partially successful, hampered by the selection of chemical-resistant tick populations. The Boophilus microplus Bm86 protein was isolated from tick gut epithelial cells and shown to induce a protective response against tick infestations in vaccinated cattle. Vaccine preparations including the recombinant Bm86 are used to control cattle tick infestations in the field as an alternative measure to reduce the losses produced by this ectoparasite. The principle for the immunological control of tick infestations relies on a polyclonal antibody response against the target antigen and, therefore, should be difficult to select for tick-resistant populations. However, sequence variations in the Bm86 locus, among other factors, could affect the effectiveness of Bm86-containing vaccines. In the present study we have addressed this issue, employing data obtained with B. microplus strains from Australia, Mexico, Cuba, Argentina and Venezuela. The results showed a tendency in the inverse correlation between the efficacy of the vaccination with Bm86 and the sequence variations in the Bm86 locus (R2 = 0.7). The mutation fixation index in the Bm86 locus was calculated and shown to be between 0.02 and 0.1 amino acids per year. Possible implications of these findings for the immunoprotection of cattle against tick infestations employing the Bm86 antigen are discussed. PMID:10668863

  2. Evolution of simple sequence repeat-mediated phase variation in bacterial genomes.

    PubMed

    Bayliss, Christopher D; Palmer, Michael E

    2012-09-01

    Mutability as mechanism for rapid adaptation to environmental challenge is an alluringly simple concept whose apotheosis is realized in simple sequence repeats (SSR). Bacterial genomes of several species contain SSRs with a proven role in adaptation to environmental fluctuations. SSRs are hypermutable and generate reversible mutations in localized regions of bacterial genomes, leading to phase variable ON/OFF switches in gene expression. The application of genetic, bioinformatic, and mathematical/computational modeling approaches are revolutionizing our current understanding of how genomic molecular forces and environmental factors influence SSR-mediated adaptation and led to evolution of this mechanism of localized hypermutation in bacterial genomes. PMID:22954215

  3. A haplotype method detects diverse scenarios of local adaptation from genomic sequence variation.

    PubMed

    Lange, Jeremy D; Pool, John E

    2016-07-01

    Identifying genomic targets of population-specific positive selection is a major goal in several areas of basic and applied biology. However, it is unclear how often such selection should act on new mutations versus standing genetic variation or recurrent mutation, and furthermore, favoured alleles may either become fixed or remain variable in the population. Very few population genetic statistics are sensitive to all of these modes of selection. Here, we introduce and evaluate the Comparative Haplotype Identity statistic (χMD ), which assesses whether pairwise haplotype sharing at a locus in one population is unusually large compared with another population, relative to genomewide trends. Using simulations that emulate human and Drosophila genetic variation, we find that χMD is sensitive to a wide range of selection scenarios, and for some very challenging cases (e.g. partial soft sweeps), it outperforms other two-population statistics. We also find that, as with FST , our haplotype approach has the ability to detect surprisingly ancient selective sweeps. Particularly for the scenarios resembling human variation, we find that χMD outperforms other frequency- and haplotype-based statistics for soft and/or partial selective sweeps. Applying χMD and other between-population statistics to published population genomic data from D. melanogaster, we find both shared and unique genes and functional categories identified by each statistic. The broad utility and computational simplicity of χMD will make it an especially valuable tool in the search for genes targeted by local adaptation. PMID:27135633

  4. Seasonal diversity and dynamics of haptophytes in the Skagerrak, Norway, explored by high-throughput sequencing.

    PubMed

    Egge, Elianne Sirnaes; Johannessen, Torill Vik; Andersen, Tom; Eikrem, Wenche; Bittner, Lucie; Larsen, Aud; Sandaa, Ruth-Anne; Edvardsen, Bente

    2015-06-01

    Microalgae in the division Haptophyta play key roles in the marine ecosystem and in global biogeochemical processes. Despite their ecological importance, knowledge on seasonal dynamics, community composition and abundance at the species level is limited due to their small cell size and few morphological features visible under the light microscope. Here, we present unique data on haptophyte seasonal diversity and dynamics from two annual cycles, with the taxonomic resolution and sampling depth obtained with high-throughput sequencing. From outer Oslofjorden, S Norway, nano- and picoplanktonic samples were collected monthly for 2 years, and the haptophytes targeted by amplification of RNA/cDNA with Haptophyta-specific 18S rDNA V4 primers. We obtained 156 operational taxonomic units (OTUs), from c. 400.000 454 pyrosequencing reads, after rigorous bioinformatic filtering and clustering at 99.5%. Most OTUs represented uncultured and/or not yet 18S rDNA-sequenced species. Haptophyte OTU richness and community composition exhibited high temporal variation and significant yearly periodicity. Richness was highest in September-October (autumn) and lowest in April-May (spring). Some taxa were detected all year, such as Chrysochromulina simplex, Emiliania huxleyi and Phaeocystis cordata, whereas most calcifying coccolithophores only appeared from summer to early winter. We also revealed the seasonal dynamics of OTUs representing putative novel classes (clades HAP-3-5) or orders (clades D, E, F). Season, light and temperature accounted for 29% of the variation in OTU composition. Residual variation may be related to biotic factors, such as competition and viral infection. This study provides new, in-depth knowledge on seasonal diversity and dynamics of haptophytes in North Atlantic coastal waters. PMID:25893259

  5. Seasonal diversity and dynamics of haptophytes in the Skagerrak, Norway, explored by high-throughput sequencing

    PubMed Central

    Egge, Elianne Sirnæs; Johannessen, Torill Vik; Andersen, Tom; Eikrem, Wenche; Bittner, Lucie; Larsen, Aud; Sandaa, Ruth-Anne; Edvardsen, Bente

    2015-01-01

    Microalgae in the division Haptophyta play key roles in the marine ecosystem and in global biogeochemical processes. Despite their ecological importance, knowledge on seasonal dynamics, community composition and abundance at the species level is limited due to their small cell size and few morphological features visible under the light microscope. Here, we present unique data on haptophyte seasonal diversity and dynamics from two annual cycles, with the taxonomic resolution and sampling depth obtained with high-throughput sequencing. From outer Oslofjorden, S Norway, nano- and picoplanktonic samples were collected monthly for 2 years, and the haptophytes targeted by amplification of RNA/cDNA with Haptophyta-specific 18S rDNA V4 primers. We obtained 156 operational taxonomic units (OTUs), from c. 400.000 454 pyrosequencing reads, after rigorous bioinformatic filtering and clustering at 99.5%. Most OTUs represented uncultured and/or not yet 18S rDNA-sequenced species. Haptophyte OTU richness and community composition exhibited high temporal variation and significant yearly periodicity. Richness was highest in September–October (autumn) and lowest in April–May (spring). Some taxa were detected all year, such as Chrysochromulina simplex, Emiliania huxleyi and Phaeocystis cordata, whereas most calcifying coccolithophores only appeared from summer to early winter. We also revealed the seasonal dynamics of OTUs representing putative novel classes (clades HAP-3–5) or orders (clades D, E, F). Season, light and temperature accounted for 29% of the variation in OTU composition. Residual variation may be related to biotic factors, such as competition and viral infection. This study provides new, in-depth knowledge on seasonal diversity and dynamics of haptophytes in North Atlantic coastal waters. PMID:25893259

  6. Complete Sequence Construction of the Highly Repetitive Ribosomal RNA Gene Repeats in Eukaryotes Using Whole Genome Sequence Data.

    PubMed

    Agrawal, Saumya; Ganley, Austen R D

    2016-01-01

    The ribosomal RNA genes (rDNA) encode the major rRNA species of the ribosome, and thus are essential across life. These genes are highly repetitive in most eukaryotes, forming blocks of tandem repeats that form the core of nucleoli. The primary role of the rDNA in encoding rRNA has been long understood, but more recently the rDNA has been implicated in a number of other important biological phenomena, including genome stability, cell cycle, and epigenetic silencing. Noncoding elements, primarily located in the intergenic spacer region, appear to mediate many of these phenomena. Although sequence information is available for the genomes of many organisms, in almost all cases rDNA repeat sequences are lacking, primarily due to problems in assembling these intriguing regions during whole genome assemblies. Here, we present a method to obtain complete rDNA repeat unit sequences from whole genome assemblies. Limitations of next generation sequencing (NGS) data make them unsuitable for assembling complete rDNA unit sequences; therefore, the method we present relies on the use of Sanger whole genome sequence data. Our method makes use of the Arachne assembler, which can assemble highly repetitive regions such as the rDNA in a memory-efficient way. We provide a detailed step-by-step protocol for generating rDNA sequences from whole genome Sanger sequence data using Arachne, for refining complete rDNA unit sequences, and for validating the sequences obtained. In principle, our method will work for any species where the rDNA is organized into tandem repeats. This will help researchers working on species without a complete rDNA sequence, those working on evolutionary aspects of the rDNA, and those interested in conducting phylogenetic footprinting studies with the rDNA. PMID:27576718

  7. Haplogroup Classification of Korean Cattle Breeds Based on Sequence Variations of mtDNA Control Region

    PubMed Central

    Kim, Jae-Hwan; Lee, Seong-Su; Kim, Seung Chang; Choi, Seong-Bok; Kim, Su-Hyun; Lee, Chang Woo; Jung, Kyoung-Sub; Kim, Eun Sung; Choi, Young-Sun; Kim, Sung-Bok; Kim, Woo Hyun; Cho, Chang-Yeon

    2016-01-01

    Many studies have reported the frequency and distribution of haplogroups among various cattle breeds for verification of their origins and genetic diversity. In this study, 318 complete sequences of the mtDNA control region from four Korean cattle breeds were used for haplogroup classification. 71 polymorphic sites and 66 haplotypes were found in these sequences. Consistent with the genetic patterns in previous reports, four haplogroups (T1, T2, T3, and T4) were identified in Korean cattle breeds. In addition, T1a, T3a, and T3b sub-haplogroups were classified. In the phylogenetic tree, each haplogroup formed an independent cluster. The frequencies of T3, T4, T1 (containing T1a), and T2 were 66%, 16%, 10%, and 8%, respectively. Especially, the T1 haplogroup contained only one haplotype and a sample. All four haplogroups were found in Chikso, Jeju black and Hanwoo. However, only the T3 and T4 haplogroups appeared in Heugu, and most Chikso populations showed a partial of four haplogroups. These results will be useful for stable conservation and efficient management of Korean cattle breeds. PMID:26954229

  8. Mitochondrial sequence variation in ancient horses from the Carpathian Basin and possible modern relatives.

    PubMed

    Priskin, K; Szabó, K; Tömöry, G; Bogácsi-Szabó, E; Csányi, B; Eördögh, R; Downes, C S; Raskó, I

    2010-02-01

    Movements of human populations leave their traces in the genetic makeup of the areas affected; the same applies to the horses that move with their owners This study is concerned with the mitochondrial control region genotypes of 31 archaeological horse remains, excavated from pre-conquest Avar and post-conquest Hungarian burial sites in the Carpathian Basin dating from the sixth to the tenth century. To investigate relationships to other ancient and recent breeds, modern Hucul and Akhal Teke samples were also collected, and mtDNA control region (CR) sequences from 76 breeds representing 921 individual specimens were combined with our sequence data. Phylogenetic relationships among horse mtDNA CR haplotypes were estimated using both genetic distance and the non-dichotomous network method. Both methods indicated a separation between horses of the Avars and the Hungarians. Our results show that the ethnic changes induced by the Hungarian Conquest were accompanied by a corresponding change in the stables of the Carpathian Basin. PMID:19789983

  9. Passing faces: sequence-dependent variations in the perceptual processing of emotional faces.

    PubMed

    Karl, Christian; Hewig, Johannes; Osinsky, Roman

    2016-10-01

    There is broad evidence that contextual factors influence the processing of emotional facial expressions. Yet temporal-dynamic aspects, inter alia how face processing is influenced by the specific order of neutral and emotional facial expressions, have been largely neglected. To shed light on this topic, we recorded electroencephalogram from 168 healthy participants while they performed a gender-discrimination task with angry and neutral faces. Our event-related potential (ERP) analyses revealed a strong emotional modulation of the N170 component, indicating that the basic visual encoding and emotional analysis of a facial stimulus happen, at least partially, in parallel. While the N170 and the late positive potential (LPP; 400-600 ms) were only modestly affected by the sequence of preceding faces, we observed a strong influence of face sequences on the early posterior negativity (EPN; 200-300 ms). Finally, the differing response patterns of the EPN and LPP indicate that these two ERPs represent distinct processes during face analysis: while the former seems to represent the integration of contextual information in the perception of a current face, the latter appears to represent the net emotional interpretation of a current face. PMID:26599470

  10. Haplogroup Classification of Korean Cattle Breeds Based on Sequence Variations of mtDNA Control Region.

    PubMed

    Kim, Jae-Hwan; Lee, Seong-Su; Kim, Seung Chang; Choi, Seong-Bok; Kim, Su-Hyun; Lee, Chang Woo; Jung, Kyoung-Sub; Kim, Eun Sung; Choi, Young-Sun; Kim, Sung-Bok; Kim, Woo Hyun; Cho, Chang-Yeon

    2016-05-01

    Many studies have reported the frequency and distribution of haplogroups among various cattle breeds for verification of their origins and genetic diversity. In this study, 318 complete sequences of the mtDNA control region from four Korean cattle breeds were used for haplogroup classification. 71 polymorphic sites and 66 haplotypes were found in these sequences. Consistent with the genetic patterns in previous reports, four haplogroups (T1, T2, T3, and T4) were identified in Korean cattle breeds. In addition, T1a, T3a, and T3b sub-haplogroups were classified. In the phylogenetic tree, each haplogroup formed an independent cluster. The frequencies of T3, T4, T1 (containing T1a), and T2 were 66%, 16%, 10%, and 8%, respectively. Especially, the T1 haplogroup contained only one haplotype and a sample. All four haplogroups were found in Chikso, Jeju black and Hanwoo. However, only the T3 and T4 haplogroups appeared in Heugu, and most Chikso populations showed a partial of four haplogroups. These results will be useful for stable conservation and efficient management of Korean cattle breeds. PMID:26954229

  11. An Analysis of Stimuli that Influence Compliance during the High-Probability Instruction Sequence

    ERIC Educational Resources Information Center

    Normand, Matthew P.; Kestner, Kathryn; Jessel, Joshua

    2010-01-01

    When we evaluated variables that influence the effectiveness of the high-probability (high-p) instruction sequence, the sequence was associated with a precipitous decrease in compliance with high-"p" instructions for 1 participant, thereby precluding continued use of the sequence. We investigated the reasons for this decrease. Stimuli associated…

  12. High Levels of Sample-to-Sample Variation Confound Data Analysis for Non-Invasive Prenatal Screening of Fetal Microdeletions.

    PubMed

    Chu, Tianjiao; Yeniterzi, Suveyda; Yatsenko, Svetlana A; Dunkel, Mary; Shaw, Patricia A; Bunce, Kimberly D; Peters, David G

    2016-01-01

    Our goal was to test the hypothesis that inter-individual genomic copy number variation in control samples is a confounding factor in the non-invasive prenatal detection of fetal microdeletions via the sequence-based analysis of maternal plasma DNA. The database of genomic variants (DGV) was used to determine the "Genomic Variants Frequency" (GVF) for each 50kb region in the human genome. Whole genome sequencing of fifteen karyotypically normal maternal plasma and six CVS DNA controls samples was performed. The coefficient of variation of relative read counts (cv.RTC) for these samples was determined for each 50kb region. Maternal plasma from two pregnancies affected with a chromosome 5p microdeletion was also sequenced, and analyzed using the GCREM algorithm. We found strong correlation between high variance in read counts and GVF amongst controls. Consequently we were unable to confirm the presence of the microdeletion via sequencing of maternal plasma samples obtained from two sequential affected pregnancies. Caution should be exercised when performing NIPT for microdeletions. It is vital to develop our understanding of the factors that impact the sensitivity and specificity of these approaches. In particular, benign copy number variation amongst controls is a major confounder, and their effects should be corrected bioinformatically. PMID:27249650

  13. High Levels of Sample-to-Sample Variation Confound Data Analysis for Non-Invasive Prenatal Screening of Fetal Microdeletions

    PubMed Central

    Chu, Tianjiao; Yeniterzi, Suveyda; Yatsenko, Svetlana A.; Dunkel, Mary; Shaw, Patricia A.; Bunce, Kimberly D.; Peters, David G.

    2016-01-01

    Our goal was to test the hypothesis that inter-individual genomic copy number variation in control samples is a confounding factor in the non-invasive prenatal detection of fetal microdeletions via the sequence-based analysis of maternal plasma DNA. The database of genomic variants (DGV) was used to determine the “Genomic Variants Frequency” (GVF) for each 50kb region in the human genome. Whole genome sequencing of fifteen karyotypically normal maternal plasma and six CVS DNA controls samples was performed. The coefficient of variation of relative read counts (cv.RTC) for these samples was determined for each 50kb region. Maternal plasma from two pregnancies affected with a chromosome 5p microdeletion was also sequenced, and analyzed using the GCREM algorithm. We found strong correlation between high variance in read counts and GVF amongst controls. Consequently we were unable to confirm the presence of the microdeletion via sequencing of maternal plasma samples obtained from two sequential affected pregnancies. Caution should be exercised when performing NIPT for microdeletions. It is vital to develop our understanding of the factors that impact the sensitivity and specificity of these approaches. In particular, benign copy number variation amongst controls is a major confounder, and their effects should be corrected bioinformatically. PMID:27249650

  14. Functional and genetic analysis of haplotypic sequence variation at the nicastrin genomic locus

    PubMed Central

    Hamilton, Gillian; Killick, Richard; Lambert, Jean-Charles; Amouyel, Philippe; Carrasquillo, Minerva M.; Pankratz, V. Shane; Graff-Radford, Neill R.; Dickson, Dennis W.; Petersen, Ronald C.; Younkin, Steven G.; Powell, John F.; Wade-Martins, Richard

    2013-01-01

    Nicastrin (NCSTN) is a component of the γ-secretase complex and therefore potentially a candidate risk gene for Alzheimer's disease. Here, we have developed a novel functional genomics methodology to express common locus haplotypes to assess functional differences. DNA recombination was used to engineer 5 bacterial artificial chromosomes (BACs) to each express a different haplotype of the NCSTN locus. Each NCSTN-BAC was delivered to knockout nicastrin (Ncstn−/−) cells and clonal NCSTN-BAC+/Ncstn−/− cell lines were created for functional analyses. We showed that all NCSTN-BAC haplotypes expressed nicastrin protein and rescued γ-secretase activity and amyloid beta (Aβ) production in NCSTN-BAC+/Ncstn−/− lines. We then showed that genetic variation at the NCSTN locus affected alternative splicing in human postmortem brain tissue. However, there was no robust functional difference between clonal cell lines rescued by each of the 5 different haplotypes. Finally, there was no statistically significant association of NCSTN with disease risk in the 4 cohorts. We therefore conclude that it is unlikely that common variation at the NCSTN locus is a risk factor for Alzheimer's disease. PMID:22405046

  15. Variations of a Y chromosome repeated sequence across subspecies of Mus musculus.

    PubMed

    Boursot, P; Bonhomme, F; Catalan, J; Moriwaki, K

    1989-12-01

    The complex species Mus musculus is widespread in Eurasia and consists of four parapatric genetical entities (subspecies) that have recently radiated. Two of them (M. m. domesticus and M. m. musculus) are known to interact through a narrow zone of hybridisation across which autosomal and mitochondrial exchanges are very limited and Y chromosome exchange is absent. We extend here the study of this group by the genetical analysis of 22 Asian strains of various origins (China, Korea, Japan, Taiwan, Philippines and Indonesia). A survey of protein variation at ten polymorphic loci confirmed that these animals belong to either the subspecies M. m. musculus (northern type in Asia, ranging westwards to Eastern Europe) or to M. m. castaneus (southern Asian type) and revealed a certain degree of intergradation between the two taxa. Y chromosome variations were assessed in these strains using a Y specific DNA probe representing part of a small multigene family and also in four M. m. domesticus (the Western European house mouse) strains of various origins and one M. m. bactrianus (from Pakistan). Musculus and castaneus were identically monomorphic for one type of organisation of this Y repeated family, while domesticus and bactrianus were very similar to each other, showing slightly different types of organisation. Introgression of a bactrianus Y chromosome into the territory of castaneus was found in Indonesia. The present distribution of the Y types among the four subspecies is not phylogenetically concordant with the known distributions of autosomal and mitochondrial variants.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:2575606

  16. emm and sof gene sequence variation in relation to serological typing of opacity-factor-positive group A streptococci.

    PubMed

    Beall, B; Gherardi, G; Lovgren, M; Facklam, R R; Forwick, B A; Tyrrell, G J

    2000-05-01

    Approximately 40-60% of group A streptococcal (GAS) isolates are capable of opacifying sera, due to the expression of the sof (serum opacity factor) gene. The emm (M protein gene) and sof 5' sequences were obtained from a diverse set of GAS reference strains and clinical isolates, and correlated with M serotyping and anti-opacity-factor testing results. Attempts to amplify sof from strains with M serotypes or emm types historically associated with the opacity-factor-negative phenotype were negative, except for emm12 strains, which were found to contain a highly conserved sof sequence. There was a strong correlation of certain M serotypes with specific emm sequences regardless of strain background, and likewise a strong association of specific anti-opacity-factor (AOF) types to sof gene sequence types. In several examples, M type identity, or partial identity shared between strains with differing emm types, was correlated with short, highly conserved 5' emm sequences likely to encode M-type-specific epitopes. Additionally, each of three pairs of historically distinct M type reference strains found to share the same 5' emm sequence, were also found to share M serotype specificity. Based upon sof sequence comparisons between strains of the same and of differing AOF types, an approximately 450 residue domain was determined likely to contain key epitopes required for AOF type specificity. Analysis of two Sof sequences that were not highly homologous, yet shared a common AOF type, further implicated a 107 aa portion of this 450-residue domain in putatively containing AOF-specific epitopes. Taken together, the serological data suggest that AOF-specific epitopes for all Sof proteins may reside within a region corresponding to this 107-residue sequence. The presence of specific, hypervariable emm/sof pairs within multiple isolates appears likely to be a reliable indicator of their overall genetic relatedness, and to be very useful for accurate subtyping of GAS isolates by

  17. The thermostable direct hemolysin-related hemolysin (trh) gene of Vibrio parahaemolyticus: Sequence variation and implications for detection and function.

    PubMed

    Nilsson, William B; Turner, Jeffrey W

    2016-07-01

    Vibrio parahaemolyticus is a leading cause of bacterial food-related illness associated with the consumption of undercooked seafood. Only a small subset of strains is pathogenic. Most clinical strains encode for the thermostable direct hemolysin (TDH) and/or the TDH-related hemolysin (TRH). In this work, we amplify and sequence the trh gene from over 80 trh+strains of this bacterium and identify thirteen genetically distinct alleles, most of which have not been deposited in GenBank previously. Sequence data was used to design new primers for more reliable detection of trh by endpoint PCR. We also designed a new quantitative PCR assay to target a more conserved gene that is genetically-linked to trh. This gene, ureR, encodes the transcriptional regulator for the urease gene cluster immediately upstream of trh. We propose that this ureR assay can be a useful screening tool as a surrogate for direct detection of trh that circumvents challenges associated with trh sequence variation. PMID:27094247

  18. CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data

    PubMed Central

    De, Rajat K.

    2015-01-01

    Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision. PMID:26291322

  19. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE PAGESBeta

    Leung, Elo; Huang, Amy; Cadag, Eithon; Montana, Aldrin; Soliman, Jan Lorenz; Zhou, Carol L. Ecale

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  20. Variational Bayesian strategies for high-dimensional, stochastic design problems

    NASA Astrophysics Data System (ADS)

    Koutsourelakis, P. S.

    2016-03-01

    This paper is concerned with a lesser-studied problem in the context of model-based, uncertainty quantification (UQ), that of optimization/design/control under uncertainty. The solution of such problems is hindered not only by the usual difficulties encountered in UQ tasks (e.g. the high computational cost of each forward simulation, the large number of random variables) but also by the need to solve a nonlinear optimization problem involving large numbers of design variables and potentially constraints. We propose a framework that is suitable for a class of such problems and is based on the idea of recasting them as probabilistic inference tasks. To that end, we propose a Variational Bayesian (VB) formulation and an iterative VB-Expectation-Maximization scheme that is capable of identifying a local maximum as well as a low-dimensional set of directions in the design space, along which, the objective exhibits the largest sensitivity. We demonstrate the validity of the proposed approach in the context of two numerical examples involving thousands of random and design variables. In all cases considered the cost of the computations in terms of calls to the forward model was of the order of 100 or less. The accuracy of the approximations provided is assessed by information-theoretic metrics.

  1. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.

    PubMed

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  2. Assessment of megabase-scale somatic copy number variation using single-cell sequencing

    PubMed Central

    Knouse, Kristin A.; Wu, Jie; Amon, Angelika

    2016-01-01

    Megabase-scale copy number variants (CNVs) can have profound phenotypic consequences. Germline CNVs of this magnitude are associated with disease and experience negative selection. However, it is unknown whether organismal function requires that every cell maintain a balanced genome. It is possible that large somatic CNVs are tolerated or even positively selected. Single-cell sequencing is a useful tool for assessing somatic genomic heterogeneity, but its performance in CNV detection has not been rigorously tested. Here, we develop an approach that allows for reliable detection of megabase-scale CNVs in single somatic cells. We discover large CNVs in 8%–9% of cells across tissues and identify two recurrent CNVs. We conclude that large CNVs can be tolerated in subpopulations of cells, and particular CNVs are relatively prevalent within and across individuals. PMID:26772196

  3. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    NASA Astrophysics Data System (ADS)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  4. Variations of the ISM Compactness Across the Main Sequence of Star Forming Galaxies: Observations and Simulations

    NASA Astrophysics Data System (ADS)

    Martínez-Galarza, J. R.; Smith, H. A.; Lanz, L.; Hayward, Christopher C.; Zezas, A.; Rosenthal, L.; Weiner, A.; Hung, C.; Ashby, M. L. N.; Groves, B.

    2016-01-01

    The majority of star-forming galaxies follow a simple empirical correlation in the star formation rate (SFR) versus stellar mass (M*) plane, of the form {{SFR}}\\propto {M}*α , usually referred to as the star formation main sequence (MS). The physics that sets the properties of the MS is currently a subject of debate, and no consensus has been reached regarding the fundamental difference between members of the sequence and its outliers. Here we combine a set of hydro-dynamical simulations of interacting galactic disks with state-of-the-art radiative transfer codes to analyze how the evolution of mergers is reflected upon the properties of the MS. We present Chiburst, a Markov Chain Monte Carlo spectral energy distribution (SED) code that fits the multi-wavelength, broad-band photometry of galaxies and derives stellar masses, SFRs, and geometrical properties of the dust distribution. We apply this tool to the SEDs of simulated mergers and compare the derived results with the reference output from the simulations. Our results indicate that changes in the SEDs of mergers as they approach coalescence and depart from the MS are related to an evolution of dust geometry in scales larger than a few hundred parsecs. This is reflected in a correlation between the specific star formation rate, and the compactness parameter { C }, that parametrizes this geometry and hence the evolution of dust temperature ({T}{{dust}}) with time. As mergers approach coalescence, they depart from the MS and increase their compactness, which implies that moderate outliers of the MS are consistent with late-type mergers. By further applying our method to real observations of luminous infrared galaxies (LIRGs), we show that the merger scenario is unable to explain these extreme outliers of the MS. Only by significantly increasing the gas fraction in the simulations are we able to reproduce the SEDs of LIRGs.

  5. DNA sequence variations of metalloproteinases: their role in asthma and COPD

    PubMed Central

    Sampsonas, Fotis; Kaparianos, Alexander; Lykouras, Dimosthenis; Karkoulias, Kiriakos; Spiropoulos, Kostas

    2007-01-01

    Asthma and chronic obstructive pulmonary disease (COPD) are complex genetic diseases that cause considerable morbidity and mortality worldwide. Genetic variability interacting with environmental and ethnic factors is presumed to cause tobacco smoke susceptibility and to influence asthma severity. A disintegrin and metalloproteinase 33 (ADAM33) and matrix metalloproteinase‐9 (MMP9) appear to have important roles in asthma and COPD pathogenesis. ADAM33 and MMP9 genetic alterations could possibly contribute to the establishment and progression of these multifactorial diseases, although their association with the clinical phenotypes has not yet been elucidated. However, the occurrence of these alterations does not always result in clear disease, implying that either they are an epiphenomenon or they are in proximity to the true causative alteration. This review summarises the most recent literature dealing with the genetic variations of metalloproteinases and outlines their potential pathogenetic outcome. PMID:17403951

  6. Use of robotics in high-throughput DNA sequencing.

    PubMed

    Keeney, Stephen

    2011-01-01

    Until relatively recently, full sequencing of genes consisting of more than several exons was not considered practicable within a routine diagnostic context. As a result, many approaches to unknown mutation detection in a specific gene involved a mutation pre-screening step to limit the amount of DNA sequencing required. Protocols to pre-screen for mutations and limit the amount of DNA sequencing may not localise every base change present and/or require considerable levels of manual intervention. Advances in technology, allied with careful protocol design, now permit direct DNA sequencing to be applied to larger areas of gene sequence, allowing unequivocal mutation identification in the area of a gene being analysed. The protocol described below utilises robotic systems, allied to custom-designed PCR primers, to facilitate rapid DNA sequencing of multiple gene targets. The general approach is amenable to adaptation for use with multi-channel pipettes. PMID:20938842

  7. Fungal community analysis by high-throughput sequencing of amplified markers – a user's guide

    PubMed Central

    Lindahl, Björn D; Nilsson, R Henrik; Tedersoo, Leho; Abarenkov, Kessy; Carlsen, Tor; Kjøller, Rasmus; Kõljalg, Urmas; Pennanen, Taina; Rosendahl, Søren; Stenlid, Jan; Kauserud, Håvard

    2013-01-01

    Novel high-throughput sequencing methods outperform earlier approaches in terms of resolution and magnitude. They enable identification and relative quantification of community members and offer new insights into fungal community ecology. These methods are currently taking over as the primary tool to assess fungal communities of plant-associated endophytes, pathogens, and mycorrhizal symbionts, as well as free-living saprotrophs. Taking advantage of the collective experience of six research groups, we here review the different stages involved in fungal community analysis, from field sampling via laboratory procedures to bioinformatics and data interpretation. We discuss potential pitfalls, alternatives, and solutions. Highlighted topics are challenges involved in: obtaining representative DNA/RNA samples and replicates that encompass the targeted variation in community composition, selection of marker regions and primers, options for amplification and multiplexing, handling of sequencing errors, and taxonomic identification. Without awareness of methodological biases, limitations of markers, and bioinformatics challenges, large-scale sequencing projects risk yielding artificial results and misleading conclusions. PMID:23534863

  8. Fungal community analysis by high-throughput sequencing of amplified markers--a user's guide.

    PubMed

    Lindahl, Björn D; Nilsson, R Henrik; Tedersoo, Leho; Abarenkov, Kessy; Carlsen, Tor; Kjøller, Rasmus; Kõljalg, Urmas; Pennanen, Taina; Rosendahl, Søren; Stenlid, Jan; Kauserud, Håvard

    2013-07-01

    Novel high-throughput sequencing methods outperform earlier approaches in terms of resolution and magnitude. They enable identification and relative quantification of community members and offer new insights into fungal community ecology. These methods are currently taking over as the primary tool to assess fungal communities of plant-associated endophytes, pathogens, and mycorrhizal symbionts, as well as free-living saprotrophs. Taking advantage of the collective experience of six research groups, we here review the different stages involved in fungal community analysis, from field sampling via laboratory procedures to bioinformatics and data interpretation. We discuss potential pitfalls, alternatives, and solutions. Highlighted topics are challenges involved in: obtaining representative DNA/RNA samples and replicates that encompass the targeted variation in community composition, selection of marker regions and primers, options for amplification and multiplexing, handling of sequencing errors, and taxonomic identification. Without awareness of methodological biases, limitations of markers, and bioinformatics challenges, large-scale sequencing projects risk yielding artificial results and misleading conclusions. PMID:23534863

  9. High-resolution transcriptome analysis with long-read RNA sequencing.

    PubMed

    Cho, Hyunghoon; Davis, Joe; Li, Xin; Smith, Kevin S; Battle, Alexis; Montgomery, Stephen B

    2014-01-01

    RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals. PMID:25251678

  10. Modeling and optimization of defense high level waste removal sequencing

    NASA Astrophysics Data System (ADS)

    Paul, Pran Krishna

    has been successfully implemented with this general scheme to sequence wastes from different waste tanks for precipitate production and provide optimized sequences to ProdMod for simulating the behavior of the SRS waste complex. Parametric studies using this optimization methodology demonstrate that the devised scheme is appropriate for the real life operations of the SRS waste complex. The computational planning tool based on the coupled simulation and optimization methodology developed in this work is in current use to help planners process the SRS's 34 million gallons of high level radioactive waste efficiently and economically all the way to clean up of all the tanks. This methodology can also be directly applicable to the Hanford Site and aid in the final design and operation of its facilities to process 55 million gallons of high level radioactive waste.

  11. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    PubMed

    Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

    2013-01-01

    High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589

  12. BOOGIE: Predicting Blood Groups from High Throughput Sequencing Data

    PubMed Central

    Giollo, Manuel; Minervini, Giovanni; Scalzotto, Marta; Leonardi, Emanuela; Ferrari, Carlo; Tosatto, Silvio C. E.

    2015-01-01

    Over the last decade, we have witnessed an incredible growth in the amount of available genotype data due to high throughput sequencing (HTS) techniques. This information may be used to predict phenotypes of medical relevance, and pave the way towards personalized medicine. Blood phenotypes (e.g. ABO and Rh) are a purely genetic trait that has been extensively studied for decades, with currently over thirty known blood groups. Given the public availability of blood group data, it is of interest to predict these phenotypes from HTS data which may translate into more accurate blood typing in clinical practice. Here we propose BOOGIE, a fast predictor for the inference of blood groups from single nucleotide variant (SNV) databases. We focus on the prediction of thirty blood groups ranging from the well known ABO and Rh, to the less studied Junior or Diego. BOOGIE correctly predicted the blood group with 94% accuracy for the Personal Genome Project whole genome profiles where good quality SNV annotation was available. Additionally, our tool produces a high quality haplotype phase, which is of interest in the context of ethnicity-specific polymorphisms or traits. The versatility and simplicity of the analysis make it easily interpretable and allow easy extension of the protocol towards other phenotypes. BOOGIE can be downloaded from URL http://protein.bio.unipd.it/download/. PMID:25893845

  13. A high-plex PCR approach for massively parallel sequencing.

    PubMed

    Nguyen-Dumont, Tú; Pope, Bernard J; Hammet, Fleur; Southey, Melissa C; Park, Daniel J

    2013-08-01

    Current methods for targeted massively parallel sequencing (MPS) have several drawbacks, including limited design flexibility, expense, and protocol complexity, which restrict their application to settings involving modest target size and requiring low cost and high throughput. To address this, we have developed Hi-Plex, a PCR-MPS strategy intended for high-throughput screening of multiple genomic target regions that integrates simple, automated primer design software to control product size. Featuring permissive thermocycling conditions and clamp bias reduction, our protocol is simple, cost- and time-effective, uses readily available reagents, does not require expensive instrumentation, and requires minimal optimization. In a 60-plex assay targeting the breast cancer predisposition genes PALB2 and XRCC2, we applied Hi-Plex to 100 ng LCL-derived DNA, and 100 ng and 25 ng FFPE tumor-derived DNA. Altogether, at least 86.94% of the human genome-mapped reads were on target, and 100% of targeted amplicons were represented within 25-fold of the mean. Using 25 ng FFPE-derived DNA, 95.14% of mapped reads were on-target and relative representation ranged from 10.1-fold lower to 5.8-fold higher than the mean. These results were obtained using only the initial automatically-designed primers present in equal concentration. Hi-Plex represents a powerful new approach for screening panels of genomic target regions. PMID:23931594

  14. Improved detection of artifactual viral minority variants in high-throughput sequencing data.

    PubMed

    Welkers, Matthijs R A; Jonges, Marcel; Jeeninga, Rienk E; Koopmans, Marion P G; de Jong, Menno D

    2014-01-01

    High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs). PMID:25657642

  15. Phylogeny, Floral Evolution, and Inter-Island Dispersal in Hawaiian Clermontia (Campanulaceae) Based on ISSR Variation and Plastid Spacer Sequences

    PubMed Central

    Givnish, Thomas J.; Bean, Gregory J.; Ames, Mercedes; Lyon, Stephanie P.; Sytsma, Kenneth J.

    2013-01-01

    Previous studies based on DNA restriction-site and sequence variation have shown that the Hawaiian lobeliads are monophyletic and that the two largest genera, Cyanea and Clermontia, diverged from each other ca. 9.7 Mya. Sequence divergence among species of Clermontia is quite limited, however, and extensive hybridization is suspected, which has interfered with production of a well-resolved molecular phylogeny for the genus. Clermontia is of considerable interest because several species posses petal-like sepals, raising the question of whether such a homeotic mutation has arisen once or several times. In addition, morphological and molecular studies have implied different patterns of inter-island dispersal within the genus. Here we use nuclear ISSRs (inter-simple sequence repeat polymorphisms) and five plastid non-coding sequences to derive biparental and maternal phylogenies for Clermontia. Our findings imply that (1) Clermontia is not monophyletic, with Cl. pyrularia nested within Cyanea and apparently an intergeneric hybrid; (2) the earliest divergent clades within Clermontia are native to Kauài, then Òahu, then Maui, supporting the progression rule of dispersal down the chain toward progressively younger islands, although that rule is violated in later-evolving taxa in the ISSR tree; (3) almost no sequence divergence among several Clermontia species in 4.5 kb of rapidly evolving plastid DNA; (4) several apparent cases of hybridization/introgression or incomplete lineage sorting (i.e., Cl. oblongifolia, peleana, persicifolia, pyrularia, samuelii, tuberculata), based on extensive conflict between the ISSR and plastid phylogenies; and (5) two origins and two losses of petaloid sepals, or—perhaps more plausibly—a single origin and two losses of this homeotic mutation, with its introgression into Cl. persicifolia. Our phylogenies are better resolved and geographically more informative than others based on ITS and 5S-NTS sequences and nuclear SNPs, but agree

  16. Phylogeny, floral evolution, and inter-island dispersal in Hawaiian Clermontia (Campanulaceae) based on ISSR variation and plastid spacer sequences.

    PubMed

    Givnish, Thomas J; Bean, Gregory J; Ames, Mercedes; Lyon, Stephanie P; Sytsma, Kenneth J

    2013-01-01

    Previous studies based on DNA restriction-site and sequence variation have shown that the Hawaiian lobeliads are monophyletic and that the two largest genera, Cyanea and Clermontia, diverged from each other ca. 9.7 Mya. Sequence divergence among species of Clermontia is quite limited, however, and extensive hybridization is suspected, which has interfered with production of a well-resolved molecular phylogeny for the genus. Clermontia is of considerable interest because several species posses petal-like sepals, raising the question of whether such a homeotic mutation has arisen once or several times. In addition, morphological and molecular studies have implied different patterns of inter-island dispersal within the genus. Here we use nuclear ISSRs (inter-simple sequence repeat polymorphisms) and five plastid non-coding sequences to derive biparental and maternal phylogenies for Clermontia. Our findings imply that (1) Clermontia is not monophyletic, with Cl. pyrularia nested within Cyanea and apparently an intergeneric hybrid; (2) the earliest divergent clades within Clermontia are native to Kauài, then Òahu, then Maui, supporting the progression rule of dispersal down the chain toward progressively younger islands, although that rule is violated in later-evolving taxa in the ISSR tree; (3) almost no sequence divergence among several Clermontia species in 4.5 kb of rapidly evolving plastid DNA; (4) several apparent cases of hybridization/introgression or incomplete lineage sorting (i.e., Cl. oblongifolia, peleana, persicifolia, pyrularia, samuelii, tuberculata), based on extensive conflict between the ISSR and plastid phylogenies; and (5) two origins and two losses of petaloid sepals, or--perhaps more plausibly--a single origin and two losses of this homeotic mutation, with its introgression into Cl. persicifolia. Our phylogenies are better resolved and geographically more informative than others based on ITS and 5S-NTS sequences and nuclear SNPs, but agree with

  17. Next-generation sequencing analysis of off-ladder alleles due to migration shift caused by sequence variation at D12S391 locus.

    PubMed

    Fujii, Koji; Watahiki, Haruhiko; Mita, Yusuke; Iwashima, Yasuki; Miyaguchi, Hajime; Kitayama, Tetsushi; Nakahara, Hiroaki; Mizuno, Natsuko; Sekiguchi, Kazumasa

    2016-09-01

    In short tandem repeat (STR) analysis, length polymorphisms are detected by capillary electrophoresis (CE). At most STR loci, mobility shift due to sequence variation in the repeat region was thought not to affect the typing results. In our recent population studies of 1501 Japanese individuals, off-ladder calls were observed at the D12S391 locus using PowerPlex Fusion in nine samples for allele 22, one sample for allele 25, and one sample for allele 26. However, these samples were typed as ordinary alleles within the bins using GlobalFiler. In this study, next-generation sequencing analysis using MiSeq was performed for the D12S391 locus from the 11 off-ladder samples and 33 other samples, as well as the allelic ladders of PowerPlex Fusion and GlobalFiler. All off-ladder allele 22 in the nine samples had [AGAT]11[AGAC]11 as a repeat structure, while the corresponding allele was [AGAT]15[AGAC]6[AGAT] for the PowerPlex Fusion ladder, and [AGAT]13[AGAC]9 for the GlobalFiler ladder. Overall, as the number of [AGAT] in the repeat structure decreased at the D12S391 locus, the peak migrated more slowly using PowerPlex Fusion, the reverse strand of which was labeled, and it migrated more rapidly using GlobalFiler, the forward strand of which was labeled. The allelic ladders of both STR kits were reamplified with our small amplicon D12S391 primers and their mobility was also examined. In conclusion, off-ladder observations of allele 22 at the D12S391 locus using PowerPlex Fusion were mainly attributed to a relatively large difference of the repeat structure between its allelic ladder and off-ladder allele 22. PMID:27591542

  18. An integrative approach to predicting the functional effects of non-coding and coding sequence variation

    PubMed Central

    Shihab, Hashem A.; Rogers, Mark F.; Gough, Julian; Mort, Matthew; Cooper, David N.; Day, Ian N. M.; Gaunt, Tom R.; Campbell, Colin

    2015-01-01

    Motivation: Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source. Results: We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions. Availability and implementation: The FATHMM-MKL webserver is available at: http://fathmm.biocompute.org.uk Contact: H.Shihab@bristol.ac.uk or Mark.Rogers@bristol.ac.uk or C.Campbell@bristol.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25583119

  19. Genetic relationship of Chinese and Japanese gamecocks revealed by mtDNA sequence variation.

    PubMed

    Liu, Yi-Ping; Zhu, Qing; Yao, Yong-Gang

    2006-02-01

    Cockfighting has a very long history dating back to as early as 2500 years ago in China. Cockfighting was intertwined with human cultural traditions, helped disperse chickens across the world, and influenced the subsequent breed selection. Therefore, tracing the origin of gamecocks could mirror the distribution of the cockfighting culture. In this study, we compared the available mtDNA control region sequences in Chinese and Japanese gamecocks to test the recently proposed hypothesis behind the dual origin of the Japanese cockfighting culture (from China and Southeast Asia independently). We assigned gamecock mtDNAs to different matrilineal components (or phylogenetic clades) that emerged from the phylogenetic tree and network profile, and compared the frequency differences between Chinese and Japanese gamecocks. Among the six clades (A-F) identified, Japanese gamecocks were most frequently found in clades C and D (74%, 32/43), whereas more than half of the Chinese gamecock samples (69%, 35/51) were grouped in clades A and B. Haplotypes in Japanese gamecocks assigned to clades A, B, and E were either shared with those of the Chinese samples or differed from the close Chinese types by no more than a three-mutation distance. This genetic pattern is in accordance with the proposed dual origin of Japanese gamecocks but has left room for single origin of Japanese gamecocks from China. The genetic structure of gamecocks in China and Japan might also be influenced by subsequent breed selection and conservation after the initial gamecock introduction. PMID:16648993

  20. Complete Haplotype Sequence of the Human Immunoglobulin Heavy-Chain Variable, Diversity, and Joining Genes and Characterization of Allelic and Copy-Number Variation

    PubMed Central

    Watson, Corey T.; Steinberg, Karyn M.; Huddleston, John; Warren, Rene L.; Malig, Maika; Schein, Jacqueline; Willsey, A. Jeremy; Joy, Jeffrey B.; Scott, Jamie K.; Graves, Tina A.; Wilson, Richard K.; Holt, Robert A.; Eichler, Evan E.; Breden, Felix

    2013-01-01

    The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (Fst = 0.3–0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested. PMID:23541343

  1. Weather explains high annual variation in butterfly dispersal.

    PubMed

    Kuussaari, Mikko; Rytteri, Susu; Heikkinen, Risto K; Heliölä, Janne; von Bagh, Peter

    2016-07-27

    Weather conditions fundamentally affect the activity of short-lived insects. Annual variation in weather is therefore likely to be an important determinant of their between-year variation in dispersal, but conclusive empirical studies are lacking. We studied whether the annual variation of dispersal can be explained by the flight season's weather conditions in a Clouded Apollo (Parnassius mnemosyne) metapopulation. This metapopulation was monitored using the mark-release-recapture method for 12 years. Dispersal was quantified for each monitoring year using three complementary measures: emigration rate (fraction of individuals moving between habitat patches), average residence time in the natal patch, and average distance moved. There was much variation both in dispersal and average weather conditions among the years. Weather variables significantly affected the three measures of dispersal and together with adjusting variables explained 79-91% of the variation observed in dispersal. Different weather variables became selected in the models explaining variation in three dispersal measures apparently because of the notable intercorrelations. In general, dispersal rate increased with increasing temperature, solar radiation, proportion of especially warm days, and butterfly density, and decreased with increasing cloudiness, rainfall, and wind speed. These results help to understand and model annually varying dispersal dynamics of species affected by global warming. PMID:27440662

  2. Evaluation of a Pooled Strategy for High-Throughput Sequencing of Cosmid Clones from Metagenomic Libraries

    PubMed Central

    Lam, Kathy N.; Hall, Michael W.; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D.; Charles, Trevor C.

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones. PMID:24911009

  3. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    PubMed

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones. PMID:24911009

  4. Sequence variation and linkage disequilibrium in the human T-cell receptor beta (TCRB) locus.

    PubMed

    Subrahmanyan, L; Eberle, M A; Clark, A G; Kruglyak, L; Nickerson, D A

    2001-08-01

    The T-cell receptor (TCR) plays a central role in the immune system, and > 90% of human T cells present a receptor that consists of the alpha TCR subunit (TCRA) and the beta subunit (TCRB). Here we report an analysis of 63 variable genes (BV), spanning 553 kb of TCRB that yielded 279 single-nucleotide polymorphisms (SNPs). Samples were drawn from 10 individuals and represent four populations-African American, Chinese, Mexican, and Northern European. We found nine variants that produce nonfunctional BV segments, removing those genes from the TCRB genomic repertoire. There was significant heterogeneity among population samples in SNP frequency (including the BV-inactivating sites), indicating the need for multiple-population samples for adequate variant discovery. In addition, we observed considerable linkage disequilibrium (LD) (r(2) > 0.1) over distances of approximately 30 kb in TCRB, and, in general, the distribution of r(2) as a function of physical distance was in close agreement with neutral coalescent simulations. LD in TCRB showed considerable spatial variation across the locus, being concentrated in "blocks" of LD; however, coalescent simulations of the locus illustrated that the heterogeneity of LD we observed in TCRB did not differ markedly from that expected from neutral processes. Finally, examination of the extended genotypes for each subject demonstrated homozygous stretches of >100 kb in the locus of several individuals. These results provide the basis for optimization of locuswide SNP typing in TCRB for studies of genotype-phenotype association. PMID:11438886

  5. Sequence Variation and Linkage Disequilibrium in the Human T-Cell Receptor β (TCRB) Locus

    PubMed Central

    Subrahmanyan, Lakshman; Eberle, Michael A.; Clark, Andrew G.; Kruglyak, Leonid; Nickerson, Deborah A.

    2001-01-01

    The T-cell receptor (TCR) plays a central role in the immune system, and >90% of human T cells present a receptor that consists of the α TCR subunit (TCRA) and the β subunit (TCRB). Here we report an analysis of 63 variable genes (BV), spanning 553 kb of TCRB that yielded 279 single-nucleotide polymorphisms (SNPs). Samples were drawn from 10 individuals and represent four populations—African American, Chinese, Mexican, and Northern European. We found nine variants that produce nonfunctional BV segments, removing those genes from the TCRB genomic repertoire. There was significant heterogeneity among population samples in SNP frequency (including the BV-inactivating sites), indicating the need for multiple-population samples for adequate variant discovery. In addition, we observed considerable linkage disequilibrium (LD) (r2>0.1) over distances of ∼30 kb in TCRB, and, in general, the distribution of r2 as a function of physical distance was in close agreement with neutral coalescent simulations. LD in TCRB showed considerable spatial variation across the locus, being concentrated in “blocks” of LD; however, coalescent simulations of the locus illustrated that the heterogeneity of LD we observed in TCRB did not differ markedly from that expected from neutral processes. Finally, examination of the extended genotypes for each subject demonstrated homozygous stretches of >100 kb in the locus of several individuals. These results provide the basis for optimization of locuswide SNP typing in TCRB for studies of genotype-phenotype association. PMID:11438886

  6. Paleomagnetic directions and thermoluminescence dating from a bread oven-floor sequence in Lübeck (Germany): A record of 450 years of geomagnetic secular variation

    NASA Astrophysics Data System (ADS)

    Schnepp, Elisabeth; Pucher, Rudolf; Goedicke, Christian; Manzano, Ana; Müller, Uwe; Lanos, Philippe

    2003-02-01

    A record of about 450 years of geomagnetic secular variation is presented from a single archaeological site in Lübeck (Germany) where a sequence of 25 bread oven floors has been preserved in a bakery from medieval times until today. The age dating of the oven-floor sequence is based on historical documents, 14C-dating and thermoluminescence dating. It confines the time interval from about 1300 to 1800 A.D. Paleomagnetic directions have been determined from each oven floor by means of 198 oriented hand samples. After alternating field as well as thermal demagnetization experiments, the characteristic remanent magnetization direction was obtained using principal component analysis. The mean directions of 24 oven floors are characterized by high Fisherian precision parameters (>146) and small α95 confidence limits (1.2°-4.6°). For obtaining a smooth curve of geomagnetic secular variation for Lübeck, a spherical spline function was fitted to the data using a Bayesian approach, which considers not only the obtained ages, but also stratigraphic order. Correlation with historical magnetic records suggests that the age estimation for the upper 10 layers was too young and must date from the end of the sixteenth to the mid of the eighteenth century. For the lowermost 14 layers, dating is reliable and provides a secular variation curve for Germany. The inclination shows a minimum in the fourteenth century and then increases by more than 10°. Declination shows a local minimum around 1400 A.D. followed by a maximum in the seventeenth century. This is followed by the movement of declination about 30° to western directions.

  7. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies

    PubMed Central

    2010-01-01

    Background High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. Results DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. Conclusions This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives

  8. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges

    PubMed Central

    Liu, Biao; Morrison, Carl D.; Johnson, Candace S.; Trump, Donald L.; Qin, Maochun; Conroy, Jeffrey C.; Wang, Jianmin; Liu, Song

    2013-01-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections. PMID:24240121

  9. Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools.

    PubMed

    Ding, Jun; Sidore, Carlo; Butler, Thomas J; Wing, Mary Kate; Qian, Yong; Meirelles, Osorio; Busonero, Fabio; Tsoi, Lam C; Maschio, Andrea; Angius, Andrea; Kang, Hyun Min; Nagaraja, Ramaiah; Cucca, Francesco; Abecasis, Gonçalo R; Schlessinger, David

    2015-07-01

    DNA sequencing identifies common and rare genetic variants for association studies, but studies typically focus on variants in nuclear DNA and ignore the mitochondrial genome. In fact, analyzing variants in mitochondrial DNA (mtDNA) sequences presents special problems, which we resolve here with a general solution for the analysis of mtDNA in next-generation sequencing studies. The new program package comprises 1) an algorithm designed to identify mtDNA variants (i.e., homoplasmies and heteroplasmies), incorporating sequencing error rates at each base in a likelihood calculation and allowing allele fractions at a variant site to differ across individuals; and 2) an estimation of mtDNA copy number in a cell directly from whole-genome sequencing data. We also apply the methods to DNA sequence from lymphocytes of ~2,000 SardiNIA Project participants. As expected, mothers and offspring share all homoplasmies but a lesser proportion of heteroplasmies. Both homoplasmies and heteroplasmies show 5-fold higher transition/transversion ratios than variants in nuclear DNA. Also, heteroplasmy increases with age, though on average only ~1 heteroplasmy reaches the 4% level between ages 20 and 90. In addition, we find that mtDNA copy number averages ~110 copies/lymphocyte and is ~54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference (p-value = 0.0031) and waist-hip ratio (p-value = 2.4×10-5), but not with body mass index, indicating an association with central fat distribution. To our knowledge, this is the largest population analysis to date of mtDNA dynamics, revealing the age-imposed increase in heteroplasmy, the relatively high heritability of copy number, and the association of copy number with metabolic traits. PMID:26172475

  10. High-resolution genetic mapping of maize pan-genome sequence anchors.

    PubMed

    Lu, Fei; Romay, Maria C; Glaubitz, Jeffrey C; Bradbury, Peter J; Elshire, Robert J; Wang, Tianyu; Li, Yu; Li, Yongxiang; Semagn, Kassa; Zhang, Xuecai; Hernandez, Alvaro G; Mikel, Mark A; Soifer, Ilya; Barad, Omer; Buckler, Edward S

    2015-01-01

    In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. The structural variation across a species can be represented by a 'pan-genome', which is essential to fully understand the genetic control of phenotypes. However, the pan-genome's complexity hinders its accurate assembly via sequence alignment. Here we demonstrate an approach to facilitate pan-genome construction in maize. By performing 18 trillion association tests we map 26 million tags generated by reduced representation sequencing of 14,129 maize inbred lines. Using machine-learning models we select 4.4 million accurately mapped tags as sequence anchors, 1.1 million of which are presence/absence variations. Structural variations exhibit enriched association with phenotypic traits, indicating that it is a significant source of adaptive variation in maize. The ability to efficiently map ultrahigh-density pan-genome sequence anchors enables fine characterization of structural variation and will advance both genetic research and breeding in many crops. PMID:25881062

  11. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  12. High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach

    PubMed Central

    2012-01-01

    Background Next-Generation Sequencing (NGS) is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study. Results Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data. Conclusions Implementation of a validated pipeline for NGS data acquisition and analysis provides highly

  13. Lack of Structural Variation but Extensive Length Polymorphisms and Heteroplasmic Length Variations in the Mitochondrial DNA Control Region of Highly Inbred Crested Ibis, Nipponia nippon.

    PubMed

    He, Xue-Lian; Ding, Chang-Qing; Han, Jian-Lin

    2013-01-01

    The animal mitochondrial DNA (mtDNA) length polymorphism and heteroplasmy are accepted to be universal. Here we report the lack of structural variation but the presence of length polymorphism as well as heteroplasmy in mtDNA control region of an endangered avian species - the Crested Ibis (Nipponia nippon). The complete control region was directly sequenced while the distribution pattern and inheritance of the length variations were examined using both direct sequencing and genotyping of the PCR fragments from captive birds with pedigrees, wild birds and a historical specimen. Our results demonstrated that there was no structural variation in the control region, however, different numbers of short tandem repeats with an identical motif of CA3CA2CA3 at the 3'-end of the control region determined the length polymorphisms among and heteroplasmy within individual birds. There were one to three predominant fragments in every bird; nevertheless multiple minor fragments coexist in all birds. These extremely high polymorphisms were suggested to have derived from the 'replication slippage' of a perfect microsatellite evolution following the step-wise mutational model. The patterns of heteroplasmy were found to be shifted between generations and among siblings but rather stable between blood and feather samples. This study provides the first evidence of a very extensive mtDNA length polymorphism and heteroplasmy in the highly inbred Crested Ibis which carries an mtDNA genome lack of structural genetic diversity. The analysis of pedigreed samples also sheds light on the transmission of mtDNA length heteroplasmy in birds following the genetic bottleneck theory. Further research focusing on the generation and transmission of particular mtDNA heteroplasmy patterns in single germ line of Crested Ibis is encouraged by this study. PMID:23805212

  14. Lack of Structural Variation but Extensive Length Polymorphisms and Heteroplasmic Length Variations in the Mitochondrial DNA Control Region of Highly Inbred Crested Ibis, Nipponia nippon

    PubMed Central

    He, Xue-Lian; Ding, Chang-Qing; Han, Jian-Lin

    2013-01-01

    The animal mitochondrial DNA (mtDNA) length polymorphism and heteroplasmy are accepted to be universal. Here we report the lack of structural variation but the presence of length polymorphism as well as heteroplasmy in mtDNA control region of an endangered avian species – the Crested Ibis (Nipponia nippon). The complete control region was directly sequenced while the distribution pattern and inheritance of the length variations were examined using both direct sequencing and genotyping of the PCR fragments from captive birds with pedigrees, wild birds and a historical specimen. Our results demonstrated that there was no structural variation in the control region, however, different numbers of short tandem repeats with an identical motif of CA3CA2CA3 at the 3′-end of the control region determined the length polymorphisms among and heteroplasmy within individual birds. There were one to three predominant fragments in every bird; nevertheless multiple minor fragments coexist in all birds. These extremely high polymorphisms were suggested to have derived from the ‘replication slippage’ of a perfect microsatellite evolution following the step-wise mutational model. The patterns of heteroplasmy were found to be shifted between generations and among siblings but rather stable between blood and feather samples. This study provides the first evidence of a very extensive mtDNA length polymorphism and heteroplasmy in the highly inbred Crested Ibis which carries an mtDNA genome lack of structural genetic diversity. The analysis of pedigreed samples also sheds light on the transmission of mtDNA length heteroplasmy in birds following the genetic bottleneck theory. Further research focusing on the generation and transmission of particular mtDNA heteroplasmy patterns in single germ line of Crested Ibis is encouraged by this study. PMID:23805212

  15. A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes

    PubMed Central

    Bergström, Anders; Simpson, Jared T.; Salinas, Francisco; Barré, Benjamin; Parts, Leopold; Zia, Amin; Nguyen Ba, Alex N.; Moses, Alan M.; Louis, Edward J.; Mustonen, Ville; Warringer, Jonas; Durbin, Richard; Liti, Gianni

    2014-01-01

    The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies. PMID:24425782

  16. Integration of high-resolution seismic with core data delineates sequence stratigraphy of a shelf-edge delta complex

    SciTech Connect

    Combes, J.M.; Nissen, S.E.; Scott, R.W.

    1995-12-31

    Correlation of high resolution seismic and corehole data sets obtained offshore Louisiana by a cooperative consortium of Louisiana State University and ten petroleum industry partners has resulted in a detailed sequence stratigraphic interpretation of a Late Pleistocene shelf margin delta system. High resolution a Late Pleistocene shelf margin delta system. High resolution stratal geometries have been interpreted within this framework of genetically related facies and key sequence surfaces have been identified both on the high resolution seismic lines and in the core data. Regional expressions of chronostratigraphically identified sequence-bounding unconformities and transgressive ravinement surfaces emphasize the importance of these surfaces in determining stratigraphic relationships. Several key conclusions resulted from this study: (1) The optimum location for interpretation of sequence surfaces is within or near the locus of maximum deposition. (2) At a distance from a depocenter the characteristic features of sequence surfaces lose seismic resolution and minor, subtle variations in the reflection character are the only seismic indicators of major boundaries. (3) Shelf edge deltaic deposits are known to contain important hydrocarbon reservoirs and this latest Pleistocene system provides an excellent model for older Cenozoic systems. (4) Potential deep sea fan reservoirs may accumulate seaward of shelf margin deltas during both falling and rising sea level stages depending upon local sedimentological conditions.

  17. De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines.

    PubMed

    Shukla, Anjali; Singh, V K; Bharadwaj, D R; Kumar, Rajesh; Rai, Ashutosh; Rai, A K; Mugasimangalam, Raja; Parameswaran, Sriram; Singh, Major; Naik, P S

    2015-01-01

    Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd. PMID:26047102

  18. De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines

    PubMed Central

    Shukla, Anjali; Singh, V. K.; Bharadwaj, D. R.; Kumar, Rajesh; Rai, Ashutosh; Rai, A. K.; Mugasimangalam, Raja; Parameswaran, Sriram; Singh, Major; Naik, P. S.

    2015-01-01

    Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd. PMID:26047102

  19. Spatial thickness variation of Carboniferous coal-bearing sequences: A sedimentological response to varying levels of compactional and structural control

    SciTech Connect

    Liu, Yuejin; Ferm, J.C. . Dept. of Geological Sciences)

    1992-01-01

    A study of 1,120 borehole records from Carboniferous coal-bearing rocks in a 160 square mile area in southeastern Kentucky shows that within a stratigraphic interval of about 2,000 feet, the major lithic components are coarsening-upward sequences and coal groups. The latter are groups of rocks averaging 120 feet in thickness which include coal, thin mudstone and sandstone of channel or splay origin. The coarsening-upward sequences, which consist of mudstone overlain by sandstone, are of two types, one that is thick (mean thickness 170 feet/52 m) and very widespread and the other thin (mean thickness 70 feet/21 m) and has only local distribution. Variogram and trend surface procedures were used to characterize the dimension and areal distribution of these rock bodies. The results show that thickness variation is a product of differential compaction and movement of deep seated structures contemporaneous with sedimentation. Structural effects on two scales can be recognized, one on the order of 6 to 10 miles and other greater than 20 miles. Differential compaction effects are found to be closely associated with those produced by the smaller scale structures while some of the large scale structure effects are concordant with present Alleghenian structures.

  20. The R package otu2ot for implementing the entropy decomposition of nucleotide variation in sequence data

    PubMed Central

    Ramette, Alban; Buttigieg, Pier Luigi

    2014-01-01

    Oligotyping is a novel, supervised computational method that classifies closely related sequences into “oligotypes” (OTs) based on subtle nucleotide variation (Eren et al., 2013). Its application to microbial datasets has helped reveal ecological patterns which are often hidden by the way sequence data are currently clustered to define operational taxonomic units (OTUs). Here, we implemented the OT entropy decomposition procedure and its unsupervised version, Minimal Entropy Decomposition (MED; Eren et al., 2014c), in the statistical programming language and environment, R. The aim of this implementation is to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework. In addition, two complementary approaches are implemented: (1) An analytical method (the broken stick model) is proposed to help identify OTs of low abundance that could be generated by chance alone and (2) a one-pass profiling (OP) method, to efficiently identify those OTUs whose subsequent oligotyping would be most promising to be undertaken. These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible. The package and procedures are illustrated by several tutorials and examples. PMID:25452747

  1. Phylogeny and chromosomal variations in East Asian Carex, Siderostictae group (Cyperaceae), based on DNA sequences and cytological data.

    PubMed

    Yano, Okihito; Ikeda, Hiroshi; Jin, Xiao-Feng; Hoshino, Takuji

    2014-01-01

    Carex (Cyperaceae) is one of the largest genera of the flowering plants, and comprises more than 2,000 species. In Carex, section Siderostictae with broader leaves distributed in East Asia is thought to be an ancestral group. We aimed to clarify the phylogenetic relationships and chromosomal variations within the section Siderostictae, and to examine the relationship of broad-leaved species of the sections Hemiscaposae and Surculosae from East Asia, inferred from DNA sequences and cytological data. Our results indicate that a monophyletic Siderostictae clade, including the sections Hemiscaposae, Siderostictae and Surculosae, as the earliest diverging group in the tribe Cariceae. Low chromosome numbers, 2n = 12 or 24, with large sizes were observed in these three sections. Our results suggest that the genus Carex might have originated or relictly restricted in the East Asia. Geographical distributions of diploid species are restricted in narrower areas, while those of tetraploid species are wider in East Asia. It is concluded that chromosomal variations in Siderostictae clade may have been caused by polyploidization and that tetraploid species may have been able to exploit their habitats by polyploidization. PMID:23857080

  2. Contamination-controlled high-throughput whole genome sequencing for influenza A viruses using the MiSeq sequencer.

    PubMed

    Lee, Hong Kai; Lee, Chun Kiat; Tang, Julian Wei-Tze; Loh, Tze Ping; Koay, Evelyn Siew-Chuan

    2016-01-01

    Accurate full-length genomic sequences are important for viral phylogenetic studies. We developed a targeted high-throughput whole genome sequencing (HT-WGS) method for influenza A viruses, which utilized an enzymatic cleavage-based approach, the Nextera XT DNA library preparation kit, for library preparation. The entire library preparation workflow was adapted for the Sentosa SX101, a liquid handling platform, to automate this labor-intensive step. As the enzymatic cleavage-based approach generates low coverage reads at both ends of the cleaved products, we corrected this loss of sequencing coverage at the termini by introducing modified primers during the targeted amplification step to generate full-length influenza A sequences with even coverage across the whole genome. Another challenge of targeted HTS is the risk of specimen-to-specimen cross-contamination during the library preparation step that results in the calling of false-positive minority variants. We included an in-run, negative system control to capture contamination reads that may be generated during the liquid handling procedures. The upper limits of 99.99% prediction intervals of the contamination rate were adopted as cut-off values of contamination reads. Here, 148 influenza A/H3N2 samples were sequenced using the HTS protocol and were compared against a Sanger-based sequencing method. Our data showed that the rate of specimen-to-specimen cross-contamination was highly significant in HTS. PMID:27624998

  3. Estimating copy numbers of alleles from population-scale high-throughput sequencing data

    PubMed Central

    2015-01-01

    Background With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. Results We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. Conclusions Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases. PMID:25707811

  4. Next-generation-sequencing of recurrent childhood high hyperdiploid acute lymphoblastic leukemia reveals mutations typically associated with high risk patients.

    PubMed

    Chen, Cai; Bartenhagen, Christoph; Gombert, Michael; Okpanyi, Vera; Binder, Vera; Röttgers, Silja; Bradtke, Jutta; Teigler-Schlegel, Andrea; Harbott, Jochen; Ginzel, Sebastian; Thiele, Ralf; Husemann, Peter; Krell, Pina F I; Borkhardt, Arndt; Dugas, Martin; Hu, Jianda; Fischer, Ute

    2015-09-01

    20% of children suffering from high hyperdiploid acute lymphoblastic leukemia develop recurrent disease. The molecular mechanisms are largely unknown. Here, we analyzed the genetic landscape of five patients at relapse, who developed recurrent disease without prior high-risk indication using whole-exome- and whole-genome-sequencing. Oncogenic mutations of RAS pathway genes (NRAS, KRAS, FLT3, n=4) and deactivating mutations of major epigenetic regulators (CREBBP, EP300, each n=2 and ARID4B, EZH2, MACROD2, MLL2, each n=1) were prominent in these cases and virtually absent in non-recurrent cases (n=6) or other pediatric acute lymphoblastic leukemia cases (n=18). In relapse nucleotide variations were detected in cell fate determining transcription factors (GLIS1, AKNA). Structural genomic alterations affected genes regulating B-cell development (IKZF1, PBX1, RUNX1). Eleven novel translocations involved the genes ART4, C12orf60, MACROD2, TBL1XR1, LRRN4, KIAA1467, and ELMO1/MIR1200. Typically, patients harbored only single structural variations, except for one patient who displayed massive rearrangements in the context of a germline tumor suppressor TP53 mutation and a Li-Fraumeni syndrome-like family history. Another patient harbored a germline mutation in the DNA repair factor ATM. In summary, the relapse patients of our cohort were characterized by somatic mutations affecting the RAS pathway, epigenetic and developmental programs and germline mutations in DNA repair pathways. PMID:26189108

  5. Persistent Sub-Yearly Chromospheric Variations in Lower Main-Sequence Stars: Tau Booe and alpha Com

    NASA Technical Reports Server (NTRS)

    Maulik, Davesh; Donahue, Robert A.; Baliunas, Sallie L.

    1997-01-01

    The recent discoveries of extrasolar planetary systems around lower main-sequence stars such as tau Booe (HD 120136) has prompted further investigation into their stellar activity. A cursory analysis of tau Booe for cyclic chromospheric activity, based on its 30-yr record of Ca 2 H and K fluxes obtained as part of the HK Project from Mount Wilson Observatory, finds an intermediate, sub-yearly period (approximately 117 d) in chromospheric activity in addition to, and separate from, both its rotation (3.3 d) and long-term variability. As a persistent subyearly period in surface magnetic activity is unprecedented, we investigate this apparent anomaly further by examining chromospheric activity levels of other stars with similar mass, searching for variability in chromospheric activity with periods of less than one year, but longer than measured or predicted rotation. An examination of the time series of 40 mid-to-late F dwarfs yielded one other star for further analysis: alpha Com (HD 114378, P approximately 132 d). The variations for these two stars were checked for persistence and coherence. Based on these determinations, we eliminate the possibilities of rotation, long-term activity cycle, and the evolution of active regions as the cause of this variation in both stars. In particular, for tau Booe we infer that the phenomenon may be chromospheric in origin; however, beyond this, it is difficult to identify anything further regarding the cause of the activity variations, or even whether the observed modulation in the two stars have the same origin.

  6. Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...

  7. (Genomic variation in maize)

    SciTech Connect

    Rivin, C.J.

    1991-01-01

    These studies have sought to learn how different DNA sequences and sequence arrangements contribute to genome plasticity in maize. We describe quantitative variation among maize inbred lines for tandemly arrayed and dispersed repeated DNA sequences and gene families, and qualitative variation for sequences homologous to the Mutator family of transposons. The potential of these sequences to undergo unequal crossing over, non-allelic (ectopic) recombination and transposition makes them a source of genome instability. We have found examples of rapid genomic change involving these sequences in Fl hybrids, tissue culture cells and regenerated plants. We describe the repetitive portion of the maize genome as composed primarily of sequences that vary markedly in copy number among different genetic stocks. The most highly variable is the 185 bp repeat associated with the heterochromatic chromosome knobs. Even in lines without visible knobs, there is a considerable quantity of tandemly arrayed repeats. We also found a high degree of variability for the tandemly arrayed 5S and ribosomal DNA repeats. While such variation might be expected as the result of unequal cross-over, we were surprised to find considerable variation among lower copy number, dispersed repeats as well. One highly repeated sequence that showed a complex tandem and dispersed arrangement stood out as showing no detectable variability among the maize lines. In striking contrast to the variability seen between the inbred stocks, individuals within a stock were indistinguishable with regard to their repeated sequence multiplicities.

  8. Genetic variation and population differentiation in a medical herb Houttuynia cordata in China revealed by inter-simple sequence repeats (ISSRs).

    PubMed

    Wei, Lin; Wu, Xian-Jin

    2012-01-01

    Houttuynia cordata is an important traditional Chinese herb with unresolved genetics and taxonomy, which lead to potential problems in the conservation and utilization of the resource. Inter-simple sequence repeat (ISSR) markers were used to assess the level and distribution of genetic diversity in 226 individuals from 15 populations of H. cordata in China. ISSR analysis revealed low genetic variations within populations but high genetic differentiations among populations. This genetic structure probably mainly reflects the historical association among populations. Genetic cluster analysis showed that the basal clade is composed of populations from Southwest China, and the other populations have continuous and eastward distributions. The structure of genetic diversity in H. cordata demonstrated that this species might have survived in Southwest China during the glacial age, and subsequently experienced an eastern postglacial expansion. Based on the results of genetic analysis, it was proposed that as many as possible targeted populations for conservation be included. PMID:22942696

  9. Sequence polymorphism of GroEL gene in natural population of Bacillus and Brevibacillus spp. that showed variation in thermal tolerance capacity and mRNA expression.

    PubMed

    Sen, R; Tripathy, S; Padhi, S K; Mohanty, S; Maiti, N K

    2014-10-01

    GroEL, a class I chaperonin, plays an important role in the thermal adaptation of the cell and helps to maintain the viability of the cell under heat shock condition. Function of groEL in vivo depends on the maintenance of proper structure of the protein which in turn depends on the nucleotide and amino acid sequence of the gene. In this study, we investigated the changes in nucleotide and amino acid sequences of the partial groEL gene that may affect the thermotolerance capacity as well as mRNA expression of bacterial isolates. Sequences among the same species having differences in the amino acid level were identified as different alleles. The effect of allelic variation on the groEL gene expression was analyzed by comparison and relative quantification in each allele under thermal shock condition by RT-PCR. Evaluation of K a/K s ratio among the strains of same species showed that the groEL gene of all the species had undergone similar functional constrain during evolution. The strains showing similar thermotolerance capacity was found to carry same allele of groEL gene. The isolates carrying allele having amino acid substitution inside the highly ATP/ADP or Mg(2+)-binding region could not tolerate thermal stress and showed lower expression of the groEL gene. Our results indicate that during evolution of these bacterial species the groEL gene has undergone the process of natural selection, and the isolates have evolved with the groEL allelic sequences that help them to withstand the thermal stress during their interaction with the environment. PMID:24894903

  10. SPITZER IRS SPECTRAL MAPPING OF THE TOOMRE SEQUENCE: SPATIAL VARIATIONS OF PAH, GAS, AND DUST PROPERTIES IN NEARBY MAJOR MERGERS

    SciTech Connect

    Haan, S.; Armus, L.; Laine, S.; Surace, J. A.; Diaz-Santos, T.; Beirao, P.; Stierwalt, S.; Charmandaris, V.; Smith, J. D.; Schweizer, F.; Murphy, E. J.; Brandl, B.; Evans, A. S.; Hibbard, J. E.; Yun, M.; Jarrett, T. H.

    2011-12-01

    We have mapped the key mid-IR diagnostics in eight major merger systems of the Toomre sequence (NGC 4676, NGC 7592, NGC 6621, NGC 2623, NGC 6240, NGC 520, NGC 3921, and NGC 7252) using the Spitzer Infrared Spectrograph. With these maps, we explore the variation of the ionized-gas, polycyclic aromatic hydrocarbon (PAH), and warm gas (H{sub 2}) properties across the sequence and within the galaxies. While the global PAH interband strength and ionized gas flux ratios ([Ne III]/[Ne II]) are similar to those of normal star-forming galaxies, the distribution of the spatially resolved PAH and fine structure line flux ratios is significantly different from one system to the other. Rather than a constant H{sub 2}/PAH flux ratio, we find that the relation between the H{sub 2} and PAH fluxes is characterized by a power law with a roughly constant exponent (0.61 {+-} 0.05) over all merger components and spatial scales. While following the same power law on local scales, three galaxies have a factor of 10 larger integrated (i.e., global) H{sub 2}/PAH flux ratio than the rest of the sample, even larger than what it is in most nearby active galactic nuclei. These findings suggest a common dominant excitation mechanism for H{sub 2} emission over a large range of global H{sub 2}/PAH flux ratios in major mergers. Early-merger systems show a different distribution between the cold (CO J = 1-0) and warm (H{sub 2}) molecular gas components, which is likely due to the merger interaction. Strong evidence for buried star formation in the overlap region of the merging galaxies is found in two merger systems (NGC 6621 and NGC 7592) as seen in the PAH, [Ne II], [Ne III], and warm gas line emission, but with no apparent corresponding CO (J = 1-0) emission. The minimum of the 11.3/7.7 {mu}m PAH interband strength ratio is typically located in the nuclei of galaxies, while the [Ne III/[Ne II] ratio increases with distance from the nucleus. Our findings also demonstrate that the variations of

  11. Spatial stress variations in the aftershock sequence following the 2008 M6 earthquake doublet in the South Iceland Seismic Zone

    NASA Astrophysics Data System (ADS)

    Hensch, M.; Árnadóttir, Th.; Lund, B.; Brandsdóttir, B.

    2012-04-01

    The South Iceland Seismic Zone (SISZ) is an approximately 80 km wide E-W transform zone, bridging the offset between the Eastern Volcanic Zone and the Hengill triple junction to the west. The plate motion is accommodated in the brittle crust by faulting on many N-S trending right-lateral strike-slip faults of 2-5 km separation. Major sequences of large earthquakes (M>6) has occurred repeatedly in the SISZ since the settlement in Iceland more than thousand years ago. On 29th May 2008, two M6 earthquakes hit the western part of the SISZ on two adjacent N-S faults within a few seconds. The intense aftershock sequence was recorded by the permanent Icelandic SIL network and a promptly installed temporary network of 11 portable seismometers in the source region. The network located thousands of aftershocks during the following days, illuminating a 12-17 km long region along both major fault ruptures as well as several smaller parallel faults along a diffuse E-W trending region west of the mainshock area without any preceding main rupture. This episode is suggested to be the continuation of an earthquake sequence which started with two M6.5 and several M5-6 events in June 2000. The time delay between the 2000 and 2008 events could be due to an inflation episode in Hengill during 1993-1998, that potentially locked N-S strike slip faults in the western part of the SISZ. Around 300 focal solutions for aftershocks have been derived by analyzing P-wave polarities, showing predominantly strike-slip movements with occasional normal faulting components (unstable P-axis direction), which suggests an extensional stress regime as their driving force. A subsequent stress inversion of four different aftershock clusters reveals slight variations of the directions of the average σ3 axes. While for both southern clusters, including the E-W cluster, the σ3 axes are rather elongated perpendicular to the overall plate spreading axis, they are more northerly trending for shallower clusters

  12. Accurate and reliable high-throughput detection of copy number variation in the human genome

    PubMed Central

    Fiegler, Heike; Redon, Richard; Andrews, Dan; Scott, Carol; Andrews, Robert; Carder, Carol; Clark, Richard; Dovey, Oliver; Ellis, Peter; Feuk, Lars; French, Lisa; Hunt, Paul; Kalaitzopoulos, Dimitrios; Larkin, James; Montgomery, Lyndal; Perry, George H.; Plumb, Bob W.; Porter, Keith; Rigby, Rachel E.; Rigler, Diane; Valsesia, Armand; Langford, Cordelia; Humphray, Sean J.; Scherer, Stephen W.; Lee, Charles; Hurles, Matthew E.; Carter, Nigel P.

    2006-01-01

    This study describes a new tool for accurate and reliable high-throughput detection of copy number variation in the human genome. We have constructed a large-insert clone DNA microarray covering the entire human genome in tiling path resolution that we have used to identify copy number variation in human populations. Crucial to this study has been the development of a robust array platform and analytic process for the automated identification of copy number variants (CNVs). The array consists of 26,574 clones covering 93.7% of euchromatic regions. Clones were selected primarily from the published “Golden Path,” and mapping was confirmed by fingerprinting and BAC-end sequencing. Array performance was extensively tested by a series of validation assays. These included determining the hybridization characteristics of each individual clone on the array by chromosome-specific add-in experiments. Estimation of data reproducibility and false-positive/negative rates was carried out using self–self hybridizations, replicate experiments, and independent validations of CNVs. Based on these studies, we developed a variance-based automatic copy number detection analysis process (CNVfinder) and have demonstrated its robustness by comparison with the SW-ARRAY method. PMID:17122085

  13. Bats aloft: Variation in echolocation call structure at high altitudes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Bats alter their echolocation calls in response to changes in ecological and behavioral conditions, but little is known about how they adjust their call structure in response to changes in altitude. This study examines altitudinal variation in the echolocation calls of Brazilian free-tailed bats, T...

  14. Genetic structure of the widespread and common Mediterranean bryophyte Pleurochaete squarrosa (Brid.) Lindb. (Pottiaceae) - evidence from nuclear and plastidic DNA sequence variation and allozymes.

    PubMed

    Grundmann, Michael; Ansell, Stephen W; Russell, Stephen J; Koch, Marcus A; Vogel, Johannes C

    2007-02-01

    The Mediterranean Basin as one the world's most biologically diverse regions provides an interesting area for the study of plant evolution and spatial structure in plant populations. The dioecious moss Pleurochaete squarrosa is a widespread and common bryophyte in the Mediterranean Basin. Thirty populations were sampled for a study on molecular diversity and genetic structure, covering most major islands and mainland populations from Europe and Africa. A significant decline in nuclear and chloroplast sequence and allozyme variation within populations from west to east was observed. While DNA sequence data showed patterns of isolation by distance, allozyme markers did not. Instead, their considerable interpopulation genetic differentiation appeared to be unrelated to geographic distance. Similar high values for coefficients of gene diversity (G(ST)) in all data sets provided evidence of geographic isolation and limited gene flow among populations (i) within islands, (ii) within mainland areas, and (iii) between islands and mainland. Notably, populations in continental Spain are strongly genetically isolated from all other investigated areas. Surprisingly, there was no difference in gene diversity and G(ST) between islands and mainland areas. Thus, we conclude that large Mediterranean islands may function as 'mainland' for bryophytes. This hypothesis and its implication for conservation biology of cryptogamic plants warrant further investigation. While sexually reproducing populations were found all over the Mediterranean Basin, high levels of multilocus linkage disequilibrium provide evidence of mainly vegetative propagation even in populations where sexual reproduction was observed. PMID:17284206

  15. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing. PMID:27377322

  16. Variation in sequences containing microsatellite motifs in the perennial biomass and forage grass, Phalaris arundinacea (Poaceae).

    PubMed

    Barth, Susanne; Jankowska, Marta Jolanta; Hodkinson, Trevor Roland; Vellani, Tia; Klaas, Manfred

    2016-01-01

    Forty three microsatellite markers were developed for further genetic characterisation of a forage and biomass grass crop, for which genomic resources are currently scarce. The microsatellite markers were developed from a normalized EST-SSR library. All of the 43 markers gave a clear banding pattern on 3% Metaphor agarose gels. Eight selected SSR markers were tested in detail for polymorphism across eleven DNA samples of large geographic distribution across Europe. The new set of 43 SSR markers will help future research to characterise the genetic structure and diversity of Phalaris arundinacea, with a potential to further understand its invasive character in North American wetlands, as well as aid in breeding work for desired biomass and forage traits. P. arundinacea is particularly valued in the northern latitude as a crop with high biomass potential, even more so on marginal lands. PMID:27005474

  17. Groupwise registration of cardiac perfusion MRI sequences using normalized mutual information in high dimension

    NASA Astrophysics Data System (ADS)

    Hamrouni, Sameh; Rougon, Nicolas; Pr"teux, Françoise

    2011-03-01

    In perfusion MRI (p-MRI) exams, short-axis (SA) image sequences are captured at multiple slice levels along the long-axis of the heart during the transit of a vascular contrast agent (Gd-DTPA) through the cardiac chambers and muscle. Compensating cardio-thoracic motions is a requirement for enabling computer-aided quantitative assessment of myocardial ischaemia from contrast-enhanced p-MRI sequences. The classical paradigm consists of registering each sequence frame on a reference image using some intensity-based matching criterion. In this paper, we introduce a novel unsupervised method for the spatio-temporal groupwise registration of cardiac p-MRI exams based on normalized mutual information (NMI) between high-dimensional feature distributions. Here, local contrast enhancement curves are used as a dense set of spatio-temporal features, and statistically matched through variational optimization to a target feature distribution derived from a registered reference template. The hard issue of probability density estimation in high-dimensional state spaces is bypassed by using consistent geometric entropy estimators, allowing NMI to be computed directly from feature samples. Specifically, a computationally efficient kth-nearest neighbor (kNN) estimation framework is retained, leading to closed-form expressions for the gradient flow of NMI over finite- and infinite-dimensional motion spaces. This approach is applied to the groupwise alignment of cardiac p-MRI exams using a free-form Deformation (FFD) model for cardio-thoracic motions. Experiments on simulated and natural datasets suggest its accuracy and robustness for registering p-MRI exams comprising more than 30 frames.

  18. High resolution sequence stratigraphic analysis of the Late Miocene Abu Madi Formation, Northern Nile Delta Basin

    NASA Astrophysics Data System (ADS)

    Sarhan, Mohammad Abdelfattah

    2015-12-01

    Abu Madi Formation represents the Upper Miocene Messinian age in the Nile Delta basin. It consists mainly of sandstones and shale intercalations and because of its richness in hydrocarbon, it has been subdivided by the petroleum companies into Level-I, Level-II and Level-III, respectively according to the increase in the sandstone to the shale ratio. The Miocene cycle in the northern subsurface section of the Nile Delta encompasses three main formations namely from the base; Sidi Salim formation, Qawasim Formation and Abu Madi Formation at the top. The high resolution sequence stratigraphic analysis, using gamma ray responses, has been done for the Late Miocene formation in the northern part of the Nile delta subsurface section. For this purpose, the gamma-ray logs of ten deep wells, arranged in four cross-sections trending in almost north-south direction throughout the northern region of the Nile Delta, were analyzed. The analysis has revealed that the interpreted 4th order depositional cycles within Abu Madi Formation display great variations in both number and gamma ray responses in each investigated well, and cannot be traced laterally, even in the nearest well. These variations in the interpreted 4th order depositional sequences could be attributed to the presence of normal faults buried in the inter-area laying between the investigated wells. This finding matches with the conclusion of that Abu Madi Formation represents a part of the Upper Miocene Nile Delta syn-rift megasequence, developed during the Upper Miocene rift phase of the Red Sea - Gulf of Suez province in Egypt. Accordingly, in the sequence stratigraphic approach, the depositional history of Abu Madi Formation was strongly overprinted by the tectonic controls rather than the relative sea-level changes which are assumed to be of a secondary influence. Regarding the hydrocarbon aspects of the Abu Madi Formation, the present work recommends to direct the drilling efforts into the stratigraphic traps

  19. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    PubMed

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J; Sommer, Simone; Godoy, José A

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications. PMID:27294261

  20. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches

    PubMed Central

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J.; Sommer, Simone

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele’s amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications. PMID:27294261

  1. Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species.

    PubMed

    Hess, Jon E; Campbell, Nathan R; Close, David A; Docker, Margaret F; Narum, Shawn R

    2013-06-01

    Unlike most anadromous fishes that have evolved strict homing behaviour, Pacific lamprey (Entosphenus tridentatus) seem to lack philopatry as evidenced by minimal population structure across the species range. Yet unexplained findings of within-region population genetic heterogeneity coupled with the morphological and behavioural diversity described for the species suggest that adaptive genetic variation underlying fitness traits may be responsible. We employed restriction site-associated DNA sequencing to genotype 4439 quality filtered single nucleotide polymorphism (SNP) loci for 518 individuals collected across a broad geographical area including British Columbia, Washington, Oregon and California. A subset of putatively neutral markers (N = 4068) identified a significant amount of variation among three broad populations: northern British Columbia, Columbia River/southern coast and 'dwarf' adults (F(CT) = 0.02, P ≪ 0.001). Additionally, 162 SNPs were identified as adaptive through outlier tests, and inclusion of these markers revealed a signal of adaptive variation related to geography and life history. The majority of the 162 adaptive SNPs were not independent and formed four groups of linked loci. Analyses with matsam software found that 42 of these outlier SNPs were significantly associated with geography, run timing and dwarf life history, and 27 of these 42 SNPs aligned with known genes or highly conserved genomic regions using the genome browser available for sea lamprey. This study provides both neutral and adaptive context for observed genetic divergence among collections and thus reconciles previous findings of population genetic heterogeneity within a species that displays extensive gene flow. PMID:23205767

  2. Structure of the Bacterial Cytoskeleton Protein Bactofilin by NMR Chemical Shifts and Sequence Variation.

    PubMed

    Kassem, Maher M; Wang, Yong; Boomsma, Wouter; Lindorff-Larsen, Kresten

    2016-06-01

    Bactofilins constitute a recently discovered class of bacterial proteins that form cytoskeletal filaments. They share a highly conserved domain (DUF583) of which the structure remains unknown, in part due to the large size and noncrystalline nature of the filaments. Here, we describe the atomic structure of a bactofilin domain from Caulobacter crescentus. To determine the structure, we developed an approach that combines a biophysical model for proteins with recently obtained solid-state NMR spectroscopy data and amino acid contacts predicted from a detailed analysis of the evolutionary history of bactofilins. Our structure reveals a triangular β-helical (solenoid) conformation with conserved residues forming the tightly packed core and polar residues lining the surface. The repetitive structure explains the presence of internal repeats as well as strongly conserved positions, and is reminiscent of other fibrillar proteins. Our work provides a structural basis for future studies of bactofilin biology and for designing molecules that target them, as well as a starting point for determining the organization of the entire bactofilin filament. Finally, our approach presents new avenues for determining structures that are difficult to obtain by traditional means. PMID:27276252

  3. High-Quality Draft Genome Sequence of Bacillus subtilis Strain WAUSV36

    PubMed Central

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M.

    2016-01-01

    Bacillus subtilis strain WAUSV36 inhibits the growth of and decreases disease symptoms caused by the potato pathogen Phytophthora infestans. We determined the sequence of the 4.7-Mbp genome of this strain. WAUSV36 shared very high nucleotide sequence identity with previously sequenced strains of B. subtilis. PMID:27340068

  4. Putting Physics First: Three Case Studies of High School Science Department and Course Sequence Reorganization

    ERIC Educational Resources Information Center

    Larkin, Douglas B.

    2016-01-01

    This article examines the process of shifting to a "Physics First" sequence in science course offerings in three school districts in the United States. This curricular sequence reverses the more common U.S. high school sequence of biology/chemistry/physics, and has gained substantial support in the physics education community over the…

  5. High Precision Measurement of Stellar Radial Velocity Variations

    NASA Technical Reports Server (NTRS)

    Cochran, W. D.

    1984-01-01

    A prototype instrument for measurement of stellar radial velocity variations to a precision of a few meters per second is discussed. The instrument will be used to study low amplitude stellar non-radial oscillations, to search for binary systems with large mass ratios, and ultimately to search for extrasolar planetary systems. The instrument uses a stable Fabry-Perot etalon, in reflection, to impose a set of fixed reference absorption lines on the stellar spectrum before it enters the coude spectrograph of the McDonald Observatory 2.7-m telescope. The spectrum is recorded on the Octicon detector, which consists of eight Reticon arrays placed end to end. Radial velocity variations of the star are detected by measuring the shift of the stellar lines with respect the artificial Fabry-Perot lines, and correcting for the known motions in the solar system.

  6. Phylogeography of East Asian Lespedeza buergeri (Fabaceae) based on chloroplast and nuclear ribosomal DNA sequence variations.

    PubMed

    Jin, Dong-Pil; Lee, Jung-Hyun; Xu, Bo; Choi, Byoung-Hee

    2016-09-01

    The dynamic changes in land configuration during the Quaternary that were accompanied by climatic oscillations have significantly influenced the current distribution and genetic structure of warm-temperate forests in East Asia. Although recent surveys have been conducted, the historical migration of forest species via land bridges and, especially, the origins of Korean populations remains conjectural. Here, we reveal the genetic structure of Lespedeza buergeri, a warm-temperate shrub that is disjunctively distributed around the East China Sea (ECS) at China, Korea, and Japan. Two non-coding regions (rpl32-trnL, psbA-trnH) of chloroplast DNA (cpDNA) and the internal transcribed spacer of nuclear ribosomal DNA (nrITS) were analyzed for 188 individuals from 16 populations, which covered almost all of its distribution. The nrITS data demonstrated a genetic structure that followed geographic boundaries. This examination utilized AMOVA, comparisons of genetic differentiation based on haplotype frequency/genetic mutations among haplotypes, and Mantel tests. However, the cpDNA data showed contrasting genetic pattern, implying that this difference was due to a slower mutation rate in cpDNA than in nrITS. These results indicated frequent migration by this species via an ECS land bridge during the early Pleistocene that then tapered gradually toward the late Pleistocene. A genetic isolation between western and eastern Japan coincided with broad consensus that was suggested by the presence of other warm-temperate plants in that country. For Korean populations, high genetic diversity indicated the existence of refugia during the Last Glacial Maximum on the Korean Peninsula. However, their closeness with western Japanese populations at the level of haplotype clade implied that gene flow from western Japanese refugia was possible until post-glacial processing occurred through the Korea/Tsushima Strait land bridge. PMID:27206725

  7. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea.

    PubMed

    Goldsmith, Dawn B; Parsons, Rachel J; Beyene, Damitu; Salamon, Peter; Breitbart, Mya

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  8. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    PubMed Central

    Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  9. High natural gene expression variation in the reef-building coral Acropora millepora: potential for acclimative and adaptive plasticity

    PubMed Central

    2013-01-01

    Background Ecosystems worldwide are suffering the consequences of anthropogenic impact. The diverse ecosystem of coral reefs, for example, are globally threatened by increases in sea surface temperatures due to global warming. Studies to date have focused on determining genetic diversity, the sequence variability of genes in a species, as a proxy to estimate and predict the potential adaptive response of coral populations to environmental changes linked to climate changes. However, the examination of natural gene expression variation has received less attention. This variation has been implicated as an important factor in evolutionary processes, upon which natural selection can act. Results We acclimatized coral nubbins from six colonies of the reef-building coral Acropora millepora to a common garden in Heron Island (Great Barrier Reef, GBR) for a period of four weeks to remove any site-specific environmental effects on the physiology of the coral nubbins. By using a cDNA microarray platform, we detected a high level of gene expression variation, with 17% (488) of the unigenes differentially expressed across coral nubbins of the six colonies (jsFDR-corrected, p < 0.01). Among the main categories of biological processes found differentially expressed were transport, translation, response to stimulus, oxidation-reduction processes, and apoptosis. We found that the transcriptional profiles did not correspond to the genotype of the colony characterized using either an intron of the carbonic anhydrase gene or microsatellite loci markers. Conclusion Our results provide evidence of the high inter-colony variation in A. millepora at the transcriptomic level grown under a common garden and without a correspondence with genotypic identity. This finding brings to our attention the importance of taking into account natural variation between reef corals when assessing experimental gene expression differences. The high transcriptional variation detected in this study is

  10. Assessing diversity of the female urine microbiota by high throughput sequencing of 16S rDNA amplicons

    PubMed Central

    2011-01-01

    Background Urine within the urinary tract is commonly regarded as "sterile" in cultivation terms. Here, we present a comprehensive in-depth study of bacterial 16S rDNA sequences associated with urine from healthy females by means of culture-independent high-throughput sequencing techniques. Results Sequencing of the V1V2 and V6 regions of the 16S ribosomal RNA gene using the 454 GS FLX system was performed to characterize the possible bacterial composition in 8 culture-negative (<100,000 CFU/ml) healthy female urine specimens. Sequences were compared to 16S rRNA databases and showed significant diversity, with the predominant genera detected being Lactobacillus, Prevotella and Gardnerella. The bacterial profiles in the female urine samples studied were complex; considerable variation between individuals was observed and a common microbial signature was not evident. Notably, a significant amount of sequences belonging to bacteria with a known pathogenic potential was observed. The number of operational taxonomic units (OTUs) for individual samples varied substantially and was in the range of 20 - 500. Conclusions Normal female urine displays a noticeable and variable bacterial 16S rDNA sequence richness, which includes fastidious and anaerobic bacteria previously shown to be associated with female urogenital pathology. PMID:22047020

  11. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling

    PubMed Central

    Baslan, Timour; Kendall, Jude; Ward, Brian; Cox, Hilary; Leotta, Anthony; Rodgers, Linda; Riggs, Michael; D'Italia, Sean; Sun, Guoli; Yong, Mao; Miskimen, Kristy; Gilmore, Hannah; Saborowski, Michael; Dimitrova, Nevenka; Krasnitz, Alexander; Harris, Lyndsay; Wigler, Michael; Hicks, James

    2015-01-01

    Genome-wide analysis at the level of single cells has recently emerged as a powerful tool to dissect genome heterogeneity in cancer, neurobiology, and development. To be truly transformative, single-cell approaches must affordably accommodate large numbers of single cells. This is feasible in the case of copy number variation (CNV), because CNV determination requires only sparse sequence coverage. We have used a combination of bioinformatic and molecular approaches to optimize single-cell DNA amplification and library preparation for highly multiplexed sequencing, yielding a method that can produce genome-wide CNV profiles of up to a hundred individual cells on a single lane of an Illumina HiSeq instrument. We apply the method to human cancer cell lines and biopsied cancer tissue, thereby illustrating its efficiency, reproducibility, and power to reveal underlying genetic heterogeneity and clonal phylogeny. The capacity of the method to facilitate the rapid profiling of hundreds to thousands of single-cell genomes represents a key step in making single-cell profiling an easily accessible tool for studying cell lineage. PMID:25858951

  12. Comparison of Predicted Scaffold-Compatible Sequence Variation in the Triple-Hairpin Structure of Human Immunodeficiency Virus Type 1 gp41 with Patient Data

    PubMed Central

    Boutonnet, Nathalie; Janssens, Wouter; Boutton, Carlo; Verschelde, Jean-Luc; Heyndrickx, Leo; Beirnaert, Els; van der Groen, Guido; Lasters, Ignace

    2002-01-01

    It has been proposed that the ectodomain of human immunodeficiency virus type 1 (HIV-1) gp41 (e-gp41), involved in HIV entry into the target cell, exists in at least two conformations, a pre-hairpin intermediate and a fusion-active hairpin structure. To obtain more information on the structure-sequence relationship in e-gp41, we performed in silico a full single-amino-acid substitution analysis, resulting in a Fold Compatible Database (FCD) for each conformation. The FCD contains for each residue position in a given protein a list of values assessing the energetic compatibility (ECO) of each of the 20 natural amino acids at that position. Our results suggest that FCD predictions are in good agreement with the sequence variation observed for well-validated e-gp41 sequences. The data show that at a minECO threshold value of 5 kcal/mol, about 90% of the observed patient sequence variation is encompassed by the FCD predictions. Some inconsistent FCD predictions at N-helix positions packing against residues of the C helix suggest that packing of both peptides may involve some flexibility and may be attributed to an altered orientation of the C-helical domain versus the N-helical region. The permissiveness of sequence variation in the C helices is in agreement with FCD predictions. Comparison of N-core and triple-hairpin FCDs suggests that the N helices may impose more constraints on sequence variation than the C helices. Although the observed sequences of e-gp41 contain many multiple mutations, our method, which is based on single-point mutations, can predict the natural sequence variability of e-gp41 very well. PMID:12097573

  13. High-speed Viterbi decoding with overlapping code sequences

    NASA Technical Reports Server (NTRS)

    Ross, Michael D.; Osborne, William P.

    1993-01-01

    The Viterbi Algorithm for decoding convolutional codes and Trellis Coded Modulation is suited to VLSI implementation but contains a feedback loop which limits the speed of pipelined architecture. The feedback loop is circumvented by decoding independent sequences simultaneously, resulting in a 5-9 fold speed-up with a two-fold hardware expansion.

  14. Three-Year Sequence for High School Mathematics, Course III.

    ERIC Educational Resources Information Center

    New York State Education Dept., Albany. Bureau of Curriculum Development.

    This publication is designed to aid schools in planning a course of study for students in Course III Mathematics in New York State schools. This guide continues the developmental program begun in Course I. It is intended to serve as the Regents syllabus for the third year of the academic sequence for a New York State Regents diploma. This document…

  15. The Effects of a High-Probability Request Sequencing Technique in Enhancing Transition Behaviors

    ERIC Educational Resources Information Center

    Banda, Devender R.; Kubina, Richard M., Jr.

    2006-01-01

    In this study, an autism support teacher used a high-probability request sequencing technique to help a middle-school student with autism engage in three transition behaviors. High probability request sequencing refers to a procedure in which 2 to 3 preferred questions, highly associated with compliance, are rapidly given before presenting a low…

  16. Rapid detection of sequence variation in Clostridium difficile genes using LATE-PCR with multiple mismatch-tolerant hybridization probes.

    PubMed

    Pierce, Kenneth E; Khan, Huma; Mistry, Rohit; Goldenberg, Simon D; French, Gary L; Wangh, Lawrence J

    2012-11-01

    A novel molecular assay for Clostridium difficile was developed using Linear-After-The-Exponential polymerase chain reaction (LATE-PCR). Single-stranded DNA products generated by LATE-PCR were detected and distinguished by hybridization to fluorescent mismatch-tolerant probes, as the temperature was lowered after amplification in 5(°)C intervals between 65°C and 25°C. Single-tube multiplex reactions for tcdA, tcdB, tcdC, and cdtB (binary toxin) sequences were initially optimized using synthetic targets and were subsequently done using genomic DNA; each target was detected and characterized by hybridization to one or more probes of a different fluorescent color. In the case of tcdC, three probes, each labeled with a Quasar fluorophore, hybridize to different locations with known mutations, including the deletion at nucleotide 117 in ribotype 027 strains and the premature stop codon mutation at nucleotide 184 in ribotype 078 strains, each of which is associated with hypervirulent infections. These and other tcdC mutations were distinguished from the reference sequence, as well as from each other by changes in the fluorescent contour generated from the combined Quasar-labeled probes. Specific variations in tcdA and tcdB were also identified in the multiplex assay, including those that identified strains lacking toxin A production. This single closed-tube assay generates substantially more information about virulent C. difficile than currently available commercial assays and could be further expanded to provide strain typing. PMID:22982259

  17. Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors.

    PubMed

    Burgess, Ralph; Yang, Ziheng

    2008-09-01

    Estimation of population parameters for the common ancestors of humans and the great apes is important in understanding our evolutionary history. In particular, inference of population size for the human-chimpanzee common ancestor may shed light on the process by which the 2 species separated and on whether the human population experienced a severe size reduction in its early evolutionary history. In this study, the Bayesian method of ancestral inference of Rannala and Yang (2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics. 164:1645-1656) was extended to accommodate variable mutation rates among loci and random species-specific sequencing errors. The model was applied to analyze a genome-wide data set of approximately 15,000 neutral loci (7.4 Mb) aligned for human, chimpanzee, gorilla, orangutan, and macaque. We obtained robust and precise estimates for effective population sizes along the hominoid lineage extending back approximately 30 Myr to the cercopithecoid divergence. The results showed that ancestral populations were 5-10 times larger than modern humans along the entire hominoid lineage. The estimates were robust to the priors used and to model assumptions about recombination. The unusually low X chromosome divergence between human and chimpanzee could not be explained by variation in the male mutation bias or by current models of hybridization and introgression. Instead, our parameter estimates were consistent with a simple instantaneous process for human-chimpanzee speciation but showed a major reduction in X chromosome effective population size peculiar to the human-chimpanzee common ancestor, possibly due to selective sweeps on the X prior to separation of the 2 species. PMID:18603620

  18. Intragenomic sequence variation at the ITS1 - ITS2 region and at the 18S and 28S nuclear ribosomal DNA genes of the New Zealand mud snail, Potamopyrgus antipodarum (Hydrobiidae: mollusca)

    USGS Publications Warehouse

    Hoy, Marshal S.; Rodriguez, Rusty J.

    2013-01-01

    Molecular genetic analysis was conducted on two populations of the invasive non-native New Zealand mud snail (Potamopyrgus antipodarum), one from a freshwater ecosystem in Devil's Lake (Oregon, USA) and the other from an ecosystem of higher salinity in the Columbia River estuary (Hammond Harbor, Oregon, USA). To elucidate potential genetic differences between the two populations, three segments of nuclear ribosomal DNA (rDNA), the ITS1-ITS2 regions and the 18S and 28S rDNA genes were cloned and sequenced. Variant sequences within each individual were found in all three rDNA segments. Folding models were utilized for secondary structure analysis and results indicated that there were many sequences which contained structure-altering polymorphisms, which suggests they could be nonfunctional pseudogenes. In addition, analysis of molecular variance (AMOVA) was used for hierarchical analysis of genetic variance to estimate variation within and among populations and within individuals. AMOVA revealed significant variation in the ITS region between the populations and among clones within individuals, while in the 5.8S rDNA significant variation was revealed among individuals within the two populations. High levels of intragenomic variation were found in the ITS regions, which are known to be highly variable in many organisms. More interestingly, intragenomic variation was also found in the 18S and 28S rDNA, which has rarely been observed in animals and is so far unreported in Mollusca. We postulate that in these P. antipodarum populations the effects of concerted evolution are diminished due to the fact that not all of the rDNA genes in their polyploid genome should be essential for sustaining cellular function. This could lead to a lessening of selection pressures, allowing mutations to accumulate in some copies, changing them into variant sequences.                   

  19. In vivo activity of epoxide hydrolase according to sequence variation affects the progression of human IgA nephropathy.

    PubMed

    Lee, Jung Pyo; Yang, Seung Hee; Kim, Dong Ki; Lee, Hajeong; Kim, Bora; Cho, Joo-Youn; Yu, Kyung-Sang; Paik, Jin Ho; Kim, Myounghee; Lim, Chun Soo; Kim, Yon Su

    2011-06-01

    Epoxyeicosatrienoic acid (EET) regulates the functional integrity of the endothelium. It is hypothesized that the activity of epoxide hydrolase (EPHX2), which determines EET concentration through hydrolysis, may affect the progression of glomerulonephritis. Here, we evaluated the relationship between genetic variations, the in vivo activity of EPHX2, and progression of IgA nephropathy (IgAN). Three single-nucleotide polymorphisms (SNPs) [rs41507953 (K55R), rs751141 (R287Q), and rs1042032] were traced in 401 IgAN patients and 402 normal healthy controls. The in vivo activity of EPHX2 was assessed by measuring substrates/metabolites of the enzyme. None of the polymorphism frequencies differed significantly between patients and controls. However, patients carrying the variant allele (A) of rs751141 possessed better kidney survival than those with the wild-type allele (G; P < 0.001). This association remained significant after adjustment for several risk factors (hazard ratio 1.83, 95% confidence interval 1.13-2.96, P = 0.014). Vascular damage was more prominent in kidney biopsies from patients carrying the G allele of rs751141. The in vivo activity of EPHX2, assessed by the epoxyoctadecenoic acid/dihydroxyoctadecenoic acid ratio using liquid chromatography/mass spectrometry analysis, was elevated in patients with the G allele. The expression of EPHX2 in the human kidney was independent of the sequence variation of the rs751141 allele. Variant rs41507953 was not present in this cohort, and rs1042032 was not associated with progression. Thus the specific measures which regulate EPHX2 activity should be designed for potential therapeutics. PMID:21429967

  20. Maternal phylogenetic relationships and genetic variation among Arabian horse populations using whole mitochondrial DNA D-loop sequencing

    PubMed Central

    2013-01-01

    Background Maternal inheritance is an essential point in Arabian horse population genetics and strains classification. The mitochondrial DNA (mtDNA) sequencing is a highly informative tool to investigate maternal lineages. We sequenced the whole mtDNA D-loop of 251 Arabian horses to study the genetic diversity and phylogenetic relationships of Arabian populations and to examine the traditional strain classification system that depends on maternal family lines using native Arabian horses from the Middle East. Results The variability in the upstream region of the D-loop revealed additional differences among the haplotypes that had identical sequences in the hypervariable region 1 (HVR1). While the American-Arabians showed relatively low diversity, the Syrian population was the most variable and contained a very rare and old haplogroup. The Middle Eastern horses had major genetic contributions to the Western horses and there was no clear pattern of differentiation among all tested populations. Our results also showed that several individuals from different strains shared a single haplotype, and individuals from a single strain were represented in clearly separated haplogroups. Conclusions The whole mtDNA D-loop sequence was more powerful for analysis of the maternal genetic diversity in the Arabian horses than using just the HVR1. Native populations from the Middle East, such as Syrians, could be suggested as a hot spot of genetic diversity and may help in understanding the evolution history of the Arabian horse breed. Most importantly, there was no evidence that the Arabian horse breed has clear subdivisions depending on the traditional maternal based strain classification system. PMID:24034565

  1. Hfqs in Bacillus anthracis: Role of protein sequence variation in the structure and function of proteins in the Hfq family.

    PubMed

    Vrentas, Catherine; Ghirlando, Rodolfo; Keefer, Andrea; Hu, Zonglin; Tomczak, Aurelie; Gittis, Apostolos G; Murthi, Athulaprabha; Garboczi, David N; Gottesman, Susan; Leppla, Stephen H

    2015-11-01

    Hfq proteins in Gram-negative bacteria play important roles in bacterial physiology and virulence, mediated by binding of the Hfq hexamer to small RNAs and/or mRNAs to post-transcriptionally regulate gene expression. However, the physiological role of Hfqs in Gram-positive bacteria is less clear. Bacillus anthracis, the causative agent of anthrax, uniquely expresses three distinct Hfq proteins, two from the chromosome (Hfq1, Hfq2) and one from its pXO1 virulence plasmid (Hfq3). The protein sequences of Hfq1 and 3 are evolutionarily distinct from those of Hfq2 and of Hfqs found in other Bacilli. Here, the quaternary structure of each B. anthracis Hfq protein, as produced heterologously in Escherichia coli, was characterized. While Hfq2 adopts the expected hexamer structure, Hfq1 does not form similarly stable hexamers in vitro. The impact on the monomer-hexamer equilibrium of varying Hfq C-terminal tail length and other sequence differences among the Hfqs was examined, and a sequence region of the Hfq proteins that was involved in hexamer formation was identified. It was found that, in addition to the distinct higher-order structures of the Hfq homologs, they give rise to different phenotypes. Hfq1 has a disruptive effect on the function of E. coli Hfq in vivo, while Hfq3 expression at high levels is toxic to E. coli but also partially complements Hfq function in E. coli. These results set the stage for future studies of the roles of these proteins in B. anthracis physiology and for the identification of sequence determinants of phenotypic complementation. PMID:26271475

  2. Genome-Wide Sequence Variation among Mycobacterium avium Subspecies paratuberculosis Isolates: A Better Understanding of Johne's Disease Transmission Dynamics.

    PubMed

    Hsu, Chung-Yi; Wu, Chia-Wei; Talaat, Adel M

    2011-01-01

    Mycobacterium avium subspecies paratuberculosis (M. ap), the causative agent of Johne's disease, infects many farmed ruminants, wild-life animals, and recently isolated from humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole-genome sequences of several M. ap and M. avium subspecies avium (M. avium) isolates to gain insights into genomic diversity associated with variable hosts and environments. Using Next-generation sequencing technology, all six M. ap isolates showed a high percentage of similarity (98%) to the reference genome sequence of M. ap K-10 isolated from cattle. However, two M. avium isolates (DT 78 and Env 77) showed significant sequence diversity (only 87 and 40% similarity, respectively) compared to the reference strain M. avium 104, a reflection of the wide environmental niches of this group of mycobacteria. Within the M. ap isolates, genomic rearrangements (insertions/deletions) were not detected, and only unique single nucleotide polymorphisms (SNPs) were observed among M. ap isolates. While more of the SNPs (~100) in M. ap genomes were non-synonymous, a total of ~6,000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomics had a enough discriminatory power to differentiate between isolates from different hosts but yet suggesting a bovine source of infection to other animals examined in this study. Interestingly, the human isolate (M. ap 4B) was closely related to a M. ap isolate from a dairy facility, suggesting a common source of infection. Overall, the identified phylo-genomes further supported the idea of a common ancestor to both M. ap and M. avium isolates. Genome-wide analysis described here could provide a strong foundation for a population genetic structure that could be useful for the analysis of mycobacterial evolution and for the tracking of Johne

  3. Variation of b and p values from aftershocks sequences along the Mexican subduction zone and their relation to plate characteristics

    NASA Astrophysics Data System (ADS)

    Ávila-Barrientos, L.; Zúñiga, F. R.; Rodríguez-Pérez, Q.; Guzmán-Speziale, M.

    2015-11-01

    Aftershock sequences along the Mexican subduction margin (between coordinates 110ºW and 91ºW) were analyzed by means of the p value from the Omori-Utsu relation and the b value from the Gutenberg-Richter relation. We focused on recent medium to large (Mw > 5.6) events considered susceptible of generating aftershock sequences suitable for analysis. The main goal was to try to find a possible correlation between aftershock parameters and plate characteristics, such as displacement rate, age and segmentation. The subduction regime of Mexico is one of the most active regions of the world with a high frequency of occurrence of medium to large events and plate characteristics change along the subduction margin. Previous studies have observed differences in seismic source characteristics at the subduction regime, which may indicate a difference in rheology and possible segmentation. The results of the analysis of the aftershock sequences indicate a slight tendency for p values to decrease from west to east with increasing of plate age although a statistical significance is undermined by the small number of aftershocks in the sequences, a particular feature distinctive of the region as compared to other world subduction regimes. The b values show an opposite, increasing trend towards the east even though the statistical significance is not enough to warrant the validation of such a trend. A linear regression between both parameters provides additional support for the inverse relation. Moreover, we calculated the seismic coupling coefficient, showing a direct relation with the p and b values. While we cannot undoubtedly confirm the hypothesis that aftershock generation depends on certain tectonic characteristics (age, thickness, temperature), our results do not reject it thus encouraging further study into this question.

  4. Gaussian process test for high-throughput sequencing time series: application to experimental evolution

    PubMed Central

    Topa, Hande; Jónás, Ágnes; Kofler, Robert; Kosiol, Carolin; Honkela, Antti

    2015-01-01

    Motivation: Recent advances in high-throughput sequencing (HTS) have made it possible to monitor genomes in great detail. New experiments not only use HTS to measure genomic features at one time point but also monitor them changing over time with the aim of identifying significant changes in their abundance. In population genetics, for example, allele frequencies are monitored over time to detect significant frequency changes that indicate selection pressures. Previous attempts at analyzing data from HTS experiments have been limited as they could not simultaneously include data at intermediate time points, replicate experiments and sources of uncertainty specific to HTS such as sequencing depth. Results: We present the beta-binomial Gaussian process model for ranking features with significant non-random variation in abundance over time. The features are assumed to represent proportions, such as proportion of an alternative allele in a population. We use the beta-binomial model to capture the uncertainty arising from finite sequencing depth and combine it with a Gaussian process model over the time series. In simulations that mimic the features of experimental evolution data, the proposed method clearly outperforms classical testing in average precision of finding selected alleles. We also present simulations exploring different experimental design choices and results on real data from Drosophila experimental evolution experiment in temperature adaptation. Availability and implementation: R software implementing the test is available at https://github.com/handetopa/BBGP. Contact: hande.topa@aalto.fi, agnes.jonas@vetmeduni.ac.at, carolin.kosiol@vetmeduni.ac.at, antti.honkela@hiit.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25614471

  5. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    PubMed

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations. PMID:26745372

  6. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing

    PubMed Central

    Shafer, Aaron B. A.; Northrup, Joseph M.; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B. W.

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations. PMID:26745372

  7. Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

    PubMed Central

    2012-01-01

    Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993

  8. CYP2D7 Sequence Variation Interferes with TaqMan CYP2D6*15 and *35 Genotyping

    PubMed Central

    Riffel, Amanda K.; Dehghani, Mehdi; Hartshorne, Toinette; Floyd, Kristen C.; Leeder, J. Steven; Rosenblatt, Kevin P.; Gaedigk, Andrea

    2016-01-01

    TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs) some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false-positive CYP2D6*15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6*15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL)-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6*35) which is also located in exon 1. Although alternative CYP2D6*15 and *35 assays resolved the issue, we discovered a novel CYP2D6*15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6*15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696) SNP of CYP2D6*43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer and/or probe regions can impact

  9. CYP2D7 Sequence Variation Interferes with TaqMan CYP2D6 (*) 15 and (*) 35 Genotyping.

    PubMed

    Riffel, Amanda K; Dehghani, Mehdi; Hartshorne, Toinette; Floyd, Kristen C; Leeder, J Steven; Rosenblatt, Kevin P; Gaedigk, Andrea

    2015-01-01

    TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs) some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false-positive CYP2D6 (*) 15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6 (*) 15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL)-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6 (*) 35) which is also located in exon 1. Although alternative CYP2D6 (*) 15 and (*) 35 assays resolved the issue, we discovered a novel CYP2D6 (*) 15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6 (*) 15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696) SNP of CYP2D6 (*) 43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer

  10. Construction and Analysis of High-Density Linkage Map Using High-Throughput Sequencing Data

    PubMed Central

    Liu, Min; Liu, Hui; Zeng, Huaping; Deng, Dejing; Xin, Huaigen; Song, Jun; Xu, Chunhua; Sun, Xiaowen; Hou, Xilin; Wang, Xiaowu; Zheng, Hongkun

    2014-01-01

    Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/. PMID:24905985

  11. A strategy to recover a high-quality, complete plastid sequence from low-coverage whole-genome sequencing1

    PubMed Central

    Garaycochea, Silvia; Speranza, Pablo; Alvarez-Valin, Fernando

    2015-01-01

    Premise of the study: We developed a bioinformatic strategy to recover and assemble a chloroplast genome using data derived from low-coverage 454 GS FLX/Roche whole-genome sequencing. Methods: A comparative genomics approach was applied to obtain the complete chloroplast genome from a weedy biotype of rice from Uruguay. We also applied appropriate filters to discriminate reads representing novel DNA transfer events between the chloroplast and nuclear genomes. Results: From a set of 295,159 reads (96 Mb data), we assembled the chloroplast genome into two contigs. This weedy rice was classified based on 23 polymorphic regions identified by comparison with reference chloroplast genomes. We detected recent and past events of genetic material transfer between the chloroplast and nuclear genomes and estimated their occurrence frequency. Discussion: We obtained a high-quality complete chloroplast genome sequence from low-coverage sequencing data. Intergenome DNA transfer appears to be more frequent than previously thought. PMID:26504677

  12. A high-density remote reference magnetic variation profile in the Pacific northwest of North America

    USGS Publications Warehouse

    Hermance, J.F.; Lusi, S.; Slocum, W.; Neumann, G.A.; Green, A.W., Jr.

    1989-01-01

    During the summer of 1985, as part of the EMSLAB Project, Brown University conducted a detailed magnetic variation study of the Oregon Coast Range and Cascades volcanic system along an E-W profile in central Oregon. Comprised of a sequence of 75 remote reference magnetic variation (MV) stations spaced 3-4 km apart, the profile stretched for 225 km from Newport, on the Oregon coast, across the Coast Range, the Willamette Valley, and the High Cascades to a point ??? 50 km east of Santiam Pass. At all of the MV stations, data were collected for short periods (16-100 s), and at 17 of these stations data were also obtained at longer periods (100-1600 s). Data were monitored with a three-component ring core fluxgate magnetometer (Nanotesla), and were recorded with a microcomputer (DEC PDP 11/73) based data acquisition system. A 2-D generalized inversion of the magnetic transfer coefficients over the period range of 16-1600 s indicates four distinct conductors. First, we see the coast effect caused by a large sedimentary wedge offshore. Second, we see the effect of currents flowing in the conductive sediments of the Willamette Valley. Our inversion suggests that the Willamette Valley consists of two electrically distinct features, due perhaps to a horst-like structure imprinted on the valley sediments. Next we note an electric current system centered beneath the High Cascades. This latter feature may be associated with a sediment-filled graben beneath Santiam Pass as suggested by some of the gravity and MT results reported to date. Finally, we detect the presence of a deep conductor at mid-crustal depths which laterally extends westward from beneath the Basin and Range Province, and terminates beneath the western Cascades. One view of this last result is that it appears that modern Basin and Range structure is being imprinted on pre-existing Cascade structure. ?? 1989.

  13. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers

    PubMed Central

    Bessaud, Maël; Sadeuh-Mba, Serge A.; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses. PMID:27617004

  14. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers.

    PubMed

    Bessaud, Maël; Sadeuh-Mba, Serge A; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses. PMID:27617004

  15. High-throughput sequencing of complete human mtDNA genomes from the Philippines

    PubMed Central

    Gunnarsdóttir, Ellen D.; Li, Mingkun; Bauchet, Marc; Finstermeier, Knut; Stoneking, Mark

    2011-01-01

    Because of the time and cost associated with Sanger sequencing of complete human mtDNA genomes, practically all evolutionary studies have screened samples first to define haplogroups and then either selected a few samples from each haplogroup, or many samples from a particular haplogroup of interest, for complete mtDNA genome sequencing. Such biased sampling precludes many analyses of interest. Here, we used high-throughput sequencing platforms to generate, rapidly and inexpensively, 109 complete mtDNA genome sequences from random samples of individuals from three Filipino groups, including one Negrito group, the Mamanwa. We obtained on average ∼55-fold coverage per sequence, with <1% missing data per sequence. Various analyses attest to the accuracy of the sequences, including comparison to sequences of the first hypervariable segment of the control region generated by Sanger sequencing; patterns of nucleotide substitution and the distribution of polymorphic sites across the genome; and the observed haplogroups. Bayesian skyline plots of population size change through time indicate similar patterns for all three Filipino groups, but sharply contrast with such plots previously constructed from biased sampling of complete mtDNA genomes, as well as with an artificially constructed sample of sequences that mimics the biased sampling. Our results clearly demonstrate that the high-throughput sequencing platforms are the methodology of choice for generating complete mtDNA genome sequences. PMID:21147912

  16. Tracking TCRβ Sequence Clonotype Expansions during Antiviral Therapy Using High-Throughput Sequencing of the Hypervariable Region

    PubMed Central

    Robinson, Mark W.; Hughes, Joseph; Wilkie, Gavin S.; Swann, Rachael; Barclay, Stephen T.; Mills, Peter R.; Patel, Arvind H.; Thomson, Emma C.; McLauchlan, John

    2016-01-01

    To maintain a persistent infection viruses such as hepatitis C virus (HCV) employ a range of mechanisms that subvert protective T cell responses. The suppression of antigen-specific T cell responses by HCV hinders efforts to profile T cell responses during chronic infection and antiviral therapy. Conventional methods of detecting antigen-specific T cells utilize either antigen stimulation (e.g., ELISpot, proliferation assays, cytokine production) or antigen-loaded tetramer staining. This limits the ability to profile T cell responses during chronic infection due to suppressed effector function and the requirement for prior knowledge of antigenic viral peptide sequences. Recently, high-throughput sequencing (HTS) technologies have been developed for the analysis of T cell repertoires. In the present study, we have assessed the feasibility of HTS of the TCRβ complementarity determining region (CDR)3 to track T cell expansions in an antigen-independent manner. Using sequential blood samples from HCV-infected individuals undergoing antiviral therapy, we were able to measure the population frequencies of >35,000 TCRβ sequence clonotypes in each individual over the course of 12 weeks. TRBV/TRBJ gene segment usage varied markedly between individuals but remained relatively constant within individuals across the course of therapy. Despite this stable TRBV/TRBJ gene segment usage, a number of TCRβ sequence clonotypes showed dramatic changes in read frequency. These changes could not be linked to therapy outcomes in the present study; however, the TCRβ CDR3 sequences with the largest fold changes did include sequences with identical TRBV/TRBJ gene segment usage and high junction region homology to previously published CDR3 sequences from HCV-specific T cells targeting the HLA-B*0801-restricted 1395HSKKKCDEL1403 and HLA-A*0101-restricted 1435ATDALMTGY1443 epitopes. The pipeline developed in this proof of concept study provides a platform for the design of future

  17. Mitochondrial DNA sequence variation in Drosophilid species (Diptera: Drosophilidae) along altitudinal gradient from Central Himalayan region of India.

    PubMed

    Sarswat, Manisha; Dewan, Saurabh; Fartyal, Rajendra Singh

    2016-06-01

    Central Himalayan region of India encompasses varied ecological habitats ranging from near tropics to the mid-elevation forests dominated by cool-temperate taxa. In past, we have reported several new records and novel species from Uttarakhand state of India. Here, we assessed genetic variations in three mitochondrial genes, namely, 16S rRNA, cytochrome c oxidase subunit I and cytochrome c oxidase subunit II (COI and COII) in 26 drosophilid species collected along altitudinal transect from 550 to 2700 m above mean sea level. In the present study, overall 543 sequences were generated, 82 for 16S rRNA, 238 for COI, 223 for COII with 21, 47 and 45 mitochondrial haplotypes for 16S rRNA, COI and COII genes, respectively. Almost all species were represented by 2-3 unique mitochondrial haplotypes, depicting a significant impact of environmental heterogeneity along altitudinal gradient on genetic diversity. Also for the first time, molecular data of some rare species like Drosophila mukteshwarensis, Liodrosophila nitida, Lordiphosa parantillaria, Lordiphosa ayarpathaensis, Scaptomyza himalayana, Scaptomyza tistai, Zaprionus grandis and Stegana minuta are provided to public domains through this study. PMID:27350680

  18. RNA Sequencing Analysis Reveals Transcriptomic Variations in Tobacco (Nicotiana tabacum) Leaves Affected by Climate, Soil, and Tillage Factors

    PubMed Central

    Lei, Bo; Lu, Kun; Ding, Fuzhang; Zhang, Kai; Chen, Yi; Zhao, Huina; Zhang, Lin; Ren, Zhu; Qu, Cunmin; Guo, Wenjing; Wang, Jing; Pan, Wenjie

    2014-01-01

    The growth and development of plants are sensitive to their surroundings. Although numerous studies have analyzed plant transcriptomic variation, few have quantified the effect of combinations of factors or identified factor-specific effects. In this study, we performed RNA sequencing (RNA-seq) analysis on tobacco leaves derived from 10 treatment combinations of three groups of ecological factors, i.e., climate factors (CFs), soil factors (SFs), and tillage factors (TFs). We detected 4980, 2916, and 1605 differentially expressed genes (DEGs) that were affected by CFs, SFs, and TFs, which included 2703, 768, and 507 specific and 703 common DEGs (simultaneously regulated by CFs, SFs, and TFs), respectively. GO and KEGG enrichment analyses showed that genes involved in abiotic stress responses and secondary metabolic pathways were overrepresented in the common and CF-specific DEGs. In addition, we noted enrichment in CF-specific DEGs related to the circadian rhythm, SF-specific DEGs involved in mineral nutrient absorption and transport, and SF- and TF-specific DEGs associated with photosynthesis. Based on these results, we propose a model that explains how plants adapt to various ecological factors at the transcriptomic level. Additionally, the identified DEGs lay the foundation for future investigations of stress resistance, circadian rhythm and photosynthesis in tobacco. PMID:24733065

  19. Sequence Variation in Amplification Target Genes and Standards Influences Interlaboratory Comparison of BK Virus DNA Load Measurement

    PubMed Central

    Solis, Morgane; Meddeb, Mariam; Sueur, Charlotte; Domingo-Calap, Pilar; Soulier, Eric; Chabaud, Angeline; Perrin, Peggy; Moulin, Bruno; Bahram, Seiamak; Stoll-Keller, Françoise; Caillard, Sophie; Barth, Heidi

    2015-01-01

    International guidelines define a BK virus (BKV) load of ≥4 log10 copies/ml as presumptive of BKV-associated nephropathy (BKVN) and a cutoff for therapeutic intervention. To investigate whether BKV DNA loads (BKVL) are comparable between laboratories, 2 panels of 15 and 8 clinical specimens (urine, whole blood, and plasma) harboring different BKV genotypes were distributed to 20 and 27 French hospital centers in 2013 and 2014, respectively. Although 68% of the reported results fell within the acceptable range of the expected result ±0.5 log10, the interlaboratory variation ranged from 1.32 to 5.55 log10. Polymorphisms specific to BKV genotypes II and IV, namely, the number and position of mutations in amplification target genes and/or deletion in standards, arose as major sources of interlaboratory disagreements. The diversity of DNA purification methods also contributed to the interlaboratory variability, in particular for urine samples. Our data strongly suggest that (i) commercial external quality controls for BKVL assessment should include all major BKV genotypes to allow a correct evaluation of BKV assays, and (ii) the BKV sequence of commercial standards should be provided to users to verify the absence of mismatches with the primers and probes of their BKV assays. Finally, the optimization of primer and probe design and standardization of DNA extraction methods may substantially decrease interlaboratory variability and allow interinstitutional studies to define a universal cutoff for presumptive BKVN and, ultimately, ensure adequate patient care. PMID:26468499

  20. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications.

    PubMed

    Gu, W; Crawford, E D; O'Donovan, B D; Wilson, M R; Chow, E D; Retallack, H; DeRisi, J L

    2016-01-01

    Next-generation sequencing has generated a need for a broadly applicable method to remove unwanted high-abundance species prior to sequencing. We introduce DASH (Depletion of Abundant Sequences by Hybridization). Sequencing libraries are 'DASHed' with recombinant Cas9 protein complexed with a library of guide RNAs targeting unwanted species for cleavage, thus preventing them from consuming sequencing space. We demonstrate a more than 99 % reduction of mitochondrial rRNA in HeLa cells, and enrichment of pathogen sequences in patient samples. We also demonstrate an application of DASH in cancer. This simple method can be adapted for any sample type and increases sequencing yield without additional cost. PMID:26944702

  1. Major Breeding Plumage Color Differences of Male Ruffs (Philomachus pugnax) Are Not Associated With Coding Sequence Variation in the MC1R Gene

    PubMed Central

    Küpper, Clemens; Burke, Terry; Lank, David B.

    2015-01-01

    Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species. PMID:25534935

  2. Wavefront Amplitude Variation of TPF's High Contrast Imaging Testbed: Modeling and Experiment

    NASA Technical Reports Server (NTRS)

    Shi, Fang; Lowman, Andrew E.; Moody, Dwight C.; Niessner, Albert F.; Trauger, John T.

    2005-01-01

    Knowledge of wavefront amplitude is as important as the knowledge of phase for a coronagraphic high contrast imaging system. Efforts have been made to understand various contributions of the amplitude variation in Terrestrial Planet Finder's (TPF) High Contrast Imaging Testbed (HCIT). Modeling of HCIT with as-built mirror surfaces has shown an amplitude variation of 1.3% due to the phase-amplitude mixing for the testbed's front-end optics. Experimental measurements on the testbed have shown the amplitude variation is about 2.5% with the testbed's illumination pattern has a major contribution as the low order amplitude variation.

  3. Twin Mitochondrial Sequence Analysis.

    PubMed

    Bouhlal, Yosr; Martinez, Selena; Gong, Henry; Dumas, Kevin; Shieh, Joseph T C

    2013-09-01

    When applying genome-wide sequencing technologies to disease investigation, it is increasingly important to resolve sequence variation in regions of the genome that may have homologous sequences. The human mitochondrial genome challenges interpretation given the potential for heteroplasmy, somatic variation, and homologous nuclear mitochondrial sequences (numts). Identical twins share the same mitochondrial DNA (mtDNA) from early life, but whether the mitochondrial sequence remains similar is unclear. We compared an adult monozygotic twin pair using high throughput-sequencing and evaluated variants with primer extension and mitochondrial pre-enrichment. Thirty-seven variants were shared between the twin individuals, and the variants were verified on the original genomic DNA. These studies support highly identical genetic sequence in this case. Certain low-level variant calls were of high quality and homology to the mitochondrial DNA, and they were further evaluated. When we assessed calls in pre-enriched mitochondrial DNA templates, we found that these may represent numts, which can be differentiated from mtDNA variation. We conclude that twin identity extends to mitochondrial DNA, and it is critical to differentiate between numts and mtDNA in genome sequencing, particularly since significant heteroplasmy could influence genome interpretation. Further studies on mtDNA and numts will aid in understanding how variation occurs and persists. PMID:24040623

  4. Very high resolution single pass HLA genotyping using amplicon sequencing on the 454 next generation DNA sequencers: Comparison with Sanger sequencing.

    PubMed

    Yamamoto, F; Höglund, B; Fernandez-Vina, M; Tyan, D; Rastrou, M; Williams, T; Moonsamy, P; Goodridge, D; Anderson, M; Erlich, H A; Holcomb, C L

    2015-12-01

    Compared to Sanger sequencing, next-generation sequencing offers advantages for high resolution HLA genotyping including increased throughput, lower cost, and reduced genotype ambiguity. Here we describe an enhancement of the Roche 454 GS GType HLA genotyping assay to provide very high resolution (VHR) typing, by the addition of 8 primer pairs to the original 14, to genotype 11 HLA loci. These additional amplicons help resolve common and well-documented alleles and exclude commonly found null alleles in genotype ambiguity strings. Simplification of workflow to reduce the initial preparation effort using early pooling of amplicons or the Fluidigm Access Array™ is also described. Performance of the VHR assay was evaluated on 28 well characterized cell lines using Conexio Assign MPS software which uses genomic, rather than cDNA, reference sequence. Concordance was 98.4%; 1.6% had no genotype assignment. Of concordant calls, 53% were unambiguous. To further assess the assay, 59 clinical samples were genotyped and results compared to unambiguous allele assignments obtained by prior sequence-based typing supplemented with SSO and/or SSP. Concordance was 98.7% with 58.2% as unambiguous calls; 1.3% could not be assigned. Our results show that the amplicon-based VHR assay is robust and can replace current Sanger methodology. Together with software enhancements, it has the potential to provide even higher resolution HLA typing. PMID:26037172

  5. HetMappsS: Heterozygous mapping strategy for high resolution Genotyping-by-Sequencing Markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Reduced representation genotyping approaches, such as genotyping-by-sequencing (GBS), provide opportunities to generate high-resolution genetic maps at a low per-sample cost. However, missing data and non-uniform sequence coverage can complicate map creation in highly heterozygous species. To facili...

  6. Detection of genetic variation affecting milk coagulation properties in Danish Holstein dairy cattle by analyses of pooled whole-genome sequences from phenotypically extreme samples (pool-seq).

    PubMed

    Bertelsen, H P; Gregersen, V R; Poulsen, N; Nielsen, R O; Das, A; Madsen, L B; Buitenhuis, A J; Holm, L-E; Panitz, F; Larsen, L B; Bendixen, C

    2016-04-01

    Rennet-induced milk coagulation is an important trait for cheese production. Recent studies have reported an alarming frequency of cows producing poorly coagulating milk unsuitable for cheese production. Several genetic factors are known to affect milk coagulation, including variation in the major milk proteins; however, recent association studies indicate genetic effects from other genomic regions as well. The aim of this study was to detect genetic variation affecting milk coagulation properties, measured as curd-firming rate (CFR) and milk pH. This was achieved by examining allele frequency differences between pooled whole-genome sequences of phenotypically extreme samples (pool-seq).. Curd-firming rate and raw milk pH were measured for 415 Danish Holstein cows, and each animal was sequenced at low coverage. Pools were created containing whole genome sequence reads from samples with "extreme" values (high or low) for both phenotypic traits. A total of 6,992,186 and 5,295,501 SNP were assessed in relation to CFR and milk pH, respectively. Allele frequency differences were calculated between pools and 32 significantly different SNP were detected, 1 for milk pH and 31 for CFR, of which 19 are located on chromosome 6. A total of 9 significant SNP, which were selected based on the possible function of proximal candidate genes, were genotyped in the entire sample set ( = 415) to test for an association. The most significant SNP was located proximal to , explaining 33% of the phenotypic variance. , coding for κ-casein, is the most studied in relation to milk coagulation due to its position on the surface of the casein micelles and the direct involvement in milk coagulation. Three additional SNP located on chromosome 6 showed significant associations explaining 7, 3.6, and 1.3% of the phenotypic variance of CFR. The significant SNP on chromosome 6 were shown to be in linkage disequilibrium with the SNP peaking proximal to ; however, after accounting for the genotype of

  7. Gentic variation for high temperature tolerance in maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As global warming becomes inevitable, the sustainability of agricultural production in US and worldwide faces serious threat from extreme weather conditions, such as drought and high temperature (heat) stresses. While drought stress can be alleviated through irrigation, little can be done with high ...

  8. Crystallographic orientation variation of isothermal pearlite under high magnetic field

    SciTech Connect

    Meng, Lan Zhou, Xiaoling Chen, Jianhao

    2015-07-15

    Crystallographic orientation (CO) variation of magnetic-induced pearlite (MIP) during its microstructure evolution in 19.8 T was investigated by electron back-scatter diffraction (EBSD). It is closely related to the isothermal temperatures (ITs) and the applied magnetic time (MT) during the process of MIP formation. The <100> easy magnetization direction in MIP colonies is strengthened with the MT within the certain transformed fraction of MIP (f{sub MIP}) at the relatively lower IT (983 K) above the eutectoid temperature but below the magnetically shifted upward eutectoid temperature, while this special CO tends to be weakened at a relatively higher IT (995 K). For the same MT, the higher the IT, the relatively larger is the proportion in <100> orientation for MIP colonies at the early growth stage. These results have demonstrated that the change of <100> orientation of MIP is closely related to the growth rate of pearlite ferrite (PF), and strengthened mainly at early transformation stage. When f{sub MIP} reaches some value, the growth rate of MIP at other COs, such as <110>, even at the hard magnetization direction, turns to present speed-up. - Highlights: • HMF can induce pearlite with different fractions above the eutectoid temperature. • CO is closely related to isothermal temperatures and applied magnetic time. • <100> direction is related to the growth rate of PF, and strengthened at early stage. • When f{sub MIP} reaches some value, the growth rate at other COs turns to present speed-up.

  9. Human Skin Microbiota: High Diversity of DNA Viruses Identified on the Human Skin by High Throughput Sequencing

    PubMed Central

    Foulongne, Vincent; Sauvage, Virginie; Hebert, Charles; Dereure, Olivier; Cheval, Justine; Gouilh, Meriadeg Ar; Pariente, Kevin; Segondy, Michel; Burguière, Ana; Manuguerra, Jean-Claude; Caro, Valérie; Eloit, Marc

    2012-01-01

    The human skin is a complex ecosystem that hosts a heterogeneous flora. Until recently, the diversity of the cutaneous microbiota was mainly investigated for bacteria through culture based assays subsequently confirmed by molecular techniques. There are now many evidences that viruses represent a significant part of the cutaneous flora as demonstrated by the asymptomatic carriage of beta and gamma-human papillomaviruses on the healthy skin. Furthermore, it has been recently suggested that some representatives of the Polyomavirus genus might share a similar feature. In the present study, the cutaneous virome of the surface of the normal-appearing skin from five healthy individuals and one patient with Merkel cell carcinoma was investigated through a high throughput metagenomic sequencing approach in an attempt to provide a thorough description of the cutaneous flora, with a particular focus on its viral component. The results emphasize the high diversity of the viral cutaneous flora with multiple polyomaviruses, papillomaviruses and circoviruses being detected on normal-appearing skin. Moreover, this approach resulted in the identification of new Papillomavirus and Circovirus genomes and confirmed a very low level of genetic diversity within human polyomavirus species. Although viruses are generally considered as pathogen agents, our findings support the existence of a complex viral flora present at the surface of healthy-appearing human skin in various individuals. The dynamics and anatomical variations of this skin virome and its variations according to pathological conditions remain to be further studied. The potential involvement of these viruses, alone or in combination, in skin proliferative disorders and oncogenesis is another crucial issue to be elucidated. PMID:22723863

  10. Sequence Variation and Immunologic Cross-Reactivity among Babesia bovis Merozoite Surface Antigen 1 Proteins from Vaccine Strains and Vaccine Breakthrough Isolates

    PubMed Central

    LeRoith, Tanya; Brayton, Kelly A.; Molloy, John B.; Bock, Russell E.; Hines, Stephen A.; Lew, Ala E.; McElwain, Terry F.

    2005-01-01

    The Babesia bovis merozoite surface antigen 1 (MSA-1) is an immunodominant membrane glycoprotein that is the target of invasion-blocking antibodies. While antigenic variation has been demonstrated in MSA-1 among strains from distinct geographical areas, the extent of sequence variation within a region where it is endemic and the effect of variation on immunologic cross-reactivity have not been assessed. In this study, sequencing of MSA-1 from two Australian B. bovis vaccine strains and 14 breakthrough isolates from vaccinated animals demonstrated low sequence identity in the extracellular region of the molecule, ranging from 19.8 to 46.7% between the T vaccine strain and eight T vaccine breakthrough isolates, and from 18.7 to 99% between the K vaccine strain and six K vaccine breakthrough isolates. Although MSA-1 amino acid sequence varied substantially among strains, overall predicted regions of hydrophilicity and hydrophobicity in the extracellular domain were conserved in all strains examined, suggesting a conserved functional role for MSA-1 despite sequence polymorphism. Importantly, the antigenic variation created by sequence differences resulted in a lack of immunologic cross-reactivity among outbreak strains using sera from animals infected with the B. bovis vaccine strains. Additionally, sera from cattle hyperinfected with the Mexico strain of B. bovis and shown to be clinically immune did not cross-react with MSA-1 from any other isolate tested. The results indicate that isolates of B. bovis capable of evading vaccine-induced immunity contain an msa-1 gene that is significantly different from the msa-1 of the vaccine strain, and that the difference can result in a complete lack of cross-reactivity between MSA-1 from vaccine and breakthrough strains in immunized animals. PMID:16113254

  11. High-throughput sequencing of black pepper root transcriptome

    PubMed Central

    2012-01-01

    Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782

  12. Variations in Natal Knowledge Among High School Students

    ERIC Educational Resources Information Center

    Decker, David L.; Caetano, Donald F.

    1977-01-01

    A questionnaire given to high school students revealed that there is a wide difference between males and females and between caucasians and ethnic groups of knowledge and understanding of birth defects, conception, and childbearing. (JD)

  13. Sequence Variation Analysis of Epstein-Barr Virus Nuclear Antigen 1 Gene in the Virus Associated Lymphomas of Northern China

    PubMed Central

    Sun, Lingling; Zhao, Zhenzhen; Liu, Song; Liu, Xia; Sun, Zhifu; Luo, Bing

    2015-01-01

    Epstein-Barr virus (EBV) nuclear antigen 1 (EBNA1) is the only viral protein expressed in all EBV-positive tumors as it is essential for the maintenance, replication and transcription of the virus genome. According to the polymorphism of residue 487 in EBNA1 gene, EBV isolates can be classified into five subtypes: P-ala, P-thr, V-val, V-leu and V-pro. Whether these EBNA1 subtypes contribute to different tissue tropism of EBV and are consequently associated with certain malignancies remain to be determined. To elucidate the relationship, one hundred and ten EBV-positive lymphoma tissues of different types from Northern China, a non-NPC endemic area, were tested for the five subtypes by nested-PCR and DNA sequencing. In addition, EBV type 1 and type 2 classification was typed by using standard PCR assays across type-specific regions of the EBNA3C genes. Four EBNA1 subtypes were identified: V-val (68.2%, 75/110), P-thrV (15.5%, 17/110), V-leuV (3.6%, 4/110) and P-ala (10.9%, 12/110). The distribution of the EBNA1 subtypes in the four lymphoma groups was not significantly different (p = 0.075), neither was that of the EBV type 1/type 2 (p = 0.089). Compared with the previous data of gastric carcinoma (GC), nasopharyngeal carcinoma (NPC) and throat washing (TW) from healthy donors, the distribution of EBNA1 subtypes in lymphoma differed significantly (p = 0.016), with a little higher frequency of P-ala subtype. The EBV type distribution between lymphoma and the other three groups was significantly different (p = 0.000, p = 0.000, p = 0.001, respectively). The proportion of type 1 and type 2 mixed infections was higher in lymphoma than that in GC, NPC and TW. In lymphomas, the distribution of EBNA1 subtypes in the three EBV types was not significantly different (p = 0.546). These data suggested that the variation patterns of EBNA1 gene may be geographic-associated rather than tumor-specific and the role of EBNA1 gene variations in tumorigenesis needs more extensive and

  14. Mitochondrial cytochrome B DNA variation in the high-fecundity atlantic cod: trans-atlantic clines and shallow gene genealogy.

    PubMed Central

    Arnason, Einar

    2004-01-01

    An analysis of sequence variation of 250 bp of the mitochondrial cytochrome b gene of 1278 Atlantic cod Gadus morhua ranging from Newfoundland to the Baltic shows four high-frequency (>8%) haplotypes and a number of rare and singleton haplotypes. Variation is primarily synonymous mutations. Natural selection acting directly on these variants is either absent or very weak. Common haplotypes show regular trans-Atlantic clines in frequencies and each of them reaches its highest frequency in a particular country. A shallow multifurcating constellation gene genealogy implies young age and recent turnover of polymorphism. Haplotypes characterizing populations at opposite ends of the geographic distribution in Newfoundland and the Baltic are mutationally closest together. The haplotypes are young and have risen rapidly in frequency. Observed differentiation among countries is due primarily to clinal variation. Hypotheses of historical isolation and polymorphisms balanced by local selection and gene flow are unlikely. Instead the results are explained by demic selection of mitochondria carried by highly fit females winning reproductive sweepstakes. By inference the Atlantic cod, a very high-fecundity vertebrate, is characterized by a high variance of offspring number and strong natural selection that leads to very low effective to actual population sizes. PMID:15126405

  15. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  16. Genome-wide synteny through highly sensitive sequence alignment: Satsuma

    PubMed Central

    Grabherr, Manfred G.; Russell, Pamela; Meyer, Miriah; Mauceli, Evan; Alföldi, Jessica; Di Palma, Federica; Lindblad-Toh, Kerstin

    2010-01-01

    Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous ‘battleship’-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/ Contact: grabherr@broadinstitute.org PMID:20208069

  17. High-resolution network biology: connecting sequence with function

    PubMed Central

    Ryan, Colm J.; Cimermančič, Peter; Szpiech, Zachary A.; Sali, Andrej; Hernandez, Ryan D.; Krogan, Nevan J.

    2014-01-01

    Proteins are not monolithic entities; rather, they can contain multiple domains that mediate distinct interactions, and their functionality can be regulated through post-translational modifications at multiple distinct sites. Traditionally, network biology has ignored such properties of proteins and has instead examined either the physical interactions of whole proteins or the consequences of removing entire genes. In this Review, we discuss experimental and computational methods to increase the resolution of protein– protein, genetic and drug–gene interaction studies to the domain and residue levels. Such work will be crucial for using interaction networks to connect sequence and structural information, and to understand the biological consequences of disease-associated mutations, which will hopefully lead to more effective therapeutic strategies. PMID:24197012

  18. High-throughput-sequencing-based identification of a grapevine fanleaf virus satellite RNA in Vitis vinifera.

    PubMed

    Chiumenti, Michela; Mohorianu, Irina; Roseti, Vincenzo; Saldarelli, Pasquale; Dalmay, Tamas; Minafra, Angelantonio

    2016-05-01

    A new satellite RNA (satRNA) of grapevine fanleaf virus (GFLV) was identified by high-throughput sequencing of high-definition (HD) adapter libraries from grapevine plants of the cultivar Panse precoce (PPE) affected by enation disease. The complete nucleotide sequence was obtained by automatic sequencing using primers designed based on next-generation sequencing (NGS) data. The full-length sequence, named satGFLV-PPE, consisted of 1119 nucleotides with a single open reading frame from position 15 to 1034. This satRNA showed maximum nucleotide sequence identity of 87 % to satArMV-86 and satGFLV-R6. Symptomatic grapevines were surveyed for the presence of the satRNA, and no correlation was found between detection of the satRNA and enation symptom expression. PMID:26873812