Neocortical malformation as consequence of nonadaptive regulation of neuronogenetic sequence
NASA Technical Reports Server (NTRS)
Caviness, V. S. Jr; Takahashi, T.; Nowakowski, R. S.
2000-01-01
Variations in the structure of the neocortex induced by single gene mutations may be extreme or subtle. They differ from variations in neocortical structure encountered across and within species in that these "normal" structural variations are adaptive (both structurally and behaviorally), whereas those associated with disorders of development are not. Here we propose that they also differ in principle in that they represent disruptions of molecular mechanisms that are not normally regulatory to variations in the histogenetic sequence. We propose an algorithm for the operation of the neuronogenetic sequence in relation to the overall neocortical histogenetic sequence and highlight the restriction point of the G1 phase of the cell cycle as the master regulatory control point for normal coordinate structural variation across species and importantly within species. From considerations based on the anatomic evidence from neocortical malformation in humans, we illustrate in principle how this overall sequence appears to be disrupted by molecular biological linkages operating principally outside the control mechanisms responsible for the normal structural variation of the neocortex. MRDD Research Reviews 6:22-33, 2000. Copyright 2000 Wiley-Liss, Inc.
Identification of structural variation in mouse genomes.
Keane, Thomas M; Wong, Kim; Adams, David J; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz
2014-01-01
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Richardson, David S; Westerdahl, Helena
2003-12-01
The Great reed warbler (GRW) and the Seychelles warbler (SW) are congeners with markedly different demographic histories. The GRW is a normal outbred bird species while the SW population remains isolated and inbred after undergoing a severe population bottleneck. We examined variation at Major Histocompatibility Complex (MHC) class I exon 3 using restriction fragment length polymorphism, denaturing gradient gel electrophoresis and DNA sequencing. Although genetic variation was higher in the GRW, considerable variation has been maintained in the SW. The ten exon 3 sequences found in the SW were as diverged from each other as were a random sub-sample of the 67 sequences from the GRW. There was evidence for balancing selection in both species, and the phylogenetic analysis showing that the exon 3 sequences did not separate according to species, was consistent with transspecies evolution of the MHC.
Methodologic European external quality assurance for DNA sequencing: the EQUALseq program.
Ahmad-Nejad, Parviz; Dorn-Beineke, Alexandra; Pfeiffer, Ulrike; Brade, Joachim; Geilenkeuser, Wolf-Jochen; Ramsden, Simon; Pazzagli, Mario; Neumaier, Michael
2006-04-01
DNA sequencing is a key technique in molecular diagnostics, but to date no comprehensive methodologic external quality assessment (EQA) programs have been instituted. Between 2003 and 2005, the European Union funded, as specific support actions, the EQUAL initiative to develop methodologic EQA schemes for genotyping (EQUALqual), quantitative PCR (EQUALquant), and sequencing (EQUALseq). Here we report on the results of the EQUALseq program. The participating laboratories received a 4-sample set comprising 2 DNA plasmids, a PCR product, and a finished sequencing reaction to be analyzed. Data and information from detailed questionnaires were uploaded online and evaluated by use of a scoring system for technical skills and proficiency of data interpretation. Sixty laboratories from 21 European countries registered, and 43 participants (72%) returned data and samples. Capillary electrophoresis was the predominant platform (n = 39; 91%). The median contiguous correct sequence stretch was 527 nucleotides with considerable variation in quality of both primary data and data evaluation. The association between laboratory performance and the number of sequencing assays/year was statistically significant (P <0.05). Interestingly, more than 30% of participants neither added comments to their data nor made efforts to identify the gene sequences or mutational positions. Considerable variations exist even in a highly standardized methodology such as DNA sequencing. Methodologic EQAs are appropriate tools to uncover strengths and weaknesses in both technique and proficiency, and our results emphasize the need for mandatory EQAs. The results of EQUALseq should help improve the overall quality of molecular genetics findings obtained by DNA sequencing.
Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
Assmus, Jens; Kleffe, Jürgen; Schmitt, Armin O.; Brockmann, Gudrun A.
2013-01-01
There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence. PMID:23658777
Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation.
Dueck, Hannah; Khaladkar, Mugdha; Kim, Tae Kyung; Spaethling, Jennifer M; Francis, Chantal; Suresh, Sangita; Fisher, Stephen A; Seale, Patrick; Beck, Sheryl G; Bartfai, Tamas; Kuhn, Bernhard; Eberwine, James; Kim, Junhyong
2015-06-09
Differentiation of metazoan cells requires execution of different gene expression programs but recent single-cell transcriptome profiling has revealed considerable variation within cells of seeming identical phenotype. This brings into question the relationship between transcriptome states and cell phenotypes. Additionally, single-cell transcriptomics presents unique analysis challenges that need to be addressed to answer this question. We present high quality deep read-depth single-cell RNA sequencing for 91 cells from five mouse tissues and 18 cells from two rat tissues, along with 30 control samples of bulk RNA diluted to single-cell levels. We find that transcriptomes differ globally across tissues with regard to the number of genes expressed, the average expression patterns, and within-cell-type variation patterns. We develop methods to filter genes for reliable quantification and to calibrate biological variation. All cell types include genes with high variability in expression, in a tissue-specific manner. We also find evidence that single-cell variability of neuronal genes in mice is correlated with that in rats consistent with the hypothesis that levels of variation may be conserved. Single-cell RNA-sequencing data provide a unique view of transcriptome function; however, careful analysis is required in order to use single-cell RNA-sequencing measurements for this purpose. Technical variation must be considered in single-cell RNA-sequencing studies of expression variation. For a subset of genes, biological variability within each cell type appears to be regulated in order to perform dynamic functions, rather than solely molecular noise.
Maximizing ecological and evolutionary insight in bisulfite sequencing data sets
Lea, Amanda J.; Vilgalys, Tauras P.; Durst, Paul A.P.; Tung, Jenny
2017-01-01
Preface Genome-scale bisulfite sequencing approaches have opened the door to ecological and evolutionary studies of DNA methylation in many organisms. These approaches can be powerful. However, they introduce new methodological and statistical considerations, some of which are particularly relevant to non-model systems. Here, we highlight how these considerations influence a study’s power to link methylation variation with a predictor variable of interest. Relative to current practice, we argue that sample sizes will need to increase to provide robust insights. We also provide recommendations for overcoming common challenges and an R Shiny app to aid in study design. PMID:29046582
The evolution of transcriptional regulation in eukaryotes
NASA Technical Reports Server (NTRS)
Wray, Gregory A.; Hahn, Matthew W.; Abouheif, Ehab; Balhoff, James P.; Pizer, Margaret; Rockman, Matthew V.; Romano, Laura A.
2003-01-01
Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.
Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes
2015-08-19
Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
High levels of variation in Salix lignocellulose genes revealed using poplar genomic resources
2013-01-01
Background Little is known about the levels of variation in lignin or other wood related genes in Salix, a genus that is being increasingly used for biomass and biofuel production. The lignin biosynthesis pathway is well characterized in a number of species, including the model tree Populus. We aimed to transfer the genomic resources already available in Populus to its sister genus Salix to assess levels of variation within genes involved in wood formation. Results Amplification trials for 27 gene regions were undertaken in 40 Salix taxa. Twelve of these regions were sequenced. Alignment searches of the resulting sequences against reference databases, combined with phylogenetic analyses, showed the close similarity of these Salix sequences to Populus, confirming homology of the primer regions and indicating a high level of conservation within the wood formation genes. However, all sequences were found to vary considerably among Salix species, mainly as SNPs with a smaller number of insertions-deletions. Between 25 and 176 SNPs per kbp per gene region (in predicted exons) were discovered within Salix. Conclusions The variation found is sizeable but not unexpected as it is based on interspecific and not intraspecific comparison; it is comparable to interspecific variation in Populus. The characterisation of genetic variation is a key process in pre-breeding and for the conservation and exploitation of genetic resources in Salix. This study characterises the variation in several lignocellulose gene markers for such purposes. PMID:23924375
Museum genomics: low-cost and high-accuracy genetic data from historical specimens.
Rowe, Kevin C; Singhal, Sonal; Macmanes, Matthew D; Ayroles, Julien F; Morelli, Toni Lyn; Rubidge, Emily M; Bi, Ke; Moritz, Craig C
2011-11-01
Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome. © 2011 Blackwell Publishing Ltd.
Ikehata, Hironobu
2018-05-31
Ultraviolet radiation (UVR) predominantly induces UV-signature mutations, C → T and CC → TT base substitutions at dipyrimidine sites, in the cellular and skin genome. I observed in our in vivo mutation studies of mouse skin that these UVR-specific mutations show a wavelength-dependent variation in their sequence-context preference. The C → T mutation occurs most frequently in the 5'-TCG-3' sequence regardless of the UVR wavelength, but is recovered more preferentially there as the wavelength increases, resulting in prominent occurrences exclusively in the TCG sequence in the UVA wavelength range, which I will designate as a "UVA signature" in this review. The preference of the UVB-induced C → T mutation for the sequence contexts shows a mixed pattern of UVC- and UVA-induced mutations, and a similar pattern is also observed for natural sunlight, in which UVB is the most genotoxic component. In addition, the CC → TT mutation hardly occurs at UVA1 wavelengths, although it is detected rarely but constantly in the UVC and UVB ranges. This wavelength-dependent variation in the sequence-context preference of the UVR-specific mutations could be explained by two different photochemical mechanisms of cyclobutane pyrimidine dimer (CPD) formation. The UV-signature mutations observed in the UVC and UVB ranges are known to be caused mainly by CPDs produced through the conventional singlet/triplet excitation of pyrimidine bases after the direct absorption of the UVC/UVB photon energy in those bases. On the other hand, a novel photochemical mechanism through the direct absorption of the UVR energy to double-stranded DNA, which is called "collective excitation", has been proposed for the UVA-induced CPD formation. The UVA photons directly absorbed by DNA produce CPDs with a sequence context preference different from that observed for CPDs caused by the UVC/UVB-mediated singlet/triplet excitation, causing CPD formation preferentially at thymine-containing dipyrimidine sites and probably also preferably at methyl CpG-associated dipyrimidine sites, which include the TCG sequence. In this review, I present a mechanistic consideration on the wavelength-dependent variation of the sequence context preference of the UVR-specific mutations and rationalize the proposition of the UVA-signature mutation, in addition to the UV-signature mutation.
USDA-ARS?s Scientific Manuscript database
Porcine respiratory and reproductive syndrome (PRRS) continues to be an economically important disease affecting commercial pig production in the United States and worldwide. Its considerable sequence and antigenic variation, coupled with the limited protection offered by current vaccine options im...
Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A; Sahgal, Arjun
2011-11-21
Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R² = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.
Plant centromere organization: a dynamic structure with conserved functions.
Ma, Jianxin; Wing, Rod A; Bennetzen, Jeffrey L; Jackson, Scott A
2007-03-01
Although the structural features of centromeres from most multicellular eukaryotes remain to be characterized, recent analyses of the complete sequences of two centromeric regions of rice, together with data from Arabidopsis thaliana and maize, have illuminated the considerable size variation and sequence divergence of plant centromeres. Despite the severe suppression of meiotic chromosomal exchange in centromeric and pericentromeric regions of rice, the centromere core shows high rates of unequal homologous recombination in the absence of chromosomal exchange, resulting in frequent and extensive DNA rearrangement. Not only is the sequence of centromeric tandem and non-tandem repeats highly variable but also the copy number, spacing, order and orientation, providing ample natural variation as the basis for selection of superior centromere performance. This review article focuses on the structural and evolutionary dynamics of plant centromere organization and the potential molecular mechanisms responsible for the rapid changes of centromeric components.
Liu, G H; Zhou, W; Nisbet, A J; Xu, M J; Zhou, D H; Zhao, G H; Wang, S K; Song, H Q; Lin, R Q; Zhu, X Q
2014-03-01
Trichuris trichiura and Trichuris suis parasitize (at the adult stage) the caeca of humans and pigs, respectively, causing trichuriasis. Despite these parasites being of human and animal health significance, causing considerable socio-economic losses globally, little is known of the molecular characteristics of T. trichiura and T. suis from China. In the present study, the entire first and second internal transcribed spacer (ITS-1 and ITS-2) regions of nuclear ribosomal DNA (rDNA) of T. trichiura and T. suis from China were amplified by polymerase chain reaction (PCR), the representative amplicons were cloned and sequenced, and sequence variation in the ITS rDNA was examined. The ITS rDNA sequences for the T. trichiura and T. suis samples were 1222-1267 bp and 1339-1353 bp in length, respectively. Sequence analysis revealed that the ITS-1, 5.8S and ITS-2 rDNAs of both whipworms were 600-627 bp and 655-661 bp, 154 bp, and 468-486 bp and 530-538 bp in size, respectively. Sequence variation in ITS rDNA within and among T. trichiura and T. suis was examined. Excluding nucleotide variations in the simple sequence repeats, the intra-species sequence variation in the ITS-1 was 0.2-1.7% within T. trichiura, and 0-1.5% within T. suis. For ITS-2 rDNA, the intra-species sequence variation was 0-1.3% within T. trichiura and 0.2-1.7% within T. suis. The inter-species sequence differences between the two whipworms were 60.7-65.3% for ITS-1 and 59.3-61.5% for ITS-2. These results demonstrated that the ITS rDNA sequences provide additional genetic markers for the characterization and differentiation of the two whipworms. These data should be useful for studying the epidemiology and population genetics of T. trichiura and T. suis, as well as for the diagnosis of trichuriasis in humans and pigs.
Wu, Gary D; Lewis, James D; Hoffmann, Christian; Chen, Ying-Yu; Knight, Rob; Bittinger, Kyle; Hwang, Jennifer; Chen, Jun; Berkowsky, Ronald; Nessel, Lisa; Li, Hongzhe; Bushman, Frederic D
2010-07-30
Intense interest centers on the role of the human gut microbiome in health and disease, but optimal methods for analysis are still under development. Here we present a study of methods for surveying bacterial communities in human feces using 454/Roche pyrosequencing of 16S rRNA gene tags. We analyzed fecal samples from 10 individuals and compared methods for storage, DNA purification and sequence acquisition. To assess reproducibility, we compared samples one cm apart on a single stool specimen for each individual. To analyze storage methods, we compared 1) immediate freezing at -80 degrees C, 2) storage on ice for 24 or 3) 48 hours. For DNA purification methods, we tested three commercial kits and bead beating in hot phenol. Variations due to the different methodologies were compared to variation among individuals using two approaches--one based on presence-absence information for bacterial taxa (unweighted UniFrac) and the other taking into account their relative abundance (weighted UniFrac). In the unweighted analysis relatively little variation was associated with the different analytical procedures, and variation between individuals predominated. In the weighted analysis considerable variation was associated with the purification methods. Particularly notable was improved recovery of Firmicutes sequences using the hot phenol method. We also carried out surveys of the effects of different 454 sequencing methods (FLX versus Titanium) and amplification of different 16S rRNA variable gene segments. Based on our findings we present recommendations for protocols to collect, process and sequence bacterial 16S rDNA from fecal samples--some major points are 1) if feasible, bead-beating in hot phenol or use of the PSP kit improves recovery; 2) storage methods can be adjusted based on experimental convenience; 3) unweighted (presence-absence) comparisons are less affected by lysis method.
LinkFinder: An expert system that constructs phylogenic trees
NASA Technical Reports Server (NTRS)
Inglehart, James; Nelson, Peter C.
1991-01-01
An expert system has been developed using the C Language Integrated Production System (CLIPS) that automates the process of constructing DNA sequence based phylogenies (trees or lineages) that indicate evolutionary relationships. LinkFinder takes as input homologous DNA sequences from distinct individual organisms. It measures variations between the sequences, selects appropriate proportionality constants, and estimates the time that has passed since each pair of organisms diverged from a common ancestor. It then designs and outputs a phylogenic map summarizing these results. LinkFinder can find genetic relationships between different species, and between individuals of the same species, including humans. It was designed to take advantage of the vast amount of sequence data being produced by the Genome Project, and should be of value to evolution theorists who wish to utilize this data, but who have no formal training in molecular genetics. Evolutionary theory holds that distinct organisms carrying a common gene inherited that gene from a common ancestor. Homologous genes vary from individual to individual and species to species, and the amount of variation is now believed to be directly proportional to the time that has passed since divergence from a common ancestor. The proportionality constant must be determined experimentally; it varies considerably with the types of organisms and DNA molecules under study. Given an appropriate constant, and the variation between two DNA sequences, a simple linear equation gives the divergence time.
NASA Astrophysics Data System (ADS)
Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A.; Sahgal, Arjun
2011-11-01
Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R2 = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.
Bashir, Ali; Bansal, Vikas; Bafna, Vineet
2010-06-18
Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
Genetic variation in domestic reindeer and wild caribou in Alaska
Cronin, M.; Renecker, L.; Pierson, Barbara J.; Patton, J.C.
1995-01-01
Reindeer were introduced into Alaska 100 years ago and have been maintained as semidomestic livestock. They have had contact with wild caribou herds, including deliberate cross-breeding and mixing in the wild. Reindeer have considerable potential as a domestic animal for meat or velvet antler production, and wild caribou are important to subsistence and sport hunters. Our objective was to quantify the genetic relationships of reindeer and caribou in Alaska. We identified allelic variation among five herds of wild caribou and three herds of reindeer with DNA sequencing and restriction enzymes for three loci: a DQA locus of the major histocompatibility complex (Rata-DQA1), k-casein and the D-loop of mitochondrial DNA. These loci are of interest because of their potential influence on domestic animal performance and the fitness of wild populations. There is considerable genetic variation in reindeer and caribou for all three loci, including five, three and six alleles for DQA, k-casein and D-loop respectively. Most alleles occur in both reindeer and caribou, which may be the result of recent common ancestry or genetic introgression in either direction. However, allele frequencies differ considerably between reindeer and caribou, which suggests that gene flow has been limited.
Xu, Shixia; Ju, Jianfeng; Zhou, Xuming; Wang, Lian; Zhou, Kaiya; Yang, Guang
2012-01-01
To further extend our understanding of the mechanism causing the current nearly extinct status of the baiji (Lipotes vexillifer), one of the most critically endangered species in the world, genetic diversity at the major histocompatibility complex (MHC) class II DRB locus was investigated in the baiji. Nine highly divergent DRB alleles were identified in 17 samples, with an average of 28.4 (13.2%) nucleotide difference and 16.7 (23.5%) amino acid difference between alleles. The unexpectedly high levels of DRB allelic diversity in the baiji may partly be attributable to its evolutionary adaptations to the freshwater environment which is regarded to have a higher parasite diversity compared to the marine environment. In addition, balancing selection was found to be the main mechanisms in generating sequence diversity at baiji DRB gene. Considerable sequence variation at the adaptive MHC genes despite of significant loss of neutral genetic variation in baiji genome might suggest that intense selection has overpowered random genetic drift as the main evolutionary forces, which further suggested that the critically endangered or nearly extinct status of the baiji is not an outcome of genetic collapse. PMID:22272349
Rebelling for a Reason: Protein Structural “Outliers”
Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini
2013-01-01
Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
Kooshavar, Daniz; Tabatabaiefar, Mohammad Amin; Farrokhi, Effat; Abolhasani, Marziye; Noori-Daloii, Mohammad-Reza; Hashemzadeh-Chaleshtori, Morteza
2013-02-01
Autosomal recessive non-syndromic hearing loss (ARNSHL) can be caused by many genes. However, mutations in the GJB2 gene, which encodes the gap-junction (GJ) protein connexin (Cx) 26, constitute a considerable proportion differing among population. Between 10 and 42 percent of patients with recessive GJB2 mutations carry only one mutant allele. Mutations in GJB4, GJA1, and GJC3 encoding Cx30.3, Cx43, and Cx29, respectively, can lead to HL. Combination of different connexins in heteromeric and heterotypic GJ assemblies is possible. This study aims to determine whether variations in any of the genes GJB4, GJA1 or GJC3 can be the second mutant allele causing the disease in the digenic mode of inheritance in the studied GJB2 heterozygous cases. We examined 34 unrelated GJB2 heterozygous ARNSHL subjects from different geographic and ethnic areas in Iran, using polymerase chain reaction (PCR) followed by direct DNA sequencing to identify any sequence variations in these genes. Restriction fragment length polymorphism (RFLP) assays were performed on 400 normal hearing individuals. Sequence analysis of GJB4 showed five heterozygous variations including c.451C>A, c.219C>T, c.507C>G, c.155_158delTCTG and c.542C>T, with only the latter variation not being detected in any of control samples. There were three heterozygous variations including c.758C>T, c.717G>A and c.3*dupA in GJA1 in four cases. We found no variations in GJC3 gene sequence. Our data suggest that GJB4 c.542C>T variant and less likely some variations of GJB4 and GJA1, but not possibly GJC3, can be assigned to ARNSHL in GJB2 heterozygous mutation carriers providing clues of the digenic pattern. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
2017-01-01
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014
Gould, Virginia C; Okazaki, Aki; Howe, Robin A; Avison, Matthew B
2004-08-01
To determine the level of variation in the smeDEF efflux pump and smeT transcriptional regulator genes among three defined 16S rRNA sequence subgroups of clinical Stenotrophomonas maltophilia isolates. smeDEF sequencing used a PCR genome walking approach. Determination of the sequence surrounding smeDEF used a flanking primer PCR method and specific primers anchored in smeD or smeF together with random primers. smeDEF is chromosomal and located in the same position in the chromosome in all three subgroups of isolates. Flanking smeD is a gene, smeT, encoding a putative transcriptional repressor for smeDEF. Variation at these loci among the isolates is considerably lower (up to 10%) than at intrinsic beta-lactamase loci (up to 30%) in the same isolates, implying greater functional constraint. The smeD-smeT intergenic region contains a highly conserved section, which maps with previously predicted promoter/operator regions, and a hypervariable untranslated region, which can be used to subgroup clinical isolates. These data provide further evidence that it is possible to group clinical isolates of the inherently variable species, S. maltophilia, based on genotypic properties. Isolate D457, in which most work concerning smeDEF expression has been performed, does not fall into S. maltophilia subgroup A, which is the most typical.
Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species
Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.
2012-01-01
Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of genomic content. Differences in gene content likely contribute to differences in the clinical and environmental distribution of species and sequence types. PMID:23166675
2010-01-01
Intense interest centers on the role of the human gut microbiome in health and disease, but optimal methods for analysis are still under development. Here we present a study of methods for surveying bacterial communities in human feces using 454/Roche pyrosequencing of 16S rRNA gene tags. We analyzed fecal samples from 10 individuals and compared methods for storage, DNA purification and sequence acquisition. To assess reproducibility, we compared samples one cm apart on a single stool specimen for each individual. To analyze storage methods, we compared 1) immediate freezing at -80°C, 2) storage on ice for 24 or 3) 48 hours. For DNA purification methods, we tested three commercial kits and bead beating in hot phenol. Variations due to the different methodologies were compared to variation among individuals using two approaches--one based on presence-absence information for bacterial taxa (unweighted UniFrac) and the other taking into account their relative abundance (weighted UniFrac). In the unweighted analysis relatively little variation was associated with the different analytical procedures, and variation between individuals predominated. In the weighted analysis considerable variation was associated with the purification methods. Particularly notable was improved recovery of Firmicutes sequences using the hot phenol method. We also carried out surveys of the effects of different 454 sequencing methods (FLX versus Titanium) and amplification of different 16S rRNA variable gene segments. Based on our findings we present recommendations for protocols to collect, process and sequence bacterial 16S rDNA from fecal samples--some major points are 1) if feasible, bead-beating in hot phenol or use of the PSP kit improves recovery; 2) storage methods can be adjusted based on experimental convenience; 3) unweighted (presence-absence) comparisons are less affected by lysis method. PMID:20673359
Bacterial resistance to antibodies: a model evolutionary study.
Schulman, Lawrence S
2017-03-21
The tangled nature model of evolution (reviewed in the main text) is adapted for use in the study of antibody resistance acquired by horizontal gene transfer. Exchanges of DNA and the acquisition of resistant gene sequences are considered. For the parameters used, resistant strains rapidly proliferate and dominate, although initial intense antibiotic treatment can occasionally prevent this. Variation in genome distribution appears to be long tailed. If this is reflected in nature, the occurrence of resistant bacterial strains can be expected, as well as considerable variation in patient outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Golden, Barbara L.; Kundrot, Craig E.
2003-01-01
RNA molecules may be crystallized using variations of the methods developed for protein crystallography. As the technology has become available to syntheisize and purify RNA molecules in the quantities and with the quality that is required for crystallography, the field of RNA structure has exploded. The first consideration when crystallizing an RNA is the sequence, which may be varied in a rational way to enhance crystallizability or prevent formation of alternate structures. Once a sequence has been designed, the RNA may be synthesized chemically by solid-state synthesis, or it may be produced enzymatically using RNA polymerase and an appropriate DNA template. Purification of milligram quantities of RNA can be accomplished by HPLC or gel electrophoresis. As with proteins, crystallization of RNA is usually accomplished by vapor diffusion techniques. There are several considerations that are either unique to RNA crystallization or more important for RNA crystallization. Techniques for design, synthesis, purification, and crystallization of RNAs will be reviewed here.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M
2015-08-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.
2015-01-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-01-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Keel, B N; Nonneman, D J; Rohrer, G A
2017-08-01
Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Trapnell, Cole; Davidson, Stuart; Pachter, Lior; Chu, Hou Cheng; Tonkin, Leath A.; Biggin, Mark D.; Eisen, Michael B.
2010-01-01
Changes in gene expression play an important role in evolution, yet the molecular mechanisms underlying regulatory evolution are poorly understood. Here we compare genome-wide binding of the six transcription factors that initiate segmentation along the anterior-posterior axis in embryos of two closely related species: Drosophila melanogaster and Drosophila yakuba. Where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species. Levels of binding, however, vary considerably. The magnitude and direction of the interspecies differences in binding levels of all six factors are strongly correlated, suggesting a role for chromatin or other factor-independent forces in mediating the divergence of transcription factor binding. Nonetheless, factor-specific quantitative variation in binding is common, and we show that it is driven to a large extent by the gain and loss of cognate recognition sequences for the given factor. We find only a weak correlation between binding variation and regulatory function. These data provide the first genome-wide picture of how modest levels of sequence divergence between highly morphologically similar species affect a system of coordinately acting transcription factors during animal development, and highlight the dominant role of quantitative variation in transcription factor binding over short evolutionary distances. PMID:20351773
Rapid evolution of cis-regulatory sequences via local point mutations
NASA Technical Reports Server (NTRS)
Stone, J. R.; Wray, G. A.
2001-01-01
Although the evolution of protein-coding sequences within genomes is well understood, the same cannot be said of the cis-regulatory regions that control transcription. Yet, changes in gene expression are likely to constitute an important component of phenotypic evolution. We simulated the evolution of new transcription factor binding sites via local point mutations. The results indicate that new binding sites appear and become fixed within populations on microevolutionary timescales under an assumption of neutral evolution. Even combinations of two new binding sites evolve very quickly. We predict that local point mutations continually generate considerable genetic variation that is capable of altering gene expression.
Aravind, Penmatsa; Wistow, Graeme; Sharma, Yogendra; Sankaranarayanan, Rajan
2008-01-01
βγ-Crystallins belong to a superfamily of proteins in prokaryotes and eukaryotes that are based on duplications of a characteristic, highly conserved Greek Key motif. Most members of the superfamily in vertebrates are structural proteins of the eye lens that contain four motifs arranged as two structural domains. Absent in melanoma-1 (AIM1), an unusual member of the superfamily whose expression is associated with suppression of malignancy in melanoma, contains 12 βγ-crystallin motifs in six domains. Some of these motifs diverge considerably from the canonical motif sequence. AIM1g1, the first βγ-crystallin domain of AIM1, is the most variant of βγ-crystallin domains currently known. In order to understand the limits of sequence variation on the structure, we report the crystal structure of AIM1g1 at 1.9Å resolution. In spite of having changes in key residues, the domain retains the overall βγ-crystallin fold. The domain also contains an unusual extended surface loop that significantly alters the shape of the domain and its charge profile. This structure illustrates the resilience of the βγ fold to considerable sequence changes and its remarkable ability to adapt for novel functions. PMID:18582473
RNA circularization reveals terminal sequence heterogeneity in a double-stranded RNA virus.
Widmer, G
1993-03-01
Double-stranded RNA viruses (dsRNA), termed LRV1, have been found in several strains of the protozoan parasite Leishmania. With the aim of constructing a full-length cDNA copy of the viral genome, including its terminal sequences, a protocol based on PCR amplification across the 3'-5' junction of circularized RNA was developed. This method proved to be applicable to dsRNA. It provided a relatively simple alternative to one-sided PCR, without loss of specificity inherent in the use of generic primers. LRV1 terminal nucleotide sequences obtained by this method showed a considerable variation in length, particularly at the 5' end of the positive strand, as well as the potential for forming 3' overhangs. The opposite genomic end terminates in 0, 1, or 2 TCA trinucleotide repeats. These results are compared with terminal sequences derived from one-sided PCR experiments.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-02-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha).
da Silva, M N; Patton, J L
1993-09-01
Patterns of evolutionary relationships among haplotype clades of sequences of the mitochondrial cytochrome b DNA gene are examined for five genera of arboreal rodents of the Caviomorph family Echimyidae from the Amazon Basin. Data are available for 798 bp of sequence from a total of 24 separate localities in Peru, Venezuela, Bolivia, and Brazil for Mesomys, Isothrix, Makalata, Dactylomys, and Echimys. Sequence divergence, corrected for multiple hits, is extensive, ranging from less than 1% for comparisons within populations of over 20% among geographic units within genera. Both the degree of differentiation and the geographic patterning of the variation suggest that more than one species composes the Amazonian distribution of the currently recognized Mesomys hispidus, Isothrix bistriata, Makalata didelphoides, and Dactylomys dactylinus. There is general concordance in the geographic range of haplotype clades for each of these taxa, and the overall level of differentiation within them is largely equivalent. These observations suggest that a common vicariant history underlies the respective diversification of each genus. However, estimated times of divergence based on the rate of third position transversion substitutions for the major clades within each genus typically range above 1 million years. Thus, allopatric isolation precipitating divergence must have been considerably earlier than the late Pleistocene forest fragmentation events commonly invoked for Amazonian biota.
van Keulen, H; Campbell, S R; Erlandsen, S L; Jarroll, E L
1991-06-01
In an attempt to study Giardia at the DNA sequence level, the rRNA genes of three species, Giardia duodenalis, Giardia ardeae and Giardia muris were cloned and restriction enzyme maps were constructed. The rDNA repeats of these Giardia show completely different restriction enzyme recognition patterns. The size of the rDNA repeat ranges from approximately 5.6 kb in G. duodenalis to 7.6 kb in both G. muris and G. ardeae. These size differences are mainly attributable to the variation in length of the spacer. Minor differences exist among these Giardia in the sizes of their small subunit rRNA and the internal transcribed spacer between small and large subunit rRNA. The genetic maps were constructed by sequence analysis of the DNA around the 5' and 3' ends of the mature rRNA genes and between the rRNA covering the 5.8S rRNA gene and internal transcribed spacer. Comparison of the 5.8S rDNA and 3' end of large subunit rDNA from these three Giardia species showed considerable sequence variation, but the rDNA sequences of G. duodenalis and G. ardeae appear more closely related to each other than to G. muris.
Identification of medicinal plants in the family Fabaceae using a potential DNA barcode ITS2.
Gao, Ting; Yao, Hui; Song, Jingyuan; Liu, Chang; Zhu, Yingjie; Ma, Xinye; Pang, Xiaohui; Xu, Hongxi; Chen, Shilin
2010-07-06
To test whether the ITS2 region is an effective marker for use in authenticating of the family Fabaceae which contains many important medicinal plants. The ITS2 regions of 114 samples in Fabaceae were amplified. Sequence assembly was assembled by CodonCode Aligner V3.0. In combination with sequences from public database, the sequences were aligned by Clustal W, and genetic distances were computed using MEGA V4.0. The intra- vs. inter-specific variations were assessed by six metrics, wilcoxon two-sample tests and "barcoding gaps". Species identification was accomplished using TaxonGAP V2.4, BLAST1 and the nearest distance method. ITS2 sequences had considerable variation at the genus and species level. The intra-specific divergence ranged from 0% to 14.4%, with an average of 1.7%, and the inter-specific divergence ranged from 0% to 63.0%, with an average of 8.6%. Twenty-four species found in the Chinese Pharmacopoeia, along with another 66 species including their adulterants, were successfully identified based on ITS2 sequences. In addition, ITS2 worked well, with over 80.0% of species and 100% of genera being correctly differentiated for the 1507 sequences derived from 1126 species belonging to 196 genera. Our findings support the notion that ITS2 can be used as an efficient and powerful marker and a potential barcode to distinguish various species in Fabaceae. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.
Comparative RNA sequencing reveals substantial genetic variation in endangered primates
Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav
2012-01-01
Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615
Doanh, N Pham; Tu, A Luu; Bui, T Dung; Loan, T Ho; Nonaka, Nariaki; Horii, Yoichiro; Blair, David; Nawa, Yukifumi
2016-10-01
Paragonimus westermani is one of the most medically important lung flukes and is widely distributed in Asia. It exhibits considerable variation in morphological, genetic and biological features. In central provinces of Vietnam, a high prevalence of metacercariae of this species has been reported from the crab intermediate host, Vietopotamon aluoiense. In this study, we detected P. westermani metacercariae in two additional crab hosts, Donopotamon haii in Quang Tri Province, central Vietnam and Indochinamon tannanti in Yen Bai Province in the north. The latter is a new locality for P. westermani in a northern region of Vietnam where P. heterotremus is the only species currently known to cause human paragonimiasis. Paragonimus westermani metacercariae found in Vietnam showed considerable morphological variation but slight genetic variation based on DNA sequences from the nuclear ribosomal ITS2 region and the mitochondrial 16S gene. Co-infection of the same individual crabs with P. westermani and P. heterotremus and/or some other Paragonimus species was found frequently, suggesting potential for co-infection in humans. The findings of the present study emphasize the need for highly specific molecular and immunodiagnostic methods to differentially diagnose between P. westermani and P. heterotremus infections.
Fane, Anne; Sarovich, Derek S.; Price, Erin P.; Rush, Catherine M.; Govan, Brenda L.; Parker, Elizabeth; Mayo, Mark; Currie, Bart J.; Ketheesan, Natkunam
2017-01-01
Neurologic melioidosis is a serious, potentially fatal form of Burkholderia pseudomallei infection. Recently, we reported that a subset of clinical isolates of B. pseudomallei from Australia have heightened virulence and potential for dissemination to the central nervous system. In this study, we demonstrate that this subset has a B. mallei–like sequence variation of the actin-based motility gene, bimA. Compared with B. pseudomallei isolates having typical bimA alleles, isolates that contain the B. mallei–like variation demonstrate increased persistence in phagocytic cells and increased virulence with rapid systemic dissemination and replication within multiple tissues, including the brain and spinal cord, in an experimental model. These findings highlight the implications of bimA variation on disease progression of B. pseudomallei infection and have considerable clinical and public health implications with respect to the degree of neurotropic threat posed to human health. PMID:28418830
Morris, Jodie L; Fane, Anne; Sarovich, Derek S; Price, Erin P; Rush, Catherine M; Govan, Brenda L; Parker, Elizabeth; Mayo, Mark; Currie, Bart J; Ketheesan, Natkunam
2017-05-01
Neurologic melioidosis is a serious, potentially fatal form of Burkholderia pseudomallei infection. Recently, we reported that a subset of clinical isolates of B. pseudomallei from Australia have heightened virulence and potential for dissemination to the central nervous system. In this study, we demonstrate that this subset has a B. mallei-like sequence variation of the actin-based motility gene, bimA. Compared with B. pseudomallei isolates having typical bimA alleles, isolates that contain the B. mallei-like variation demonstrate increased persistence in phagocytic cells and increased virulence with rapid systemic dissemination and replication within multiple tissues, including the brain and spinal cord, in an experimental model. These findings highlight the implications of bimA variation on disease progression of B. pseudomallei infection and have considerable clinical and public health implications with respect to the degree of neurotropic threat posed to human health.
NASA Astrophysics Data System (ADS)
Cawood, Adam J.; Bond, Clare E.
2018-01-01
Stratigraphic influence on structural style and strain distribution in deformed sedimentary sequences is well established, in models of 2D mechanical stratigraphy. In this study we attempt to refine existing models of stratigraphic-structure interaction by examining outcrop scale 3D variations in sedimentary architecture and the effects on subsequent deformation. At Monkstone Point, Pembrokeshire, SW Wales, digital mapping and virtual scanline data from a high resolution virtual outcrop have been combined with field observations, sedimentary logs and thin section analysis. Results show that significant variation in strain partitioning is controlled by changes, at a scale of tens of metres, in sedimentary architecture within Upper Carboniferous fluvio-deltaic deposits. Coupled vs uncoupled deformation of the sequence is defined by the composition and lateral continuity of mechanical units and unit interfaces. Where the sedimentary sequence is characterized by gradational changes in composition and grain size, we find that deformation structures are best characterized by patterns of distributed strain. In contrast, distinct compositional changes vertically and in laterally equivalent deposits results in highly partitioned deformation and strain. The mechanical stratigraphy of the study area is inherently 3D in nature, due to lateral and vertical compositional variability. Consideration should be given to 3D variations in mechanical stratigraphy, such as those outlined here, when predicting subsurface deformation in multi-layers.
Fernandes, E.K.K.; Moraes, A.M.L.; Pacheco, R.S.; Rangel, D.E.N.; Miller, M.P.; Bittencourt, V.R.E.P.; Roberts, D.W.
2009-01-01
Aims: The genetic diversity of Beauveria bassiana was investigated by comparing isolates of this species to each other (49 from different geographical regions of Brazil and 4 from USA) and to other Beauveria spp. Methods and Results: The isolates were examined by multilocus enzyme electrophoresis (MLEE), amplified fragment length polymorphism (AFLP), and rDNA sequencing. MLEE and AFLP revealed considerable genetic variability among B. bassiana isolates. Several isolates from South and Southeast Brazil had high similarity coefficients, providing evidence of at least one population with clonal structure. There were clear genomic differences between most Brazilian and USA B. bassiana isolates. A Mantel test using data generated by AFLP provided evidence that greater geographical distances were associated with higher genetic distances. AFLP and rDNA sequencing demonstrated notable genotypic variation between B. bassiana and other Beauveria spp. Conclusion: Geographical distance between populations apparently is an important factor influencing genotypic variability among B. bassiana populations in Brazil. Significance and Impact of the Study: This study characterized many B. bassiana isolates. The results indicate that certain Brazilian isolates are considerably different from others and possibly should be regarded as separate species from B. bassiana sensu latu. The information on genetic variation among the Brazilian isolates, therefore, will be important to comprehending the population structure of B. bassiana in Brazil. ?? 2009 The Society for Applied Microbiology.
Variation in promiscuity and sexual selection drives avian rate of Faster-Z evolution.
Wright, Alison E; Harrison, Peter W; Zimmer, Fabian; Montgomery, Stephen H; Pointer, Marie A; Mank, Judith E
2015-03-01
Higher rates of coding sequence evolution have been observed on the Z chromosome relative to the autosomes across a wide range of species. However, despite a considerable body of theory, we lack empirical evidence explaining variation in the strength of the Faster-Z Effect. To assess the magnitude and drivers of Faster-Z Evolution, we assembled six de novo transcriptomes, spanning 90 million years of avian evolution. Our analysis combines expression, sequence and polymorphism data with measures of sperm competition and promiscuity. In doing so, we present the first empirical evidence demonstrating the positive relationship between Faster-Z Effect and measures of promiscuity, and therefore variance in male mating success. Our results from multiple lines of evidence indicate that selection is less effective on the Z chromosome, particularly in promiscuous species, and that Faster-Z Evolution in birds is due primarily to genetic drift. Our results reveal the power of mating system and sexual selection in shaping broad patterns in genome evolution. © 2015 John Wiley & Sons Ltd.
Inter-individual variation in expression: a missing link in biomarker biology?
Little, Peter F R; Williams, Rohan B H; Wilkins, Marc R
2009-01-01
The past decade has seen an explosion of variation data demonstrating that diversity of both protein-coding sequences and of regulatory elements of protein-coding genes is common and of functional importance. In this article, we argue that genetic diversity can no longer be ignored in studies of human biology, even research projects without explicit genetic experimental design, and that this knowledge can, and must, inform research. By way of illustration, we focus on the potential role of genetic data in case-control studies to identify and validate cancer protein biomarkers. We argue that a consideration of genetics, in conjunction with proteomic biomarker discovery projects, should improve the proportion of biomarkers that can accurately classify patients.
Position specific variation in the rate of evolution in transcription factor binding sites
Moses, Alan M; Chiang, Derek Y; Kellis, Manolis; Lander, Eric S; Eisen, Michael B
2003-01-01
Background The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Results Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. Conclusion As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA. PMID:12946282
Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H
2018-04-12
Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Magnusson, P; Bäck, S A; Olsson, L E
1999-11-01
MR image nonuniformity can vary significantly with the spin-echo pulse sequence repetition time. When MR images with different nonuniformity shapes are used in a T1-calculation the resulting T1-image becomes nonuniform. As shown in this work the uniformity TR-dependence of the spin-echo pulse sequence is a critical property for T1 measurements in general and for ferrous sulfate dosimeter gel (FeGel) applications in particular. The purpose was to study the characteristics of the MR image plane nonuniformity in FeGel evaluation. This included studies of the possibility of decreasing nonuniformities by selecting uniformity optimized repetition times, studies of the transmitted and received RF-fields and studies of the effectiveness of the correction methods background subtraction and quotient correction. A pronounced MR image nonuniformity variation with repetition and T1 relaxation time was observed, and was found to originate from nonuniform RF-transmission in combination with the inherent differences in T1 relaxation for different repetition times. The T1 calculation itself, the uniformity optimized repetition times, nor none of the correction methods studied could sufficiently correct the nonuniformities observed in the T1 images. The nonuniformities were found to vary considerably less with inversion time for the inversion-recovery pulse sequence, than with repetition time for the spin-echo pulse sequence, resulting in considerably lower T1 image nonuniformity levels.
Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong
2014-01-01
Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372
Yokoyama, Jun; Fukuda, Tatsuya; Tsukaya, Hirokazu
2003-08-01
Morphological and molecular variation in Mitchella undulata Siebold et Zucc. was examined to evaluate the genetic basis for recognizing the dwarf variety, M. undulata var. minor Masamune. Considerable variation in leaf size in M. undulata, but no obvious morphological discontinuities, were found between the normal and dwarf varieties. Instead, a weak cline running from the Pacific Ocean to the Sea of Japan was found. Anatomical observations of leaf blades revealed that the large variation in leaf size can be attributed to variation in the number of leaf cells and not to differences in cell size. A molecular analysis based on sequences of rDNA internal transcribed spacer regions indicated that there were two major genotypes in M. undulata with minor variation in haplotypes resulting from additional substitutions or putative recombination. The dwarf form from Yakushima was neither genetically uniform nor apparently differentiated from other populations. From these results, we conclude that the dwarf form of M. undulata should be treated at the rank of forma.
Variation of 45S rDNA intergenic spacers in Arabidopsis thaliana.
Havlová, Kateřina; Dvořáčková, Martina; Peiro, Ramon; Abia, David; Mozgová, Iva; Vansáčová, Lenka; Gutierrez, Crisanto; Fajkus, Jiří
2016-11-01
Approximately seven hundred 45S rRNA genes (rDNA) in the Arabidopsis thaliana genome are organised in two 4 Mbp-long arrays of tandem repeats arranged in head-to-tail fashion separated by an intergenic spacer (IGS). These arrays make up 5 % of the A. thaliana genome. IGS are rapidly evolving sequences and frequent rearrangements inside the rDNA loci have generated considerable interspecific and even intra-individual variability which allows to distinguish among otherwise highly conserved rRNA genes. The IGS has not been comprehensively described despite its potential importance in regulation of rDNA transcription and replication. Here we describe the detailed sequence variation in the complete IGS of A. thaliana WT plants and provide the reference/consensus IGS sequence, as well as genomic DNA analysis. We further investigate mutants dysfunctional in chromatin assembly factor-1 (CAF-1) (fas1 and fas2 mutants), which are known to have a reduced number of rDNA copies, and plant lines with restored CAF-1 function (segregated from a fas1xfas2 genetic background) showing major rDNA rearrangements. The systematic rDNA loss in CAF-1 mutants leads to the decreased variability of the IGS and to the occurrence of distinct IGS variants. We present for the first time a comprehensive and representative set of complete IGS sequences, obtained by conventional cloning and by Pacific Biosciences sequencing. Our data expands the knowledge of the A. thaliana IGS sequence arrangement and variability, which has not been available in full and in detail until now. This is also the first study combining IGS sequencing data with RFLP analysis of genomic DNA.
Hysteretic energy prediction method for mainshock-aftershock sequences
NASA Astrophysics Data System (ADS)
Zhai, Changhai; Ji, Duofa; Wen, Weiping; Li, Cuihua; Lei, Weidong; Xie, Lili
2018-04-01
Structures located in seismically active regions may be subjected to mainshock-aftershock (MSAS) sequences. Strong aftershocks significantly affect the hysteretic energy demand of structures. The hysteretic energy, E H,seq, is normalized by mass m and expressed in terms of the equivalent velocity, V D,seq, to quantitatively investigate aftershock effects on the hysteretic energy of structures. The equivalent velocity, V D,seq, is computed by analyzing the response time-history of an inelastic single-degree-of-freedom (SDOF) system with a varying vibration period subjected to 309 MSAS sequences. The present study selected two kinds of MSAS sequences, with one aftershock and two aftershocks, respectively. The aftershocks are scaled to maintain different relative intensities. The variation of the equivalent velocity, V D,seq, is studied for consideration of the ductility values, site conditions, relative intensities, number of aftershocks, hysteretic models, and damping ratios. The MSAS sequence with one aftershock exhibited a 10% to 30% hysteretic energy increase, whereas the MSAS sequence with two aftershocks presented a 20% to 40% hysteretic energy increase. Finally, a hysteretic energy prediction equation is proposed as a function of the vibration period, ductility value, and damping ratio to estimate hysteretic energy for mainshock-aftershock sequences.
Biodiversity of Trichoderma (Hypocreaceae) in Southern Europe and Macaronesia
Jaklitsch, W.M.; Voglmayr, H.
2015-01-01
The first large-scale survey of sexual and asexual Trichoderma morphs collected from plant and fungal materials conducted in Southern Europe and Macaronesia including a few collections from French islands east of Africa yielded more than 650 specimens identified to the species level. Routine sequencing of tef1 revealed a genetic variation among these isolates that exceeds previous experience and ca. 90 species were recognized, of which 74 are named and 17 species newly described. Aphysiostroma stercorarium is combined in Trichoderma. For the first time a sexual morph is described for T. hamatum. The hitherto most complete phylogenetic tree is presented for the entire genus Trichoderma, based on rpb2 sequences. For the first time also a genus-wide phylogenetic tree based on acl1 sequences is shown. Detailed phylogenetic analyses using tef1 sequences are presented in four separate trees representing major clades of Trichoderma. Discussions involve species composition of clades and ecological and biogeographic considerations including distribution of species. PMID:26955191
Ebolavirus comparative genomics
Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...
2015-07-14
The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less
NASA Astrophysics Data System (ADS)
Kalaycıoğlu, Barış; Husnu Dirikolu, M.
2010-09-01
In this study, a Type III composite pressure vessel (ISO 11439:2000) loaded with high internal pressure is investigated in terms of the effect of the orientation of the element coordinate system while simulating the continuous variation of the fibre angle, the effect of symmetric and non-symmetric composite wall stacking sequences, and lastly, a stacking sequence evaluation for reducing the cylindrical section-end cap transition region stress concentration. The research was performed using an Ansys® model with 2.9 l volume, 6061 T6 aluminium liner/Kevlar® 49-Epoxy vessel material, and a service internal pressure loading of 22 MPa. The results show that symmetric stacking sequences give higher burst pressures by up to 15%. Stacking sequence evaluations provided a further 7% pressure-carrying capacity as well as reduced stress concentration in the transition region. Finally, the Type III vessel under consideration provides a 45% lighter construction as compared with an all metal (Type I) vessel.
Koch, Evan; Novembre, John
2017-01-01
When mutations have small effects on fitness, population size plays an important role in determining the amount and nature of deleterious genetic variation. The extent to which recent population size changes have impacted deleterious variation in humans has been a question of considerable interest and debate. An emerging consensus is that the Out-of-Africa bottleneck and subsequent growth events have been too short to cause meaningful differences in genetic load between populations; though changes in the number and average frequencies of deleterious variants have taken place. To provide more support for this view and to offer additional insight into the divergent evolution of deleterious variation across populations, we numerically solve time-inhomogeneous diffusion equations and study the temporal dynamics of the frequency spectra in models of population size change for modern humans. We observe how the response to demographic change differs by the strength of selection, and we then assess whether similar patterns are observed in exome sequence data from 33,370 and 5203 individuals of non-Finnish European and West African ancestry, respectively. Our theoretical results highlight how even simple summaries of the frequency spectrum can have complex responses to demographic change. These results support the finding that some apparent discrepancies between previous results have been driven by the behaviors of the precise summaries of deleterious variation. Further, our empirical results make clear the difficulty of inferring slight differences in frequency spectra using recent next-generation sequence data. PMID:28159863
Karyotype Analysis of Four Vicia Species using In Situ Hybridization with Repetitive Sequences
NAVRÁTILOVÁ, ALICE; NEUMANN, PAVEL; MACAS, JIŘÍ
2003-01-01
Mitotic chromosomes of four Vicia species (V. sativa, V. grandiflora, V. pannonica and V. narbonensis) were subjected to in situ hybridization with probes derived from conserved plant repetitive DNA sequences (18S–25S and 5S rDNA, telomeres) and genus‐specific satellite repeats (VicTR‐A and VicTR‐B). Numbers and positions of hybridization signals provided cytogenetic landmarks suitable for unambiguous identification of all chromosomes, and establishment of the karyotypes. The VicTR‐A and ‐B sequences, in particular, produced highly informative banding patterns that alone were sufficient for discrimination of all chromosomes. However, these patterns were not conserved among species and thus could not be employed for identification of homologous chromosomes. This fact, together with observed variations in positions and numbers of rDNA loci, suggests considerable divergence between karyotypes of the species studied. PMID:12770847
Boldogköi, Zsolt
2004-09-01
Population genetics, the mathematical theory of modern evolutionary biology, defines evolution as the alteration of the frequency of distinct gene variants (alleles) differing in fitness over the time. The major problem with this view is that in gene and protein sequences we can find little evidence concerning the molecular basis of phenotypic variance, especially those that would confer adaptive benefit to the bearers. Some novel data, however, suggest that a large amount of genetic variation exists in the regulatory region of genes within populations. In addition, comparison of homologous DNA sequences of various species shows that evolution appears to depend more strongly on gene expression than on the genes themselves. Furthermore, it has been demonstrated in several systems that genes form functional networks, whose products exhibit interrelated expression profiles. Finally, it has been found that regulatory circuits of development behave as evolutionary units. These data demonstrate that our view of evolution calls for a new synthesis. In this article I propose a novel concept, termed the selfish gene network hypothesis, which is based on an overall consideration of the above findings. The major statements of this hypothesis are as follows. (1) Instead of individual genes, gene networks (GNs) are responsible for the determination of traits and behaviors. (2) The primary source of microevolution is the intraspecific polymorphism in GNs and not the allelic variation in either the coding or the regulatory sequences of individual genes. (3) GN polymorphism is generated by the variation in the regulatory regions of the component genes and not by the variance in their coding sequences. (4) Evolution proceeds through continuous restructuring of the composition of GNs rather than fixing of specific alleles or GN variants.
Wu, Jianzhong; Fujisawa, Masaki; Tian, Zhixi; Yamagata, Harumi; Kamiya, Kozue; Shibata, Michie; Hosokawa, Satomi; Ito, Yukiyo; Hamada, Masao; Katagiri, Satoshi; Kurita, Kanako; Yamamoto, Mayu; Kikuta, Ari; Machita, Kayo; Karasawa, Wataru; Kanamori, Hiroyuki; Namiki, Nobukazu; Mizuno, Hiroshi; Ma, Jianxin; Sasaki, Takuji; Matsumoto, Takashi
2009-12-01
Centromeres are sites for assembly of the chromosomal structures that mediate faithful segregation at mitosis and meiosis. This function is conserved across species, but the DNA components that are involved in kinetochore formation differ greatly, even between closely related species. To shed light on the nature, evolutionary timing and evolutionary dynamics of rice centromeres, we decoded a 2.25-Mb DNA sequence covering the centromeric region of chromosome 8 of an indica rice variety, 'Kasalath' (Kas-Cen8). Analysis of repetitive sequences in Kas-Cen8 led to the identification of 222 long terminal repeat (LTR)-retrotransposon elements and 584 CentO satellite monomers, which account for 59.2% of the region. A comparison of the Kas-Cen8 sequence with that of japonica rice 'Nipponbare' (Nip-Cen8) revealed that about 66.8% of the Kas-Cen8 sequence was collinear with that of Nip-Cen8. Although the 27 putative genes are conserved between the two subspecies, only 55.4% of the total LTR-retrotransposon elements in 'Kasalath' had orthologs in 'Nipponbare', thus reflecting recent proliferation of a considerable number of LTR-retrotransposons since the divergence of two rice subspecies of indica and japonica within Oryza sativa. Comparative analysis of the subfamilies, time of insertion, and organization patterns of inserted LTR-retrotransposons between the two Cen8 regions revealed variations between 'Kasalath' and 'Nipponbare' in the preferential accumulation of CRR elements, and the expansion of CentO satellite repeats within the core domain of Cen8. Together, the results provide insights into the recent proliferation of LTR-retrotransposons, and the rapid expansion of CentO satellite repeats, underlying the dynamic variation and plasticity of plant centromeres.
Khatri, Bhavin S.; Goldstein, Richard A.
2015-01-01
Speciation is fundamental to understanding the huge diversity of life on Earth. Although still controversial, empirical evidence suggests that the rate of speciation is larger for smaller populations. Here, we explore a biophysical model of speciation by developing a simple coarse-grained theory of transcription factor-DNA binding and how their co-evolution in two geographically isolated lineages leads to incompatibilities. To develop a tractable analytical theory, we derive a Smoluchowski equation for the dynamics of binding energy evolution that accounts for the fact that natural selection acts on phenotypes, but variation arises from mutations in sequences; the Smoluchowski equation includes selection due to both gradients in fitness and gradients in sequence entropy, which is the logarithm of the number of sequences that correspond to a particular binding energy. This simple consideration predicts that smaller populations develop incompatibilities more quickly in the weak mutation regime; this trend arises as sequence entropy poises smaller populations closer to incompatible regions of phenotype space. These results suggest a generic coarse-grained approach to evolutionary stochastic dynamics, allowing realistic modelling at the phenotypic level. PMID:25936759
Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing
Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas
2016-01-01
ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018
Li, Peng; Wang, Dechen; Yan, Jinli; Zhou, Jianuan; Deng, Yinyue; Jiang, Zide; Cao, Bihao; He, Zifu; Zhang, Lianhui
2016-01-01
Ralstonia solanacearum species complex is a devastating group of phytopathogens with an unusually wide host range and broad geographical distribution. R. solanacearum isolates may differ considerably in various properties including host range and pathogenicity, but the underlying genetic bases remain vague. Here, we conducted the genome sequencing of strain EP1 isolated from Guangdong Province of China, which belongs to phylotype I and is highly virulent to a range of solanaceous crops. Its complete genome contains a 3.95-Mb chromosome and a 2.05-Mb mega-plasmid, which is considerably bigger than reported genomes of other R. solanacearum strains. Both the chromosome and the mega-plasmid have essential house-keeping genes and many virulence genes. Comparative analysis of strain EP1 with other 3 phylotype I and 3 phylotype II, III, IV strains unveiled substantial genome rearrangements, insertions and deletions. Genome sequences are relatively conserved among the 4 phylotype I strains, but more divergent among strains of different phylotypes. Moreover, the strains exhibited considerable variations in their key virulence genes, including those encoding secretion systems and type III effectors. Our results provide valuable information for further elucidation of the genetic basis of diversified virulences and host range of R. solanacearum species. PMID:27833603
A visual tracking method based on deep learning without online model updating
NASA Astrophysics Data System (ADS)
Tang, Cong; Wang, Yicheng; Feng, Yunsong; Zheng, Chao; Jin, Wei
2018-02-01
The paper proposes a visual tracking method based on deep learning without online model updating. In consideration of the advantages of deep learning in feature representation, deep model SSD (Single Shot Multibox Detector) is used as the object extractor in the tracking model. Simultaneously, the color histogram feature and HOG (Histogram of Oriented Gradient) feature are combined to select the tracking object. In the process of tracking, multi-scale object searching map is built to improve the detection performance of deep detection model and the tracking efficiency. In the experiment of eight respective tracking video sequences in the baseline dataset, compared with six state-of-the-art methods, the method in the paper has better robustness in the tracking challenging factors, such as deformation, scale variation, rotation variation, illumination variation, and background clutters, moreover, its general performance is better than other six tracking methods.
Finnerty, J R; Block, B A
1992-06-01
We were able to differentiate between species of billfish (Istiophoridae family) and to detect considerable intraspecific variation in the blue marlin (Makaira nigricans) by directly sequencing a polymerase chain reaction (PCR)-amplified, 612-bp fragment of the mitochondrial cytochrome b gene. Thirteen variable nucleotide sites separated blue marlin (n = 26) into 7 genotypes. On average, these genotypes differed by 5.7 base substitutions. A smaller sample of swordfish from an equally broad geographic distribution displayed relatively little intraspecific variation, with an average of 1.3 substitutions separating different genotypes. A cladistic analysis of blue marlin cytochrome b variants indicates two major divergent evolutionary lines within the species. The frequencies of these two major evolutionary lines differ significantly between Atlantic and Pacific ocean basins. This finding is important given that the Atlantic stocks of blue marlin are considered endangered. Migration from the Pacific can help replenish the numbers of blue marlin in the Atlantic, but the loss of certain mitochondrial DNA haplotypes in the Atlantic due to overfishing probably could not be remedied by an influx of Pacific fish because of their absence in the Pacific population. Fishery management strategies should attempt to preserve the genetic diversity within the species. The detection of DNA sequence polymorphism indicates the utility of PCR technology in pelagic fishery genetics.
Modliszewski, Jennifer L; Thomas, David T; Fan, Chuanzhu; Crawford, Daniel J; Depamphilis, Claude W; Xiang, Qiu-Yun Jenny
2006-03-01
Knowledge regarding the origin and maintenance of hybrid zones is critical for understanding the evolutionary outcomes of natural hybridization. To evaluate the contribution of historical contact vs. long-distance gene flow in the formation of a broad hybrid zone in central and northern Georgia that involves Aesculus pavia, A. sylvatica, and A. flava, three cpDNA regions (matK, trnD-trnT, and trnH-trnK) were analyzed. The maternal inheritance of cpDNA in Aesculus was confirmed via sequencing of matK from progeny of controlled crosses. Restriction site analyses identified 21 unique haplotypes among 248 individuals representing 29 populations from parental species and hybrids. Haplotypes were sequenced for all cpDNA regions. Restriction site and sequence data were subjected to phylogeographic and population genetic analyses. Considerable cpDNA variation was detected in the hybrid zone, as well as ancestral cpDNA polymorphism; furthermore, the distribution of haplotypes indicates limited interpopulation gene flow via seeds. The genealogy and structure of genetic variation further support the historical presence of A. pavia in the Piedmont, although they are at present locally extinct. In conjunction with previous allozyme studies, the cpDNA data suggest that the hybrid zone originated through historical local gene flow, yet is maintained by periodic long-distance pollen dispersal.
Furney, Simon J; Turajlic, Samra; Stamp, Gordon; Nohadani, Mahrokh; Carlisle, Anna; Thomas, J Meirion; Hayes, Andrew; Strauss, Dirk; Gore, Martin; van den Oord, Joost; Larkin, James; Marais, Richard
2013-07-01
Mucosal melanoma displays distinct clinical and epidemiological features compared to cutaneous melanoma. Here we used whole genome and whole exome sequencing to characterize the somatic alterations and mutation spectra in the genomes of ten mucosal melanomas. We observed somatic mutation rates that are considerably lower than occur in sun-exposed cutaneous melanoma, but comparable to the rates seen in cancers not associated with exposure to known mutagens. In particular, the mutation signatures are not indicative of ultraviolet light- or tobacco smoke-induced DNA damage. Genes previously reported as mutated in other cancers were also mutated in mucosal melanoma. Notably, there were substantially more copy number and structural variations in mucosal melanoma than have been reported in cutaneous melanoma. Thus, mucosal and cutaneous melanomas are distinct diseases with discrete genetic features. Our data suggest that different mechanisms underlie the genesis of these diseases and that structural variations play a more important role in mucosal than in cutaneous melanomagenesis. Copyright © 2013 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Gross, G; Snel, J; Boekhorst, J; Smits, M A; Kleerebezem, M
2010-03-01
Recently, we have identified the mannose-specific adhesin encoding gene (msa) of Lactobacillus plantarum. In the current study, structure and function of this potentially probiotic effector gene were further investigated, exploring genetic diversity of msa in L. plantarum in relation to mannose adhesion capacity. The results demonstrate that there is considerable variation in quantitative in vitro mannose adhesion capacity, which is paralleled by msa gene sequence variation. The msa genes of different L. plantarum strains encode proteins with variable domain composition. Construction of L. plantarum 299v mutant strains revealed that the msa gene product is the key-protein for mannose adhesion, also in a strain with high mannose adhering capacity. However, no straightforward correlation between adhesion capacity and domain composition of Msa in L. plantarum could be identified. Nevertheless, differences in Msa sequences in combination with variable genetic background of specific bacterial strains appears to determine mannose adhesion capacity and potentially affects probiotic properties. These findings exemplify the strain-specificity of probiotic characteristics and illustrate the need for careful and molecular selection of new candidate probiotics.
Jorge, Paulo H; Mastrochirico-Filho, Vito A; Hata, Milene E; Mendes, Natália J; Ariede, Raquel B; de Freitas, Milena Vieira; Vera, Manuel; Porto-Foresti, Fábio; Hashimoto, Diogo T
2018-01-01
The pirapitinga, Piaractus brachypomus (Characiformes, Serrasalmidae), is a fish from the Amazon basin and is considered to be one of the main native species used in aquaculture production in South America. The objectives of this study were: (1) to perform liver transcriptome sequencing of pirapitinga through NGS and then validate a set of microsatellite markers for this species; and (2) to use polymorphic microsatellites for analysis of genetic variability in farmed stocks. The transcriptome sequencing was carried out through the Roche/454 technology, which resulted in 3,696 non-redundant contigs. Of this total, 2,568 contigs had similarity in the non-redundant (nr) protein database (Genbank) and 2,075 sequences were characterized in the categories of Gene Ontology (GO). After the validation process of 30 microsatellite loci, eight markers showed polymorphism. The analysis of these polymorphic markers in farmed stocks revealed that fish farms from North Brazil had a higher genetic diversity than fish farms from Southeast Brazil. AMOVA demonstrated that the highest proportion of variation was presented within the populations. However, when comparing different groups (1: Wild; 2: North fish farms; 3: Southeast fish farms), a considerable variation between the groups was observed. The F ST values showed the occurrence of genetic structure among the broodstocks from different regions of Brazil. The transcriptome sequencing in pirapitinga provided important genetic resources for biological studies in this non-model species, and microsatellite data can be used as the framework for the genetic management of breeding stocks in Brazil, which might provide a basis for a genetic pre-breeding programme.
Bimolata, Waikhom; Kumar, Anirudh; Sundaram, Raman Meenakshi; Laha, Gouri Shankar; Qureshi, Insaf Ahmed; Reddy, Gajjala Ashok; Ghazi, Irfan Ahmad
2013-08-01
Xa27 is one of the important R-genes, effective against bacterial blight disease of rice caused by Xanthomonas oryzae pv. oryzae (Xoo). Using natural population of Oryza, we analyzed the sequence variation in the functionally important domains of Xa27 across the Oryza species. DNA sequences of Xa27 alleles from 27 rice accessions revealed higher nucleotide diversity among the reported R-genes of rice. Sequence polymorphism analysis revealed synonymous and non-synonymous mutations in addition to a number of InDels in non-coding regions of the gene. High sequence variation was observed in the promoter region including the 5'UTR with 'π' value 0.00916 and 'θ w ' = 0.01785. Comparative analysis of the identified Xa27 alleles with that of IRBB27 and IR24 indicated the operation of both positive selection (Ka/Ks > 1) and neutral selection (Ka/Ks ≈ 0). The genetic distances of alleles of the gene from Oryza nivara were nearer to IRBB27 as compared to IR24. We also found the presence of conserved and null UPT (upregulated by transcriptional activator) box in the isolated alleles. Considerable amino acid polymorphism was localized in the trans-membrane domain for which the functional significance is yet to be elucidated. However, the absence of functional UPT box in all the alleles except IRBB27 suggests the maintenance of single resistant allele throughout the natural population.
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
Van Nostrand, Joy D.; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong
2017-01-01
Illumina’s MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1–3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility. PMID:28453559
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wen, Chongqing; Wu, Liyou; Qin, Yujia
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
Wen, Chongqing; Wu, Liyou; Qin, Yujia; ...
2017-04-28
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform.
Wen, Chongqing; Wu, Liyou; Qin, Yujia; Van Nostrand, Joy D; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong
2017-01-01
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.
Plasmodium copy number variation scan: gene copy numbers evaluation in haploid genomes.
Beghain, Johann; Langlois, Anne-Claire; Legrand, Eric; Grange, Laura; Khim, Nimol; Witkowski, Benoit; Duru, Valentine; Ma, Laurence; Bouchier, Christiane; Ménard, Didier; Paul, Richard E; Ariey, Frédéric
2016-04-12
In eukaryotic genomes, deletion or amplification rates have been estimated to be a thousand more frequent than single nucleotide variation. In Plasmodium falciparum, relatively few transcription factors have been identified, and the regulation of transcription is seemingly largely influenced by gene amplification events. Thus copy number variation (CNV) is a major mechanism enabling parasite genomes to adapt to new environmental changes. Currently, the detection of CNVs is based on quantitative PCR (qPCR), which is significantly limited by the relatively small number of genes that can be analysed at any one time. Technological advances that facilitate whole-genome sequencing, such as next generation sequencing (NGS) enable deeper analyses of the genomic variation to be performed. Because the characteristics of Plasmodium CNVs need special consideration in algorithms and strategies for which classical CNV detection programs are not suited a dedicated algorithm to detect CNVs across the entire exome of P. falciparum was developed. This algorithm is based on a custom read depth strategy through NGS data and called PlasmoCNVScan. The analysis of CNV identification on three genes known to have different levels of amplification and which are located either in the nuclear, apicoplast or mitochondrial genomes is presented. The results are correlated with the qPCR experiments, usually used for identification of locus specific amplification/deletion. This tool will facilitate the study of P. falciparum genomic adaptation in response to ecological changes: drug pressure, decreased transmission, reduction of the parasite population size (transition to pre-elimination endemic area).
Information Propagation in Developmental Enhancers
NASA Astrophysics Data System (ADS)
Jena, Siddhartha; Levine, Michael
Rather than encoding information about protein sequence, certain lengths of noncoding DNA, called enhancers, interact with protein machinery such as transcription factors to precisely regulate gene expression. Enhancers have been studied extensively in the fruit fly Drosophila melanogaster, where they regulate the expression of developmental genes that establish the blueprint of the adult fly. It has been suggested that enhancer sequences possess a specific but unknown syntax with regards to the placement and strength of transcription factor binding sites. Moreover, studies in divergent fly species have shown that compensatory evolution allows for maintenance of enhancer functionality despite considerable variation in primary DNA sequence. Here, the possible role of enhancers as signal processing modules is studied as a way of explaining these two findings. We first demonstrate how this framework can be used to explain the fine-tuned spatiotemporal dynamics of gene expression. We then explore the evolutionary pressure on enhancer sequences and the resulting emergence of enhancers that are linked by compensatory mutations. This study provides a possible mechanism for the function of multiple enhancers linked to a single gene.
Stratigraphy and structure of coalbed methane reservoirs in the United States: an overview
Pashin, J.C.
1998-01-01
Stratigraphy and geologic structure determine the shape, continuity and permeability of coal and are therefore critical considerations for designing exploration and production strategies for coalbed methane. Coal in the United states is dominantly of Pennsylvanian, Cretaceous and Tertiary age, and to date, more than 90% of the coalbed methane produced is from Pennsylvanian and cretaceous strata of the Black Warrior and San Juan Basins. Investigations of these basins establish that sequence stratigraphy is a promising approach for regional characterization of coalbed methane reservoirs. Local stratigraphic variation within these strata is the product of sedimentologic and tectonic processes and is a consideration for selecting completion zones. Coalbed methane production in the United States is mainly from foreland and intermontane basins containing diverse compression and extensional structures. Balanced structural models can be used to construct and validate cross sections as well as to quantify layer-parallel strain and predict the distribution of fractures. Folds and faults influence gas and water production in diverse ways. However, interwell heterogeneity related to fractures and shear structures makes the performance of individual wells difficult to predict.Stratigraphy and geologic structure determine the shape, continuity and permeability of coal and are therefore critical considerations for designing exploration and production strategies for coalbed methane. Coal in the United States is dominantly of Pennsylvanian, Cretaceous and Tertiary age, and to date, more than 90% of the coalbed methane produced is from Pennsylvanian and Cretaceous strata of the Black Warrior and San Juan Basins. Investigations of these basins establish that sequence stratigraphy is a promising approach for regional characterization of coalbed methane reservoirs. Local stratigraphic variation within these strata is the product of sedimentologic and tectonic processes and is a consideration for selecting completion zones. Coalbed methane production in the United States is mainly from foreland and intermontane basins containing diverse compressional and extensional structures. Balanced structural models can be used to construct and validate cross sections as well as to quantify layer-parallel strain and predict the distribution of fractures. Folds and faults influence gas and water production in diverse ways. However, interwell heterogeneity related to fractures and shear structures makes the performance of individual wells difficult to predict.
Phylogeographic Differentiation of Mitochondrial DNA in Han Chinese
Yao, Yong-Gang; Kong, Qing-Peng; Bandelt, Hans-Jürgen; Kivisild, Toomas; Zhang, Ya-Ping
2002-01-01
To characterize the mitochondrial DNA (mtDNA) variation in Han Chinese from several provinces of China, we have sequenced the two hypervariable segments of the control region and the segment spanning nucleotide positions 10171–10659 of the coding region, and we have identified a number of specific coding-region mutations by direct sequencing or restriction-fragment–length–polymorphism tests. This allows us to define new haplogroups (clades of the mtDNA phylogeny) and to dissect the Han mtDNA pool on a phylogenetic basis, which is a prerequisite for any fine-grained phylogeographic analysis, the interpretation of ancient mtDNA, or future complete mtDNA sequencing efforts. Some of the haplogroups under study differ considerably in frequencies across different provinces. The southernmost provinces show more pronounced contrasts in their regional Han mtDNA pools than the central and northern provinces. These and other features of the geographical distribution of the mtDNA haplogroups observed in the Han Chinese make an initial Paleolithic colonization from south to north plausible but would suggest subsequent migration events in China that mainly proceeded from north to south and east to west. Lumping together all regional Han mtDNA pools into one fictive general mtDNA pool or choosing one or two regional Han populations to represent all Han Chinese is inappropriate for prehistoric considerations as well as for forensic purposes or medical disease studies. PMID:11836649
Devesse, Laurence; Ballard, David; Davenport, Lucinda; Riethorst, Immy; Mason-Buck, Gabriella; Syndercombe Court, Denise
2018-05-01
By using sequencing technology to genotype loci of forensic interest it is possible to simultaneously target autosomal, X and Y STRs as well as identity, ancestry and phenotypic informative SNPs, resulting in a breadth of data obtained from a single run that is considerable when compared to that generated with standard technologies. It is important however that this information aligns with the genotype data currently obtained using commercially available kits for CE-based investigations such that results are compatible with existing databases and hence can be of use to the forensic community. In this work, 400 samples were typed using commercially available STR kits and CE, as well as using the Ilumina ForenSeq™ DNA Signature Prep Kit and MiSeq ® FGx to assess concordance of autosomal STRs and population variability. Results show a concordance rate between the two technologies exceeding 99.98% while numerous novel sequence based alleles are described. In order to make use of the sequence variation observed, sequence specific allele frequencies were generated for White British and British Chinese populations. Copyright © 2017 Elsevier B.V. All rights reserved.
Candidate chemosensory ionotropic receptors in a Lepidoptera.
Olivier, V; Monsempes, C; François, M-C; Poivet, E; Jacquin-Joly, E
2011-04-01
A new family of candidate chemosensory ionotropic receptors (IRs) related to ionotropic glutamate receptors (iGluRs) was recently discovered in Drosophila melanogaster. Through Blast analyses of an expressed sequenced tag library prepared from male antennae of the noctuid moth Spodoptera littoralis, we identified 12 unigenes encoding proteins related to D. melanogaster and Bombyx mori IRs. Their full length sequences were obtained and the analyses of their expression patterns suggest that they were exclusively expressed or clearly enriched in chemosensory organs. The deduced protein sequences were more similar to B. mori and D. melanogaster IRs than to iGluRs and showed considerable variations in the predicted ligand-binding domains; none have the three glutamate-interacting residues found in iGluRs, suggesting different binding specificities. Our data suggest that we identified members of the insect IR chemosensory receptor family in S. littoralis and we report here the first demonstration of IR expression in Lepidoptera. © 2010 The Authors. Insect Molecular Biology © 2010 The Royal Entomological Society.
Diekmann, Kerstin; Hodkinson, Trevor R.; Barth, Susanne
2012-01-01
Background and Aims Lolium perenne (perennial ryegrass) is the most important forage grass species of temperate regions. We have previously released the chloroplast genome sequence of L. perenne ‘Cashel’. Here nine chloroplast microsatellite markers are published, which were designed based on knowledge about genetically variable regions within the L. perenne chloroplast genome. These markers were successfully used for characterizing the genetic diversity in Lolium and different grass species. Methods Chloroplast genomes of 14 Poaceae taxa were screened for mononucleotide microsatellite repeat regions and primers designed for their amplification from nine loci. The potential of these markers to assess genetic diversity was evaluated on a set of 16 Irish and 15 European L. perenne ecotypes, nine L. perenne cultivars, other Lolium taxa and other grass species. Key Results All analysed Poaceae chloroplast genomes contained more than 200 mononucleotide repeats (chloroplast simple sequence repeats, cpSSRs) of at least 7 bp in length, concentrated mainly in the large single copy region of the genome. Nucleotide composition varied considerably among subfamilies (with Pooideae biased towards poly A repeats). The nine new markers distinguish L. perenne from all non-Lolium taxa. TeaCpSSR28 was able to distinguish between all Lolium species and Lolium multiflorum due to an elongation of an A8 mononucleotide repeat in L. multiflorum. TeaCpSSR31 detected a considerable degree of microsatellite length variation and single nucleotide polymorphism. TeaCpSSR27 revealed variation within some L. perenne accessions due to a 44-bp indel and was hence readily detected by simple agarose gel electrophoresis. Smaller insertion/deletion events or single nucleotide polymorphisms detected by these new markers could be visualized by polyacrylamide gel electrophoresis or DNA sequencing, respectively. Conclusions The new markers are a valuable tool for plant breeding companies, seed testing agencies and the wider scientific community due to their ability to monitor genetic diversity within breeding pools, to trace maternal inheritance and to distinguish closely related species. PMID:22419761
Diekmann, Kerstin; Hodkinson, Trevor R; Barth, Susanne
2012-11-01
Lolium perenne (perennial ryegrass) is the most important forage grass species of temperate regions. We have previously released the chloroplast genome sequence of L. perenne 'Cashel'. Here nine chloroplast microsatellite markers are published, which were designed based on knowledge about genetically variable regions within the L. perenne chloroplast genome. These markers were successfully used for characterizing the genetic diversity in Lolium and different grass species. Chloroplast genomes of 14 Poaceae taxa were screened for mononucleotide microsatellite repeat regions and primers designed for their amplification from nine loci. The potential of these markers to assess genetic diversity was evaluated on a set of 16 Irish and 15 European L. perenne ecotypes, nine L. perenne cultivars, other Lolium taxa and other grass species. All analysed Poaceae chloroplast genomes contained more than 200 mononucleotide repeats (chloroplast simple sequence repeats, cpSSRs) of at least 7 bp in length, concentrated mainly in the large single copy region of the genome. Nucleotide composition varied considerably among subfamilies (with Pooideae biased towards poly A repeats). The nine new markers distinguish L. perenne from all non-Lolium taxa. TeaCpSSR28 was able to distinguish between all Lolium species and Lolium multiflorum due to an elongation of an A(8) mononucleotide repeat in L. multiflorum. TeaCpSSR31 detected a considerable degree of microsatellite length variation and single nucleotide polymorphism. TeaCpSSR27 revealed variation within some L. perenne accessions due to a 44-bp indel and was hence readily detected by simple agarose gel electrophoresis. Smaller insertion/deletion events or single nucleotide polymorphisms detected by these new markers could be visualized by polyacrylamide gel electrophoresis or DNA sequencing, respectively. The new markers are a valuable tool for plant breeding companies, seed testing agencies and the wider scientific community due to their ability to monitor genetic diversity within breeding pools, to trace maternal inheritance and to distinguish closely related species.
Mascagni, Flavia; Barghini, Elena; Giordani, Tommaso; Rieseberg, Loren H.; Cavallini, Andrea; Natali, Lucia
2015-01-01
The sunflower (Helianthus annuus) genome contains a very large proportion of transposable elements, especially long terminal repeat retrotransposons. However, knowledge on the retrotransposon-related variability within this species is still limited. We used next-generation sequencing (NGS) technologies to perform a quantitative and qualitative survey of intraspecific variation of the retrotransposon fraction of the genome across 15 genotypes—7 wild accessions and 8 cultivars—of H. annuus. By mapping the Illumina reads of the 15 genotypes onto a library of sunflower long terminal repeat retrotransposons, we observed considerable variability in redundancy among genotypes, at both superfamily and family levels. In another analysis, we mapped Illumina paired reads to two sets of sequences, that is, long terminal repeat retrotransposons and protein-encoding sequences, and evaluated the extent of retrotransposon proximity to genes in the sunflower genome by counting the number of paired reads in which one read mapped to a retrotransposon and the other to a gene. Large variability among genotypes was also ascertained for retrotransposon proximity to genes. Both long terminal repeat retrotransposon redundancy and proximity to genes varied among retrotransposon families and also between cultivated and wild genotypes. Such differences are discussed in relation to the possible role of long terminal repeat retrotransposons in the domestication of sunflower. PMID:26608057
Pseudo Steady-State Free Precession for MR-Fingerprinting.
Assländer, Jakob; Glaser, Steffen J; Hennig, Jürgen
2017-03-01
This article discusses the signal behavior in the case the flip angle in steady-state free precession sequences is continuously varied as suggested for MR-fingerprinting sequences. Flip angle variations prevent the establishment of a steady state and introduce instabilities regarding to magnetic field inhomogeneities and intravoxel dephasing. We show how a pseudo steady state can be achieved, which restores the spin echo nature of steady-state free precession. Based on geometrical considerations, relationships between the flip angle, repetition and echo time are derived that suffice to the establishment of a pseudo steady state. The theory is tested with Bloch simulations as well as phantom and in vivo experiments. A typical steady-state free precession passband can be restored with the proposed conditions. The stability of the pseudo steady state is demonstrated by comparing the evolution of the signal of a single isochromat to one resulting from a spin ensemble. As confirmed by experiments, magnetization in a pseudo steady state can be described with fewer degrees of freedom compared to the original fingerprinting and the pseudo steady state results in more reliable parameter maps. The proposed conditions restore the spin-echo-like signal behavior typical for steady-state free precession in fingerprinting sequences, making this approach more robust to B 0 variations. Magn Reson Med 77:1151-1161, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Sequencing Insights into Microbial Communities in the Water and Sediments of Fenghe River, China.
Lu, Sidan; Sun, Yujiao; Zhao, Xuan; Wang, Lei; Ding, Aizhong; Zhao, Xiaohui
2016-07-01
The connection between microbial community structure and spatial variation and pollution in river waters has been widely investigated. However, water and sediments together have rarely been explored. In this study, Illumina high-throughput sequencing was performed to analyze microbes in 24 water and sediment samples from natural to anthropogenic sources and from headstream to downstream areas. These data were used to assess variability in microbial community structure and diversity along in the Fenghe River, China. The relationship between bacterial diversity and environmental parameters was statistically analyzed. An average of 1682 operational taxonomic units was obtained. Microbial diversity increased from the headstream to downstream and tended to be greater in sediment compared with water. The water samples near the headstream endured relatively low Shannon and Chao1 indices. These diversity indices and the number of observed species in the water and sediment samples increase downstream. The parameters also differ in the two river tributaries. Community structures shift based on the extent of nitrogen pollution variation in the sediment and water samples. The four most dominant genera in the water community were Escherichia, Acinetobacter, Comamonadaceae, and Pseudomonas. In the sediments, the most dominant genera were Stramenopiles, Flavobacterium, Pseudomonas, and Comamonadaceae. The number of ammonia-oxidizing archaea in the headstream water slightly differed from that in the sediment but varied considerably in the downstream sediments. Statistical analysis showed that community variation is correlated with changes in ammonia nitrogen, total nitrogen, and nitrate nitrogen. This study identified different microbial community structures in river water and sediments. Overall this study emphasized the need to elucidate spatial variations in bacterial diversity in water and sediments associated with physicochemical gradients and to show the effects of such variation on waterborne microbial community structures.
Host and Environmental Factors Affecting the Intestinal Microbiota in Chickens
Kers, Jannigje G.; Velkers, Francisca C.; Fischer, Egil A. J.; Hermes, Gerben D. A.; Stegeman, J. A.; Smidt, Hauke
2018-01-01
The initial development of intestinal microbiota in poultry plays an important role in production performance, overall health and resistance against microbial infections. Multiplexed sequencing of 16S ribosomal RNA gene amplicons is often used in studies, such as feed intervention or antimicrobial drug trials, to determine corresponding effects on the composition of intestinal microbiota. However, considerable variation of intestinal microbiota composition has been observed both within and across studies. Such variation may in part be attributed to technical factors, such as sampling procedures, sample storage, DNA extraction, the choice of PCR primers and corresponding region to be sequenced, and the sequencing platforms used. Furthermore, part of this variation in microbiota composition may also be explained by different host characteristics and environmental factors. To facilitate the improvement of design, reproducibility and interpretation of poultry microbiota studies, we have reviewed the literature on confounding factors influencing the observed intestinal microbiota in chickens. First, it has been identified that host-related factors, such as age, sex, and breed, have a large effect on intestinal microbiota. The diversity of chicken intestinal microbiota tends to increase most during the first weeks of life, and corresponding colonization patterns seem to differ between layer- and meat-type chickens. Second, it has been found that environmental factors, such as biosecurity level, housing, litter, feed access and climate also have an effect on the composition of the intestinal microbiota. As microbiota studies have to deal with many of these unknown or hidden host and environmental variables, the choice of study designs can have a great impact on study outcomes and interpretation of the data. Providing details on a broad range of host and environmental factors in articles and sequence data repositories is highly recommended. This creates opportunities to combine data from different studies for meta-analysis, which will facilitate scientific breakthroughs toward nutritional and husbandry associated strategies to improve animal health and performance. PMID:29503637
Host and Environmental Factors Affecting the Intestinal Microbiota in Chickens.
Kers, Jannigje G; Velkers, Francisca C; Fischer, Egil A J; Hermes, Gerben D A; Stegeman, J A; Smidt, Hauke
2018-01-01
The initial development of intestinal microbiota in poultry plays an important role in production performance, overall health and resistance against microbial infections. Multiplexed sequencing of 16S ribosomal RNA gene amplicons is often used in studies, such as feed intervention or antimicrobial drug trials, to determine corresponding effects on the composition of intestinal microbiota. However, considerable variation of intestinal microbiota composition has been observed both within and across studies. Such variation may in part be attributed to technical factors, such as sampling procedures, sample storage, DNA extraction, the choice of PCR primers and corresponding region to be sequenced, and the sequencing platforms used. Furthermore, part of this variation in microbiota composition may also be explained by different host characteristics and environmental factors. To facilitate the improvement of design, reproducibility and interpretation of poultry microbiota studies, we have reviewed the literature on confounding factors influencing the observed intestinal microbiota in chickens. First, it has been identified that host-related factors, such as age, sex, and breed, have a large effect on intestinal microbiota. The diversity of chicken intestinal microbiota tends to increase most during the first weeks of life, and corresponding colonization patterns seem to differ between layer- and meat-type chickens. Second, it has been found that environmental factors, such as biosecurity level, housing, litter, feed access and climate also have an effect on the composition of the intestinal microbiota. As microbiota studies have to deal with many of these unknown or hidden host and environmental variables, the choice of study designs can have a great impact on study outcomes and interpretation of the data. Providing details on a broad range of host and environmental factors in articles and sequence data repositories is highly recommended. This creates opportunities to combine data from different studies for meta-analysis, which will facilitate scientific breakthroughs toward nutritional and husbandry associated strategies to improve animal health and performance.
Śliwińska-Jewsiewicka, A; Kuciński, M; Kirtiklis, L; Dobosz, S; Ocalewicz, K; Jankun, Malgorzata
2015-08-01
Brook trout Salvelinus fontinalis (Mitchill, 1814) chromosomes have been analyzed using conventional and molecular cytogenetic techniques enabling characteristics and chromosomal location of heterochromatin, nucleolus organizer regions (NORs), ribosomal RNA-encoding genes and telomeric DNA sequences. The C-banding and chromosome digestion with the restriction endonucleases demonstrated distribution and heterogeneity of the heterochromatin in the brook trout genome. DNA sequences of the ribosomal RNA genes, namely the nucleolus-forming 28S (major) and non-nucleolus-forming 5S (minor) rDNAs, were physically mapped using fluorescence in situ hybridization (FISH) and primed in situ labelling. The minor rDNA locus was located on the subtelo-acrocentric chromosome pair No. 9, whereas the major rDNA loci were dispersed on 14 chromosome pairs, showing a considerable inter-individual variation in the number and location. The major and minor rDNA loci were located at different chromosomes. Multichromosomal location (3-6 sites) of the NORs was demonstrated by silver nitrate (AgNO3) impregnation. All Ag-positive i.e. active NORs corresponded to the GC-rich blocks of heterochromatin. FISH with telomeric probe showed the presence of the interstitial telomeric site (ITS) adjacent to the NOR/28S rDNA site on the chromosome 11. This ITS was presumably remnant of the chromosome rearrangement(s) leading to the genomic redistribution of the rDNA sequences. Comparative analysis of the cytogenetic data among several related salmonid species confirmed huge variation in the number and the chromosomal location of rRNA gene clusters in the Salvelinus genome.
Genome-wide Mapping Reveals Conservation of Promoter DNA Methylation Following Chicken Domestication
Li, Qinghe; Wang, Yuanyuan; Hu, Xiaoxiang; Zhao, Yaofeng; Li, Ning
2015-01-01
It is well-known that environment influences DNA methylation, however, the extent of heritable DNA methylation variation following animal domestication remains largely unknown. Using meDIP-chip we mapped the promoter methylomes for 23,316 genes in muscle tissues of ancestral and domestic chickens. We systematically examined the variation of promoter DNA methylation in terms of different breeds, differentially expressed genes, SNPs and genes undergo genetic selection sweeps. While considerable changes in DNA sequence and gene expression programs were prevalent, we found that the inter-strain DNA methylation patterns were highly conserved in promoter region between the wild and domestic chicken breeds. Our data suggests a global preservation of DNA methylation between the wild and domestic chicken breeds in either a genome-wide or locus-specific scale in chick muscle tissues. PMID:25735894
The BIG Data Center: from deposition to integration to translation
2017-01-01
Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. PMID:27899658
Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.
Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P
1997-11-01
A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.
Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays
Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie; Liévin, Jacques; Körzdörfer, Thomas; Rotaru, Alexandru; Gothelf, Kurt V.; Besenbacher, Flemming; Bald, Ilko
2014-01-01
The electronic structure of DNA is determined by its nucleotide sequence, which is for instance exploited in molecular electronics. Here we demonstrate that also the DNA strand breakage induced by low-energy electrons (18 eV) depends on the nucleotide sequence. To determine the absolute cross sections for electron induced single strand breaks in specific 13 mer oligonucleotides we used atomic force microscopy analysis of DNA origami based DNA nanoarrays. We investigated the DNA sequences 5′-TT(XYX)3TT with X = A, G, C and Y = T, BrU 5-bromouracil and found absolute strand break cross sections between 2.66 · 10−14 cm2 and 7.06 · 10−14 cm2. The highest cross section was found for 5′-TT(ATA)3TT and 5′-TT(ABrUA)3TT, respectively. BrU is a radiosensitizer, which was discussed to be used in cancer radiation therapy. The replacement of T by BrU into the investigated DNA sequences leads to a slight increase of the absolute strand break cross sections resulting in sequence-dependent enhancement factors between 1.14 and 1.66. Nevertheless, the variation of strand break cross sections due to the specific nucleotide sequence is considerably higher. Thus, the present results suggest the development of targeted radiosensitizers for cancer radiation therapy. PMID:25487346
Dor, Roi; Lovette, Irby J.; Safran, Rebecca J.; Billerman, Shawn M.; Huber, Gernot H.; Vortman, Yoni; Lotem, Arnon; McGowan, Andrew; Evans, Matthew R.; Cooper, Caren B.; Winkler, David W.
2011-01-01
Recent studies of several species have reported a latitudinal cline in the circadian clock gene, Clock, which influences rhythms in both physiology and behavior. Latitudinal variation in this gene may hence reflect local adaptation to seasonal variation. In some bird populations, there is also an among-individual association between Clock poly-Q genotype and clutch initiation date and incubation period. We examined Clock poly-Q allele variation in the Barn Swallow (Hirundo rustica), a species with a cosmopolitan geographic distribution and considerable variation in life-history traits that may be influenced by the circadian clock. We genotyped Barn Swallows from five populations (from three subspecies) and compared variation at the Clock locus to that at microsatellite loci and mitochondrial DNA (mtDNA). We found very low variation in the Clock poly-Q region, as >96% of individuals were homozygous, and the two other alleles at this locus were globally rare. Genetic differentiation based on the Clock poly-Q locus was not correlated with genetic differentiation based on either microsatellite loci or mtDNA sequences. Our results show that high diversity in Clock poly-Q is not general across avian species. The low Clock variation in the background of heterogeneity in microsatellite and mtDNA loci in Barn Swallows may be an outcome of stabilizing selection on the Clock locus. PMID:22216124
An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.
Zare, Fatima; Dow, Michelle; Monteleone, Nicholas; Hosny, Abdelrahman; Nabavi, Sheida
2017-05-31
Recently copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. Recently whole exome sequencing (WES) has become primary strategy for sequencing patient samples and study their genomics aberrations. However, compared to whole genome sequencing, WES introduces more biases and noise that make CNV detection very challenging. Additionally, tumors' complexity makes the detection of cancer specific CNVs even more difficult. Although many CNV detection tools have been developed since introducing NGS data, there are few tools for somatic CNV detection for WES data in cancer. In this study, we evaluated the performance of the most recent and commonly used CNV detection tools for WES data in cancer to address their limitations and provide guidelines for developing new ones. We focused on the tools that have been designed or have the ability to detect cancer somatic aberrations. We compared the performance of the tools in terms of sensitivity and false discovery rate (FDR) using real data and simulated data. Comparative analysis of the results of the tools showed that there is a low consensus among the tools in calling CNVs. Using real data, tools show moderate sensitivity (~50% - ~80%), fair specificity (~70% - ~94%) and poor FDRs (~27% - ~60%). Also, using simulated data we observed that increasing the coverage more than 10× in exonic regions does not improve the detection power of the tools significantly. The limited performance of the current CNV detection tools for WES data in cancer indicates the need for developing more efficient and precise CNV detection methods. Due to the complexity of tumors and high level of noise and biases in WES data, employing advanced novel segmentation, normalization and de-noising techniques that are designed specifically for cancer data is necessary. Also, CNV detection development suffers from the lack of a gold standard for performance evaluation. Finally, developing tools with user-friendly user interfaces and visualization features can enhance CNV studies for a broader range of users.
Bellissimo, Daniel B; Christopherson, Pamela A; Flood, Veronica H; Gill, Joan Cox; Friedman, Kenneth D; Haberichter, Sandra L; Shapiro, Amy D; Abshire, Thomas C; Leissinger, Cindy; Hoots, W Keith; Lusher, Jeanne M; Ragni, Margaret V; Montgomery, Robert R
2012-03-01
Diagnosis and classification of VWD is aided by molecular analysis of the VWF gene. Because VWF polymorphisms have not been fully characterized, we performed VWF laboratory testing and gene sequencing of 184 healthy controls with a negative bleeding history. The controls included 66 (35.9%) African Americans (AAs). We identified 21 new sequence variations, 13 (62%) of which occurred exclusively in AAs and 2 (G967D, T2666M) that were found in 10%-15% of the AA samples, suggesting they are polymorphisms. We identified 14 sequence variations reported previously as VWF mutations, the majority of which were type 1 mutations. These controls had VWF Ag levels within the normal range, suggesting that these sequence variations might not always reduce plasma VWF levels. Eleven mutations were found in AAs, and the frequency of M740I, H817Q, and R2185Q was 15%-18%. Ten AA controls had the 2N mutation H817Q; 1 was homozygous. The average factor VIII level in this group was 99 IU/dL, suggesting that this variation may confer little or no clinical symptoms. This study emphasizes the importance of sequencing healthy controls to understand ethnic-specific sequence variations so that asymptomatic sequence variations are not misidentified as mutations in other ethnic or racial groups.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.
Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A
2010-02-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer
Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.
2010-01-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640
Okeke, Iruka N.; Borneman, Jade A.; Shin, Sooan; Mellies, Jay L.; Quinn, Laura E.; Kaper, James B.
2001-01-01
Enteropathogenic Escherichia coli (EPEC) strains that carry the EPEC adherence factor (EAF) plasmid were screened for the presence of different EAF sequences, including those of the plasmid-encoded regulator (per). Considerable variation in gene content of EAF plasmids from different strains was seen. However, bfpA, the gene encoding the structural subunit for the bundle-forming pilus, bundlin, and per genes were found in 96.8% of strains. Sequence analysis of the per operon and its promoter region from 15 representative strains revealed that it is highly conserved. Most of the variation occurs in the 5′ two-thirds of the perA gene. In contrast, the C-terminal portion of the predicted PerA protein that contains the DNA-binding helix-turn-helix motif is 100% conserved in all strains that possess a full-length gene. In a minority of strains including the O119:H2 and canine isolates and in a subset of O128:H2 and O142:H6 strains, frameshift mutations in perA leading to premature truncation and consequent inactivation of the gene were identified. Cloned perA, -B, and -C genes from these strains, unlike those from strains with a functional operon, failed to activate the LEE1 operon and bfpA transcriptional fusions or to complement a per mutant in reference strain E2348/69. Furthermore, O119, O128, and canine strains that carry inactive per operons were deficient in virulence protein expression. The context in which the perABC operon occurs on the EAF plasmid varies. The sequence upstream of the per promoter region in EPEC reference strains E2348/69 and B171-8 was present in strains belonging to most serogroups. In a subset of O119:H2, O128:H2, and O142:H6 strains and in the canine isolate, this sequence was replaced by an IS1294-homologous sequence. PMID:11500429
Govindaraj, Mahalingam
2015-01-01
The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133
Preparation of highly multiplexed small RNA sequencing libraries.
Persson, Helena; Søkilde, Rolf; Pirona, Anna Chiara; Rovira, Carlos
2017-08-01
MicroRNAs (miRNAs) are ~22-nucleotide-long small non-coding RNAs that regulate the expression of protein-coding genes by base pairing to partially complementary target sites, preferentially located in the 3´ untranslated region (UTR) of target mRNAs. The expression and function of miRNAs have been extensively studied in human disease, as well as the possibility of using these molecules as biomarkers for prognostication and treatment guidance. To identify and validate miRNAs as biomarkers, their expression must be screened in large collections of patient samples. Here, we develop a scalable protocol for the rapid and economical preparation of a large number of small RNA sequencing libraries using dual indexing for multiplexing. Combined with the use of off-the-shelf reagents, more samples can be sequenced simultaneously on large-scale sequencing platforms at a considerably lower cost per sample. Sample preparation is simplified by pooling libraries prior to gel purification, which allows for the selection of a narrow size range while minimizing sample variation. A comparison with publicly available data from benchmarking of miRNA analysis platforms showed that this method captures absolute and differential expression as effectively as commercially available alternatives.
USDA-ARS?s Scientific Manuscript database
Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...
Genetic Variation in the Acorn Barnacle from Allozymes to Population Genomics
Flight, Patrick A.; Rand, David M.
2012-01-01
Understanding the patterns of genetic variation within and among populations is a central problem in population and evolutionary genetics. We examine this question in the acorn barnacle, Semibalanus balanoides, in which the allozyme loci Mpi and Gpi have been implicated in balancing selection due to varying selective pressures at different spatial scales. We review the patterns of genetic variation at the Mpi locus, compare this to levels of population differentiation at mtDNA and microsatellites, and place these data in the context of genome-wide variation from high-throughput sequencing of population samples spanning the North Atlantic. Despite considerable geographic variation in the patterns of selection at the Mpi allozyme, this locus shows rather low levels of population differentiation at ecological and trans-oceanic scales (FST ∼ 5%). Pooled population sequencing was performed on samples from Rhode Island (RI), Maine (ME), and Southwold, England (UK). Analysis of more than 650 million reads identified approximately 335,000 high-quality SNPs in 19 million base pairs of the S. balanoides genome. Much variation is shared across the Atlantic, but there are significant examples of strong population differentiation among samples from RI, ME, and UK. An FST outlier screen of more than 22,000 contigs provided a genome-wide context for interpretation of earlier studies on allozymes, mtDNA, and microsatellites. FST values for allozymes, mtDNA and microsatellites are close to the genome-wide average for random SNPs, with the exception of the trans-Atlantic FST for mtDNA. The majority of FST outliers were unique between individual pairs of populations, but some genes show shared patterns of excess differentiation. These data indicate that gene flow is high, that selection is strong on a subset of genes, and that a variety of genes are experiencing diversifying selection at large spatial scales. This survey of polymorphism in S. balanoides provides a number of genomic tools that promise to make this a powerful model for ecological genomics of the rocky intertidal. PMID:22767487
Mascagni, Flavia; Barghini, Elena; Giordani, Tommaso; Rieseberg, Loren H; Cavallini, Andrea; Natali, Lucia
2015-11-24
The sunflower (Helianthus annuus) genome contains a very large proportion of transposable elements, especially long terminal repeat retrotransposons. However, knowledge on the retrotransposon-related variability within this species is still limited. We used next-generation sequencing (NGS) technologies to perform a quantitative and qualitative survey of intraspecific variation of the retrotransposon fraction of the genome across 15 genotypes--7 wild accessions and 8 cultivars--of H. annuus. By mapping the Illumina reads of the 15 genotypes onto a library of sunflower long terminal repeat retrotransposons, we observed considerable variability in redundancy among genotypes, at both superfamily and family levels. In another analysis, we mapped Illumina paired reads to two sets of sequences, that is, long terminal repeat retrotransposons and protein-encoding sequences, and evaluated the extent of retrotransposon proximity to genes in the sunflower genome by counting the number of paired reads in which one read mapped to a retrotransposon and the other to a gene. Large variability among genotypes was also ascertained for retrotransposon proximity to genes. Both long terminal repeat retrotransposon redundancy and proximity to genes varied among retrotransposon families and also between cultivated and wild genotypes. Such differences are discussed in relation to the possible role of long terminal repeat retrotransposons in the domestication of sunflower. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Galbany-Casals, M; Blanco-Moreno, J M; Garcia-Jacas, N; Breitwieser, I; Smissen, R D
2011-07-01
The yellow-flowered everlasting daisy Helichrysum italicum (Asteraceae, Gnaphalieae) is widely distributed in the Mediterranean basin, where it grows in continuous and widespread populations in diverse open habitats. Helichrysum italicum subsp. microphyllum has a disjunct distribution in the Balearic Islands (Majorca and Dragonera), Corsica, Sardinia, Crete and Cyprus. Numerous morphological intermediates between subsp. italicum and subsp. microphyllum are known from Corsica, where the two subspecies co-occur. The aims of the study were to investigate if subsp. microphyllum has a common origin, constituting an independent gene pool from subsp. italicum, or if the morphological differences between subsp. microphyllum and subsp. italicum have arisen independently in different locations from a common wider gene pool. Our analyses of AFLP, cpDNA sequences and morphological characters show that there is geographic structure to the genetic variation within H. italicum, with eastern and western Mediterranean groups, which do not correspond with the division into subsp. microphyllum and subsp. italicum as currently circumscribed. Local selection on quantitative trait loci provides sufficient explanation for the morphological divergence observed and is consistent with genetic data. Within the western Mediterranean group of the species we found considerable polymorphism in chloroplast DNA sequences among and within some populations. Comparison with chloroplast DNA sequences from other Helichrysum species showed that some chloroplast haplotypes are shared across species. © 2010 German Botanical Society and The Royal Botanical Society of the Netherlands.
2013-01-01
Background The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations – changes specific to a tumor and not within an individual’s germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. Results We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. Conclusion We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic. PMID:23642077
Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W
2013-05-04
The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.
Genetic variability and haplotypes of Echinococcus isolates from Tunisia.
Boufana, Belgees; Lahmar, Samia; Rebaï, Waël; Ben Safta, Zoubeir; Jebabli, Leïla; Ammar, Adel; Kachti, Mahmoud; Aouadi, Soufia; Craig, Philip S
2014-11-01
The species/genotypes of Echinococcus infecting a range of intermediate, canid and human hosts were examined as well as the intraspecific variation and population structure of Echinococcus granulosus sensu lato (s.l.) within these hosts. A total of 174 Echinococcus isolates from humans and ungulate intermediate hosts and adult tapeworms from dogs and jackals were used. Genomic DNA was used to amplify a fragment within a mitochondrial gene and a nuclear gene, coding for cytochrome c oxidase subunit 1 (cox1; 828 bp) and elongation factor 1-alpha (ef1a; 656 bp), respectively. E. granulosus sensu stricto was identified from all host species examined, E. canadensis (G6) in a camel and, for the first time, fertile cysts of E. granulosus (s.s.) and E. equinus in equids (donkeys) and E. granulosus (s.s.) from wild boars and goats. Considerable genetic variation was seen only for the cox1 sequences of E. granulosus (s.s.). The pairwise fixation index (Fst) for cox1 E. granulosus (s.s.) sequences from donkeys was high and was statistically significant compared with that of E. granulosus populations from other intermediate hosts. A single haplotype (EqTu01) was identified for the cox1 nucleotide sequences of E. equinus. The role of donkeys in the epidemiology of echinococcosis in Tunisia requires further investigation. © The Author 2014. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Variation in genome-wide mutation rates within and between human families.
Conrad, Donald F; Keebler, Jonathan E M; DePristo, Mark A; Lindsay, Sarah J; Zhang, Yujun; Casals, Ferran; Idaghdour, Youssef; Hartl, Chris L; Torroja, Carlos; Garimella, Kiran V; Zilversmit, Martine; Cartwright, Reed; Rouleau, Guy A; Daly, Mark; Stone, Eric A; Hurles, Matthew E; Awadalla, Philip
2011-06-12
J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldane's contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.
Simultaneous stochastic inversion for geomagnetic main field and secular variation. II - 1820-1980
NASA Technical Reports Server (NTRS)
Bloxham, Jeremy; Jackson, Andrew
1989-01-01
With the aim of producing readable time-dependent maps of the geomagnetic field at the core-mantle boundary, the method of simultaneous stochastic inversion for the geomagnetic main field and secular variation, described by Bloxham (1987), was applied to survey data from the period 1820-1980 to yield two time-dependent geomagnetic-field models, one for the period 1900-1980 and the other for 1820-1900. Particular consideration was given to the effect of crustal fields on observations. It was found that the existing methods of accounting for these fields as sources of random noise are inadequate in two circumstances: (1) when sequences of measurements are made at one particular site, and (2) for measurements made at satellite altitude. The present model shows many of the features in the earth's magnetic field at the core-mantle boundary described by Bloxham and Gubbins (1985) and supports many of their earlier conclusions.
Cavusoglu, M; Ciloglu, T; Serinagaoglu, Y; Kamasak, M; Erogul, O; Akcam, T
2008-08-01
In this paper, 'snore regularity' is studied in terms of the variations of snoring sound episode durations, separations and average powers in simple snorers and in obstructive sleep apnoea (OSA) patients. The goal was to explore the possibility of distinguishing among simple snorers and OSA patients using only sleep sound recordings of individuals and to ultimately eliminate the need for spending a whole night in the clinic for polysomnographic recording. Sequences that contain snoring episode durations (SED), snoring episode separations (SES) and average snoring episode powers (SEP) were constructed from snoring sound recordings of 30 individuals (18 simple snorers and 12 OSA patients) who were also under polysomnographic recording in Gülhane Military Medical Academy Sleep Studies Laboratory (GMMA-SSL), Ankara, Turkey. Snore regularity is quantified in terms of mean, standard deviation and coefficient of variation values for the SED, SES and SEP sequences. In all three of these sequences, OSA patients' data displayed a higher variation than those of simple snorers. To exclude the effects of slow variations in the base-line of these sequences, new sequences that contain the coefficient of variation of the sample values in a 'short' signal frame, i.e., short time coefficient of variation (STCV) sequences, were defined. The mean, the standard deviation and the coefficient of variation values calculated from the STCV sequences displayed a stronger potential to distinguish among simple snorers and OSA patients than those obtained from the SED, SES and SEP sequences themselves. Spider charts were used to jointly visualize the three parameters, i.e., the mean, the standard deviation and the coefficient of variation values of the SED, SES and SEP sequences, and the corresponding STCV sequences as two-dimensional plots. Our observations showed that the statistical parameters obtained from the SED and SES sequences, and the corresponding STCV sequences, possessed a strong potential to distinguish among simple snorers and OSA patients, both marginally, i.e., when the parameters are examined individually, and jointly. The parameters obtained from the SEP sequences and the corresponding STCV sequences, on the other hand, did not have a strong discrimination capability. However, the joint behaviour of these parameters showed some potential to distinguish among simple snorers and OSA patients.
Major histocompatibility complex variation in the endangered Przewalski's horse.
Hedrick, P W; Parker, K M; Miller, E L; Miller, P S
1999-01-01
The major histocompatibility complex (MHC) is a fundamental part of the vertebrate immune system, and the high variability in many MHC genes is thought to play an essential role in recognition of parasites. The Przewalski's horse is extinct in the wild and all the living individuals descend from 13 founders, most of whom were captured around the turn of the century. One of the primary genetic concerns in endangered species is whether they have ample adaptive variation to respond to novel selective factors. In examining 14 Przewalski's horses that are broadly representative of the living animals, we found six different class II DRB major histocompatibility sequences. The sequences showed extensive nonsynonymous variation, concentrated in the putative antigen-binding sites, and little synonymous variation. Individuals had from two to four sequences as determined by single-stranded conformation polymorphism (SSCP) analysis. On the basis of the SSCP data, phylogenetic analysis of the nucleotide sequences, and segregation in a family group, we conclude that four of these sequences are from one gene (although one sequence codes for a nonfunctional allele because it contains a stop codon) and two other sequences are from another gene. The position of the stop codon is at the same amino-acid position as in a closely related sequence from the domestic horse. Because other organisms have extensive variation at homologous loci, the Przewalski's horse may have quite low variation in this important adaptive region. PMID:10430594
Geology of the Devonian black shales of the Appalachian basin
Roen, J.B.
1983-01-01
Black shales of Devonian age in the Appalachian basin are a unique rock sequence. The high content of organic matter, which imparts the characteristic lithology, has for years attracted considerable interest in the shales as a possible source of energy. Concurrent with periodic and varied economic exploitations of the black shales are geologic studies. The recent energy shortage prompted the U.S. Department of Energy through the Eastern Gas Shales Project of the Morgantown Energy Technology Center to underwrite a research program to determine the geologic, geochemical, and structural characteristics of the Devonian black shales in order to enhance the recovery of gas from the shales. Geologic studies produced a regional stratigraphic network that correlates the 15-foot sequence in Tennessee with 3,000 feet of interbedded black and gray shales in central New York. The classic Devonian black-shale sequence in New York has been correlated with the Ohio Shale of Ohio and Kentucky and the Chattanooga Shale of Tennessee and southwestern Virginia. Biostratigraphic and lithostratigraphic markers in conjunction with gamma-ray logs facilitated long range correlations within the Appalachian basin and provided a basis for correlations with the black shales of the Illinois and Michigan basins. Areal distribution of selected shale units along with paleocurrent studies, clay mineralogy, and geochemistry suggests variations in the sediment source and transport directions. Current structures, faunal evidence, lithologic variations, and geochemical studies provide evidence to support interpretation of depositional environments. In addition, organic geochemical data combined with stratigraphic and structural characteristics of the shale within the basin allow an evaluation of the resource potential of natural gas in the Devonian shale sequence.
Zimmer, Christoph T; Garrood, William T; Singh, Kumar Saurabh; Randall, Emma; Lueke, Bettina; Gutbrod, Oliver; Matthiesen, Svend; Kohler, Maxie; Nauen, Ralf; Davies, T G Emyr; Bass, Chris
2018-01-22
Gene duplication is a major source of genetic variation that has been shown to underpin the evolution of a wide range of adaptive traits [1, 2]. For example, duplication or amplification of genes encoding detoxification enzymes has been shown to play an important role in the evolution of insecticide resistance [3-5]. In this context, gene duplication performs an adaptive function as a result of its effects on gene dosage and not as a source of functional novelty [3, 6-8]. Here, we show that duplication and neofunctionalization of a cytochrome P450, CYP6ER1, led to the evolution of insecticide resistance in the brown planthopper. Considerable genetic variation was observed in the coding sequence of CYP6ER1 in populations of brown planthopper collected from across Asia, but just two sequence variants are highly overexpressed in resistant strains and metabolize imidacloprid. Both variants are characterized by profound amino-acid alterations in substrate recognition sites, and the introduction of these mutations into a susceptible P450 sequence is sufficient to confer resistance. CYP6ER1 is duplicated in resistant strains with individuals carrying paralogs with and without the gain-of-function mutations. Despite numerical parity in the genome, the susceptible and mutant copies exhibit marked asymmetry in their expression with the resistant paralogs overexpressed. In the primary resistance-conferring CYP6ER1 variant, this results from an extended region of novel sequence upstream of the gene that provides enhanced expression. Our findings illustrate the versatility of gene duplication in providing opportunities for functional and regulatory innovation during the evolution of an adaptive trait. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Ackerman, Sara L; Koenig, Barbara A
2018-01-01
Increasingly used for clinical purposes, genome and exome sequencing can generate clinically relevant information that is not directly related to the reason for testing (incidental or secondary findings). Debates about the ethical implications of secondary findings were sparked by the American College of Medical Genetics (ACMG) 2013 policy statement, which recommended that laboratories report pathogenic alterations in 56 genes. Although wide variation in laboratories' secondary findings policies has been reported, little is known about its causes. We interviewed 18 laboratory directors and genetic counselors at 10 U.S. laboratories to investigate the motivations and interests shaping secondary findings reporting policies for clinical exome sequencing. Analysis of interview transcripts and laboratory documents was informed by sociological theories of standardization. Laboratories varied widely in terms of the types of secondary findings reported, consent-form language, and choices offered to patients. In explaining their adaptation of the ACMG report, our participants weighed genetic information's clinical, moral, professional, and commercial value in an attempt to maximize benefits for patients and families, minimize the costs of sequencing and analysis, adhere to professional norms, attract customers, and contend with the uncertain clinical implications of much of the genetic information generated. Nearly all laboratories in our study voluntarily adopted ACMG's recommendations, but their actual practices varied considerably and were informed by laboratory-specific judgments about clinical utility and patient benefit. Our findings offer a compelling example of standardization as a complex process that rarely leads simply to uniformity of practice. As laboratories take on a more prominent role in decisions about the return of genetic information, strategies are needed to inform patients, families, and clinicians about the differences between laboratories' practices and ensure that the consent process prompts a discussion of the value of additional genetic information for patients and their families.
Using genomics to characterize evolutionary potential for conservation of wild populations
Harrisson, Katherine A; Pavlova, Alexandra; Telonis-Scott, Marina; Sunnucks, Paul
2014-01-01
Genomics promises exciting advances towards the important conservation goal of maximizing evolutionary potential, notwithstanding associated challenges. Here, we explore some of the complexity of adaptation genetics and discuss the strengths and limitations of genomics as a tool for characterizing evolutionary potential in the context of conservation management. Many traits are polygenic and can be strongly influenced by minor differences in regulatory networks and by epigenetic variation not visible in DNA sequence. Much of this critical complexity is difficult to detect using methods commonly used to identify adaptive variation, and this needs appropriate consideration when planning genomic screens, and when basing management decisions on genomic data. When the genomic basis of adaptation and future threats are well understood, it may be appropriate to focus management on particular adaptive traits. For more typical conservations scenarios, we argue that screening genome-wide variation should be a sensible approach that may provide a generalized measure of evolutionary potential that accounts for the contributions of small-effect loci and cryptic variation and is robust to uncertainty about future change and required adaptive response(s). The best conservation outcomes should be achieved when genomic estimates of evolutionary potential are used within an adaptive management framework. PMID:25553064
Using chaos to generate variations on movement sequences
NASA Astrophysics Data System (ADS)
Bradley, Elizabeth; Stuart, Joshua
1998-12-01
We describe a method for introducing variations into predefined motion sequences using a chaotic symbol-sequence reordering technique. A progression of symbols representing the body positions in a dance piece, martial arts form, or other motion sequence is mapped onto a chaotic trajectory, establishing a symbolic dynamics that links the movement sequence and the attractor structure. A variation on the original piece is created by generating a trajectory with slightly different initial conditions, inverting the mapping, and using special corpus-based graph-theoretic interpolation schemes to smooth any abrupt transitions. Sensitive dependence guarantees that the variation is different from the original; the attractor structure and the symbolic dynamics guarantee that the two resemble one another in both aesthetic and mathematical senses.
Genetic Variation in Cardiomyopathy and Cardiovascular Disorders.
McNally, Elizabeth M; Puckelwartz, Megan J
2015-01-01
With the wider deployment of massively-parallel, next-generation sequencing, it is now possible to survey human genome data for research and clinical purposes. The reduced cost of producing short-read sequencing has now shifted the burden to data analysis. Analysis of genome sequencing remains challenged by the complexity of the human genome, including redundancy and the repetitive nature of genome elements and the large amount of variation in individual genomes. Public databases of human genome sequences greatly facilitate interpretation of common and rare genetic variation, although linking database sequence information to detailed clinical information is limited by privacy and practical issues. Genetic variation is a rich source of knowledge for cardiovascular disease because many, if not all, cardiovascular disorders are highly heritable. The role of rare genetic variation in predicting risk and complications of cardiovascular diseases has been well established for hypertrophic and dilated cardiomyopathy, where the number of genes that are linked to these disorders is growing. Bolstered by family data, where genetic variants segregate with disease, rare variation can be linked to specific genetic variation that offers profound diagnostic information. Understanding genetic variation in cardiomyopathy is likely to help stratify forms of heart failure and guide therapy. Ultimately, genetic variation may be amenable to gene correction and gene editing strategies.
[Hydrologic variability and sensitivity based on Hurst coefficient and Bartels statistic].
Lei, Xu; Xie, Ping; Wu, Zi Yi; Sang, Yan Fang; Zhao, Jiang Yan; Li, Bin Bin
2018-04-01
Due to the global climate change and frequent human activities in recent years, the pure stochastic components of hydrological sequence is mixed with one or several of the variation ingredients, including jump, trend, period and dependency. It is urgently needed to clarify which indices should be used to quantify the degree of their variability. In this study, we defined the hydrological variability based on Hurst coefficient and Bartels statistic, and used Monte Carlo statistical tests to test and analyze their sensitivity to different variants. When the hydrological sequence had jump or trend variation, both Hurst coefficient and Bartels statistic could reflect the variation, with the Hurst coefficient being more sensitive to weak jump or trend variation. When the sequence had period, only the Bartels statistic could detect the mutation of the sequence. When the sequence had a dependency, both the Hurst coefficient and the Bartels statistics could reflect the variation, with the latter could detect weaker dependent variations. For the four variations, both the Hurst variability and Bartels variability increased with the increases of variation range. Thus, they could be used to measure the variation intensity of the hydrological sequence. We analyzed the temperature series of different weather stations in the Lancang River basin. Results showed that the temperature of all stations showed the upward trend or jump, indicating that the entire basin had experienced warming in recent years and the temperature variability in the upper and lower reaches was much higher. This case study showed the practicability of the proposed method.
Savary, Romain; Masclaux, Frédéric G; Wyss, Tania; Droh, Germain; Cruz Corella, Joaquim; Machado, Ana Paula; Morton, Joseph B; Sanders, Ian R
2018-01-01
Arbuscular mycorrhizal fungi (AMF; phylum Gomeromycota) associate with plants forming one of the most successful microbe-plant associations. The fungi promote plant diversity and have a potentially important role in global agriculture. Plant growth depends on both inter- and intra-specific variation in AMF. It was recently reported that an unusually large number of AMF taxa have an intercontinental distribution, suggesting long-distance gene flow for many AMF species, facilitated by either long-distance natural dispersal mechanisms or human-assisted dispersal. However, the intercontinental distribution of AMF species has been questioned because the use of very low-resolution markers may be unsuitable to detect genetic differences among geographically separated AMF, as seen with some other fungi. This has been untestable because of the lack of population genomic data, with high resolution, for any AMF taxa. Here we use phylogenetics and population genomics to test for intra-specific variation in Rhizophagus irregularis, an AMF species for which genome sequence information already exists. We used ddRAD sequencing to obtain thousands of markers distributed across the genomes of 81 R. irregularis isolates and related species. Based on 6 888 variable positions, we observed significant genetic divergence into four main genetic groups within R. irregularis, highlighting that previous studies have not captured underlying genetic variation. Despite considerable genetic divergence, surprisingly, the variation could not be explained by geographical origin, thus also supporting the hypothesis for at least one AMF species of widely dispersed AMF genotypes at an intercontinental scale. Such information is crucial for understanding AMF ecology, and how these fungi can be used in an environmentally safe way in distant locations.
Patterns of DNA barcode variation in Canadian marine molluscs.
Layton, Kara K S; Martel, André L; Hebert, Paul D N
2014-01-01
Molluscs are the most diverse marine phylum and this high diversity has resulted in considerable taxonomic problems. Because the number of species in Canadian oceans remains uncertain, there is a need to incorporate molecular methods into species identifications. A 648 base pair segment of the cytochrome c oxidase subunit I gene has proven useful for the identification and discovery of species in many animal lineages. While the utility of DNA barcoding in molluscs has been demonstrated in other studies, this is the first effort to construct a DNA barcode registry for marine molluscs across such a large geographic area. This study examines patterns of DNA barcode variation in 227 species of Canadian marine molluscs. Intraspecific sequence divergences ranged from 0-26.4% and a barcode gap existed for most taxa. Eleven cases of relatively deep (>2%) intraspecific divergence were detected, suggesting the possible presence of overlooked species. Structural variation was detected in COI with indels found in 37 species, mostly bivalves. Some indels were present in divergent lineages, primarily in the region of the first external loop, suggesting certain areas are hotspots for change. Lastly, mean GC content varied substantially among orders (24.5%-46.5%), and showed a significant positive correlation with nearest neighbour distances. DNA barcoding is an effective tool for the identification of Canadian marine molluscs and for revealing possible cases of overlooked species. Some species with deep intraspecific divergence showed a biogeographic partition between lineages on the Atlantic, Arctic and Pacific coasts, suggesting the role of Pleistocene glaciations in the subdivision of their populations. Indels were prevalent in the barcode region of the COI gene in bivalves and gastropods. This study highlights the efficacy of DNA barcoding for providing insights into sequence variation across a broad taxonomic group on a large geographic scale.
Inter- and intraspecific mitochondrial DNA variation in North American bears (Ursus)
Cronin, Matthew A.; Amstrup, Steven C.; Garner, Gerald W.; Vyse, Ernest R.
1991-01-01
We assessed mitochondrial DNA variation in North American black bears (Ursus americanus), brown bears (Ursus arctos), and polar bears (Ursus maritimus). Divergent mitochondrial DNA haplotypes (0.05 base substitutions per nucleotide) were identified in populations of black bears from Montana and Oregon. In contrast, very similar haplotypes occur in black bears across North America. This discordance of haplotype phylogeny and geographic distribution indicates that there has been maintenance of polymorphism and considerable gene flow throughout the history of the species. Intraspecific mitochondrial DNA sequence divergence in brown bears and polar bears is lower than in black bears. The two morphological forms of U. arctos, grizzly and coastal brown bears, are not in distinct mtDNA lineages. Interspecific comparisons indicate that brown bears and polar bears share similar mitochondrial DNA (0.023 base substitutions per nucleotide) which is quite divergent (0.078 base substitutions per nucleotide) from that of black bears. High mitochondrial DNA divergence within black bears and paraphyletic relationships of brown and polar bear mitochondrial DNA indicate that intraspecific variation across species' ranges should be considered in phylogenetic analyses of mitochondrial DNA.
Divergent and nonuniform gene expression patterns in mouse brain
Morris, John A.; Royall, Joshua J.; Bertagnolli, Darren; Boe, Andrew F.; Burnell, Josh J.; Byrnes, Emi J.; Copeland, Cathy; Desta, Tsega; Fischer, Shanna R.; Goldy, Jeff; Glattfelder, Katie J.; Kidney, Jolene M.; Lemon, Tracy; Orta, Geralyn J.; Parry, Sheana E.; Pathak, Sayan D.; Pearson, Owen C.; Reding, Melissa; Shapouri, Sheila; Smith, Kimberly A.; Soden, Chad; Solan, Beth M.; Weller, John; Takahashi, Joseph S.; Overly, Caroline C.; Lein, Ed S.; Hawrylycz, Michael J.; Hohmann, John G.; Jones, Allan R.
2010-01-01
Considerable progress has been made in understanding variations in gene sequence and expression level associated with phenotype, yet how genetic diversity translates into complex phenotypic differences remains poorly understood. Here, we examine the relationship between genetic background and spatial patterns of gene expression across seven strains of mice, providing the most extensive cellular-resolution comparative analysis of gene expression in the mammalian brain to date. Using comprehensive brainwide anatomic coverage (more than 200 brain regions), we applied in situ hybridization to analyze the spatial expression patterns of 49 genes encoding well-known pharmaceutical drug targets. Remarkably, over 50% of the genes examined showed interstrain expression variation. In addition, the variability was nonuniformly distributed across strain and neuroanatomic region, suggesting certain organizing principles. First, the degree of expression variance among strains mirrors genealogic relationships. Second, expression pattern differences were concentrated in higher-order brain regions such as the cortex and hippocampus. Divergence in gene expression patterns across the brain could contribute significantly to variations in behavior and responses to neuroactive drugs in laboratory mouse strains and may help to explain individual differences in human responsiveness to neuroactive drugs. PMID:20956311
Yim, Lau Chui; Hongmei, Jing; Aitchison, Jonathan C; Pointing, Stephen B
2006-07-01
We report an assessment of whole-community diversity for an extremely isolated geothermal location with considerable phylogenetic and phylogeographic novelty. We further demonstrate, using multiple statistical analyses of sequence data, that the response of community diversity is not monotonic to thermal stress along a gradient of 52-83 degrees C. A combination of domain- and division-specific PCR was used to obtain a broad spectrum of community phylotypes, which were resolved by denaturing gradient gel electrophoresis. Among 58 sequences obtained from microbial mats and streamers, some 95% suggest novel archaeal and bacterial diversity at the species level or higher. Moreover, new phylogeographic and thermally defined lineages among the Cyanobacteria, Chloroflexi, Eubacterium and Thermus are identified. Shannon-Wiener diversity estimates suggest that mats at 63 degrees C supported highest diversity, but when alternate models were applied [Average Taxonomic Distinctness (AvTD) and Variation in Taxonomic Distinctness (VarTD)] that also take into account the phylogenetic relationships between phylotypes, it is evident that greatest taxonomic diversity (AvTD) occurred in streamers at 65-70 degrees C, whereas greatest phylogenetic distance between taxa (VarTD) occurred in streamers of 83 degrees C. All models demonstrated that diversity is not related to thermal stress in a linear fashion.
Sequence variation in SORL1 and Dementia risk in Swedes
Reynolds, Chandra A.; Hong, Mun-Gwan; Eriksson, Ulrika K.; Blennow, Kaj; Johansson, Boo; Malmberg, Bo; Berg, Stig; Gatz, Margaret; Pedersen, Nancy L.; Bennet, Anna M.; Prince, Jonathan A.
2010-01-01
The gene encoding the neuronal sortilin-related receptor SORL1 has been claimed to be associated with Alzheimer Disease by independent groups and across various human populations. We evaluated six genetic markers in SORL1 in a sample of 1558 Swedish dementia cases (including 1270 Alzheimer disease cases) and 2179 controls. For both single marker and haplotype-based analyses we found no strong support for SORL1 as a dementia- or AD-risk modifying gene in our sample in isolation, nor did we observe association with AD/dementia-related traits, including CSF β-amyloid1–42, tau levels, or age-at-onset. However, meta-analyses of markers in this study together with previously published studies on SORL1 encompassing in excess of 13,000 individuals does suggest significant association with AD (best OR 1.097; 95% CI 1.038–1.158, p = 0.001). All six markers were significant in meta-analyses and it is notable that they occur in two distinct LD blocks. These data are consistent with either allelic heterogeneity or the existence of as yet untested functional variants and these will be important considerations in further attempts to evaluate the importance of sequence variation in SORL1 with AD risk. PMID:19653016
The BIG Data Center: from deposition to integration to translation.
2017-01-04
Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Fokkema, Ivo F A C; den Dunnen, Johan T; Taschner, Peter E M
2005-08-01
The completion of the human genome project has initiated, as well as provided the basis for, the collection and study of all sequence variation between individuals. Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). We have developed the Leiden Open (source) Variation Database (LOVD) software approaching the "LSDB-in-a-Box" idea for the easy creation and maintenance of a fully web-based gene sequence variation database. LOVD is platform-independent and uses PHP and MySQL open source software only. The basic gene-centered and modular design of the database follows the recommendations of the Human Genome Variation Society (HGVS) and focuses on the collection and display of DNA sequence variations. With minimal effort, the LOVD platform is extendable with clinical data. The open set-up should both facilitate and promote functional extension with scripts written by the community. The LOVD software is freely available from the Leiden Muscular Dystrophy pages (www.DMD.nl/LOVD/). To promote the use of LOVD, we currently offer curators the possibility to set up an LSDB on our Leiden server. (c) 2005 Wiley-Liss, Inc.
Sabree, Zakee L; Hansen, Allison K; Moran, Nancy A
2012-01-01
Starting in 2003, numerous studies using culture-independent methodologies to characterize the gut microbiota of honey bees have retrieved a consistent and distinctive set of eight bacterial species, based on near identity of the 16S rRNA gene sequences. A recent study [Mattila HR, Rios D, Walker-Sperling VE, Roeselers G, Newton ILG (2012) Characterization of the active microbiotas associated with honey bees reveals healthier and broader communities when colonies are genetically diverse. PLoS ONE 7(3): e32962], using pyrosequencing of the V1-V2 hypervariable region of the 16S rRNA gene, reported finding entirely novel bacterial species in honey bee guts, and used taxonomic assignments from these reads to predict metabolic activities based on known metabolisms of cultivable species. To better understand this discrepancy, we analyzed the Mattila et al. pyrotag dataset. In contrast to the conclusions of Mattila et al., we found that the large majority of pyrotag sequences belonged to clusters for which representative sequences were identical to sequences from previously identified core species of the bee microbiota. On average, they represent 95% of the bacteria in each worker bee in the Mattila et al. dataset, a slightly lower value than that found in other studies. Some colonies contain small proportions of other bacteria, mostly species of Enterobacteriaceae. Reanalysis of the Mattila et al. dataset also did not support a relationship between abundances of Bifidobacterium and of putative pathogens or a significant difference in gut communities between colonies from queens that were singly or multiply mated. Additionally, consistent with previous studies, the dataset supports the occurrence of considerable strain variation within core species, even within single colonies. The roles of these bacteria within bees, or the implications of the strain variation, are not yet clear.
Sénécal, Karine; Thys, Kristof; Vears, Danya F; Van Assche, Kristof; Knoppers, Bartha M; Borry, Pascal
2016-01-01
The development of next-generation sequencing (NGS) technologies are revolutionizing medical practice, facilitating more accurate, sophisticated and cost-effective genetic testing. NGS is already being implemented in the clinic assisting diagnosis and management of disorders with a strong heritable component. Although considerable attention has been paid to issues regarding return of incidental or secondary findings, matters of consent are less well explored. This is particularly important for the use of NGS in minors. Recent guidelines addressing genomic testing and screening of children and adolescents have suggested that as ‘young children' lack decision-making capacity, decisions about testing must be conducted by a surrogate, namely their parents. This prompts consideration of the age at which minors can provide lawful consent to health-care interventions, and consequently NGS performed for diagnostic purposes. Here, we describe the existing legal approaches regarding the rights of minors to consent to health-care interventions, including how laws in the 28 Member States of the European Union and in Canada consider competent minors, and then apply this to the context of NGS. There is considerable variation in the rights afforded to minors across countries. Many legal systems determine that minors would be allowed, or may even be required, to make decisions about interventions such as NGS. However, minors are often considered as one single homogeneous population who always require parental consent, rather than recognizing there are different categories of ‘minors' and that capacity to consent or to be involved in discussions and decision-making process is a spectrum rather than a hurdle. PMID:27302841
Sénécal, Karine; Thys, Kristof; Vears, Danya F; Van Assche, Kristof; Knoppers, Bartha M; Borry, Pascal
2016-11-01
The development of next-generation sequencing (NGS) technologies are revolutionizing medical practice, facilitating more accurate, sophisticated and cost-effective genetic testing. NGS is already being implemented in the clinic assisting diagnosis and management of disorders with a strong heritable component. Although considerable attention has been paid to issues regarding return of incidental or secondary findings, matters of consent are less well explored. This is particularly important for the use of NGS in minors. Recent guidelines addressing genomic testing and screening of children and adolescents have suggested that as 'young children' lack decision-making capacity, decisions about testing must be conducted by a surrogate, namely their parents. This prompts consideration of the age at which minors can provide lawful consent to health-care interventions, and consequently NGS performed for diagnostic purposes. Here, we describe the existing legal approaches regarding the rights of minors to consent to health-care interventions, including how laws in the 28 Member States of the European Union and in Canada consider competent minors, and then apply this to the context of NGS. There is considerable variation in the rights afforded to minors across countries. Many legal systems determine that minors would be allowed, or may even be required, to make decisions about interventions such as NGS. However, minors are often considered as one single homogeneous population who always require parental consent, rather than recognizing there are different categories of 'minors' and that capacity to consent or to be involved in discussions and decision-making process is a spectrum rather than a hurdle.
Genomic Sequence Variation Markup Language (GSVML).
Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi
2010-02-01
With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as a potential data exchanging format for genomic sequence variation data exchange focusing on human health applications. The international standardization of GSVML is necessary, and is currently underway. GSVML can be applied to enhance the utilization of genomic sequence variation data worldwide by providing a communicable platform between clinical and research applications. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.
Farlow, Janice L; Lin, Hai; Sauerbeck, Laura; Lai, Dongbing; Koller, Daniel L; Pugh, Elizabeth; Hetrick, Kurt; Ling, Hua; Kleinloog, Rachel; van der Vlies, Pieter; Deelen, Patrick; Swertz, Morris A; Verweij, Bon H; Regli, Luca; Rinkel, Gabriel J E; Ruigrok, Ynte M; Doheny, Kimberly; Liu, Yunlong; Broderick, Joseph; Foroud, Tatiana
2015-01-01
Genetic risk factors for intracranial aneurysm (IA) are not yet fully understood. Genomewide association studies have been successful at identifying common variants; however, the role of rare variation in IA susceptibility has not been fully explored. In this study, we report the use of whole exome sequencing (WES) in seven densely-affected families (45 individuals) recruited as part of the Familial Intracranial Aneurysm study. WES variants were prioritized by functional prediction, frequency, predicted pathogenicity, and segregation within families. Using these criteria, 68 variants in 68 genes were prioritized across the seven families. Of the genes that were expressed in IA tissue, one gene (TMEM132B) was differentially expressed in aneurysmal samples (n=44) as compared to control samples (n=16) (false discovery rate adjusted p-value=0.023). We demonstrate that sequencing of densely affected families permits exploration of the role of rare variants in a relatively common disease such as IA, although there are important study design considerations for applying sequencing to complex disorders. In this study, we explore methods of WES variant prioritization, including the incorporation of unaffected individuals, multipoint linkage analysis, biological pathway information, and transcriptome profiling. Further studies are needed to validate and characterize the set of variants and genes identified in this study.
Nadimi, Maryam; Beaudet, Denis; Forget, Lise; Hijri, Mohamed; Lang, B Franz
2012-09-01
Gigaspora rosea is a member of the arbuscular mycorrhizal fungi (AMF; Glomeromycota) and a distant relative of Glomus species that are beneficial to plant growth. To allow for a better understanding of Glomeromycota, we have sequenced the mitochondrial DNA of G. rosea. A comparison with Glomus mitochondrial genomes reveals that Glomeromycota undergo insertion and loss of mitochondrial plasmid-related sequences and exhibit considerable variation in introns. The gene order between the two species is almost completely reshuffled. Furthermore, Gigaspora has fragmented cox1 and rns genes, and an unorthodox initiator tRNA that is tailored to decoding frequent UUG initiation codons. For the fragmented cox1 gene, we provide evidence that its RNA is joined via group I-mediated trans-splicing, whereas rns RNA remains in pieces. According to our model, the two cox1 precursor RNA pieces are brought together by flanking cox1 exon sequences that form a group I intron structure, potentially in conjunction with the nad5 intron 3 sequence. Finally, we present analyses that address the controversial phylogenetic association of Glomeromycota within fungi. According to our results, Glomeromycota are not a separate group of paraphyletic zygomycetes but branch together with Mortierellales, potentially also Harpellales.
Diekmann, Kerstin; Hodkinson, Trevor R; Wolfe, Kenneth H; van den Bekerom, Rob; Dix, Philip J; Barth, Susanne
2009-06-01
Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1-27 codons in comparison of L. perenne to other Poaceae and 1-68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT-PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan.
An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.
Tørresen, Ole K; Star, Bastiaan; Jentoft, Sissel; Reinar, William B; Grove, Harald; Miller, Jason R; Walenz, Brian P; Knight, James; Ekholm, Jenny M; Peluso, Paul; Edvardsen, Rolf B; Tooming-Klunderud, Ave; Skage, Morten; Lien, Sigbjørn; Jakobsen, Kjetill S; Nederbragt, Alexander J
2017-01-18
The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
Molecular Population Genetics of the Alcohol Dehydrogenase Gene Region of DROSOPHILA MELANOGASTER
Aquadro, Charles F.; Desse, Susan F.; Bland, Molly M.; Langley, Charles H.; Laurie-Ahlberg, Cathy C.
1986-01-01
Variation in the DNA restriction map of a 13-kb region of chromosome II including the alcohol dehydrogenase structural gene (Adh) was examined in Drosophila melanogaster from natural populations. Detailed analysis of 48 D. melanogaster lines representing four eastern United States populations revealed extensive DNA sequence variation due to base substitutions, insertions and deletions. Cloning of this region from several lines allowed characterization of length variation as due to unique sequence insertions or deletions [nine sizes; 21–200 base pairs (bp)] or transposable element insertions (several sizes, 340 bp to 10.2 kb, representing four different elements). Despite this extensive variation in sequences flanking the Adh gene, only one length polymorphism is clearly associated with altered Adh expression (a copia element approximately 250 bp 5' to the distal transcript start site). Nonetheless, the frequency spectra of transposable elements within and between Drosophila species suggests they are slightly deleterious. Strong nonrandom associations are observed among Adh region sequence variants, ADH allozyme (Fast vs. Slow), ADH enzyme activity and the chromosome inversion ln(2L) t. Phylogenetic analysis of restriction map haplotypes suggest that the major twofold component of ADH activity variation (high vs. low, typical of Fast and Slow allozymes, respectively) is due to sequence variation tightly linked to and possibly distinct from that underlying the allozyme difference. The patterns of nucleotide and haplotype variation for Fast and Slow allozyme lines are consistent with the recent increase in frequency and spread of the Fast haplotype associated with high ADH activity. These data emphasize the important role of evolutionary history and strong nonrandom associations among tightly linked sequence variation as determinants of the patterns of variation observed in natural populations. PMID:3026893
Variation, Repetition, And Choice
Abreu-Rodrigues, Josele; Lattal, Kennon A; dos Santos, Cristiano V; Matos, Ricardo A
2005-01-01
Experiment 1 investigated the controlling properties of variability contingencies on choice between repeated and variable responding. Pigeons were exposed to concurrent-chains schedules with two alternatives. In the REPEAT alternative, reinforcers in the terminal link depended on a single sequence of four responses. In the VARY alternative, a response sequence in the terminal link was reinforced only if it differed from the n previous sequences (lag criterion). The REPEAT contingency generated low, constant levels of sequence variation whereas the VARY contingency produced levels of sequence variation that increased with the lag criterion. Preference for the REPEAT alternative tended to increase directly with the degree of variation required for reinforcement. Experiment 2 examined the potential confounding effects in Experiment 1 of immediacy of reinforcement by yoking the interreinforcer intervals in the REPEAT alternative to those in the VARY alternative. Again, preference for REPEAT was a function of the lag criterion. Choice between varying and repeating behavior is discussed with respect to obtained behavioral variability, probability of reinforcement, delay of reinforcement, and switching within a sequence. PMID:15828592
Kim, Sang Hu; Clark, Shawn T.; Surendra, Anuradha; Copeland, Julia K.; Wang, Pauline W.; Ammar, Ron; Collins, Cathy; Tullis, D. Elizabeth; Nislow, Corey; Hwang, David M.; Guttman, David S.; Cowen, Leah E.
2015-01-01
The microbiome shapes diverse facets of human biology and disease, with the importance of fungi only beginning to be appreciated. Microbial communities infiltrate diverse anatomical sites as with the respiratory tract of healthy humans and those with diseases such as cystic fibrosis, where chronic colonization and infection lead to clinical decline. Although fungi are frequently recovered from cystic fibrosis patient sputum samples and have been associated with deterioration of lung function, understanding of species and population dynamics remains in its infancy. Here, we coupled high-throughput sequencing of the ribosomal RNA internal transcribed spacer 1 (ITS1) with phenotypic and genotypic analyses of fungi from 89 sputum samples from 28 cystic fibrosis patients. Fungal communities defined by sequencing were concordant with those defined by culture-based analyses of 1,603 isolates from the same samples. Different patients harbored distinct fungal communities. There were detectable trends, however, including colonization with Candida and Aspergillus species, which was not perturbed by clinical exacerbation or treatment. We identified considerable inter- and intra-species phenotypic variation in traits important for host adaptation, including antifungal drug resistance and morphogenesis. While variation in drug resistance was largely between species, striking variation in morphogenesis emerged within Candida species. Filamentation was uncoupled from inducing cues in 28 Candida isolates recovered from six patients. The filamentous isolates were resistant to the filamentation-repressive effects of Pseudomonas aeruginosa, implicating inter-kingdom interactions as the selective force. Genome sequencing revealed that all but one of the filamentous isolates harbored mutations in the transcriptional repressor NRG1; such mutations were necessary and sufficient for the filamentous phenotype. Six independent nrg1 mutations arose in Candida isolates from different patients, providing a poignant example of parallel evolution. Together, this combined clinical-genomic approach provides a high-resolution portrait of the fungal microbiome of cystic fibrosis patient lungs and identifies a genetic basis of pathogen adaptation. PMID:26588216
Kim, Sang Hu; Clark, Shawn T; Surendra, Anuradha; Copeland, Julia K; Wang, Pauline W; Ammar, Ron; Collins, Cathy; Tullis, D Elizabeth; Nislow, Corey; Hwang, David M; Guttman, David S; Cowen, Leah E
2015-11-01
The microbiome shapes diverse facets of human biology and disease, with the importance of fungi only beginning to be appreciated. Microbial communities infiltrate diverse anatomical sites as with the respiratory tract of healthy humans and those with diseases such as cystic fibrosis, where chronic colonization and infection lead to clinical decline. Although fungi are frequently recovered from cystic fibrosis patient sputum samples and have been associated with deterioration of lung function, understanding of species and population dynamics remains in its infancy. Here, we coupled high-throughput sequencing of the ribosomal RNA internal transcribed spacer 1 (ITS1) with phenotypic and genotypic analyses of fungi from 89 sputum samples from 28 cystic fibrosis patients. Fungal communities defined by sequencing were concordant with those defined by culture-based analyses of 1,603 isolates from the same samples. Different patients harbored distinct fungal communities. There were detectable trends, however, including colonization with Candida and Aspergillus species, which was not perturbed by clinical exacerbation or treatment. We identified considerable inter- and intra-species phenotypic variation in traits important for host adaptation, including antifungal drug resistance and morphogenesis. While variation in drug resistance was largely between species, striking variation in morphogenesis emerged within Candida species. Filamentation was uncoupled from inducing cues in 28 Candida isolates recovered from six patients. The filamentous isolates were resistant to the filamentation-repressive effects of Pseudomonas aeruginosa, implicating inter-kingdom interactions as the selective force. Genome sequencing revealed that all but one of the filamentous isolates harbored mutations in the transcriptional repressor NRG1; such mutations were necessary and sufficient for the filamentous phenotype. Six independent nrg1 mutations arose in Candida isolates from different patients, providing a poignant example of parallel evolution. Together, this combined clinical-genomic approach provides a high-resolution portrait of the fungal microbiome of cystic fibrosis patient lungs and identifies a genetic basis of pathogen adaptation.
Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L
2016-12-01
Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.
Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A
2011-01-01
PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.
On the path to genetic novelties: insights from programmed DNA elimination and RNA splicing.
Catania, Francesco; Schmitz, Jürgen
2015-01-01
Understanding how genetic novelties arise is a central goal of evolutionary biology. To this end, programmed DNA elimination and RNA splicing deserve special consideration. While programmed DNA elimination reshapes genomes by eliminating chromatin during organismal development, RNA splicing rearranges genetic messages by removing intronic regions during transcription. Small RNAs help to mediate this class of sequence reorganization, which is not error-free. It is this imperfection that makes programmed DNA elimination and RNA splicing excellent candidates for generating evolutionary novelties. Leveraging a number of these two processes' mechanistic and evolutionary properties, which have been uncovered over the past years, we present recently proposed models and empirical evidence for how splicing can shape the structure of protein-coding genes in eukaryotes. We also chronicle a number of intriguing similarities between the processes of programmed DNA elimination and RNA splicing, and highlight the role that the variation in the population-genetic environment may play in shaping their target sequences. © 2015 Wiley Periodicals, Inc.
Jo, Ick Hyun; Kim, Young Chang; Kim, Dong Hwi; Kim, Kee Hong; Hyun, Tae Kyung; Ryu, Hojin; Bang, Kyong Hwan
2017-10-01
The development of molecular markers is one of the most useful methods for molecular breeding and marker-based molecular associated selections. Even though there is less information on the reference genome, molecular markers are indispensable tools for determination of genetic variation and identification of species with high levels of accuracy and reproducibility. The demand for molecular approaches for marker-based breeding and genetic discriminations in Panax species has greatly increased in recent times and has been successfully applied for various purposes. However, owing to the existence of diverse molecular techniques and differences in their principles and applications, there should be careful consideration while selecting appropriate marker types. In this review, we outline the recent status of different molecular marker applications in ginseng research and industrial fields. In addition, we discuss the basic principles, requirements, and advantages and disadvantages of the most widely used molecular markers, including restriction fragment length polymorphism, random amplified polymorphic DNA, sequence tag sites, simple sequence repeats, and single nucleotide polymorphisms.
Learning Grasp Context Distinctions that Generalize
NASA Technical Reports Server (NTRS)
Platt, Robert; Grupen, Roderic A.; Fagg, Andrew H.
2006-01-01
Control-based approaches to grasp synthesis create grasping behavior by sequencing and combining control primitives. In the absence of any other structure, these approaches must evaluate a large number of feasible control sequences as a function of object shape, object pose, and task. This work explores a new approach to grasp synthesis that limits consideration to variations on a generalized localize-reach-grasp control policy. A new learning algorithm, known as schema structured learning, is used to learn which instantiations of the generalized policy are most likely to lead to a successful grasp in different problem contexts. Two experiments are described where Dexter, a bimanual upper torso, learns to select an appropriate grasp strategy as a function of object eccentricity and orientation. In addition, it is shown that grasp skills learned in this way can generalize to new objects. Results are presented showing that after learning how to grasp a small, representative set of objects, the robot's performance quantitatively improves for similar objects that it has not experienced before.
O'Rahilly, R; Müller, F; Hutchins, G M; Moore, G W
1984-11-01
The sequence of events in the development of the brain in staged human embryos was investigated in much greater detail than in previous studies by listing 100 features in 165 embryos of the first 5 weeks. Using a computerized bubble-sort algorithm, individual embryos were ranked in ascending order of the features present. This procedure made feasible an appreciation of the slight variation found in the developmental features. The vast majority of features appeared during either one or two stages (about 2 or 3 days). In general, the soundness of the Carnegie system of embryonic staging was amply confirmed. The rhombencephalon was found to show increasing complexity around stage 13, and the postoptic portion of the diencephalon underwent considerable differentiation by stage 15. The need for similar investigations of other systems of the body is emphasized, and the importance of such studies in assessing the timing of congenital malformations and in clarifying syndromic clusters is suggested.
Copy number analysis reveals a novel multiexon deletion of the COLQ gene in congenital myasthenia.
Wang, Wei; Wu, Yanhong; Wang, Chen; Jiao, Jinsong; Klein, Christopher J
2016-12-01
Congenital myasthenic syndrome (CMS) is genetically and clinically heterogeneous. 1 Despite a considerable number of causal genes discovered, many patients are left without a specific diagnosis after genetic testing. The presumption is that novel genes yet to be discovered will account for the majority of such patients. However, it is also possible that we are neglecting a type of genetic variation: copy number changes (>50 bp) as causal for some of these patients. Next-generation sequencing (NGS) can simultaneously screen all known causal genes 2 and is increasingly being validated to have a potential to identify copy number changes. 3 We present a CMS case who did not receive a genetic diagnosis from previous Sanger sequencing, but through a novel copy number analysis algorithm integrated into our targeted NGS panel, we discovered a novel copy number mutation in the COLQ gene and made a genetic diagnosis. This discovery expands the genotype-phenotype correlation of CMS, leads to improved genetic counsel, and allows for specific pharmacologic treatment. 1 .
Geomagnetic paleointensities from excursion sequences in lavas on Oahu, Hawaii
Coe, Robert S.; Gromme, Sherman; Mankinen, Edward A.
1984-01-01
Paleomagnetic data demonstrating three late Tertiary excursions in the direction of the geomagnetic field recorded in sequences of basaltic lavas on the island of Oahu, Hawaii were published by R. R. Doell and G. B. Dalrymple in 1973. We have determined geomagnetic paleointensities by the Thelliers' method for 14 lavas from the three sites. During these experiments, considerable difficulty was encountered because of the presence of titanomaghemite in many lavas and the contamination of natural remanent magnetization by lightning in many others. Moreover, we often observed the production of spurious high‐temperature chemical remanent magnetization during the Thellier experiments. An analysis of this particularly troublesome problem is presented. Two of the sites showed low paleointensities associated with angular departures of the paleomagnetic field direction from that of a geocentric axial dipole, which suggests that these excursions represent aborted reversals or fragments of reversals. At the third site, however, the paleointensity did not become low as the field diverged. This excursion may reflect the variation of a large nondipole source near Hawaii.
Probing energetics of Abeta fibril elongation by molecular dynamics simulations.
Takeda, Takako; Klimov, Dmitri K
2009-06-03
Using replica exchange molecular dynamics simulations and an all-atom implicit solvent model, we probed the energetics of Abeta(10-40) fibril growth. The analysis of the interactions between incoming Abeta peptides and the fibril led us to two conclusions. First, considerable variations in fibril binding propensities are observed along the Abeta sequence. The peptides in the fibril and those binding to its edge interact primarily through their N-termini. Therefore, the mutations affecting the Abeta positions 10-23 are expected to have the largest impact on fibril elongation compared with those occurring in the C-terminus and turn. Second, we performed weak perturbations of the binding free energy landscape by scanning partial deletions of side-chain interactions at various Abeta sequence positions. The results imply that strong side-chain interactions--in particular, hydrophobic contacts--impede fibril growth by favoring disordered docking of incoming peptides. Therefore, fibril elongation may be promoted by moderate reduction of Abeta hydrophobicity. The comparison with available experimental data is presented.
Theory of mind in early psychosis.
Langdon, Robyn; Still, Megan; Connors, Michael H; Ward, Philip B; Catts, Stanley V
2014-08-01
A deficit in theory of mind--the ability to infer and reason about the mental states of others - might underpin the poor social functioning of patients with psychosis. Unfortunately, however, there is considerable variation in how such a deficit is assessed. The current study compared three classic tests of theory of mind in terms of their ability to detect impairment in patients in the early stages of psychosis. Twenty-three patients within 2 years of their first psychotic episode and 19 healthy controls received picture-sequencing, joke-appreciation and story-comprehension tests of theory of mind. Whereas the picture-sequencing and joke-appreciation tests successfully detected a selective theory-of-mind deficit in patients, the story-comprehension test did not. The findings suggest that tests that place minimal demands on language processing and involve indirect, rather than explicit, instructions to assess theory of mind might be best suited to detecting theory-of-mind impairment in early stages of psychosis. © 2013 Wiley Publishing Asia Pty Ltd.
Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E
2012-07-01
Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Nichols, Krista M; Kozfkay, Christine C; Narum, Shawn R
2016-12-01
Conservation of life history variation is an important consideration for many species with trade-offs in migratory characteristics. Many salmonid species exhibit both resident and migratory strategies that capitalize on benefits in freshwater and marine environments. In this study, we investigated genomic signatures for migratory life history in collections of resident and anadromous Oncorhynchus nerka (Kokanee and Sockeye Salmon, respectively) from two lake systems, using ~2,600 SNPs from restriction-site-associated DNA sequencing (RAD-seq). Differing demographic histories were evident in the two systems where one pair was significantly differentiated (Redfish Lake, F ST = 0.091 [95% confidence interval: 0.087 to 0.095]) but the other pair was not (Alturas Lake, F ST = -0.007 [-0.008 to -0.006]). Outlier and association analyses identified several candidate markers in each population pair, but there was limited evidence for parallel signatures of genomic variation associated with migration. Despite lack of evidence for consistent markers associated with migratory life history in this species, candidate markers were mapped to functional genes and provide evidence for adaptive genetic variation within each lake system. Life history variation has been maintained in these nearly extirpated populations of O. nerka, and conservation efforts to preserve this diversity are important for long-term resiliency of this species.
HIV-1 sequence variation between isolates from mother-infant transmission pairs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wike, C.M.; Daniels, M.R.; Furtado, M.
1991-12-31
To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between eachmore » linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.« less
High levels of MHC class II allelic diversity in lake trout from Lake Superior
Dorschner, M.O.; Duris, T.; Bronte, C.R.; Burnham-Curtis, M. K.; Phillips, R.B.
2000-01-01
Sequence variation in a 216 bp portion of the major histocompatibility complex (MHC) II B1 domain was examined in 74 individual lake trout (Salvelinus namaycush) from different locations in Lake Superior. Forty-three alleles were obtained which encoded 71-72 amino acids of the mature protein. These sequences were compared with previous data obtained from five Pacific salmon species and Atlantic salmon using the same primers. Although all of the lake trout alleles clustered together in the neighbor-joining analysis of amino acid sequences, one amino acid allelic lineage was shared with Atlantic salmon (Salmo salar), a species in another genus which probably diverged from Salvelinus more than 10-20 million years ago. As shown previously in other salmonids, the level of nonsynonymous nucleotide substitution (d(N)) exceeded the level of synonymous substitution (d(S)). The level of nucleotide diversity at the MHC class II B1 locus was considerably higher in lake trout than in the Pacific salmon (genus Oncorhynchus). These results are consistent with the hypothesis that lake trout colonized Lake Superior from more than one refuge following the Wisconsin glaciation. Recent population bottlenecks may have reduced nucleotide diversity in Pacific salmon populations.
Finding cancer driver mutations in the era of big data research.
Poulos, Rebecca C; Wong, Jason W H
2018-04-02
In the last decade, the costs of genome sequencing have decreased considerably. The commencement of large-scale cancer sequencing projects has enabled cancer genomics to join the big data revolution. One of the challenges still facing cancer genomics research is determining which are the driver mutations in an individual cancer, as these contribute only a small subset of the overall mutation profile of a tumour. Focusing primarily on somatic single nucleotide mutations in this review, we consider both coding and non-coding driver mutations, and discuss how such mutations might be identified from cancer sequencing datasets. We describe some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes. We also address the use of genome-wide variation in mutation load to establish background mutation rates from which to identify driver mutations under positive selection. Finally, we describe the ways in which mutational signatures can act as clues for the identification of cancer drivers, as these mutations may cause, or arise from, certain mutational processes. By defining the molecular changes responsible for driving cancer development, new cancer treatment strategies may be developed or novel preventative measures proposed.
McCutchen-Maloney, Sandra L.
2002-01-01
DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.
Setoh, Yin Xiang; Amarilla, Alberto A; Peng, Nias Y; Slonchak, Andrii; Periasamy, Parthiban; Figueiredo, Luiz T M; Aquino, Victor H; Khromykh, Alexander A
2018-01-01
Rocio virus (ROCV) is an arbovirus belonging to the genus Flavivirus, family Flaviviridae. We present an updated sequence of ROCV strain SPH 34675 (GenBank: AY632542.4), the only available full genome sequence prior to this study. Using next-generation sequencing of the entire genome, we reveal substantial sequence variation from the prototype sequence, with 30 nucleotide differences amounting to 14 amino acid changes, as well as significant changes to predicted 3'UTR RNA structures. Our results present an updated and corrected sequence of a potential emerging human-virulent flavivirus uniquely indigenous to Brazil (GenBank: MF461639).
Global variation in CYP2C8–CYP2C9 functional haplotypes
Speed, William C; Kang, Soonmo Peter; Tuck, David P; Harris, Lyndsay N; Kidd, Kenneth K
2009-01-01
We have studied the global frequency distributions of 10 single nucleotide polymorphisms (SNPs) across 132 kb of CYP2C8 and CYP2C9 in ∼2500 individuals representing 45 populations. Five of the SNPs were in noncoding sequences; the other five involved the more common missense variants (four in CYP2C8, one in CYP2C9) that change amino acids in the gene products. One haplotype containing two CYP2C8 coding variants and one CYP2C9 coding variant reaches an average frequency of 10% in Europe; a set of haplotypes with a different CYP2C8 coding variant reaches 17% in Africa. In both cases these haplotypes are found in other regions of the world at <1%. This considerable geographic variation in haplotype frequencies impacts the interpretation of CYP2C8/CYP2C9 association studies, and has pharmacogenomic implications for drug interactions. PMID:19381162
USDA-ARS?s Scientific Manuscript database
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...
Human Genome Sequencing in Health and Disease
Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.
2013-01-01
Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.
Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi
2016-03-01
Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Liu, Nian; Huang, Yuan
2010-01-01
The complete 15,599-bp mitogenome of Acrida cinerea was determined and compared with that of the other 20 orthopterans. It displays characteristic gene content, genome organization, nucleotide composition, and codon usage found in other Caelifera mitogenomes. Comparison of 21 orthopteran sequences revealed that the tRNAs encoded by the H-strand appear more conserved than those by the L-stand. All tRNAs form the typical clover-leaf structure except trnS (agn), and most of the size variation among tRNAs stemmed from the length variation in the arm and loop of TΨC and the loop of DHU. The derived secondary structure models of the rrnS and rrnL from 21 orthoptera species closely resemble those from other insects on CRW except a considerably enlarged loop of helix 1399 of rrnS in Caelifera, which is a potentially autapomorphy of Caelifera. In the A+T-rich region, tandem repeats are not only conserved in the closely related mitogenome but also share some conserved motifs in the same subfamily. A stem-loop structure, 16 bp or longer, is likely to be involved in replication initiation in Caelifera and Grylloidea. A long T-stretch (>17 bp) with conserved stem-loop structure next to rrnS on the H-strand, bounded by a purine at either end, exists in the three species from Tettigoniidae. PMID:21197069
Genetic variants of neurotransmitter-related genes and miRNAs in Egyptian autistic patients.
Salem, Ahmed M; Ismail, Samira; Zarouk, Waheba A; Abdul Baky, Olwya; Sayed, Ahmed A; Abd El-Hamid, Sawsan; Salem, Sohair
2013-01-01
Autism is a neurodevelopmental disorder with indisputable evidence for a genetic component. This work studied the association of autism with genetic variations in neurotransmitter-related genes, including MAOA uVNTR, MAOB rs1799836, and DRD2 TaqI A in 53 autistic patients and 30 healthy individuals. The study also analyzed sequence variations of miR-431 and miR-21. MAOA uVNTR was genotyped by PCR, MAOB and DRD2 polymorphisms were analyzed by PCR-based RFLP, and miR-431 and miR-21 were sequenced. Low expressing allele of MAOA uVNTR was frequently higher in female patients compared to that in controls (OR = 2.25). MAOB G allele frequency was more significantly increased in autistic patients than in controls (P < 0.001 for both males and females). DRD2 A1+ genotype increased autism risk (OR = 5.1). Severity of autism tends to be slightly affected by MAOA/B genotype. Plasma MAOB activity was significantly reduced in G than in A allele carrying males. There was no significant difference in patients and maternal plasma MAOA/B activity compared to controls. Neither mutations nor SNPs in miR-431 and miR-21 were found among studied patients. This study threw light on some neurotransmitter-related genes suggesting their potential role in Autism pathogenesis that warrants further studies and much consideration.
Strong, Kimberly A; Zusevics, Kaija L; Bick, David P; Veith, Regan
2014-10-01
Use of genome sequencing in the clinic continues to increase. In addition to its potential to provide findings of clinical benefit, it also has the potential to identify findings unrelated to the indication for testing (incidental findings). Incidental findings are the subject of considerable debate, particularly following the publication of recommendations by the American College of Medical Genetics and Genomics. This debate involves how and which results should be returned as well as stakeholders' desires for such results. Part of the difficulty in determining best practice in relation to returning incidental findings is the dearth of empirical data available regarding laypersons' attitudes and desire for the sometimes controversial information. In an effort to contribute data on views regarding the return of incidental findings following genome sequencing in a clinical setting, a survey specifically designed around the various types of incidental findings that occur, ranging from clinically actionable to nonactionable, was administered to a nonmedical population of medical coders working at a medical school (N = 97). Almost all (98%) of the respondents were women, 80% had 6 or more years of experience as a medical coder, and about three-fourths (74%) of participants reported that they had children. The group surveyed was considerably more interested in receiving all types of results for both themselves and their children than previously surveyed genetics professionals. Results from this study offer a snapshot of opinions beyond those of the professional genetic community and demonstrate a striking difference between genetic professionals and a more lay population in terms of their attitudes and desires regarding the return of incidental findings. Additional research is needed to explain the nuances in the perspectives motivating these variations.
Mtambo, Jupiter; Madder, Maxime; Van Bortel, Wim; Chaka, George; Berkvens, Dirk; Backeljau, Thierry
2007-01-01
Studies in the biology, ecology and behaviour of R. appendiculatus in Zambia have shown considerable variation within and between populations often associated with their geographical origin. We studied variation in the mitochondrial COI (mtCOI) gene of adult R. appendiculatus ticks originating from the Eastern and Southern provinces of Zambia. Rhipicephalus appendiculatus ticks from the two provinces were placed into two groups on the mtCOI sequence data tree. One group comprised all haplotypes of specimens from the Eastern province plateau districts of Chipata and Petauke. The second group consisted of a single haplotype of specimens from the Southern province districts and Nyimba, an Eastern province district on the fringes of the valley. This variation provides additional evidence to the earlier observations in the 12S rDNA and ITS2 data for the geographic subdivision of R. appendiculatus from Southern province and Eastern province plateau. The geographic subdivision further corresponds with differences in body size and diapause between R. appendiculatus from these geographic areas. The possible implications of these findings on the epidemiology of East Coast fever (ECF) the disease for which R. appendiculatus is one of the vectors are discussed.
Exploring molecular variation in Schistosoma japonicum in China.
Young, Neil D; Chan, Kok-Gan; Korhonen, Pasi K; Min Chong, Teik; Ee, Robson; Mohandas, Namitha; Koehler, Anson V; Lim, Yan-Lue; Hofmann, Andreas; Jex, Aaron R; Qian, Baozhen; Chilton, Neil B; Gobert, Geoffrey N; McManus, Donald P; Tan, Patrick; Webster, Bonnie L; Rollinson, David; Gasser, Robin B
2015-12-01
Schistosomiasis is a neglected tropical disease that affects more than 200 million people worldwide. The main disease-causing agents, Schistosoma japonicum, S. mansoni and S. haematobium, are blood flukes that have complex life cycles involving a snail intermediate host. In Asia, S. japonicum causes hepatointestinal disease (schistosomiasis japonica) and is challenging to control due to a broad distribution of its snail hosts and range of animal reservoir hosts. In China, extensive efforts have been underway to control this parasite, but genetic variability in S. japonicum populations could represent an obstacle to eliminating schistosomiasis japonica. Although a draft genome sequence is available for S. japonicum, there has been no previous study of molecular variation in this parasite on a genome-wide scale. In this study, we conducted the first deep genomic exploration of seven S. japonicum populations from mainland China, constructed phylogenies using mitochondrial and nuclear genomic data sets, and established considerable variation between some of the populations in genes inferred to be linked to key cellular processes and/or pathogen-host interactions. Based on the findings from this study, we propose that verifying intraspecific conservation in vaccine or drug target candidates is an important first step toward developing effective vaccines and chemotherapies against schistosomiasis.
Amaral, Danilo T; Oliveira, Gabriela; Silva, Jaqueline R; Viviani, Vadim R
2016-08-31
Bioluminescent click-beetles display a wide variation of bioluminescence colors ranging from green to orange, including an unusual intra-specific color variation in the Jamaican Pyrophorus plagiophthalamus. Recently, we collected individuals of the Pyrophorus angustus species from the Southern Amazon forest, in Brazil, which displays an orange light emitting abdominal lantern. This species was also previously described from Central America, but displaying a bioluminescence spectrum from 536 nm (dorsal) to 578 nm (ventral). The biogeographic variation of the bioluminescence color in this species could be an adaptation to environmental reflectance and inter/intraspecific sexual competition. Here, we cloned, sequenced, characterized and performed site-direct mutagenesis of this new orange emitting luciferase. The in vitro luciferase spectrum displayed a peak at 594 nm, KM values for ATP and d-luciferin of 160 μM and 17 μM, respectively, and an optimum pH of approximately 8.5. Comparative multialignment and site-directed mutagenesis using different color emitting click-beetle luciferases from P. angustus, Fulgeochlizus bruchi and Pyrearinus termitilluminans luciferases cloned by our group showed an integral role of residue 247 in bioluminescence color modulation.
Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen
2009-03-01
We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.
2016-06-01
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.
2016-01-01
Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631
Awua, Adolf K; Adanu, Richard M K; Wiredu, Edwin K; Afari, Edwin A; Zubuch, Vanessa A; Asmah, Richard H; Severini, Alberto
2017-04-21
In addition to being useful for classification, sequence variations of human Papillomavirus (HPV) genotypes have been implicated in differential oncogenic potential and a differential association with the different histological forms of invasive cervical cancer. These associations have also been indicated for HPV genotype lineages and sub-lineages. In order to better understand the potential implications of lineage variation in the occurrence of cervical cancers in Ghana, we studied the lineages of the three most prevalent HPV genotypes among women with normal cytology as baseline to further studies. Of previously collected self- and health personnel-collected cervical specimen, 54, which were positive for HPV16, 18 and 45, were selected and the long control region (LCR) of each HPV genotype was separately amplified by a nested PCR. DNA sequences of 41 isolates obtained with the forward and reverse primers by Sanger sequencing were analysed. Nucleotide sequence variations of the HPV16 genotypes were observed at 30 positions within the LCR (7460 - 7840). Of these, 19 were the known variations for the lineages B and C (African lineages), while the other 11 positions had variations unique to the HPV16 isolates of this study. For the HPV18 isolates, the variations were at 35 positions, 22 of which were known variations of Africa lineages and the other 13 were unique variations observed for the isolates obtained in this study (at positions 7799 and 7813). HPV45 isolates had variations at 35 positions and 2 (positions 7114 and 97) were unique to the isolates of this study. This study provides the first data on the lineages of HPV 16, 18 and 45 isolates from Ghana. Although the study did not obtain full genome sequence data for a comprehensive comparison with known lineages, these genotypes were predominately of the Africa lineages and had some unique sequence variations at positions that suggest potential oncogenic implications. These data will be useful for comparison with lineages of these genotypes from women with cervical lesion and all the forms of invasive cervical cancers.
Standish, Kristopher A; Carland, Tristan M; Lockwood, Glenn K; Pfeiffer, Wayne; Tatineni, Mahidhar; Huang, C Chris; Lamberth, Sarah; Cherkas, Yauheniya; Brodmerkel, Carrie; Jaeger, Ed; Smith, Lance; Rajagopal, Gunaretnam; Curran, Mark E; Schork, Nicholas J
2015-09-22
Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.
Characterization of genetic sequence variation of 58 STR loci in four major population groups.
Novroski, Nicole M M; King, Jonathan L; Churchill, Jennifer D; Seah, Lay Hong; Budowle, Bruce
2016-11-01
Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Variation block-based genomics method for crop plants.
Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon
2014-06-15
In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.
Ekanayake, Saliya; Ruan, Yang; Schütte, Ursel M. E.; Kaonongbua, Wittaya; Fox, Geoffrey; Ye, Yuzhen; Bever, James D.
2016-01-01
ABSTRACT Arbuscular mycorrhizal (AM) fungi form mutualisms with plant roots that increase plant growth and shape plant communities. Each AM fungal cell contains a large amount of genetic diversity, but it is unclear if this diversity varies across evolutionary lineages. We found that sequence variation in the nuclear large-subunit (LSU) rRNA gene from 29 isolates representing 21 AM fungal species generally assorted into genus- and species-level clades, with the exception of species of the genera Claroideoglomus and Entrophospora. However, there were significant differences in the levels of sequence variation across the phylogeny and between genera, indicating that it is an evolutionarily constrained trait in AM fungi. These consistent patterns of sequence variation across both phylogenetic and taxonomic groups pose challenges to interpreting operational taxonomic units (OTUs) as approximations of species-level groups of AM fungi. We demonstrate that the OTUs produced by five sequence clustering methods using 97% or equivalent sequence similarity thresholds failed to match the expected species of AM fungi, although OTUs from AbundantOTU, CD-HIT-OTU, and CROP corresponded better to species than did OTUs from mothur or UPARSE. This lack of OTU-to-species correspondence resulted both from sequences of one species being split into multiple OTUs and from sequences of multiple species being lumped into the same OTU. The OTU richness therefore will not reliably correspond to the AM fungal species richness in environmental samples. Conservatively, this error can overestimate species richness by 4-fold or underestimate richness by one-half, and the direction of this error will depend on the genera represented in the sample. IMPORTANCE Arbuscular mycorrhizal (AM) fungi form important mutualisms with the roots of most plant species. Individual AM fungi are genetically diverse, but it is unclear whether the level of this diversity differs among evolutionary lineages. We found that the amount of sequence variation in an rRNA gene that is commonly used to identify AM fungal species varied significantly between evolutionary groups that correspond to different genera, with the exception of two genera that are genetically indistinguishable from each other. When we clustered groups of similar sequences into operational taxonomic units (OTUs) using five different clustering methods, these patterns of sequence variation caused the number of OTUs to either over- or underestimate the actual number of AM fungal species, depending on the genus. Our results indicate that OTU-based inferences about AM fungal species composition from environmental sequences can be improved if they take these taxonomically structured patterns of sequence variation into account. PMID:27260357
Polymorphism in the Eruption Sequence of Primary Dentition: A Cross-sectional Study
Bhojraj, Nandlal; Narayanappa
2017-01-01
Introduction Primary teeth have shown wide variations in their eruption time among different population. Population specific eruption ages are provided as mean with standard deviations or median ages with its percentile range. This alone will be insufficient for prediction of tooth eruption sequence because they provide no information on the frequency of sequence variation within the pairs of teeth. Norms of polymorphic variation in the eruption sequence can be more useful. Aim This study aims at providing norms for the sequence polymorphism in primary teeth among the children of Mysore population. Materials and Methods A cross-sectional study was designed with 1392 children, recruited from December 2015 to June 2016 by simple random sampling method. Tooth was recorded as present or absent. Across the entire possible intra quadrant tooth pair, cases of present-present, absent-absent, present-absent and absent-present and were counted and computed as percentages. Results Sequence polymorphisms were more common in 82-84 pairs of teeth. Significant polymorphic reverse sequence was observed in 52-54 (9%), 82-84 (35%) in males and 82-84 (18%) in females. There was no polymorphism in maxillary arch in females. Conclusion The present study provides the baseline data values for sequence variation in primary teeth eruption. To the best of investigators knowledge, there are no previous studies describing the sequence polymorphism in primary teeth in Indian population. The results of this study helps in assessment of eruption sequence problems in paediatric dentistry and in evaluation and prediction of tooth eruption sequence in individual child. PMID:28658912
2010-01-01
Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441
Spuesens, Emiel B M; Oduber, Minoushka; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis
2009-07-01
The gene encoding major adhesin protein P1 of Mycoplasma pneumoniae, MPN141, contains two DNA sequence stretches, designated RepMP2/3 and RepMP4, which display variation among strains. This variation allows strains to be differentiated into two major P1 genotypes (1 and 2) and several variants. Interestingly, multiple versions of the RepMP2/3 and RepMP4 elements exist at other sites within the bacterial genome. Because these versions are closely related in sequence, but not identical, it has been hypothesized that they have the capacity to recombine with their counterparts within MPN141, and thereby serve as a source of sequence variation of the P1 protein. In order to determine the variation within the RepMP2/3 and RepMP4 elements, both within the bacterial genome and among strains, we analysed the DNA sequences of all RepMP2/3 and RepMP4 elements within the genomes of 23 M. pneumoniae strains. Our data demonstrate that: (i) recombination is likely to have occurred between two RepMP2/3 elements in four of the strains, and (ii) all previously described P1 genotypes can be explained by inter-RepMP recombination events. Moreover, the difference between the two major P1 genotypes was reflected in all RepMP elements, such that subtype 1 and 2 strains can be differentiated on the basis of sequence variation in each RepMP element. This implies that subtype 1 and subtype 2 strains represent evolutionarily diverged strain lineages. Finally, a classification scheme is proposed in which the P1 genotype of M. pneumoniae isolates can be described in a sequence-based, universal fashion.
Child Development and Structural Variation in the Human Genome
ERIC Educational Resources Information Center
Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.
2013-01-01
Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…
Mukherjee, Sanchita; Kailasam, Senthilkumar; Bansal, Manju; Bhattacharyya, Dhananjay
2014-01-01
Double helical structures of DNA and RNA are mostly determined by base pair stacking interactions, which give them the base sequence-directed features, such as small roll values for the purine-pyrimidine steps. Earlier attempts to characterize stacking interactions were mostly restricted to calculations on fiber diffraction geometries or optimized structure using ab initio calculations lacking variation in geometry to comment on rather unusual large roll values observed in AU/AU base pair step in crystal structures of RNA double helices. We have generated stacking energy hyperspace by modeling geometries with variations along the important degrees of freedom, roll, and slide, which were chosen via statistical analysis as maximally sequence dependent. Corresponding energy contours were constructed by several quantum chemical methods including dispersion corrections. This analysis established the most suitable methods for stacked base pair systems despite the limitation imparted by number of atom in a base pair step to employ very high level of theory. All the methods predict negative roll value and near-zero slide to be most favorable for the purine-pyrimidine steps, in agreement with Calladine's steric clash based rule. Successive base pairs in RNA are always linked by sugar-phosphate backbone with C3'-endo sugars and this demands C1'-C1' distance of about 5.4 Å along the chains. Consideration of an energy penalty term for deviation of C1'-C1' distance from the mean value, to the recent DFT-D functionals, specifically ωB97X-D appears to predict reliable energy contour for AU/AU step. Such distance-based penalty improves energy contours for the other purine-pyrimidine sequences also. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 107-120, 2014. Copyright © 2013 Wiley Periodicals, Inc.
Senevirathne, Gayani; Kerney, Ryan
2017-01-01
Rhacophoridae, a family of morphologically cryptic frogs, with many genetically distinct evolutionary lineages, is understudied with respect to skeletal morphology, life history traits and skeletal ontogeny. Here we analyze two species each from two sister lineages, Taruga and Polypedates, and compare their postembryonic skeletal ontogeny, larval chondrocrania and adult osteology in the context of a well-resolved phylogeny. We further compare these ontogenetic traits with the direct-developing Pseudophilautus silus. For each species, we differentially stained a nearly complete developmental series of tadpoles from early postembryonic stages through metamorphosis to determine the intraspecific and interspecific differences of cranial and postcranial bones. Chondrocrania of the four species differ in 1) size; 2) presence/absence of anterolateral and posterior process; and 3) shape of the suprarostral cartilages. Interspecific variation of ossification sequences is limited during early stages, but conspicuous during later development. Early cranial ossification is typical of other anuran larvae, where the frontoparietal, exoccipital and parasphenoid ossify first. The ossification sequences of the cranial bones vary considerably within the four species. Both species of Taruga show a faster cranial ossification rate than Polypedates. Seven cranial bones form when larvae near metamorphic climax. Ossification of all 18 cranial bones is initiated by larval Gosner stage 46 in T. eques. However, some cranial bone formation is not initiated until after metamorphosis in the other three species. Postcranial sequence does not vary significantly. The comparison of adult osteology highlights two characters, which have not been previously recorded: presence/absence of the parieto-squamosal plates and bifurcated base of the omosternum. This study will provide a starting point for comparative analyses of rhacophorid skeletal ontogeny and facilitate the study of the evolution of ontogenetic repatterning associated with the life history variation in the family. PMID:28060923
Genome skimming identifies polymorphism in tern populations and species
2012-01-01
Background Terns (Charadriiformes: Sterninae) are a lineage of cosmopolitan shorebirds with a disputed evolutionary history that comprises several species of conservation concern. As a non-model system in genetics, previous study has left most of the nuclear genome unexplored, and population-level studies are limited to only 15% of the world's species of terns and noddies. Screening of polymorphic nuclear sequence markers is needed to enhance genetic resolution because of supposed low mitochondrial mutation rate, documentation of nuclear insertion of hypervariable mitochondrial regions, and limited success of microsatellite enrichment in terns. Here, we investigated the phylogenetic and population genetic utility for terns and relatives of a variety of nuclear markers previously developed for other birds and spanning the nuclear genome. Markers displaying a variety of mutation rates from both the nuclear and mitochondrial genome were tested and prioritized according to optimal cross-species amplification and extent of genetic polymorphism between (1) the main tern clades and (2) individual Royal Terns (Thalasseus maxima) breeding on the US East Coast. Results Results from this genome skimming effort yielded four new nuclear sequence-based markers for tern phylogenetics and 11 intra-specific polymorphic markers. Further, comparison between the two genomes indicated a phylogenetic conflict at the base of terns, involving the inclusion (mitochondrial) or exclusion (nuclear) of the Angel Tern (Gygis alba). Although limited mitochondrial variation was confirmed, both nuclear markers and a short tandem repeat in the mitochondrial control region indicated the presence of considerable genetic variation in Royal Terns at a regional scale. Conclusions These data document the value of intronic markers to the study of terns and allies. We expect that these and additional markers attained through next-generation sequencing methods will accurately map the genetic origin and species history of this group of birds. PMID:22333071
Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing
Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther
2015-01-01
Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256
Liu, Di; Zhang, Xiang-Bin; Yan, Zhuan-Qiang; Chen, Feng; Ji, Jun; Qin, Jian-Ping; Li, Hai-Yan; Lu, Jun-Peng; Xue, Yu; Liu, Jia-Jia; Xie, Qing-Mei; Ma, Jing-Yun; Xue, Chun-Yi; Bee, Ying-Zuo
2013-06-01
Infectious bursal disease virus (IBDV) is a double-stranded RNA virus that causes immunosuppressive disease in young chickens. Thousands of cases of IBDV infection are reported each year in South China, and these infections can result in considerable economic losses to the poultry industry. To monitor variations of the virus during the outbreaks, 30 IBDVs were identified from vaccinated chicken flocks from nine provinces in South China in 2011. VP2 fragments from different virus strains were sequenced and analyzed by comparison with the published sequences of IBDV strains from China and around the world. Phylogenetic analysis of hypervariable regions of the VP2 (vVP2) gene showed that 29 of the isolates were very virulent (vv) IBDVs, and were closely related to vvIBDV strains from Europe and Asia. Alignment analysis of the deduced amino acid (aa) sequences of vVP2 showed the 29 vv isolates had high uniformity, indicated low variability and slow evolution of the virus. The non-vvIBDV isolate JX2-11 was associated with higher than expected mortality, and had high deduced aa sequence similarity (99.2 %) with the attenuated vaccine strain B87 (BJ). The present study has demonstrated the continued circulation of IBDV strains in South China, and emphasizes the importance of reinforcing IBDV surveillance.
Sampson, Juliana K.; Sheth, Nihar U.; Koparde, Vishal N.; Scalora, Allison F.; Serrano, Myrna G.; Lee, Vladimir; Roberts, Catherine H.; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H.; Buck, Gregory A.; Neale, Michael C.; Toor, Amir A.
2016-01-01
Summary Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. PMID:24749631
An, Z; Tang, Z; Ma, B; Mason, A S; Guo, Y; Yin, J; Gao, C; Wei, L; Li, J; Fu, D
2014-07-01
Although many studies have shown that transposable element (TE) activation is induced by hybridisation and polyploidisation in plants, much less is known on how different types of TE respond to hybridisation, and the impact of TE-associated sequences on gene function. We investigated the frequency and regularity of putative transposon activation for different types of TE, and determined the impact of TE-associated sequence variation on the genome during allopolyploidisation. We designed different types of TE primers and adopted the Inter-Retrotransposon Amplified Polymorphism (IRAP) method to detect variation in TE-associated sequences during the process of allopolyploidisation between Brassica rapa (AA) and Brassica oleracea (CC), and in successive generations of self-pollinated progeny. In addition, fragments with TE insertions were used to perform Blast2GO analysis to characterise the putative functions of the fragments with TE insertions. Ninety-two primers amplifying 548 loci were used to detect variation in sequences associated with four different orders of TE sequences. TEs could be classed in ascending frequency into LTR-REs, TIRs, LINEs, SINEs and unknown TEs. The frequency of novel variation (putative activation) detected for the four orders of TEs was highest from the F1 to F2 generations, and lowest from the F2 to F3 generations. Functional annotation of sequences with TE insertions showed that genes with TE insertions were mainly involved in metabolic processes and binding, and preferentially functioned in organelles. TE variation in our study severely disturbed the genetic compositions of the different generations, resulting in inconsistencies in genetic clustering. Different types of TE showed different patterns of variation during the process of allopolyploidisation. © 2013 German Botanical Society and The Royal Botanical Society of the Netherlands.
Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.
Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D
2016-10-01
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Mining sequence variations in representative polyploid sugarcane germplasm accessions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiping; Song, Jian; You, Qian
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
Mining sequence variations in representative polyploid sugarcane germplasm accessions
Yang, Xiping; Song, Jian; You, Qian; ...
2017-08-09
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
Kintz, Erica; Heiss, Christian; Black, Ian; ...
2017-02-06
Salmonella enterica serovar Typhi is a human-restricted Gram-negative bacterial pathogen responsible for causing an estimated 27 million cases of typhoid fever annually, leading to 217,000 deaths, and current vaccines do not offer full protection. The O-antigen side chain of the lipopolysaccharide is an immunodominant antigen, can define host-pathogen interactions, and is under consideration as a vaccine target for some Gram-negative species. The composition of the O-antigen can be modified by the activity of glycosyltransferase (gtr) operons acquired by horizontal gene transfer. Here we investigate the role of two gtr operons that we identified in the S. Typhi genome. Strains weremore » engineered to express specific gtr operons. Full chemical analysis of the O-antigens of these strains identified gtr-dependent glucosylation and acetylation. The glucosylated form of the O-antigen mediated enhanced survival in human serum and decreased complement binding. A single nucleotide deviation from an epigenetic phase variation signature sequence rendered the expression of this glucosylating gtr operon uniform in the population. In contrast, the expression of the acetylating gtrC gene is controlled by epigenetic phase variation. Acetylation did not affect serum survival, but phase variation can be an immune evasion mechanism, and thus, this modification may contribute to persistence in a host. In murine immunization studies, both O-antigen modifications were generally immunodominant. Our results emphasize that natural O-antigen modifications should be taken into consideration when assessing responses to vaccines, especially O-antigen-based vaccines, and that the Salmonella gtr repertoire may confound the protective efficacy of broad-ranging Salmonella lipopolysaccharide conjugate vaccines.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kintz, Erica; Heiss, Christian; Black, Ian
Salmonella enterica serovar Typhi is a human-restricted Gram-negative bacterial pathogen responsible for causing an estimated 27 million cases of typhoid fever annually, leading to 217,000 deaths, and current vaccines do not offer full protection. The O-antigen side chain of the lipopolysaccharide is an immunodominant antigen, can define host-pathogen interactions, and is under consideration as a vaccine target for some Gram-negative species. The composition of the O-antigen can be modified by the activity of glycosyltransferase (gtr) operons acquired by horizontal gene transfer. Here we investigate the role of two gtr operons that we identified in the S. Typhi genome. Strains weremore » engineered to express specific gtr operons. Full chemical analysis of the O-antigens of these strains identified gtr-dependent glucosylation and acetylation. The glucosylated form of the O-antigen mediated enhanced survival in human serum and decreased complement binding. A single nucleotide deviation from an epigenetic phase variation signature sequence rendered the expression of this glucosylating gtr operon uniform in the population. In contrast, the expression of the acetylating gtrC gene is controlled by epigenetic phase variation. Acetylation did not affect serum survival, but phase variation can be an immune evasion mechanism, and thus, this modification may contribute to persistence in a host. In murine immunization studies, both O-antigen modifications were generally immunodominant. Our results emphasize that natural O-antigen modifications should be taken into consideration when assessing responses to vaccines, especially O-antigen-based vaccines, and that the Salmonella gtr repertoire may confound the protective efficacy of broad-ranging Salmonella lipopolysaccharide conjugate vaccines.« less
Lacerra, Giuseppina; Fiorito, Mirella; Musollino, Gennaro; Di Noce, Francesca; Esposito, Maria; Nigro, Vincenzo; Gaudiano, Carlo; Carestia, Clementina
2004-10-01
The alpha-globin chains are encoded by two duplicated genes (HBA2 and HBA1, 5'-3') showing overall sequence homology >96% and average CG content >60%. alpha-Thalassemia, the most prevalent worldwide autosomal recessive disorder, is a hereditary anemia caused by sequence variations of these genes in about 25% of carriers. We evaluated the overall sensitivity and suitability of DHPLC and DG-DGGE in scanning both the alpha-globin genes by carrying out a retrospective analysis of 19 variant alleles in 29 genotypes. The HBA2 alleles c.1A>G, c.79G>A, and c.281T>G, and the HBA1 allele c.475C>A were new. Three pathogenic sequence variations were associated in cis with nonpathogenic variations in all families studied; they were the HBA2 variation c.2T>C associated with c.-24C>G, and the HBA2 variations c.391G>C and c.427T>C, both associated with c.565G>A. We set up original experimental conditions for DHPLC and DG-DGGE and analyzed 10 normal subjects, 46 heterozygotes, seven homozygotes, seven compound heterozygotes, and six compound heterozygotes for a hybrid gene. Both the methodologies gave reproducible results and no false-positive was detected. DHPLC showed 100% sensitivity and DG-DGGE nearly 90%. About 100% of the sequence from the cap site to the polyA addition site could be scanned by DHPLC, about 87% by DG-DGGE. It is noteworthy that the three most common pathogenic sequence variations (HBA2 alleles c.2T>C, c.95+2_95+6del, and c.523A>G) were unambiguously detected by both the methodologies. Genotype diagnosis must be confirmed with PCR sequencing of single amplicons or with an allele-specific method. This study can be helpful for scanning genes with high CG content and offers a model suitable for duplicated genes with high homology. Copyright 2004 Wiley-Liss, Inc.
Read clouds uncover variation in complex regions of the human genome
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-01-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
Braus, Michael J; Graham, Linda E; Whitman, Thea L
2017-12-01
The branched periphytic green alga Cladophora glomerata, often abundant in nearshore waters of lakes and rivers worldwide, plays important ecosystem roles, some mediated by epibiotic microbiota that benefit from host-provided surface, organic C, and O 2 . Previous microscopy and high-throughput sequencing studies have indicated surprising epibiont taxonomic and functional diversity, but have not included adequate consideration of sample replication or the potential for spatial and temporal variation. Here, we report the results of 16S rRNA amplicon-based phylum-to-genus taxonomic analysis of Cladophora-associated bacterial epibiota sampled in replicate from three microsites and at six times during the open-water season of 2014, from the same lake locale (Picnic Point, Lake Mendota, Dane Co., WI, USA) explored by high-throughput sequencing studies in two previous years. Statistical methods were used to test null hypotheses that the bacterial community: (i) is homogeneous across microsites tested, and (ii) does not change over the course of a growth season or among successive years. Results indicated a dynamic microbial community that is more strongly influenced by sampling day during the growth season than by microsite variation. A surprising diversity of bacterial genera known to be associated with the key function of methane-oxidation (methanotrophy), including relatively high-abundance of Crenothrix, Methylomonas, Methylovulum, and Methylocaldum-showed intraseasonal and interannual variability possibly related to temperature differences, and microsite preferences possibly related to variation in methane abundance. By contrast, a core assemblage of bacterial genera seems to persist over a growth season and from year to year, possibly transmitted by a persistent attached host resting stage. © 2017 Phycological Society of America.
ACTG: novel peptide mapping onto gene models.
Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok
2017-04-15
In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Fishman, G A; Stone, E M; Grover, S; Derlacki, D J; Haines, H L; Hockey, R R
1999-04-01
To report the spectrum of ophthalmic findings in patients with Stargardt dystrophy or fundus flavimaculatus who have a specific sequence variation in the ABCR gene. Twenty-nine patients with Stargardt dystrophy or fundus flavimaculatus from different pedigrees were identified with possible disease-causing sequence variations in the ABCR gene from a group of 66 patients who were screened for sequence variations in this gene. Patients underwent a routine ocular examination, including slitlamp biomicroscopy and a dilated fundus examination. Fluorescein angiography was performed on 22 patients, and electroretinographic measurements were obtained on 24 of 29 patients. Kinetic visual fields were measured with a Goldmann perimeter in 26 patients. Single-strand conformation polymorphism analysis and DNA sequencing were used to identify variations in coding sequences of the ABCR gene. Three clinical phenotypes were observed among these 29 patients. In phenotype I, 9 of 12 patients had a sequence change in exon 42 of the ABCR gene in which the amino acid glutamic acid was substituted for glycine (Gly1961Glu). In only 4 of these 9 patients was a second possible disease-causing mutation found on the other ABCR allele. In addition to an atrophic-appearing macular lesion, phenotype I was characterized by localized perifoveal yellowish white flecks, the absence of a dark choroid, and normal electroretinographic amplitudes. Phenotype II consisted of 10 patients who showed a dark choroid and more diffuse yellowish white flecks in the fundus. None exhibited the Gly1961Glu change. Phenotype III consisted of 7 patients who showed extensive atrophic-appearing changes of the retinal pigment epithelium. Electroretinographic cone and rod amplitudes were reduced. One patient showed the Gly1961Glu change. A wide variation in clinical phenotype can occur in patients with sequence changes in the ABCR gene. In individual patients, a certain phenotype seems to be associated with the presence of a Gly1961Glu change in exon 42 of the ABCR gene. The identification of correlations between specific mutations in the ABCR gene and clinical phenotypes will better facilitate the counseling of patients on their visual prognosis. This information will also likely be important for future therapeutic trials in patients with Stargardt dystrophy.
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Intra-isolate genome variation in arbuscular mycorrhizal fungi persists in the transcriptome.
Boon, E; Zimmerman, E; Lang, B F; Hijri, M
2010-07-01
Arbuscular mycorrhizal fungi (AMF) are heterokaryotes with an unusual genetic makeup. Substantial genetic variation occurs among nuclei within a single mycelium or isolate. AMF reproduce through spores that contain varying fractions of this heterogeneous population of nuclei. It is not clear whether this genetic variation on the genome level actually contributes to the AMF phenotype. To investigate the extent to which polymorphisms in nuclear genes are transcribed, we analysed the intra-isolate genomic and cDNA sequence variation of two genes, the large subunit ribosomal RNA (LSU rDNA) of Glomus sp. DAOM-197198 (previously known as G. intraradices) and the POL1-like sequence (PLS) of Glomus etunicatum. For both genes, we find high sequence variation at the genome and transcriptome level. Reconstruction of LSU rDNA secondary structure shows that all variants are functional. Patterns of PLS sequence polymorphism indicate that there is one functional gene copy, PLS2, which is preferentially transcribed, and one gene copy, PLS1, which is a pseudogene. This is the first study that investigates AMF intra-isolate variation at the transcriptome level. In conclusion, it is possible that, in AMF, multiple nuclear genomes contribute to a single phenotype.
Whole-Genome Sequence Variation among Multiple Isolates of Pseudomonas aeruginosa
Spencer, David H.; Kas, Arnold; Smith, Eric E.; Raymond, Christopher K.; Sims, Elizabeth H.; Hastings, Michele; Burns, Jane L.; Kaul, Rajinder; Olson, Maynard V.
2003-01-01
Whole-genome shotgun sequencing was used to study the sequence variation of three Pseudomonas aeruginosa isolates, two from clonal infections of cystic fibrosis patients and one from an aquatic environment, relative to the genomic sequence of reference strain PAO1. The majority of the PAO1 genome is represented in these strains; however, at least three prominent islands of PAO1-specific sequence are apparent. Conversely, ∼10% of the sequencing reads derived from each isolate fail to align with the PAO1 backbone. While average sequence variation among all strains is roughly 0.5%, regions of pronounced differences were evident in whole-genome scans of nucleotide diversity. We analyzed two such divergent loci, the pyoverdine and O-antigen biosynthesis regions, by complete resequencing. A thorough analysis of isolates collected over time from one of the cystic fibrosis patients revealed independent mutations resulting in the loss of O-antigen synthesis alternating with a mucoid phenotype. Overall, we conclude that most of the PAO1 genome represents a core P. aeruginosa backbone sequence while the strains addressed in this study possess additional genetic material that accounts for at least 10% of their genomes. Approximately half of these additional sequences are novel. PMID:12562802
Kohyama, Tetsuo I; Omote, Keita; Nishida, Chizuko; Takenaka, Takeshi; Saito, Keisuke; Fujimoto, Satoshi; Masuda, Ryuichi
2015-01-01
Quantifying intraspecific genetic variation in functionally important genes, such as those of the major histocompatibility complex (MHC), is important in the establishment of conservation plans for endangered species. The MHC genes play a crucial role in the vertebrate immune system and generally show high levels of diversity, which is likely due to pathogen-driven balancing selection. The endangered Blakiston's fish owl (Bubo blakistoni) has suffered marked population declines on Hokkaido Island, Japan, during the past several decades due to human-induced habitat loss and fragmentation. We investigated the spatial and temporal patterns of genetic diversity in MHC class IIβ genes in Blakiston's fish owl, using massively parallel pyrosequencing. We found that the Blakiston's fish owl genome contains at least eight MHC class IIβ loci, indicating recent gene duplications. An analysis of sequence polymorphism provided evidence that balancing selection acted in the past. The level of MHC variation, however, was low in the current fish owl populations in Hokkaido: only 19 alleles were identified from 174 individuals. We detected considerable spatial differences in MHC diversity among the geographically isolated populations. We also detected a decline of MHC diversity in some local populations during the past decades. Our study demonstrated that the current spatial patterns of MHC variation in Blakiston's fish owl populations have been shaped by loss of variation due to the decline and fragmentation of populations, and that the short-term effects of genetic drift have counteracted the long-term effects of balancing selection.
Chen, Fen; Li, Juan; Sugiyama, Hiromu; Zhou, Dong-Hui; Song, Hui-Qun; Zhao, Guang-Hui; Zhu, Xing-Quan
2015-02-01
The present study examined sequence variability in the mitochondrial (mt) protein-coding genes cytochrome b (cytb), NADH dehydrogenase subunits 2 and 6 (nad2 and nad6) among 24 isolates of Schistosoma japonicum from different endemic regions in the Philippines, Japan and China. The complete cytb, nad2 and nad6 genes were amplified and sequenced separately from individual schistosome. Sequence variations for isolates from the Philippines were 0-0.5% for cytb, 0-0.6% for nad2, and 0-0.9% for nad6. Variation was 0-0.5%, 0.1-0.8%, 0-0.7% for corresponding genes for schistosome samples from mainland China. For worms in Japan, genetic variations were 0-0.2%, 0.1-0.2% and 0 for the three genes, respectively. Sequence variations were 0-1.0%, 0-1.8% and 0-1.1% for cytb, nad2 and nad6, respectively, among schistosome isolates from different geographical strains in the Philippines, Japan and China. Of the three countries, lowest sequence variations were found between isolates from mainland China and the Philippines and highest were detected between Japan and the Philippines in three mtDNA genes. Phylogenetic analyses based on the combined sequences of cytb, nad2 and nad6 revealed that all isolates in the Philippines clustered together sistered to samples from Yunnan and Zhejiang provinces in China, while isolates from Yamanashi in Japan were in a solitary clade. These results demonstrated the usefulness of the combined three mtDNA sequences for studying genetic diversity and population structure among S. japonicum isolates from the Philippines, China and Japan.
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor L.; Brow, Mary Ann D.; Dahlberg, James E.
2007-12-11
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Invasive cleavage of nucleic acids
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.
1999-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Invasive cleavage of nucleic acids
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.
2002-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow; Mary Ann D.; Dahlberg, James E.
2010-11-09
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.
2000-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann; Dahlberg, James E.
2005-04-05
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Jumping genes: Genomic ballast or powerhouse of biological diversification.
Choudhury, Rimjhim Roy; Parisod, Christian
2017-09-01
Studying hybridization has the potential to elucidate challenging questions in evolutionary biology such as the nature of adaptive genetic variation and reproductive isolation. A growing body of work highlights that the merging of divergent genomes goes beyond the reshuffling of standing variation from related species and promotes mutations (Abbott et al., ). However, to what extent such genome instability generates evolutionary significant variation remains largely elusive. In this issue of Molecular Ecology, Dennenmoser et al. () report considerable dynamics of transposable elements (TEs) in a recent invasive fish species of hybrid origin (Cottus; Figure ). It adds to the recent examples from plants to support TE-specific genome variation following hybridization. Insights from early, as well as established, hybrids are largely coherent with increased TE activity, and this fish system thus represents an inspiring opportunity to further address the possible association between genome dynamics and "rapid evolution of hybrid species." This work based on genome (re)sequencing contrasts with prior transcriptomics or PCR-based studies of TEs and illustrates how unprecedented amount of information promises a better understanding of the multiple patterns of variation across eukaryotic genomes; provided that we get the better of methodological advances. As discussed here, unbiased assessment of TE variation from genome surveys indeed remains a challenge precluding firm conclusions to be reached about the evolutionary significance of TEs. Despite methodological and conceptual developments that appear necessary to unambiguously uncover the unexplored iceberg below the known tip, the role of coding genes vs. TEs in promoting adaptation and speciation might be clarified in a not so remote future. © 2017 John Wiley & Sons Ltd.
Kumar, Girish; Kocour, Martin; Kunal, Swaraj Priyaranjan
2016-05-01
In order to assess the DNA sequence variation and phylogenetic relationship among five tuna species (Auxis thazard, Euthynnus affinis, Katsuwonus pelamis, Thunnus tonggol, and T. albacares) out of all four tuna genera, partial sequences of the mitochondrial DNA (mtDNA) D-loop region were analyzed. The estimate of intra-specific sequence variation in studied species was low, ranging from 0.027 to 0.080 [Kimura's two parameter distance (K2P)], whereas values of inter-specific variation ranged from 0.049 to 0.491. The longtail tuna (T. tonggol) and yellowfin tuna (T. albacares) were found to share a close relationship (K2P = 0.049) while skipjack tuna (K. pelamis) was most divergent studied species. Phylogenetic analysis using Maximum-Likelihood (ML) and Neighbor-Joining (NJ) methods supported the monophyletic origin of Thunnus species. Similarly, phylogeny of Auxis and Euthynnus species substantiate the monophyly. However, results showed a distinct origin of K. pelamis from genus Thunnus as well as Auxis and Euthynnus. Thus, the mtDNA D-loop region sequence data supports the polyphyletic origin of tuna species.
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.
Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M
2015-10-01
The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Sampson, Juliana K; Sheth, Nihar U; Koparde, Vishal N; Scalora, Allison F; Serrano, Myrna G; Lee, Vladimir; Roberts, Catherine H; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H; Buck, Gregory A; Neale, Michael C; Toor, Amir A
2014-08-01
Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. © 2014 John Wiley & Sons Ltd.
2011-01-01
Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336
Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.
2011-02-01
Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.
Estimating nonrigid motion from inconsistent intensity with robust shape features
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Wenyang; Ruan, Dan, E-mail: druan@mednet.ucla.edu; Department of Radiation Oncology, University of California, Los Angeles, California 90095
2013-12-15
Purpose: To develop a nonrigid motion estimation method that is robust to heterogeneous intensity inconsistencies amongst the image pairs or image sequence. Methods: Intensity and contrast variations, as in dynamic contrast enhanced magnetic resonance imaging, present a considerable challenge to registration methods based on general discrepancy metrics. In this study, the authors propose and validate a novel method that is robust to such variations by utilizing shape features. The geometry of interest (GOI) is represented with a flexible zero level set, segmented via well-behaved regularized optimization. The optimization energy drives the zero level set to high image gradient regions, andmore » regularizes it with area and curvature priors. The resulting shape exhibits high consistency even in the presence of intensity or contrast variations. Subsequently, a multiscale nonrigid registration is performed to seek a regular deformation field that minimizes shape discrepancy in the vicinity of GOIs. Results: To establish the working principle, realistic 2D and 3D images were subject to simulated nonrigid motion and synthetic intensity variations, so as to enable quantitative evaluation of registration performance. The proposed method was benchmarked against three alternative registration approaches, specifically, optical flow, B-spline based mutual information, and multimodality demons. When intensity consistency was satisfied, all methods had comparable registration accuracy for the GOIs. When intensities among registration pairs were inconsistent, however, the proposed method yielded pronounced improvement in registration accuracy, with an approximate fivefold reduction in mean absolute error (MAE = 2.25 mm, SD = 0.98 mm), compared to optical flow (MAE = 9.23 mm, SD = 5.36 mm), B-spline based mutual information (MAE = 9.57 mm, SD = 8.74 mm) and mutimodality demons (MAE = 10.07 mm, SD = 4.03 mm). Applying the proposed method on a real MR image sequence also provided qualitatively appealing results, demonstrating good feasibility and applicability of the proposed method. Conclusions: The authors have developed a novel method to estimate the nonrigid motion of GOIs in the presence of spatial intensity and contrast variations, taking advantage of robust shape features. Quantitative analysis and qualitative evaluation demonstrated good promise of the proposed method. Further clinical assessment and validation is being performed.« less
Estimating nonrigid motion from inconsistent intensity with robust shape features.
Liu, Wenyang; Ruan, Dan
2013-12-01
To develop a nonrigid motion estimation method that is robust to heterogeneous intensity inconsistencies amongst the image pairs or image sequence. Intensity and contrast variations, as in dynamic contrast enhanced magnetic resonance imaging, present a considerable challenge to registration methods based on general discrepancy metrics. In this study, the authors propose and validate a novel method that is robust to such variations by utilizing shape features. The geometry of interest (GOI) is represented with a flexible zero level set, segmented via well-behaved regularized optimization. The optimization energy drives the zero level set to high image gradient regions, and regularizes it with area and curvature priors. The resulting shape exhibits high consistency even in the presence of intensity or contrast variations. Subsequently, a multiscale nonrigid registration is performed to seek a regular deformation field that minimizes shape discrepancy in the vicinity of GOIs. To establish the working principle, realistic 2D and 3D images were subject to simulated nonrigid motion and synthetic intensity variations, so as to enable quantitative evaluation of registration performance. The proposed method was benchmarked against three alternative registration approaches, specifically, optical flow, B-spline based mutual information, and multimodality demons. When intensity consistency was satisfied, all methods had comparable registration accuracy for the GOIs. When intensities among registration pairs were inconsistent, however, the proposed method yielded pronounced improvement in registration accuracy, with an approximate fivefold reduction in mean absolute error (MAE = 2.25 mm, SD = 0.98 mm), compared to optical flow (MAE = 9.23 mm, SD = 5.36 mm), B-spline based mutual information (MAE = 9.57 mm, SD = 8.74 mm) and mutimodality demons (MAE = 10.07 mm, SD = 4.03 mm). Applying the proposed method on a real MR image sequence also provided qualitatively appealing results, demonstrating good feasibility and applicability of the proposed method. The authors have developed a novel method to estimate the nonrigid motion of GOIs in the presence of spatial intensity and contrast variations, taking advantage of robust shape features. Quantitative analysis and qualitative evaluation demonstrated good promise of the proposed method. Further clinical assessment and validation is being performed.
Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin
2018-01-01
Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139
RSAT 2015: Regulatory Sequence Analysis Tools
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-01-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Potenza, L; Cafiero, M A; Camarda, A; La Salandra, G; Cucchiarini, L; Dachà, M
2009-10-01
In the present work mites previously identified as Dermanyssus gallinae De Geer (Acari, Mesostigmata) using morphological keys were investigated by molecular tools. The complete internal transcribed spacer 1 (ITS1), 5.8S ribosomal DNA, and ITS2 region of the ribosomal DNA from mites were amplified and sequenced to examine the level of sequence variations and to explore the feasibility of using this region in the identification of this mite. Conserved primers located at the 3'end of 18S and at the 5'start of 28S rRNA genes were used first, and amplified fragments were sequenced. Sequence analyses showed no variation in 5.8S and ITS2 region while slight intraspecific variations involving substitutions as well as deletions concentrated in the ITS1 region. Based on the sequence analyses a nested PCR of the ITS2 region followed by RFLP analyses has been set up in the attempt to provide a rapid molecular diagnostic tool of D. gallinae.
Mapping and phasing of structural variation in patient genomes using nanopore sequencing.
Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P
2017-11-06
Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.
Human structural variation: mechanisms of chromosome rearrangements
Weckselblatt, Brooke; Rudd, M. Katharine
2015-01-01
Chromosome structural variation (SV) is a normal part of variation in the human genome, but some classes of SV can cause neurodevelopmental disorders. Analysis of the DNA sequence at SV breakpoints can reveal mutational mechanisms and risk factors for chromosome rearrangement. Large-scale SV breakpoint studies have become possible recently owing to advances in next-generation sequencing (NGS) including whole-genome sequencing (WGS). These findings have shed light on complex forms of SV such as triplications, inverted duplications, insertional translocations, and chromothripsis. Sequence-level breakpoint data resolve SV structure and determine how genes are disrupted, fused, and/or misregulated by breakpoints. Recent improvements in breakpoint sequencing have also revealed non-allelic homologous recombination (NAHR) between paralogous long interspersed nuclear element (LINE) or human endogenous retrovirus (HERV) repeats as a cause of deletions, duplications, and translocations. This review covers the genomic organization of simple and complex constitutional SVs, as well as the molecular mechanisms of their formation. PMID:26209074
Zdanowski, Marek K; Weglenski, Piotr; Golik, Pawel; Sasin, Joanna M; Borsuk, Piotr; Zmuda, Magdalena J; Stankovic, Anna
2004-11-01
The total number of bacteria and culturable bacteria in Adélie penguin (Pygoscelis adeliae) guano was determined during 42 days of decomposition in a location adjacent to the rookery in Admiralty Bay, King George Island, Antarctica. Of the culturable bacteria, 72 randomly selected colonies were described using 49 morpho-physiological tests, 27 of which were subsequently considered significant in characterizing and differentiating the isolates. On the basis of the nucleotide sequence of a fragment of the 16S rRNA gene in each of 72 pure isolates, three major phylogenetic groups were identified, namely the Moraxellaceae/Pseudomonadaceae (29 isolates), the Flavobacteriaceae (14), and the Micrococcaceae (29). Grouping of the isolates on the basis of morpho-physiological tests (whether 49 or 27 parameters) showed similar results to those based on 16S rRNA gene sequences. Clusters were characterized by considerable intra-cluster variation in both 16S rRNA gene sequences and morpho-physiological responses. High diversity in abundance and morphometry of total bacterial communities during penguin guano decomposition was supported by image analysis of epifluorescence micrographs. The results indicate that the bacterial community in penguin guano is not only one of the richest in Antarctica, but is extremely diverse, both phylogenetically and morpho-physiologically.
Guo, Juanjuan; Fu, Xiaoliang; Liao, Huidan; Hu, Zhenyu; Long, Lingling; Yan, Weitao; Ding, Yanjun; Zha, Lagabaiyila; Guo, Yadong; Yan, Jie; Chang, Yunfeng; Cai, Jifeng
2016-01-01
Decomposition is a complex process involving the interaction of both biotic and abiotic factors. Microbes play a critical role in the process of carrion decomposition. In this study, we analysed bacterial communities from live rats and rat remains decomposed under natural conditions, or excluding sarcosaphagous insect interference, in China using Illumina MiSeq sequencing of 16S rRNA gene amplicons. A total of 1,394,842 high-quality sequences and 1,938 singleton operational taxonomic units were obtained. Bacterial communities showed notable variation in relative abundance and became more similar to each other across body sites during the decomposition process. As decomposition progressed, Proteobacteria (mostly Gammaproteobacteria) became the predominant phylum in both the buccal cavity and rectum, while Firmicutes and Bacteroidetes in the mouth and rectum, respectively, gradually decreased. In particular, the arrival and oviposition of sarcosaphagous insects had no obvious influence on bacterial taxa composition, but accelerated the loss of biomass. In contrast to the rectum, the microbial community structure in the buccal cavity of live rats differed considerably from that of rats immediately after death. Although this research indicates that bacterial communities can be used as a “microbial clock” for the estimation of post-mortem interval, further work is required to better understand this concept. PMID:27052375
Leoni, Gabriele; De Poli, Andrea; Mardirossian, Mario; Gambato, Stefano; Florian, Fiorella; Venier, Paola; Wilson, Daniel N; Tossi, Alessandro; Pallavicini, Alberto; Gerdol, Marco
2017-08-22
The application of high-throughput sequencing technologies to non-model organisms has brought new opportunities for the identification of bioactive peptides from genomes and transcriptomes. From this point of view, marine invertebrates represent a potentially rich, yet largely unexplored resource for de novo discovery due to their adaptation to diverse challenging habitats. Bioinformatics analyses of available genomic and transcriptomic data allowed us to identify myticalins, a novel family of antimicrobial peptides (AMPs) from the mussel Mytilus galloprovincialis , and a similar family of AMPs from Modiolus spp., named modiocalins. Their coding sequence encompasses two conserved N-terminal (signal peptide) and C-terminal (propeptide) regions and a hypervariable central cationic region corresponding to the mature peptide. Myticalins are taxonomically restricted to Mytiloida and they can be classified into four subfamilies. These AMPs are subject to considerable interindividual sequence variability and possibly to presence/absence variation. Functional assays performed on selected members of this family indicate a remarkable tissue-specific expression (in gills) and broad spectrum of activity against both Gram-positive and Gram-negative bacteria. Overall, we present the first linear AMPs ever described in marine mussels and confirm the great potential of bioinformatics tools for the de novo discovery of bioactive peptides in non-model organisms.
Read clouds uncover variation in complex regions of the human genome.
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-10-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Grievink, Liat Shavit; Penny, David; Hendy, Mike D; Holland, Barbara R
2009-01-01
Correction to Shavit Grievink L, Penny D, Hendy MD, Holland BR: LineageSpecificSeqgen: generating sequence data with lineage-specific variation in the proportion of variable sites. BMC Evol Biol 2008, 8(1):317.
BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
Wang, Junbai; Batmanov, Kirill
2015-01-01
Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972
Melendrez, Melanie C.; Lange, Rachel K.; Cohan, Frederick M.; Ward, David M.
2011-01-01
Previous research has shown that sequences of 16S rRNA genes and 16S-23S rRNA internal transcribed spacer regions may not have enough genetic resolution to define all ecologically distinct Synechococcus populations (ecotypes) inhabiting alkaline, siliceous hot spring microbial mats. To achieve higher molecular resolution, we studied sequence variation in three protein-encoding loci sampled by PCR from 60°C and 65°C sites in the Mushroom Spring mat (Yellowstone National Park, WY). Sequences were analyzed using the ecotype simulation (ES) and AdaptML algorithms to identify putative ecotypes. Between 4 and 14 times more putative ecotypes were predicted from variation in protein-encoding locus sequences than from variation in 16S rRNA and 16S-23S rRNA internal transcribed spacer sequences. The number of putative ecotypes predicted depended on the number of sequences sampled and the molecular resolution of the locus. Chao estimates of diversity indicated that few rare ecotypes were missed. Many ecotypes hypothesized by sequence analyses were different in their habitat specificities, suggesting different adaptations to temperature or other parameters that vary along the flow channel. PMID:21169433
Adaptable gene-specific dye bias correction for two-channel DNA microarrays.
Margaritis, Thanasis; Lijnzaad, Philip; van Leenen, Dik; Bouwmeester, Diane; Kemmeren, Patrick; van Hooff, Sander R; Holstege, Frank C P
2009-01-01
DNA microarray technology is a powerful tool for monitoring gene expression or for finding the location of DNA-bound proteins. DNA microarrays can suffer from gene-specific dye bias (GSDB), causing some probes to be affected more by the dye than by the sample. This results in large measurement errors, which vary considerably for different probes and also across different hybridizations. GSDB is not corrected by conventional normalization and has been difficult to address systematically because of its variance. We show that GSDB is influenced by label incorporation efficiency, explaining the variation of GSDB across different hybridizations. A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations. GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip. A sequence-based model is also presented, which predicts which probes will suffer most from GSDB, useful for microarray probe design and correction of individual hybridizations. Software implementing the method is publicly available.
Adaptable gene-specific dye bias correction for two-channel DNA microarrays
Margaritis, Thanasis; Lijnzaad, Philip; van Leenen, Dik; Bouwmeester, Diane; Kemmeren, Patrick; van Hooff, Sander R; Holstege, Frank CP
2009-01-01
DNA microarray technology is a powerful tool for monitoring gene expression or for finding the location of DNA-bound proteins. DNA microarrays can suffer from gene-specific dye bias (GSDB), causing some probes to be affected more by the dye than by the sample. This results in large measurement errors, which vary considerably for different probes and also across different hybridizations. GSDB is not corrected by conventional normalization and has been difficult to address systematically because of its variance. We show that GSDB is influenced by label incorporation efficiency, explaining the variation of GSDB across different hybridizations. A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations. GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip. A sequence-based model is also presented, which predicts which probes will suffer most from GSDB, useful for microarray probe design and correction of individual hybridizations. Software implementing the method is publicly available. PMID:19401678
Scrutinizing MHC-I binding peptides and their limits of variation.
Koch, Christian P; Perna, Anna M; Pillong, Max; Todoroff, Nickolay K; Wrede, Paul; Folkers, Gerd; Hiss, Jan A; Schneider, Gisbert
2013-01-01
Designed peptides that bind to major histocompatibility protein I (MHC-I) allomorphs bear the promise of representing epitopes that stimulate a desired immune response. A rigorous bioinformatical exploration of sequence patterns hidden in peptides that bind to the mouse MHC-I allomorph H-2K(b) is presented. We exemplify and validate these motif findings by systematically dissecting the epitope SIINFEKL and analyzing the resulting fragments for their binding potential to H-2K(b) in a thermal denaturation assay. The results demonstrate that only fragments exclusively retaining the carboxy- or amino-terminus of the reference peptide exhibit significant binding potential, with the N-terminal pentapeptide SIINF as shortest ligand. This study demonstrates that sophisticated machine-learning algorithms excel at extracting fine-grained patterns from peptide sequence data and predicting MHC-I binding peptides, thereby considerably extending existing linear prediction models and providing a fresh view on the computer-based molecular design of future synthetic vaccines. The server for prediction is available at http://modlab-cadd.ethz.ch (SLiDER tool, MHC-I version 2012).
Zhou, Y; Ingelman-Sundberg, M; Lauschke, V M
2017-10-01
Genetic polymorphisms in cytochrome P450 (CYP) genes can result in altered metabolic activity toward a plethora of clinically important medications. Thus, single nucleotide variants and copy number variations in CYP genes are major determinants of drug pharmacokinetics and toxicity and constitute pharmacogenetic biomarkers for drug dosing, efficacy, and safety. Strikingly, the distribution of CYP alleles differs considerably between populations with important implications for personalized drug therapy and healthcare programs. To provide a global distribution map of CYP alleles with clinical importance, we integrated whole-genome and exome sequencing data from 56,945 unrelated individuals of five major human populations. By combining this dataset with population-specific linkage information, we derive the frequencies of 176 CYP haplotypes, providing an extensive resource for major genetic determinants of drug metabolism. Furthermore, we aggregated this dataset into spectra of predicted functional variability in the respective populations and discuss the implications for population-adjusted pharmacological treatment strategies. © 2017 The Authors Clinical Pharmacology & Therapeutics published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erlich, H.; Zangenberg, G.; Bugawan, T.
The rate at which allelic diversity at the HLA class I and class II loci evolves has been the subject of considerable controversy as have the mechanisms which generate new alleles. The patchwork pattern of polymorphism, particularly within the second exon of the HLA-DPB1 locus where the polymorphic sequence motifs are localized to 6 discrete regions, is consistent with the hypothesis that much of the allelic sequence variation may have been generated by segmental exchange (gene conversion). To measure the rate of new DPB1 variant generation, we have developed a strategy in which DPB1 second exon sequences are amplified frommore » pools of FACS-sorted sperm (n=50) from a heterozygous sperm donor. Pools of sperm from these heterozygous individuals are amplified with an allele-specific primer for one allele and analyzed with sequence-specific oligonucleotide probes (SSOP) complementary to the other allele. This screening procedure, which is capable of detecting a single variant molecule in a pool of parental alleles, allows the identification of new variants that have been generated by recombination and/or gene conversion between the two parental alleles. To control for potential PCR artifacts, the same screening procedure was carried out with mixtures of sperm from DPB1 *0301/*0301 and DPB1 *0401/ 0401 individuals. Pools containing putative new variants DPB1 alleles were analyzed further by cloning into M13 and sequencing the M13 clones. Our current estimate is that about 1/10,000 sperm from these heterozygous individuals represents a new DPB1 allele generated by micro-gene conversion within the second exon.« less
Structural and Thermodynamic Signatures of DNA Recognition by Mycobacterium tuberculosis DnaA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsodikov, Oleg V.; Biswas, Tapan
An essential protein, DnaA, binds to 9-bp DNA sites within the origin of replication oriC. These binding events are prerequisite to forming an enigmatic nucleoprotein scaffold that initiates replication. The number, sequences, positions, and orientations of these short DNA sites, or DnaA boxes, within the oriCs of different bacteria vary considerably. To investigate features of DnaA boxes that are important for binding Mycobacterium tuberculosis DnaA (MtDnaA), we have determined the crystal structures of the DNA binding domain (DBD) of MtDnaA bound to a cognate MtDnaA-box (at 2.0 {angstrom} resolution) and to a consensus Escherichia coli DnaA-box (at 2.3 {angstrom}). Thesemore » structures, complemented by calorimetric equilibrium binding studies of MtDnaA DBD in a series of DnaA-box variants, reveal the main determinants of DNA recognition and establish the [T/C][T/A][G/A]TCCACA sequence as a high-affinity MtDnaA-box. Bioinformatic and calorimetric analyses indicate that DnaA-box sequences in mycobacterial oriCs generally differ from the optimal binding sequence. This sequence variation occurs commonly at the first 2 bp, making an in vivo mycobacterial DnaA-box effectively a 7-mer and not a 9-mer. We demonstrate that the decrease in the affinity of these MtDnaA-box variants for MtDnaA DBD relative to that of the highest-affinity box TTGTCCACA is less than 10-fold. The understanding of DnaA-box recognition by MtDnaA and E. coli DnaA enables one to map DnaA-box sequences in the genomes of M. tuberculosis and other eubacteria.« less
In Silico Detection of Sequence Variations Modifying Transcriptional Regulation
Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob
2008-01-01
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319
Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.
1996-01-01
The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850
Osborne, Megan J; Turner, Thomas F
2011-06-01
The major histocompatibility complex (MHC) is a critical component of the adaptive immune response in vertebrates. Due to the role that MHC plays in immunity, absence of variation within these genes may cause species to be vulnerable to emerging diseases. The freshwater fish family Cyprinidae comprises the most diverse and species-rich group of freshwater fish in the world, but some are imperiled. Despite considerable species richness and the long evolutionary history of the family, there are very few reports of MHC sequences (apart from a few model species), and no sequences are reported from endemic North American cyprinids (subfamily Leuciscinae). Here we isolate and characterize the MH Class II beta genes from complementary DNA and genomic DNA of the non-model, endangered Rio Grande silvery minnow (Hybognathus amarus), a North American cyprinid. Phylogenetic reconstruction revealed two groups of divergent MH alleles that are paralogous to previously described loci found in deeply divergent cyprinid taxa including common carp, zebrafish, African large barb and bream. Both groups of alleles were under the influence of diversifying selection yet not all individuals had alleles belonging to both allelic groups. We concluded that the general organization and pattern of variation of MH class II genes in Rio Grande silvery minnow is similar to that identified in other cyprinid fishes studied to date, despite distant evolutionary relationships and evidence of a severe genetic bottleneck. Copyright © 2011 Elsevier Ltd. All rights reserved.
Development and evaluation of a multi-locus sequence typing scheme for Mycoplasma synoviae.
Dijkman, R; Feberwee, A; Landman, W J M
2016-08-01
Reproducible molecular Mycoplasma synoviae typing techniques with sufficient discriminatory power may help to expand knowledge on its epidemiology and contribute to the improvement of control and eradication programmes of this mycoplasma species. The present study describes the development and validation of a novel multi-locus sequence typing (MLST) scheme for M. synoviae. Thirteen M. synoviae isolates originating from different poultry categories, farms and lesions, were subjected to whole genome sequencing. Their sequences were compared to that of M. synoviae reference strain MS53. A high number of single nucleotide polymorphisms (SNPs) indicating considerable genetic diversity were identified. SNPs were present in over 40 putative target genes for MLST of which five target genes were selected (nanA, uvrA, lepA, ruvB and ugpA) for the MLST scheme. This scheme was evaluated analysing 209 M. synoviae samples from different countries, categories of poultry, farms and lesions. Eleven clonal clusters and 76 different sequence types (STs) were obtained. Clustering occurred following geographical origin, supporting the hypothesis of regional population evolution. M. synoviae samples obtained from epidemiologically linked outbreaks often harboured the same ST. In contrast, multiple M. synoviae lineages were found in samples originating from swollen joints or oviducts from hens that produce eggs with eggshell apex abnormalities indicating that further research is needed to identify the genetic factors of M. synoviae that may explain its variations in tissue tropism and disease inducing potential. Furthermore, MLST proved to have a higher discriminatory power compared to variable lipoprotein and haemagglutinin A typing, which generated 50 different genotypes on the same database.
Silberhorn, Elisabeth; Schwartz, Uwe; Symelka, Anne; de Koning-Ward, Tania; Längst, Gernot
2016-01-01
The packaging and organization of genomic DNA into chromatin represents an additional regulatory layer of gene expression, with specific nucleosome positions that restrict the accessibility of regulatory DNA elements. The mechanisms that position nucleosomes in vivo are thought to depend on the biophysical properties of the histones, sequence patterns, like phased di-nucleotide repeats and the architecture of the histone octamer that folds DNA in 1.65 tight turns. Comparative studies of human and P. falciparum histones reveal that the latter have a strongly reduced ability to recognize internal sequence dependent nucleosome positioning signals. In contrast, the nucleosomes are positioned by AT-repeat sequences flanking nucleosomes in vivo and in vitro. Further, the strong sequence variations in the plasmodium histones, compared to other mammalian histones, do not present adaptations to its AT-rich genome. Human and parasite histones bind with higher affinity to GC-rich DNA and with lower affinity to AT-rich DNA. However, the plasmodium nucleosomes are overall less stable, with increased temperature induced mobility, decreased salt stability of the histones H2A and H2B and considerable reduced binding affinity to GC-rich DNA, as compared with the human nucleosomes. In addition, we show that plasmodium histone octamers form the shortest known nucleosome repeat length (155bp) in vitro and in vivo. Our data suggest that the biochemical properties of the parasite histones are distinct from the typical characteristics of other eukaryotic histones and these properties reflect the increased accessibility of the P. falciparum genome. PMID:28033404
Mitochondrial DNA Sequence Variation in North Atlantic Long-Finned Pilot Whales, Globicephala melas
1994-06-01
Delphinapterus leucas : mitochondrial DNA sequence variation within and among North American populations. M.Sc. thesis. McMaster University. Brown, G.G...Delphinapteras leucas ) (Brennin 1992), minke whales {Balaenoptera acutorostratd) (Wada et al. 1991), bottlenose dolphins {Tursiops truncatus) (Dowling & Brown
Widespread Transient Hoogsteen Base-Pairs in Canonical Duplex DNA with Variable Energetics
Alvey, Heidi S.; Gottardo, Federico L.; Nikolova, Evgenia N.; Al-Hashimi, Hashim M.
2015-01-01
Hoogsteen base-pairing involves a 180 degree rotation of the purine base relative to Watson-Crick base-pairing within DNA duplexes, creating alternative DNA conformations that can play roles in recognition, damage induction, and replication. Here, using Nuclear Magnetic Resonance R1ρ relaxation dispersion, we show that transient Hoogsteen base-pairs occur across more diverse sequence and positional contexts than previously anticipated. We observe sequence-specific variations in Hoogsteen base-pair energetic stabilities that are comparable to variations in Watson-Crick base-pair stability, with Hoogsteen base-pairs being more abundant for energetically less favorable Watson-Crick base-pairs. Our results suggest that the variations in Hoogsteen stabilities and rates of formation are dominated by variations in Watson-Crick base pair stability, suggesting a late transition state for the Watson-Crick to Hoogsteen conformational switch. The occurrence of sequence and position-dependent Hoogsteen base-pairs provide a new potential mechanism for achieving sequence-dependent DNA transactions. PMID:25185517
CNV-seq, a new method to detect copy number variation using high-throughput sequencing.
Xie, Chao; Tammi, Martti T
2009-03-06
DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.
Molecular mechanisms of epigenetic variation in plants.
Fujimoto, Ryo; Sasaki, Taku; Ishikawa, Ryo; Osabe, Kenji; Kawanabe, Takahiro; Dennis, Elizabeth S
2012-01-01
Natural variation is defined as the phenotypic variation caused by spontaneous mutations. In general, mutations are associated with changes of nucleotide sequence, and many mutations in genes that can cause changes in plant development have been identified. Epigenetic change, which does not involve alteration to the nucleotide sequence, can also cause changes in gene activity by changing the structure of chromatin through DNA methylation or histone modifications. Now there is evidence based on induced or spontaneous mutants that epigenetic changes can cause altering plant phenotypes. Epigenetic changes have occurred frequently in plants, and some are heritable or metastable causing variation in epigenetic status within or between species. Therefore, heritable epigenetic variation as well as genetic variation has the potential to drive natural variation.
The study of human Y chromosome variation through ancient DNA.
Kivisild, Toomas
2017-05-01
High throughput sequencing methods have completely transformed the study of human Y chromosome variation by offering a genome-scale view on genetic variation retrieved from ancient human remains in context of a growing number of high coverage whole Y chromosome sequence data from living populations from across the world. The ancient Y chromosome sequences are providing us the first exciting glimpses into the past variation of male-specific compartment of the genome and the opportunity to evaluate models based on previously made inferences from patterns of genetic variation in living populations. Analyses of the ancient Y chromosome sequences are challenging not only because of issues generally related to ancient DNA work, such as DNA damage-induced mutations and low content of endogenous DNA in most human remains, but also because of specific properties of the Y chromosome, such as its highly repetitive nature and high homology with the X chromosome. Shotgun sequencing of uniquely mapping regions of the Y chromosomes to sufficiently high coverage is still challenging and costly in poorly preserved samples. To increase the coverage of specific target SNPs capture-based methods have been developed and used in recent years to generate Y chromosome sequence data from hundreds of prehistoric skeletal remains. Besides the prospects of testing directly as how much genetic change in a given time period has accompanied changes in material culture the sequencing of ancient Y chromosomes allows us also to better understand the rate at which mutations accumulate and get fixed over time. This review considers genome-scale evidence on ancient Y chromosome diversity that has recently started to accumulate in geographic areas favourable to DNA preservation. More specifically the review focuses on examples of regional continuity and change of the Y chromosome haplogroups in North Eurasia and in the New World.
Detection of nucleic acid sequences by invader-directed cleavage
Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert
1999-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.
Evolution by epigenesis: farewell to Darwinism, neo- and otherwise.
Balon, Eugene K
2004-01-01
In the last 25 years, criticism of most theories advanced by Darwin and the neo-Darwinians has increased considerably, and so did their defense. Darwinism has become an ideology, while the most significant theories of Darwin were proven unsupportable. The critics advanced other theories instead of 'natural selection' and the survival of the fittest'. 'Saltatory ontogeny' and 'epigenesis' are such new theories proposed to explain how variations in ontogeny and novelties in evolution are created. They are reviewed again in the present essay that also tries to explain how Darwinians, artificially kept dominant in academia and in granting agencies, are preventing their acceptance. Epigenesis, the mechanism of ontogenies, creates in every generation alternative variations in a saltatory way that enable the organisms to survive in the changing environments as either altricial or precocial forms. The constant production of two such forms and their survival in different environments makes it possible, over a sequence of generations, to introduce changes and establish novelties--the true phenomena of evolution. The saltatory units of evolution remain far-from-stable structures capable of self-organization and self-maintenance (autopoiesis).
Africa: continent of genome contrasts with implications for biomedical research and health.
Ramsay, Michèle
2012-08-31
The genomic architecture of African populations is poorly understood and there is considerable variation between ethno-linguistic groups. Genome-wide approaches have been extensively applied to search for genetic associations to complex traits in Europeans, but rarely in Africans. This is largely attributed to lower levels of funding, poor infrastructure and public health systems, and to the small pool of trained scientists. High levels of genetic variation and underlying population structure in Africans present significant challenges, but lower levels of linkage disequilibrium provide an opportunity for more effective localisation of causal variants. High throughput technologies, including dense genotyping arrays, genome sequencing and epigenome studies, together with plummeting costs, are making research more affordable, even for African scientists. Understanding the interactions between genome structure and environmental influences is essential to interpreting their contributions to the increase in infectious diseases and non-communicable diseases, exacerbated by adverse environments and lifestyle choices. The unique genome dynamics in African populations have an important role to play in understanding human health and susceptibility to disease. Copyright © 2012. Published by Elsevier B.V.
Al-Bustan, Suzanne A; Al-Serri, Ahmad; Annice, Babitha G; Alnaqeeb, Majed A; Al-Kandari, Wafa Y; Dashti, Mohammed
2018-01-01
The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel "rare" variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004-0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001-0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia.
Al-Serri, Ahmad; Annice, Babitha G.; Alnaqeeb, Majed A.; Al-Kandari, Wafa Y.; Dashti, Mohammed
2018-01-01
The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel “rare” variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004–0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001–0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia. PMID:29438437
Demidov, German; Simakova, Tamara; Vnuchkova, Julia; Bragin, Anton
2016-10-22
Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool. We have developed a machine learning algorithm for the detection of large duplications and deletions in the targeted sequencing data generated with PCR-based enrichment step. We have performed verification studies and established the algorithm's sensitivity and specificity. We have compared developed tool with other available methods applicable for the described data and revealed its higher performance. We showed that our method has high specificity and sensitivity for high-resolution copy number detection in targeted sequencing data using large cohort of samples.
Diversity and adaptive evolution of Saccharomyces wine yeast: a review
Marsit, Souhir; Dequin, Sylvie
2015-01-01
Saccharomyces cerevisiae and related species, the main workhorses of wine fermentation, have been exposed to stressful conditions for millennia, potentially resulting in adaptive differentiation. As a result, wine yeasts have recently attracted considerable interest for studying the evolutionary effects of domestication. The widespread use of whole-genome sequencing during the last decade has provided new insights into the biodiversity, population structure, phylogeography and evolutionary history of wine yeasts. Comparisons between S. cerevisiae isolates from various origins have indicated that a variety of mechanisms, including heterozygosity, nucleotide and structural variations, introgressions, horizontal gene transfer and hybridization, contribute to the genetic and phenotypic diversity of S. cerevisiae. This review will summarize the current knowledge on the diversity and evolutionary history of wine yeasts, focusing on the domestication fingerprints identified in these strains. PMID:26205244
Characterisation of the subtelomeric regions of Giardia lamblia genome isolate WBC6.
Prabhu, Anjali; Morrison, Hilary G; Martinez, Charles R; Adam, Rodney D
2007-04-01
Giardia trophozoites are polyploid and have five chromosomes. The chromosome homologues demonstrate considerable size heterogeneity due to variation in the subtelomeric regions. We used clones from the genome project with telomeric sequence at one end to identify six subtelomeric regions in addition to previously identified subtelomeric regions, to study the telomeric arrangement of the chromosomes. The subtelomeric regions included two retroposons, one retroposon pseudogene, and two vsp genes, in addition to the previously identified subtelomeric regions that include ribosomal DNA repeats. The presence of vsp genes in a subtelomeric region suggests that telomeric rearrangements may contribute to the generation of vsp diversity. These studies of the subtelomeric regions of Giardia may contribute to our understanding of the factors that maintain stability, while allowing diversity in chromosome structure.
Relations between broad-band linear polarization and Ca II H and K emission in late-type dwarf stars
NASA Technical Reports Server (NTRS)
Huovelin, Juhani; Saar, Steven H.; Tuominen, Ilkka
1988-01-01
Broadband UBV linear polarization data acquired for a sample of late-type dwarfs are compared with contemporaneous measurements of Ca II H and K line core emission. A weighted average of the largest values of the polarization degree is shown to be the best parameter for chromospheric activity diagnosis. The average maximum polarization in the UV is found to increase from late-F to late-G stars. It is noted that polarization in the U band is considerably more sensitive to activity variations than that in the B or V bands. The results indicate that stellar magnetic fields and the resulting saturation in the Zeeman-sensitive absorption lines are the most probably source of linear polarization in late-type main-sequence stars.
Foster, Charles S P; Henwood, Murray J; Ho, Simon Y W
2018-05-25
Data sets comprising small numbers of genetic markers are not always able to resolve phylogenetic relationships. This has frequently been the case in molecular systematic studies of plants, with many analyses being based on sequence data from only two or three chloroplast genes. An example of this comes from the riceflowers Pimelea Banks & Sol. ex Gaertn. (Thymelaeaceae), a large genus of flowering plants predominantly distributed in Australia. Despite the considerable morphological variation in the genus, low sequence divergence in chloroplast markers has led to the phylogeny of Pimelea remaining largely uncertain. In this study, we resolve the backbone of the phylogeny of Pimelea in comprehensive Bayesian and maximum-likelihood analyses of plastome sequences from 41 taxa. However, some relationships received only moderate to poor support, and the Pimelea clade contained extremely short internal branches. By using topology-clustering analyses, we demonstrate that conflicting phylogenetic signals can be found across the trees estimated from individual chloroplast protein-coding genes. A relaxed-clock dating analysis reveals that Pimelea arose in the mid-Miocene, with most divergences within the genus occurring during a subsequent rapid diversification. Our new phylogenetic estimate offers better resolution and is more strongly supported than previous estimates, providing a platform for future taxonomic revisions of both Pimelea and the broader subfamily. Our study has demonstrated the substantial improvements in phylogenetic resolution that can be achieved using plastome-scale data sets in plant molecular systematics. Copyright © 2018 Elsevier Inc. All rights reserved.
Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B
2015-07-01
The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.
Zhu, X Q; Gasser, R B
1998-06-01
In this study, we assessed single-strand conformation polymorphism (SSCP)-based approaches for their capacity to fingerprint sequence variation in ribosomal DNA (rDNA) of ascaridoid nematodes of veterinary and/or human health significance. The second internal transcribed spacer region (ITS-2) of rDNA was utilised as the target region because it is known to provide species-specific markers for this group of parasites. ITS-2 was amplified by PCR from genomic DNA derived from individual parasites and subjected to analysis. Direct SSCP analysis of amplicons from seven taxa (Toxocara vitulorum, Toxocara cati, Toxocara canis, Toxascaris leonina, Baylisascaris procyonis, Ascaris suum and Parascaris equorum) showed that the single-strand (ss) ITS-2 patterns produced allowed their unequivocal identification to species. While no variation in SSCP patterns was detected in the ITS-2 within four species for which multiple samples were available, the method allowed the direct display of four distinct sequence types of ITS-2 among individual worms of T. cati. Comparison of SSCP/sequencing with the methods of dideoxy fingerprinting (ddF) and restriction endonuclease fingerprinting (REF) revealed that also ddF allowed the definition of the four sequence types, whereas REF displayed three of four. The findings indicate the usefulness of the SSCP-based approaches for the identification of ascaridoid nematodes to species, the direct display of sequence variation in rDNA and the detection of population variation. The ability to fingerprint microheterogeneity in ITS-2 rDNA using such approaches also has implications for studying fundamental aspects relating to mutational change in rDNA.
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...
A high-resolution cattle CNV map by population-scale genome sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. Prior studies in cattle have produced low-resolution CNV maps. We constructed a draft, high-resolution map of cattle CNVs based on whole genome sequencing data from 7...
Maize HapMap2 identifies extant variation from a genome in flux
USDA-ARS?s Scientific Manuscript database
The maize genome is the largest, most diverse and complex plant genome sequenced to date. Using high-throughput sequencing to access genetic variation and a population genetics model to score the polymorphisms, we characterize and unite the diversity of the world’s key breeding germplasm, wild rela...
RSAT 2015: Regulatory Sequence Analysis Tools.
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-07-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Benz, Matthias R; Bongartz, Georg; Froehlich, Johannes M; Winkel, David; Boll, Daniel T; Heye, Tobias
2018-07-01
The aim was to investigate the variation of the arterial input function (AIF) within and between various DCE MRI sequences. A dynamic flow-phantom and steady signal reference were scanned on a 3T MRI using fast low angle shot (FLASH) 2d, FLASH3d (parallel imaging factor (P) = P0, P2, P4), volumetric interpolated breath-hold examination (VIBE) (P = P0, P3, P2 × 2, P2 × 3, P3 × 2), golden-angle radial sparse parallel imaging (GRASP), and time-resolved imaging with stochastic trajectories (TWIST). Signal over time curves were normalized and quantitatively analyzed by full width half maximum (FWHM) measurements to assess variation within and between sequences. The coefficient of variation (CV) for the steady signal reference ranged from 0.07-0.8%. The non-accelerated gradient echo FLASH2d, FLASH3d, and VIBE sequences showed low within sequence variation with 2.1%, 1.0%, and 1.6%. The maximum FWHM CV was 3.2% for parallel imaging acceleration (VIBE P2 × 3), 2.7% for GRASP and 9.1% for TWIST. The FWHM CV between sequences ranged from 8.5-14.4% for most non-accelerated/accelerated gradient echo sequences except 6.2% for FLASH3d P0 and 0.3% for FLASH3d P2; GRASP FWHM CV was 9.9% versus 28% for TWIST. MRI acceleration techniques vary in reproducibility and quantification of the AIF. Incomplete coverage of the k-space with TWIST as a representative of view-sharing techniques showed the highest variation within sequences and might be less suited for reproducible quantification of the AIF. Copyright © 2018 Elsevier B.V. All rights reserved.
Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A
2017-04-01
Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.
Oliveros, R; Cutillas, C; De Rojas, M; Arias, P
2000-12-01
Adult worms of Trichuris ovis and T. globulosa were collected from Ovis aries (sheep) and Capra hircus (goats). T. suis was isolated from Sus scrofa domestica (swine) and T. leporis was isolated from Lepus europaeus (rabbits) in Spain. Genomic DNA was isolated and a ribosomal internal transcribed spacer (ITS2) was amplified and sequenced using polymerase-chain-reaction (PCR) techniques. The ITS2 of T. ovis and T. globulosa was 407 nucleotides in length and had a GC content of about 62%. Furthermore, the ITS2 of T. suis and T. leporis was 534 and 418 nucleotides in length and had a GC content of about 64.8% and 62.4%, respectively. There was evidence of slight variation in the sequence within individuals of all species analyzed, indicating intraindividual variation in the sequence of different copies of the ribosomal DNA. Furthermore, low-level intraspecific variation was detected. Sequence analyses of ITS2 products of T. ovis and T. globulosa demonstrated no sequence difference between them. Nevertheless, differences were detected between the ITS2 sequences of T. suis, T. leporis, and T. ovis, indicating that Trichuris species can reliably be differentiated by their ITS2 sequences and PCR-linked restriction-fragment-length polymorphism (RFLP).
A global reference for human genetic variation
2016-01-01
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. PMID:26432245
Kristensen, Anne F; Kristensen, Søren R; Falkmer, Ursula; Münster, Anna-Marie B; Pedersen, Shona
2018-05-01
The Calibrated Automated Thrombography (CAT) is an in vitro thrombin generation (TG) assay that holds promise as a valuable tool within clinical diagnostics. However, the technique has a considerable analytical variation, and we therefore, investigated the analytical and between-subject variation of CAT systematically. Moreover, we assess the application of an internal standard for normalization to diminish variation. 20 healthy volunteers donated one blood sample which was subsequently centrifuged, aliquoted and stored at -80 °C prior to analysis. The analytical variation was determined on eight runs, where plasma from the same seven volunteers was processed in triplicates, and for the between-subject variation, TG analysis was performed on plasma from all 20 volunteers. The trigger reagents used for the TG assays included both PPP reagent containing 5 pM tissue factor (TF) and PPPlow with 1 pM TF. Plasma, drawn from a single donor, was applied to all plates as an internal standard for each TG analysis, which subsequently was used for normalization. The total analytical variation for TG analysis performed with PPPlow reagent is 3-14% and 9-13% for PPP reagent. This variation can be minimally reduced by using an internal standard but mainly for ETP (endogenous thrombin potential). The between-subject variation is higher when using PPPlow than PPP and this variation is considerable higher than the analytical variation. TG has a rather high inherent analytical variation but considerable lower than the between-subject variation when using PPPlow as reagent.
Castro-Prieto, Aines; Wachter, Bettina; Melzheimer, Joerg; Thalwitzer, Susanne; Sommer, Simone
2011-01-01
The genes of the major histocompatibility complex (MHC) are a key component of the mammalian immune system and have become important molecular markers for fitness-related genetic variation in wildlife populations. Currently, no information about the MHC sequence variation and constitution in African leopards exists. In this study, we isolated and characterized genetic variation at the adaptively most important region of MHC class I and MHC class II-DRB genes in 25 free-ranging African leopards from Namibia and investigated the mechanisms that generate and maintain MHC polymorphism in the species. Using single-stranded conformation polymorphism analysis and direct sequencing, we detected 6 MHC class I and 6 MHC class II-DRB sequences, which likely correspond to at least 3 MHC class I and 3 MHC class II-DRB loci. Amino acid sequence variation in both MHC classes was higher or similar in comparison to other reported felids. We found signatures of positive selection shaping the diversity of MHC class I and MHC class II-DRB loci during the evolutionary history of the species. A comparison of MHC class I and MHC class II-DRB sequences of the leopard to those of other felids revealed a trans-species mode of evolution. In addition, the evolutionary relationships of MHC class II-DRB sequences between African and Asian leopard subspecies are discussed.
Parallel gene analysis with allele-specific padlock probes and tag microarrays
Banér, Johan; Isaksson, Anders; Waldenström, Erik; Jarvius, Jonas; Landegren, Ulf; Nilsson, Mats
2003-01-01
Parallel, highly specific analysis methods are required to take advantage of the extensive information about DNA sequence variation and of expressed sequences. We present a scalable laboratory technique suitable to analyze numerous target sequences in multiplexed assays. Sets of padlock probes were applied to analyze single nucleotide variation directly in total genomic DNA or cDNA for parallel genotyping or gene expression analysis. All reacted probes were then co-amplified and identified by hybridization to a standard tag oligonucleotide array. The technique was illustrated by analyzing normal and pathogenic variation within the Wilson disease-related ATP7B gene, both at the level of DNA and RNA, using allele-specific padlock probes. PMID:12930977
2014-01-01
Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these isolates, although horizontal transfer may generate class II pilin diversity. This study supports the suggestion that high frequency antigenic variation of pilin is not universal in pathogenic Neisseria. PMID:24690385
Somatic Genetic Variation in Solid Pseudopapillary Tumor of the Pancreas by Whole Exome Sequencing
Guo, Meng; Luo, Guopei; Jin, Kaizhou; Long, Jiang; Cheng, He; Lu, Yu; Wang, Zhengshi; Yang, Chao; Xu, Jin; Ni, Quanxing; Yu, Xianjun; Liu, Chen
2017-01-01
Solid pseudopapillary tumor of the pancreas (SPT) is a rare pancreatic disease with a unique clinical manifestation. Although CTNNB1 gene mutations had been universally reported, genetic variation profiles of SPT are largely unidentified. We conducted whole exome sequencing in nine SPT patients to probe the SPT-specific insertions and deletions (indels) and single nucleotide polymorphisms (SNPs). In total, 54 SNPs and 41 indels of prominent variations were demonstrated through parallel exome sequencing. We detected that CTNNB1 mutations presented throughout all patients studied (100%), and a higher count of SNPs was particularly detected in patients with older age, larger tumor, and metastatic disease. By aggregating 95 detected variation events and viewing the interconnections among each of the genes with variations, CTNNB1 was identified as the core portion in the network, which might collaborate with other events such as variations of USP9X, EP400, HTT, MED12, and PKD1 to regulate tumorigenesis. Pathway analysis showed that the events involved in other cancers had the potential to influence the progression of the SNPs count. Our study revealed an insight into the variation of the gene encoding region underlying solid-pseudopapillary neoplasm tumorigenesis. The detection of these variations might partly reflect the potential molecular mechanism. PMID:28054945
Thermal and acid tolerant beta-xylosidases, genes encoding, related organisms, and methods
Thompson, David N [Idaho Falls, ID; Thompson, Vicki S [Idaho Falls, ID; Schaller, Kastli D [Ammon, ID; Apel, William A [Jackson, WY; Lacey, Jeffrey A [Idaho Falls, ID; Reed, David W [Idaho Falls, ID
2011-04-12
Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof are provided. Further provided are methods of at least partially degrading xylotriose and/or xylobiose using isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof.
USDA-ARS?s Scientific Manuscript database
Little is known about genetic variation of Lymantria dispar multiple nucleopolyhedrovirus (LdMNPV; Baculoviridae: Alphabaculovirus) at the nucleotide sequence level. To obtain a more comprehensive view of genetic diversity among isolates of LdMNPV, partial sequences of the lef-8 gene were generated...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, Sean
2013-03-01
Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.
Sequence variation of the feline immunodeficiency virus genome and its clinical relevance.
Stickney, A L; Dunowska, M; Cave, N J
2013-06-08
The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.
Selection of a DNA barcode for Nectriaceae from fungal whole-genomes.
Zeng, Zhaoqing; Zhao, Peng; Luo, Jing; Zhuang, Wenying; Yu, Zhihe
2012-01-01
A DNA barcode is a short segment of sequence that is able to distinguish species. A barcode must ideally contain enough variation to distinguish every individual species and be easily obtained. Fungi of Nectriaceae are economically important and show high species diversity. To establish a standard DNA barcode for this group of fungi, the genomes of Neurospora crassa and 30 other filamentous fungi were compared. The expect value was treated as a criterion to recognize homologous sequences. Four candidate markers, Hsp90, AAC, CDC48, and EF3, were tested for their feasibility as barcodes in the identification of 34 well-established species belonging to 13 genera of Nectriaceae. Two hundred and fifteen sequences were analyzed. Intra- and inter-specific variations and the success rate of PCR amplification and sequencing were considered as important criteria for estimation of the candidate markers. Ultimately, the partial EF3 gene met the requirements for a good DNA barcode: No overlap was found between the intra- and inter-specific pairwise distances. The smallest inter-specific distance of EF3 gene was 3.19%, while the largest intra-specific distance was 1.79%. In addition, there was a high success rate in PCR and sequencing for this gene (96.3%). CDC48 showed sufficiently high sequence variation among species, but the PCR and sequencing success rate was 84% using a single pair of primers. Although the Hsp90 and AAC genes had higher PCR and sequencing success rates (96.3% and 97.5%, respectively), overlapping occurred between the intra- and inter-specific variations, which could lead to misidentification. Therefore, we propose the EF3 gene as a possible DNA barcode for the nectriaceous fungi.
Boussaha, Mekki; Michot, Pauline; Letaief, Rabia; Hozé, Chris; Fritz, Sébastien; Grohs, Cécile; Esquerré, Diane; Duchesne, Amandine; Philippe, Romain; Blanquet, Véronique; Phocas, Florence; Floriot, Sandrine; Rocha, Dominique; Klopp, Christophe; Capitan, Aurélien; Boichard, Didier
2016-11-15
In recent years, several bovine genome sequencing projects were carried out with the aim of developing genomic tools to improve dairy and beef production efficiency and sustainability. In this study, we describe the first French cattle genome variation dataset obtained by sequencing 274 whole genomes representing several major dairy and beef breeds. This dataset contains over 28 million single nucleotide polymorphisms (SNPs) and small insertions and deletions. Comparisons between sequencing results and SNP array genotypes revealed a very high genotype concordance rate, which indicates the good quality of our data. To our knowledge, this is the first large-scale catalog of small genomic variations in French dairy and beef cattle. This resource will contribute to the study of gene functions and population structure and also help to improve traits through genotype-guided selection.
Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming
2013-01-01
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
2013-01-01
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169
Identification of the sequence variations of 15 autosomal STR loci in a Chinese population.
Chen, Wenjing; Cheng, Jianding; Ou, Xueling; Chen, Yong; Tong, Dayue; Sun, Hongyu
2014-01-01
DNA sequence variation including base(s) changes and insertion or deletion in the primer binding region may cause a null allele and, if this changes the length of the amplified fragment out of the allelic ladder, off-ladder (OL) alleles may be detected. In order to provide accurate and reliable DNA evidence for forensic DNA analysis, it is essential to clarify sequence variations in prevalently used STR loci. Suspected null alleles and OL alleles of PlowerPlex16® System from 21,934 unrelated Chinese individuals were verified by alternative systems and sequenced. A total of 17 cases with null alleles were identified, including 12 kinds of point mutations in 16 cases and a 19-base deletion in one case. The total frequency of null alleles was 7.751 × 10(-4). Eight hundred and forty-four OL alleles classified as being of 97 different kinds were observed at 15 STR loci of the PowerPlex®16 system except vWA. All the frequencies of OL alleles were under 0.01. Null alleles should be confirmed by alternative primers and OL alleles should be named appropriately. Particular attention should be paid to sequence variation, since incorrect designation could lead to false conclusions.
Eastman, Alexander W; Heinrichs, David E; Yuan, Ze-Chun
2014-10-03
Members of the genus Paenibacillus are important plant growth-promoting rhizobacteria that can serve as bio-reactors. Paenibacillus polymyxa promotes the growth of a variety of economically important crops. Our lab recently completed the genome sequence of Paenibacillus polymyxa CR1. As of January 2014, four P. polymyxa genomes have been completely sequenced but no comparative genomic analyses have been reported. Here we report the comparative and genetic analyses of four sequenced P. polymyxa genomes, which revealed a significantly conserved core genome. Complex metabolic pathways and regulatory networks were highly conserved and allow P. polymyxa to rapidly respond to dynamic environmental cues. Genes responsible for phytohormone synthesis, phosphate solubilization, iron acquisition, transcriptional regulation, σ-factors, stress responses, transporters and biomass degradation were well conserved, indicating an intimate association with plant hosts and the rhizosphere niche. In addition, genes responsible for antimicrobial resistance and non-ribosomal peptide/polyketide synthesis are present in both the core and accessory genome of each strain. Comparative analyses also reveal variations in the accessory genome, including large plasmids present in strains M1 and SC2. Furthermore, a considerable number of strain-specific genes and genomic islands are irregularly distributed throughout each genome. Although a variety of plant-growth promoting traits are encoded by all strains, only P. polymyxa CR1 encodes the unique nitrogen fixation cluster found in other Paenibacillus sp. Our study revealed that genomic loci relevant to host interaction and ecological fitness are highly conserved within the P. polymyxa genomes analysed, despite variations in the accessory genome. This work suggets that plant-growth promotion by P. polymyxa is mediated largely through phytohormone production, increased nutrient availability and bio-control mechanisms. This study provides an in-depth understanding of the genome architecture of this species, thus facilitating future genetic engineering and applications in agriculture, industry and medicine. Furthermore, this study highlights the current gap in our understanding of complex plant biomass metabolism in Gram-positive bacteria.
The diploid genome sequence of an Asian individual
Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian
2009-01-01
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Zhang, J R; Norris, S J
1998-08-01
The Lyme disease spirochete Borrelia burgdorferi possesses 15 silent vls cassettes and a vls expression site (vlsE) encoding a surface-exposed lipoprotein. Segments of the silent vls cassettes have been shown to recombine with the vlsE cassette region in the mammalian host, resulting in combinatorial antigenic variation. Despite promiscuous recombination within the vlsE cassette region, the 5' and 3' coding sequences of vlsE that flank the cassette region are not subject to sequence variation during these recombination events. The segments of the silent vls cassettes recombine in the vlsE cassette region through a unidirectional process such that the sequence and organization of the silent vls loci are not affected. As a result of recombination, the previously expressed segments are replaced by incoming segments and apparently degraded. These results provide evidence for a gene conversion mechanism in VlsE antigenic variation.
NASA Technical Reports Server (NTRS)
Rai, Man Mohan (Inventor); Madavan, Nateri K. (Inventor)
2007-01-01
A method and system for data modeling that incorporates the advantages of both traditional response surface methodology (RSM) and neural networks is disclosed. The invention partitions the parameters into a first set of s simple parameters, where observable data are expressible as low order polynomials, and c complex parameters that reflect more complicated variation of the observed data. Variation of the data with the simple parameters is modeled using polynomials; and variation of the data with the complex parameters at each vertex is analyzed using a neural network. Variations with the simple parameters and with the complex parameters are expressed using a first sequence of shape functions and a second sequence of neural network functions. The first and second sequences are multiplicatively combined to form a composite response surface, dependent upon the parameter values, that can be used to identify an accurate mode
Amor, Nabil; Farjallah, Sarra; Salem, Mohamed; Lamine, Dia Mamadou; Merella, Paolo; Said, Khaled; Ben Slimane, Badreddine
2011-10-01
Fasciolosis caused by Fasciola hepatica and Fasciola gigantica (Platyhelminthes: Trematoda: Digenea) is considered the most important helminth infection of ruminants in tropical countries, causing considerable socioeconomic problems. From Africa, F. gigantica has been previously characterized from Burkina Faso, Senegal, Kenya, Zambia and Mali, while F. hepatica has been reported from Morocco and Tunisia, and both species have been observed from Ethiopia and Egypt on the basis of morphometric differences, while the use of molecular markers is necessary to distinguish exactly between species. Samples identified morphologically as F. gigantica (n=60) from sheep and cattle from different geographical localities of Mauritania were genetically characterized by sequences of the first (ITS-1), the 5.8S, and second (ITS-2) Internal Transcribed Spacers (ITS) of nuclear ribosomal DNA (rDNA) genes and the mitochondrial Cytochrome c Oxidase I (COI) gene. Comparison of the sequences of the Mauritanian samples with sequences of Fasciola spp. from GenBank confirmed that all samples belong to the species F. gigantica. The nucleotide sequencing of ITS rDNA of F. gigantica showed no nucleotide variation in the ITS-1, 5.8S, and ITS-2 rDNA sequences among all samples examined and those from Burkina Faso, Kenya, Egypt and Iran. The phylogenetic trees based on the ITS-1 and ITS-2 sequences showed a close relationship of the Mauritanian samples with isolates of F. gigantica from different localities of Africa and Asia. The COI genotypes of the Mauritanian specimens of F. gigantica had a high level of diversity, and they belonged to the F. gigantica phylogenically distinguishable clade. The present study is the first molecular characterization of F. gigantica in sheep and cattle from Mauritania, allowing a reliable approach for the genetic differentiation of Fasciola spp. and providing basis for further studies on liver flukes in the African countries. Copyright © 2011 Elsevier Inc. All rights reserved.
Spuesens, Emiel B M; van de Kreeke, Nick; Estevão, Silvia; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis
2011-02-01
Mycoplasma pneumoniae is a human pathogen that causes a range of respiratory tract infections. The first step in infection is adherence of the bacteria to the respiratory epithelium. This step is mediated by a specialized organelle, which contains several proteins (cytadhesins) that have an important function in adherence. Two of these cytadhesins, P40 and P90, represent the proteolytic products from a single 130 kDa protein precursor, which is encoded by the MPN142 gene. Interestingly, MPN142 contains a repetitive DNA element, termed RepMP5, of which homologues are found at seven other loci within the M. pneumoniae genome. It has been hypothesized that these RepMP5 elements, which are similar but not identical in sequence, recombine with their counterpart within MPN142 and thereby provide a source of sequence variation for this gene. As this variation may give rise to amino acid changes within P40 and P90, the recombination between RepMP5 elements may constitute the basis of antigenic variation and, possibly, immune evasion by M. pneumoniae. To investigate the sequence variation of MPN142 in relation to inter-RepMP5 recombination, we determined the sequences of all RepMP5 elements in a collection of 25 strains. The results indicate that: (i) inter-RepMP5 recombination events have occurred in seven of the strains, and (ii) putative RepMP5 recombination events involving MPN142 have induced amino acid changes in a surface-exposed part of the P40 protein in two of the strains. We conclude that recombination between RepMP5 elements is a common phenomenon that may lead to sequence variation of MPN142-encoded proteins.
Dynamics of actin evolution in dinoflagellates.
Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F
2011-04-01
Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.
Nath, B Surendra; Gupta, S K; Bajpai, A K
2012-12-01
The life cycle, spore morphology, pathogenicity, tissue specificity, mode of transmission and small subunit rRNA (SSU-rRNA) gene sequence analysis of the five new microsporidian isolates viz., NIWB-11bp, NIWB-12n, NIWB-13md, NIWB-14b and NIWB-15mb identified from the silkworm, Bombyx mori have been studied along with type species, NIK-1s_mys. The life cycle of the microsporidians identified exhibited the sequential developmental cycles that are similar to the general developmental cycle of the genus, Nosema. The spores showed considerable variations in their shape, length and width. The pathogenicity observed was dose-dependent and differed from each of the microsporidian isolates; the NIWB-15mb was found to be more virulent than other isolates. All of the microsporidians were found to infect most of the tissues examined and showed gonadal infection and transovarial transmission in the infected silkworms. SSU-rRNA sequence based phylogenetic tree placed NIWB-14b, NIWB-12n and NIWB-11bp in a separate branch along with other Nosema species and Nosema bombycis; while NIWB-15mb and NIWB-13md together formed another cluster along with other Nosema species. NIK-1s_mys revealed a signature sequence similar to standard type species, N. bombycis, indicating that NIK-1s_mys is similar to N. bombycis. Based on phylogenetic relationships, branch length information based on genetic distance and nucleotide differences, we conclude that the microsporidian isolates identified are distinctly different from the other known species and belonging to the genus, Nosema. This SSU-rRNA gene sequence analysis method is found to be more useful approach in detecting different and closely related microsporidians of this economically important domestic insect.
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.; ...
2016-06-08
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Williams, Tony D.; Ames, Caroline E.; Kiparissis, Yiannis; Wynne-Edwards, Katherine E.
2005-01-01
We investigated the relationship between plasma and yolk oestrogens in laying female zebra finches (Taeniopygia guttata) by manipulating plasma oestradiol (E2) levels, via injection of oestradiol-17β, in a sequence-specific manner to maintain chronically high plasma levels for later-developing eggs (contrasting with the endogenous pattern of decreasing plasma E2 concentrations during laying). We report systematic variation in yolk oestrogen concentrations, in relation to laying sequence, similar to that widely reported for androgenic steroids. In sham-manipulated females, yolk E2 concentrations decreased with laying sequence. However, in E2-treated females plasma E2 levels were higher during the period of rapid yolk development of later-laid eggs, compared with control females. As a consequence, we reversed the laying-sequence-specific pattern of yolk E2: in E2-treated females, yolk E2 concentrations increased with laying-sequence. In general therefore, yolk E2 levels were a direct reflection of plasma E2 levels. However, in control females there was some inter-individual variability in the endogenous pattern of plasma E2 levels through the laying cycle which could generate variation in sequence-specific patterns of yolk hormone levels even if these primarily reflect circulating steroid levels. PMID:15695208
Hasumi, Hisashi; Furuya, Mitsuko; Tatsuno, Kenji; Yamamoto, Shogo; Baba, Masaya; Hasumi, Yukiko; Isono, Yasuhiro; Suzuki, Kae; Jikuya, Ryosuke; Otake, Shinji; Muraoka, Kentaro; Osaka, Kimito; Hayashi, Narihiko; Makiyama, Kazuhide; Miyoshi, Yasuhide; Kondo, Keiichi; Nakaigawa, Noboru; Kawahara, Takashi; Izumi, Koji; Teranishi, Junichi; Yumura, Yasushi; Uemura, Hiroji; Nagashima, Yoji; Metwalli, Adam R; Schmidt, Laura S; Aburatani, Hiroyuki; Linehan, W Marston; Yao, Masahiro
2018-05-14
Birt-Hogg-Dubé (BHD) syndrome is a hereditary kidney cancer syndrome, which predisposes patients to develop kidney cancer, cutaneous fibrofolliculomas and pulmonary cysts. The responsible gene FLCN is a tumor suppressor for kidney cancer which plays an important role in energy homeostasis through the regulation of mitochondrial oxidative metabolism. However, the process by which FLCN-deficiency leads to renal tumorigenesis is unclear. In order to clarify molecular pathogenesis of BHD-associated kidney cancer, we conducted whole-exome sequencing analysis using next-generation sequencing technology as well as metabolite analysis using LC/MS and GC/MS. Whole-exome sequencing analysis of BHD-associated kidney cancer revealed that copy number variations (CNV) of BHD-associated kidney cancer are considerably different from those already reported in sporadic cases. In somatic variant analysis, very few variants were commonly observed in BHD-associated kidney cancer; however, variants in chromatin remodeling genes were frequently observed in BHD-associated kidney cancer (17/29 tumors, 59%). Metabolite analysis of BHD-associated kidney cancer revealed metabolic reprogramming towards upregulated redox regulation which may neutralize reactive oxygen species potentially produced from mitochondria with increased respiratory capacity under FLCN-deficiency. BHD-associated kidney cancer displays unique molecular characteristics which are completely different from sporadic kidney cancer, providing mechanistic insight into tumorigenesis under FLCN-deficiency as well as a foundation for development of novel therapeutics for kidney cancer.
Barik, Suvakanta; SarkarDas, Shabari; Singh, Archita; Gautam, Vibhav; Kumar, Pramod; Majee, Manoj; Sarkar, Ananda K
2014-01-01
Similar to the majority of the microRNAs, mature miR166s are derived from multiple members of MIR166 genes (precursors) and regulate various aspects of plant development by negatively regulating their target genes (Class III HD-ZIP). The evolutionary conservation or functional diversification of miRNA166 family members remains elusive. Here, we show the phylogenetic relationships among MIR166 precursor and mature sequences from three diverse model plant species. Despite strong conservation, some mature miR166 sequences, such as ppt-miR166m, have undergone sequence variation. Critical sequence variation in ppt-miR166m has led to functional diversification, as it targets non-HD-ZIPIII gene transcript (s). MIR166 precursor sequences have diverged in a lineage specific manner, and both precursors and mature osa-miR166i/j are highly conserved. Interestingly, polycistronic MIR166s were present in Physcomitrella and Oryza but not in Arabidopsis. The nature of cis-regulatory motifs on the upstream promoter sequences of MIR166 genes indicates their possible contribution to the functional variation observed among miR166 species. Copyright © 2013 Elsevier Inc. All rights reserved.
Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun
2015-01-01
Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.
Talla, Venkat; Suh, Alexander; Kalsoom, Faheema; Dincă, Vlad; Vila, Roger; Friberg, Magne; Wiklund, Christer
2017-01-01
Abstract Characterizing and quantifying genome size variation among organisms and understanding if genome size evolves as a consequence of adaptive or stochastic processes have been long-standing goals in evolutionary biology. Here, we investigate genome size variation and association with transposable elements (TEs) across lepidopteran lineages using a novel genome assembly of the common wood-white (Leptidea sinapis) and population re-sequencing data from both L. sinapis and the closely related L. reali and L. juvernica together with 12 previously available lepidopteran genome assemblies. A phylogenetic analysis confirms established relationships among species, but identifies previously unknown intraspecific structure within Leptidea lineages. The genome assembly of L. sinapis is one of the largest of any lepidopteran taxon so far (643 Mb) and genome size is correlated with abundance of TEs, both in Lepidoptera in general and within Leptidea where L. juvernica from Kazakhstan has considerably larger genome size than any other Leptidea population. Specific TE subclasses have been active in different Lepidoptera lineages with a pronounced expansion of predominantly LINEs, DNA elements, and unclassified TEs in the Leptidea lineage after the split from other Pieridae. The rate of genome expansion in Leptidea in general has been in the range of four Mb/Million year (My), with an increase in a particular L. juvernica population to 72 Mb/My. The considerable differences in accumulation rates of specific TE classes in different lineages indicate that TE activity plays a major role in genome size evolution in butterflies and moths. PMID:28981642
Genetic variation patterns of American chestnut populations at EST-SSRs
Oliver Gailing; C. Dana Nelson
2017-01-01
The objective of this study is to analyze patterns of genetic variation at genic expressed sequence tag - simple sequence repeats (EST-SSRs) and at chloroplast DNA markers in populations of American chestnut (Castanea dentata Borkh.) to assist in conservation and breeding efforts. Allelic diversity at EST-SSRs decreased significantly from southwest to northeast along...
Thompson, David N; Thompson, Vicki S; Schaller, Kastli D; Apel, William A; Reed, David W; Lacey, Jeffrey A
2013-04-30
Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof are provided. Further provided are methods of at least partially degrading xylotriose, xylobiose, and/or arabinofuranose-substituted xylan using isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof.
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data....
Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko
2014-01-01
Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750
Goettel, Wolfgang; Xia, Eric; Upchurch, Robert; Wang, Ming-Li; Chen, Pengyin; An, Yong-Qiang Charles
2014-04-23
Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.
Firrao, Giuseppe; Torelli, Emanuela; Polano, Cesare; Ferrante, Patrizia; Ferrini, Francesca; Martini, Marta; Marcelletti, Simone; Scortichini, Marco; Ermacora, Paolo
2018-01-01
Pseudomonas syringae pv. actinidiae (Psa) biovar 3 caused pandemic bacterial canker of Actinidia chinensis and Actinidia deliciosa since 2008. In Europe, the disease spread rapidly in the kiwifruit cultivation areas from a single introduction. In this study, we investigated the genomic diversity of Psa biovar 3 strains during the primary clonal expansion in Europe using single molecule real-time (SMRT), Illumina and Sanger sequencing technologies. We recorded evidences of frequent mobilization and loss of transposon Tn6212, large chromosome inversions, and ectopic integration of IS sequences (remarkably ISPsy31, ISPsy36, and ISPsy37). While no phenotype change associated with Tn6212 mobilization could be detected, strains CRAFRU 12.29 and CRAFRU 12.50 did not elicit the hypersensitivity response (HR) on tobacco and eggplant leaves and were limited in their growth in kiwifruit leaves due to insertion of ISPsy31 and ISPsy36 in the hrpS and hrpR genes, respectively, interrupting the hrp cluster. Both strains had been isolated from symptomatic plants, suggesting coexistence of variant strains with reduced virulence together with virulent strains in mixed populations. The structural differences caused by rearrangements of self-genetic elements within European and New Zealand strains were comparable in number and type to those occurring among the European strains, in contrast with the significant difference in terms of nucleotide polymorphisms. We hypothesize a relaxation, during clonal expansion, of the selection limiting the accumulation of deleterious mutations associated with genome structural variation due to transposition of mobile elements. This consideration may be relevant when evaluating strategies to be adopted for epidemics management.
Asgharian, Hosseinali; Sahafi, Homayoun Hosseinzadeh; Ardalan, Aria Ashja; Shekarriz, Shahrokh; Elahi, Elahe
2011-05-01
We provide cytochrome c oxidase subunit 1 (COI) barcode sequences of fishes of the Nayband National Park, Persian Gulf, Iran. Industrial activities, ecological considerations and goals of The Fish Barcode of Life campaign make it crucial that fish species residing in the park be identified. To the best of our knowledge, this is the first report of barcoding data on fishes of the Persian Gulf. We examined 187 individuals representing 76 species, 56 genera and 32 families. The data flagged potentially cryptic species of Gerres filamentosus and Plectorhinchus schotaf. 16S rDNA data on these species are provided. Exclusion of these two potential cryptic species resulted in a mean COI intraspecific distance of 0.18%, and a mean inter- to intraspecific divergence ratio of 66.7. There was no overlap between maximum Kimura 2-parameter distances among conspecifics (1.66%) and minimum distance among congeneric species (6.19%). Barcodes shared among species were not observed. Neighbour-joining analysis showed that most species formed cohesive sequence units with little variation. Finally, the comparison of 16 selected species from this study with meta-data of conspecifics from Australia, India, China and South Africa revealed high interregion divergences and potential existence of six cryptic species. Pairwise interregional comparisons were more informative than global divergence assessments with regard to detection of cryptic variation. Our analysis exemplifies optimal use of the expanding barcode data now becoming available. © 2011 Blackwell Publishing Ltd.
Genes mirror geography in Daphnia magna.
Fields, Peter D; Reisser, Céline; Dukić, Marinela; Haag, Christoph R; Ebert, Dieter
2015-09-01
Identifying the presence and magnitude of population genetic structure remains a major consideration in evolutionary biology as doing so allows one to understand the demographic history of a species as well as make predictions of how the evolutionary process will proceed. Next-generation sequencing methods allow us to reconsider previous ideas and conclusions concerning the distribution of genetic variation, and what this distribution implies about a given species evolutionary history. A previous phylogeographic study of the crustacean Daphnia magna suggested that, despite strong genetic differentiation among populations at a local scale, the species shows only moderate genetic structure across its European range, with a spatially patchy occurrence of individual lineages. We apply RAD sequencing to a sample of D. magna collected across a wide swath of the species' Eurasian range and analyse the data using principle component analysis (PCA) of genetic variation and Procrustes analytical approaches, to quantify spatial genetic structure. We find remarkable consistency between the first two PCA axes and the geographic coordinates of individual sampling points, suggesting that, on a continent-wide scale, genetic differentiation is driven to a large extent by geographic distance. The observed pattern is consistent with unimpeded (i.e. no barriers, landscape or otherwise) migration at large spatial scales, despite the fragmented and patchy nature of favourable habitats at local scales. With high-resolution genetic data similar patterns may be uncovered for other species with wide geographic distributions, allowing an increased understanding of how genetic drift and selection have shaped their evolutionary history. © 2015 John Wiley & Sons Ltd.
Danic-Tchaleu, Gwenaelle; Heurtebise, Serge; Morga, Benjamin; Lapègue, Sylvie
2011-10-12
Because of its typical architecture, inheritance and small size, mitochondrial (mt) DNA is widely used for phylogenetic studies. Gene order is generally conserved in most taxa although some groups show considerable variation. This is particularly true in the phylum Mollusca, especially in the Bivalvia. During the last few years, there have been significant increases in the number of complete mitochondrial sequences available. For bivalves, 35 complete mitochondrial genomes are now available in GenBank, a number that has more than doubled in the last three years, representing 6 families and 23 genera. In the current study, we determined the complete mtDNA sequence of O. edulis, the European flat oyster. We present an analysis of features of its gene content and genome organization in comparison with other Ostrea, Saccostrea and Crassostrea species. The Ostrea edulis mt genome is 16 320 bp in length and codes for 37 genes (12 protein-coding genes, 2 rRNAs and 23 tRNAs) on the same strand. As in other Ostreidae, O. edulis mt genome contains a split of the rrnL gene and a duplication of trnM. The tRNA gene set of O. edulis, Ostrea denselamellosa and Crassostrea virginica are identical in having 23 tRNA genes, in contrast to Asian oysters, which have 25 tRNA genes (except for C. ariakensis with 24). O. edulis and O. denselamellosa share the same gene order, but differ from other Ostreidae and are closer to Crassostrea than to Saccostrea. Phylogenetic analyses reinforce the taxonomic classification of the 3 families Ostreidae, Mytilidae and Pectinidae. Within the Ostreidae family the results also reveal a closer relationship between Ostrea and Saccostrea than between Ostrea and Crassostrea. Ostrea edulis mitogenomic analyses show a high level of conservation within the genus Ostrea, whereas they show a high level of variation within the Ostreidae family. These features provide useful information for further evolutionary analysis of oyster mitogenomes.
Ryynänen, Heikki J; Primmer, Craig R
2006-01-01
Background Single nucleotide polymorphisms (SNPs) represent the most abundant type of DNA variation in the vertebrate genome, and their applications as genetic markers in numerous studies of molecular ecology and conservation of natural populations are emerging. Recent large-scale sequencing projects in several fish species have provided a vast amount of data in public databases, which can be utilized in novel SNP discovery in salmonids. However, the suggested duplicated nature of the salmonid genome may hamper SNP characterization if the primers designed in conserved gene regions amplify multiple loci. Results Here we introduce a new intron-primed exon-crossing (IPEC) method in an attempt to overcome this duplication problem, and also evaluate different priming methods for SNP discovery in Atlantic salmon (Salmo salar) and other salmonids. A total of 69 loci with differing priming strategies were screened in S. salar, and 27 of these produced ~13 kb of high-quality sequence data consisting of 19 SNPs or indels (one per 680 bp). The SNP frequency and the overall nucleotide diversity (3.99 × 10-4) in S. salar was lower than reported in a majority of other organisms, which may suggest a relative young population history for Atlantic salmon. A subset of primers used in cross-species analyses revealed considerable variation in the SNP frequencies and nucleotide diversities in other salmonids. Conclusion Sequencing success was significantly higher with the new IPEC primers; thus the total number of loci to screen in order to identify one potential polymorphic site was six times less with this new strategy. Given that duplication may hamper SNP discovery in some species, the IPEC method reported here is an alternative way of identifying novel polymorphisms in such cases. PMID:16872523
2011-01-01
Background Because of its typical architecture, inheritance and small size, mitochondrial (mt) DNA is widely used for phylogenetic studies. Gene order is generally conserved in most taxa although some groups show considerable variation. This is particularly true in the phylum Mollusca, especially in the Bivalvia. During the last few years, there have been significant increases in the number of complete mitochondrial sequences available. For bivalves, 35 complete mitochondrial genomes are now available in GenBank, a number that has more than doubled in the last three years, representing 6 families and 23 genera. In the current study, we determined the complete mtDNA sequence of O. edulis, the European flat oyster. We present an analysis of features of its gene content and genome organization in comparison with other Ostrea, Saccostrea and Crassostrea species. Results The Ostrea edulis mt genome is 16 320 bp in length and codes for 37 genes (12 protein-coding genes, 2 rRNAs and 23 tRNAs) on the same strand. As in other Ostreidae, O. edulis mt genome contains a split of the rrnL gene and a duplication of trnM. The tRNA gene set of O. edulis, Ostrea denselamellosa and Crassostrea virginica are identical in having 23 tRNA genes, in contrast to Asian oysters, which have 25 tRNA genes (except for C. ariakensis with 24). O. edulis and O. denselamellosa share the same gene order, but differ from other Ostreidae and are closer to Crassostrea than to Saccostrea. Phylogenetic analyses reinforce the taxonomic classification of the 3 families Ostreidae, Mytilidae and Pectinidae. Within the Ostreidae family the results also reveal a closer relationship between Ostrea and Saccostrea than between Ostrea and Crassostrea. Conclusions Ostrea edulis mitogenomic analyses show a high level of conservation within the genus Ostrea, whereas they show a high level of variation within the Ostreidae family. These features provide useful information for further evolutionary analysis of oyster mitogenomes. PMID:21989403
A search for the binary companion of Polaris
NASA Technical Reports Server (NTRS)
Evans, Nancy Remage
1988-01-01
Polaris has a spectroscopic orbit determined from an extensive series of observations as well as a more uncertain astrometric orbit. The determination of its mass and evolutionary state is of considerable interest because it is a low-amplitude classical Cepheid with unusual period and amplitude variations. In this study, IUE spectra are investigated to search for light from the companion. The spectra of Polaris from 1600 A to 3200 A are a good match for nonvariable supergiants of similar spectral type. The lack of any excess flux at the shortest wavelengths implies that a main-sequence companion must be later than A8 V. Although this is the most likely companion, the ultraviolet observations cannot rule out a white dwarf 15,000 K or cooler. Both these companions are consistent with either an evolutionary mass or a smaller pulsation mass for the Cepheid.
Swaggart, Kayleigh A.; Pavlicev, Mihaela; Muglia, Louis J.
2015-01-01
The molecular mechanisms controlling human birth timing at term, or resulting in preterm birth, have been the focus of considerable investigation, but limited insights have been gained over the past 50 years. In part, these processes have remained elusive because of divergence in reproductive strategies and physiology shown by model organisms, making extrapolation to humans uncertain. Here, we summarize the evolution of progesterone signaling and variation in pregnancy maintenance and termination. We use this comparative physiology to support the hypothesis that selective pressure on genomic loci involved in the timing of parturition have shaped human birth timing, and that these loci can be identified with comparative genomic strategies. Previous limitations imposed by divergence of mechanisms provide an important new opportunity to elucidate fundamental pathways of parturition control through increasing availability of sequenced genomes and associated reproductive physiology characteristics across diverse organisms. PMID:25646385
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China
Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang
2013-01-01
Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
Martikainen, Mika H; Kytövuori, Laura; Majamaa, Kari
2013-03-01
Leigh syndrome is a mitochondrial disease with considerable clinical and genetic variation. We present a 16-year-old boy with Leigh-like syndrome and broad developmental retardation, parkinsonism and hypogonadism. Sequencing of the entire mitochondrial DNA from blood revealed the m.4296G>A mutation in the MT-TI gene. The mutation was heteroplasmic with a 95% proportion of the mutant genome, while the proportion was 58% in the blood of the patient's clinically healthy mother. Our results suggest that m.4296G>A is pathogenic in humans, and that the phenotype related to this change includes Leigh-like syndrome in adolescence with parkinsonism and hypogonadism, in addition to the previously reported early infantile Leigh syndrome. Copyright © 2013 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Krieger, Jeannette; Hett, Anne Kathrin; Fuerst, Paul A; Birstein, Vadim J; Ludwig, Arne
2006-01-01
Significant intraindividual variation in the sequence of the 18S rRNA gene is unusual in animal genomes. In a previous study, multiple 18S rRNA gene sequences were observed within individuals of eight species of sturgeon from North America but not in the North American paddlefish, Polyodon spathula, in two species of Polypterus (Polypterus delhezi and Polypterus senegalus), in other primitive fishes (Erpetoichthys calabaricus, Lepisosteus osseus, Amia calva) or in a lungfish (Protopterus sp.). These observations led to the hypothesis that this unusual genetic characteristic arose within the Acipenseriformes after the presumed divergence of the sturgeon and paddlefish families. In the present study, a survey of nearly all Eurasian acipenseriform species was conducted to examine 18S rDNA variation. Intraindividual variation was not found in the polyodontid species, the Chinese paddlefish, Psephurus gladius, but variation was detected in all Eurasian acipenserid species. The comparison of sequences from two major segments of the 18S rRNA gene and identification of sites where insertion/deletion events have occurred are placed in the context of evolutionary relationships within the Acipenseriformes and the evolution of rDNA variation in this group.
Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J
2004-10-01
Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
2011-01-01
Background Paphiopedilum is a horticulturally and ecologically important genus of ca. 80 species of lady's slipper orchids native to Southeast Asia. These plants have long been of interest regarding their chromosomal evolution, which involves a progressive aneuploid series based on either fission or fusion of centromeres. Chromosome number is positively correlated with genome size, so rearrangement processes must include either insertion or deletion of DNA segments. We have conducted Fluorescence In Situ Hybridization (FISH) studies using 5S and 25S ribosomal DNA (rDNA) probes to survey for rearrangements, duplications, and phylogenetically-correlated variation within Paphiopedilum. We further studied sequence variation of the non-transcribed spacers of 5S rDNA (5S-NTS) to examine their complex duplication history, including the possibility that concerted evolutionary forces may homogenize diversity. Results 5S and 25S rDNA loci among Paphiopedilum species, representing all key phylogenetic lineages, exhibit a considerable diversity that correlates well with recognized evolutionary groups. 25S rDNA signals range from 2 (representing 1 locus) to 9, the latter representing hemizygosity. 5S loci display extensive structural variation, and show from 2 specific signals to many, both major and minor and highly dispersed. The dispersed signals mainly occur at centromeric and subtelomeric positions, which are hotspots for chromosomal breakpoints. Phylogenetic analysis of cloned 5S rDNA non-transcribed spacer (5S-NTS) sequences showed evidence for both ancient and recent post-speciation duplication events, as well as interlocus and intralocus diversity. Conclusions Paphiopedilum species display many chromosomal rearrangements - for example, duplications, translocations, and inversions - but only weak concerted evolutionary forces among highly duplicated 5S arrays, which suggests that double-strand break repair processes are dynamic and ongoing. These results make the genus a model system for the study of complex chromosomal evolution in plants. PMID:21910890
Lan, Tianying; Albert, Victor A
2011-09-12
Paphiopedilum is a horticulturally and ecologically important genus of ca. 80 species of lady's slipper orchids native to Southeast Asia. These plants have long been of interest regarding their chromosomal evolution, which involves a progressive aneuploid series based on either fission or fusion of centromeres. Chromosome number is positively correlated with genome size, so rearrangement processes must include either insertion or deletion of DNA segments. We have conducted Fluorescence In Situ Hybridization (FISH) studies using 5S and 25S ribosomal DNA (rDNA) probes to survey for rearrangements, duplications, and phylogenetically-correlated variation within Paphiopedilum. We further studied sequence variation of the non-transcribed spacers of 5S rDNA (5S-NTS) to examine their complex duplication history, including the possibility that concerted evolutionary forces may homogenize diversity. 5S and 25S rDNA loci among Paphiopedilum species, representing all key phylogenetic lineages, exhibit a considerable diversity that correlates well with recognized evolutionary groups. 25S rDNA signals range from 2 (representing 1 locus) to 9, the latter representing hemizygosity. 5S loci display extensive structural variation, and show from 2 specific signals to many, both major and minor and highly dispersed. The dispersed signals mainly occur at centromeric and subtelomeric positions, which are hotspots for chromosomal breakpoints. Phylogenetic analysis of cloned 5S rDNA non-transcribed spacer (5S-NTS) sequences showed evidence for both ancient and recent post-speciation duplication events, as well as interlocus and intralocus diversity. Paphiopedilum species display many chromosomal rearrangements--for example, duplications, translocations, and inversions--but only weak concerted evolutionary forces among highly duplicated 5S arrays, which suggests that double-strand break repair processes are dynamic and ongoing. These results make the genus a model system for the study of complex chromosomal evolution in plants.
Elbers, Jean P; Brown, Mary B; Taylor, Sabrina S
2018-01-19
Infectious disease is the single greatest threat to taxa such as amphibians (chytrid fungus), bats (white nose syndrome), Tasmanian devils (devil facial tumor disease), and black-footed ferrets (canine distemper virus, plague). Although understanding the genetic basis to disease susceptibility is important for the long-term persistence of these groups, most research has been limited to major-histocompatibility and Toll-like receptor genes. To better understand the genetic basis of infectious disease susceptibility in a species of conservation concern, we sequenced all known/predicted immune response genes (i.e., the immunomes) in 16 Florida gopher tortoises, Gopherus polyphemus. All tortoises produced antibodies against Mycoplasma agassizii (an etiologic agent of infectious upper respiratory tract disease; URTD) and, at the time of sampling, either had (n = 10) or lacked (n = 6) clinical signs. We found several variants associated with URTD clinical status in complement and lectin genes, which may play a role in Mycoplasma immunity. Thirty-five genes deviated from neutrality according to Tajima's D. These genes were enriched in functions relating to macromolecule and protein modifications, which are vital to immune system functioning. These results are suggestive of genetic differences that might contribute to disease severity, a finding that is consistent with other mycoplasmal diseases. This has implications for management because tortoises across their range may possess genetic variation associated with a more severe response to URTD. More generally: 1) this approach demonstrates that a broader consideration of immune genes is better able to identify important variants, and; 2) this data pipeline can be adopted to identify alleles associated with disease susceptibility or resistance in other taxa, and therefore provide information on a population's risk of succumbing to disease, inform translocations to increase genetic variation for disease resistance, and help to identify potential treatments.
Consensus generation and variant detection by Celera Assembler.
Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger
2008-04-15
We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
The nuclear 18S ribosomal RNA gene as a source of phylogenetic information in the genus Taenia.
Yan, Hongbin; Lou, Zhongzi; Li, Li; Ni, Xingwei; Guo, Aijiang; Li, Hongmin; Zheng, Yadong; Dyachenko, Viktor; Jia, Wanzhong
2013-03-01
Most species of the genus Taenia are of considerable medical and veterinary significance. In this study, complete nuclear 18S rRNA gene sequences were obtained from seven members of genus Taenia [Taenia multiceps, Taenia saginata, Taenia asiatica, Taenia solium, Taenia pisiformis, Taenia hydatigena, and Taenia taeniaeformis] and a phylogeny inferred using these sequences. Most of the variable sites fall within the variable regions, V1-V5. We show that sequences from the nuclear 18S ribosomal RNA gene have considerable promise as sources of phylogenetic information within the genus Taenia. Furthermore, given that almost all the variable sites lie within defined variable portions of that gene, it will be appropriate and economical to sequence only those regions for additional species of Taenia.
Clinical Interpretation and Implications of Whole-Genome Sequencing
Dewey, Frederick E.; Grove, Megan E.; Pan, Cuiping; Goldstein, Benjamin A.; Bernstein, Jonathan A.; Chaib, Hassan; Merker, Jason D.; Goldfeder, Rachel L.; Enns, Gregory M.; David, Sean P.; Pakdaman, Neda; Ormond, Kelly E.; Caleshu, Colleen; Kingham, Kerry; Klein, Teri E.; Whirl-Carrillo, Michelle; Sakamoto, Kenneth; Wheeler, Matthew T.; Butte, Atul J.; Ford, James M.; Boxer, Linda; Ioannidis, John P. A.; Yeung, Alan C.; Altman, Russ B.; Assimes, Themistocles L.; Snyder, Michael; Ashley, Euan A.; Quertermous, Thomas
2014-01-01
IMPORTANCE Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. OBJECTIVES To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. DESIGN, SETTING, AND PARTICIPANTS An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. MAIN OUTCOMES AND MEASURES Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. RESULTS Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95%CI, 0.40-0.64), and reclassified 69%of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001). CONCLUSIONS AND RELEVANCE In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine. PMID:24618965
Abundant mtDNA diversity and ancestral admixture in Colombian criollo cattle (Bos taurus).
Carvajal-Carmona, Luis G; Bermudez, Nelson; Olivera-Angel, Martha; Estrada, Luzardo; Ossa, Jorge; Bedoya, Gabriel; Ruiz-Linares, Andrés
2003-11-01
Various cattle populations in the Americas (known as criollo breeds) have an origin in some of the first livestock introduced to the continent early in the colonial period (16th and 17th centuries). These cattle constitute a potentially important genetic reserve as they are well adapted to local environments and show considerable variation in phenotype. To examine the genetic ancestry and diversity of Colombian criollo we obtained mitochondrial DNA control region sequence information for 110 individuals from seven breeds. Old World haplogroup T3 is the most commonly observed CR lineage in criollo (0.65), in agreement with a mostly European ancestry for these cattle. However, criollo also shows considerable frequencies of haplogroups T2 (0.9) and T1 (0.26), with T1 lineages in criollo being more diverse than those reported for West Africa. The distribution and diversity of Old World lineages suggest some North African ancestry for criollo, probably as a result of the Arab occupation of Iberia prior to the European migration to the New World. The mtDNA diversity of criollo is higher than that reported for European and African cattle and is consistent with a differentiated ancestry for some criollo breeds.
Ebolavirus comparative genomics
Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.
2015-01-01
The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035
Miller, Hilary C.; O’Meally, Denis; Ezaz, Tariq; Amemiya, Chris; Marshall-Graves, Jennifer A.; Edwards, Scott
2015-01-01
Major histocompatibility complex (MHC) genes are a central component of the vertebrate immune system and usually exist in a single genomic region. However, considerable differences in MHC organization and size exist between different vertebrate lineages. Reptiles occupy a key evolutionary position for understanding how variation in MHC structure evolved in vertebrates, but information on the structure of the MHC region in reptiles is limited. In this study, we investigate the organization and cytogenetic location of MHC genes in the tuatara (Sphenodon punctatus), the sole extant representative of the early-diverging reptilian order Rhynchocephalia. Sequencing and mapping of 12 clones containing class I and II MHC genes from a bacterial artificial chromosome library indicated that the core MHC region is located on chromosome 13q. However, duplication and translocation of MHC genes outside of the core region was evident, because additional class I MHC genes were located on chromosome 4p. We found a total of seven class I sequences and 11 class II β sequences, with evidence for duplication and pseudogenization of genes within the tuatara lineage. The tuatara MHC is characterized by high repeat content and low gene density compared with other species and we found no antigen processing or MHC framework genes on the MHC gene-containing clones. Our findings indicate substantial differences in MHC organization in tuatara compared with mammalian and avian MHCs and highlight the dynamic nature of the MHC. Further sequencing and annotation of tuatara and other reptile MHCs will determine if the tuatara MHC is representative of nonavian reptiles in general. PMID:25953959
Minimal Absent Words in Four Human Genome Assemblies
Garcia, Sara P.; Pinho, Armando J.
2011-01-01
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we aim to contribute to the catalogue of human genomic variation by investigating the variation in number and content of minimal absent words within a species, using four human genome assemblies. We compare the reference human genome GRCh37 assembly, the HuRef assembly of the genome of Craig Venter, the NA12878 assembly from cell line GM12878, and the YH assembly of the genome of a Han Chinese individual. We find the variation in number and content of minimal absent words between assemblies more significant for large and very large minimal absent words, where the biases of sequencing and assembly methodologies become more pronounced. Moreover, we find generally greater similarity between the human genome assemblies sequenced with capillary-based technologies (GRCh37 and HuRef) than between the human genome assemblies sequenced with massively parallel technologies (NA12878 and YH). Finally, as expected, we find the overall variation in number and content of minimal absent words within a species to be generally smaller than the variation between species. PMID:22220210
Missense polymorphisms in the MC1R gene of the dog, red fox, arctic fox and Chinese raccoon dog.
Nowacka-Woszuk, J; Salamon, S; Gorna, A; Switonski, M
2013-04-01
Coat colour variation is determined by many genes, one of which is the melanocortin receptor type 1 (MC1R) gene. In this study, we examined the whole coding sequence of this gene in four species belonging to the Canidae family (dog, red fox, arctic fox and Chinese raccoon dog). Although the comparative analysis of the obtained nucleotide sequences revealed a high conservation, which varied between 97.9 and 99.1%, we altogether identified 22 SNPs (10 in dogs, six in farmed red foxes, two in wild red foxes, three in arctic foxes and one in Chinese raccoon dog). Among them, seven appeared to be novel: one silent in the dog, three missense and one silent in the red fox, one in the 3'-flanking region in the arctic fox and one silent in the Chinese raccoon dog. In dogs and red foxes, the SNPs segregated as 10 and four haplotypes, respectively. Taking into consideration the published reports and results of this study, the highest number of missense polymorphisms was until now found in the dog (9) and red fox (7). © 2012 Blackwell Verlag GmbH.
Fuzzy measures on the Gene Ontology for gene product similarity.
Popescu, Mihail; Keller, James M; Mitchell, Joyce A
2006-01-01
One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.
A genomic audit of newly-adopted autosomal STRs for forensic identification.
Phillips, C
2017-07-01
In preparation for the growing use of massively parallel sequencing (MPS) technology to genotype forensic STRs, a comprehensive genomic audit of 73 STRs was made in 2016 [Parson et al., Forensic Sci. Int. Genet. 22, 54-63]. The loci examined included miniSTRs that were not in widespread use, but had been incorporated into MPS kits or were under consideration for this purpose. The current study expands the genomic analysis of autosomal STRs that are not commonly used, to include the full set of developed miniSTRs and an additional 24 STRs, most of which have been recently included in several supplementary forensic multiplex kits for capillary electrophoresis. The genomic audit of these 47 newly-adopted STRs examined the linkage status of new loci on the same chromosome as established forensic STRs; analyzed world-wide population variation of the newly-adopted STRs using published data; assessed their forensic informativeness; and compiled the sequence characteristics, repeat structures and flanking regions of each STR. A further 44 autosomal STRs developed for forensic analyses but not incorporated into commercial kits, are also briefly described. Copyright © 2017 Elsevier B.V. All rights reserved.
Schadt, Eric E.; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H.; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A.; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew
2013-01-01
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720
Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew
2013-01-01
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Artificial mismatch hybridization
Guo, Zhen; Smith, Lloyd M.
1998-01-01
An improved nucleic acid hybridization process is provided which employs a modified oligonucleotide and improves the ability to discriminate a control nucleic acid target from a variant nucleic acid target containing a sequence variation. The modified probe contains at least one artificial mismatch relative to the control nucleic acid target in addition to any mismatch(es) arising from the sequence variation. The invention has direct and advantageous application to numerous existing hybridization methods, including, applications that employ, for example, the Polymerase Chain Reaction, allele-specific nucleic acid sequencing methods, and diagnostic hybridization methods.
Tandemly repeated sequences in mtDNA control region of whitefish, Coregonus lavaretus.
Brzuzan, P
2000-06-01
Length variation of the mitochondrial DNA control region was observed with PCR amplification of a sample of 138 whitefish (Coregonus lavaretus). Nucleotide sequences of representative PCR products showed that the variation was due to the presence of an approximately 100-bp motif tandemly repeated two, three, or five times in the region between the conserved sequence block-3 (CSB-3) and the gene for phenylalanine tRNA. This is the first report on the tandem array composed of long repeat units in mitochondrial DNA of salmonids.
Maghuly, Fatemeh; Jankowicz-Cieslak, Joanna; Pabinger, Stephan; Till, Bradley J; Laimer, Margit
2015-01-01
Increasing economic interest in Jatropha curcas requires a major research focus on the genetic background and geographic origin of this non-edible biofuel crop. To determine the worldwide genetic structure of this species, amplified fragment length polymorphisms, inter simple sequence repeats, and novel single nucleotide polymorphisms (SNPs) were employed for a large collection of 907 J. curcas accessions and related species (RS) from three continents, 15 countries and 53 regions. PCoA, phenogram, and cophenetic analyses separated RS from two J. curcas groups. Accessions from Mexico, Bolivia, Paraguay, Kenya, and Ethiopia with unknown origins were found in both groups. In general, there was a considerable overlap between individuals from different regions and countries. The Bayesian approach using structure demonstrated two groups with a low genetic variation. Analysis of molecular varience revealed significant variation among individuals within populations. SNPs found by in silico analyses of Δ12 fatty acid desaturase indicated possible changes in gene expression and thus in fatty acid profiles. SNP variation was higher in the curcin gene compared to genes involved in oil production. Novel SNPs allowed separating toxic, non-toxic, and Mexican accessions. The present study confirms that human activities had a major influence on the genetic diversity of J. curcas, not only because of domestication, but also because of biased selection. PMID:25511658
Oryctes virus--time for a new look at a useful biocontrol agent.
Jackson, Trevor A; Crawford, Allan M; Glare, Travis R
2005-05-01
The introduction of Oryctes virus into outbreak areas of the rhinoceros beetle, Oryctes rhinoceros (Coleoptera: Scarabaeidae), has been a major success for "classical" biocontrol with a virus and has led to a dramatic reduction in palm damage in many areas of the Asia/Pacific region. In recent years, however, there have been new reports of high levels of rhinoceros beetle damage to palms. Damage has been especially intense in SE Asia following the introduction of no-burn polices for land clearance and replanting, but outbreaks have also been reported from some Pacific Islands where control seems to have diminished over time. SE Asian studies show that there is considerable genetic variation among endemic Oryctes virus isolates and studies in new island release areas have shown rapid evolution of the virus. The consequences of such genetic variation are in need of further study. Furthermore, the taxonomic position of the virus is unclear, with its removal from the Baculoviridae to an "unassigned' virus, reflecting its novel characteristics. Genomic sequencing could help resolve the taxonomy of the virus and provide a basis for studying strain variation. Oryctes virus has achieved wide success in the past without the benefit of molecular analysis and identification techniques. In order to fully take advantage of this unique pathogen for protection of palms, a renewed, coordinated effort centered on genetic selection and distribution of effective strains is required.
Genetic and Epigenetic Variations Induced by Wheat-Rye 2R and 5R Monosomic Addition Lines
Fu, Shulan; Sun, Chuanfei; Yang, Manyu; Fei, Yunyan; Tan, Feiqun; Yan, Benju; Ren, Zhenglong; Tang, Zongxiang
2013-01-01
Background Monosomic alien addition lines (MAALs) can easily induce structural variation of chromosomes and have been used in crop breeding; however, it is unclear whether MAALs will induce drastic genetic and epigenetic alterations. Methodology/Principal Findings In the present study, wheat-rye 2R and 5R MAALs together with their selfed progeny and parental common wheat were investigated through amplified fragment length polymorphism (AFLP) and methylation-sensitive amplification polymorphism (MSAP) analyses. The MAALs in different generations displayed different genetic variations. Some progeny that only contained 42 wheat chromosomes showed great genetic/epigenetic alterations. Cryptic rye chromatin has introgressed into the wheat genome. However, one of the progeny that contained cryptic rye chromatin did not display outstanding genetic/epigenetic variation. 78 and 49 sequences were cloned from changed AFLP and MSAP bands, respectively. Blastn search indicated that almost half of them showed no significant similarity to known sequences. Retrotransposons were mainly involved in genetic and epigenetic variations. Genetic variations basically affected Gypsy-like retrotransposons, whereas epigenetic alterations affected Copia-like and Gypsy-like retrotransposons equally. Genetic and epigenetic variations seldom affected low-copy coding DNA sequences. Conclusions/Significance The results in the present study provided direct evidence to illustrate that monosomic wheat-rye addition lines could induce different and drastic genetic/epigenetic variations and these variations might not be caused by introgression of rye chromatins into wheat. Therefore, MAALs may be directly used as an effective means to broaden the genetic diversity of common wheat. PMID:23342073
Genetic and epigenetic variations induced by wheat-rye 2R and 5R monosomic addition lines.
Fu, Shulan; Sun, Chuanfei; Yang, Manyu; Fei, Yunyan; Tan, Feiqun; Yan, Benju; Ren, Zhenglong; Tang, Zongxiang
2013-01-01
Monosomic alien addition lines (MAALs) can easily induce structural variation of chromosomes and have been used in crop breeding; however, it is unclear whether MAALs will induce drastic genetic and epigenetic alterations. In the present study, wheat-rye 2R and 5R MAALs together with their selfed progeny and parental common wheat were investigated through amplified fragment length polymorphism (AFLP) and methylation-sensitive amplification polymorphism (MSAP) analyses. The MAALs in different generations displayed different genetic variations. Some progeny that only contained 42 wheat chromosomes showed great genetic/epigenetic alterations. Cryptic rye chromatin has introgressed into the wheat genome. However, one of the progeny that contained cryptic rye chromatin did not display outstanding genetic/epigenetic variation. 78 and 49 sequences were cloned from changed AFLP and MSAP bands, respectively. Blastn search indicated that almost half of them showed no significant similarity to known sequences. Retrotransposons were mainly involved in genetic and epigenetic variations. Genetic variations basically affected Gypsy-like retrotransposons, whereas epigenetic alterations affected Copia-like and Gypsy-like retrotransposons equally. Genetic and epigenetic variations seldom affected low-copy coding DNA sequences. The results in the present study provided direct evidence to illustrate that monosomic wheat-rye addition lines could induce different and drastic genetic/epigenetic variations and these variations might not be caused by introgression of rye chromatins into wheat. Therefore, MAALs may be directly used as an effective means to broaden the genetic diversity of common wheat.
NASA Astrophysics Data System (ADS)
Antoine, Pierre; Rousseau, Denis-Didier; Degeai, Jean-Philippe; Moine, Olivier; Lagroix, France; kreutzer, Sebastian; Fuchs, Markus; Hatté, Christine; Gauthier, Caroline; Svoboda, Jiri; Lisá, Lenka
2013-05-01
High-resolution multidisciplinary investigation of key European loess-palaeosols profiles have demonstrated that loess sequences result from rapid and cyclic aeolian sedimentation which is reflected in variations of loess grain size indexes and correlated with Greenland ice-core dust records. This correlation suggests a global connection between North Atlantic and west-European air masses. Herein, we present a revised stratigraphy and a continuous high-resolution record of grain-size, magnetic susceptibility and organic carbon δ13C of the famous of Dolní Vestonice (DV) loess sequence in the Moravian region of the Czech Republic. A new set of quartz OSL ages provides a reliable and accurate chronology of the sequence's main pedosedimentary events. The grain size record shows strongly contrasting variations with numerous abrupt coarse-grained events, especially in the upper part of the sequence between ca 20-30 ka. This time period is also characterised by a progressive coarsening of the loess deposits as already observed in other western European sequences. The base of the DV sequence exhibits an exceptionally well-preserved soil complex composed of three chernozem soil horizons and 5 aeolian silt layers (marker silts). This complex is, at present, the most complete record of environmental variations and dust deposition in the European loess belt for the Weichselian Early-glacial period spanning about 110 to 70 ka, allowing correlations with various global palaeoclimatic records. OSL ages combined with sedimentological and palaeopedological observations lead to the conclusion that this soil complex recorded all of the main climatic events expressed in the North GRIP record from Greenland Interstadials (GIS) 25 to 19.
A Laboratory Exercise for Genotyping Two Human Single Nucleotide Polymorphisms
ERIC Educational Resources Information Center
Fernando, James; Carlson, Bradley; LeBard, Timothy; McCarthy, Michael; Umali, Finianne; Ashton, Bryce; Rose, Ferrill F., Jr.
2016-01-01
The dramatic decrease in the cost of sequencing a human genome is leading to an era in which a wide range of students will benefit from having an understanding of human genetic variation. Since over 90% of sequence variation between humans is in the form of single nucleotide polymorphisms (SNPs), a laboratory exercise has been devised in order to…
RSAT 2018: regulatory sequence analysis tools 20th anniversary.
Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane
2018-05-02
RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.
Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng
2009-10-01
The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, P<0.01) between the expansion of the cytosine sequence length in the C-stretch of HVS-I and a reduction in the number of upstream adenines. These results indicate that the C-stretch could be a useful genetic maker in forensic identification of Chinese populations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.
VARiD: a variation detection framework for color-space and letter-space platforms.
Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael
2010-06-15
High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.
McRobie, Helen R; King, Linda M; Fanutti, Cristina; Coussons, Peter J; Moncrief, Nancy D; Thomas, Alison P M
2014-01-01
Sequence variations in the melanocortin 1 receptor (MC1R) gene are associated with melanism in many different species of mammals, birds, and reptiles. The gray squirrel (Sciurus carolinensis), found in the British Isles, was introduced from North America in the late 19th century. Melanism in the British gray squirrel is associated with a 24-bp deletion in the MC1R. To investigate the origin of this mutation, we sequenced the MC1R of 95 individuals including 44 melanic gray squirrels from both the British Isles and North America. Melanic gray squirrels of both populations had the same 24-bp deletion associated with melanism. Given the significant deletion associated with melanism in the gray squirrel, we sequenced the MC1R of both wild-type and melanic fox squirrels (Sciurus niger) (9 individuals) and red squirrels (Sciurus vulgaris) (39 individuals). Unlike the gray squirrel, no association between sequence variation in the MC1R and melanism was found in these 2 species. We conclude that the melanic gray squirrel found in the British Isles originated from one or more introductions of melanic gray squirrels from North America. We also conclude that variations in the MC1R are not associated with melanism in the fox and red squirrels.
Saving the spandrels? Adaptive genomic variation in conservation and fisheries management.
Pearse, D E
2016-12-01
As highlighted by many of the papers in this issue, research on the genomic basis of adaptive phenotypic variation in natural populations has made spectacular progress in the past few years, largely due to the advances in sequencing technology and analysis. Without question, the resulting genomic data will improve the understanding of regions of the genome under selection and extend knowledge of the genetic basis of adaptive evolution. What is far less clear, but has been the focus of active discussion, is how such information can or should transfer into conservation practice to complement more typical conservation applications of genetic data. Before such applications can be realized, the evolutionary importance of specific targets of selection relative to the genome-wide diversity of the species as a whole must be evaluated. The key issues for the incorporation of adaptive genomic variation in conservation and management are discussed here, using published examples of adaptive genomic variation associated with specific phenotypes in salmonids and other taxa to highlight practical considerations for incorporating such information into conservation programmes. Scenarios are described in which adaptive genomic data could be used in conservation or restoration, constraints on its utility and the importance of validating inferences drawn from new genomic data before applying them in conservation practice. Finally, it is argued that an excessive focus on preserving the adaptive variation that can be measured, while ignoring the vast unknown majority that cannot, is a modern twist on the adaptationist programme that Gould and Lewontin critiqued almost 40 years ago. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Dalirsefat, Seyed Benyamin; Dong, Xianggui; Deng, Xuemei
2015-08-01
In total, 246 individuals from 8 Chinese indigenous blue- and brown-shelled chicken populations (Yimeng Blue, Wulong Blue, Lindian Blue, Dongxiang Blue, Lushi Blue, Jingmen Blue, Dongxiang Brown, and Lushi Brown) were genotyped for 21 SNP markers from the SLCO1B3 gene to evaluate phylogenetic relationships. As a representative of nonblue-shelled breeds, White Leghorn was included in the study for reference. A high proportion of SNP polymorphism was observed in Chinese chicken populations, ranging from 89% in Jingmen Blue to 100% in most populations, with a mean of 95% across all populations. The White Leghorn breed showed the lowest polymorphism, accounting for 43% of total SNPs. The mean expected heterozygosity varied from 0.11 in Dongxiang Blue to 0.46 in Yimeng Blue. Analysis of molecular variation (AMOVA) for 2 groups of Chinese chickens based on eggshell color type revealed 52% within-group and 43% between-group variations of the total genetic variation. As expected, FST and Reynolds' genetic distance were greatest between White Leghorn and Chinese chicken populations, with average values of 0.40 and 0.55, respectively. The first and second principal coordinates explained approximately 92% of the total variation and supported the clustering of the populations according to their eggshell color type and historical origins. STRUCTURE analysis showed a considerable source of variation among populations for the clustering into blue-shelled and nonblue-shelled chicken populations. The low estimation of genetic differentiation (FST) between Chinese chicken populations is possibly due to a common historical origin and high gene flow. Remarkably similar population classifications were obtained with all methods used in the study. Aligning endogenous avian retroviral (EAV)-HP insertion sequences showed no difference among the blue-shelled chickens. © 2015 Poultry Science Association Inc.
Segmental Duplications and Copy-Number Variation in the Human Genome
Sharp, Andrew J. ; Locke, Devin P. ; McGrath, Sean D. ; Cheng, Ze ; Bailey, Jeffrey A. ; Vallente, Rhea U. ; Pertz, Lisa M. ; Clark, Royden A. ; Schwartz, Stuart ; Segraves, Rick ; Oseroff, Vanessa V. ; Albertson, Donna G. ; Pinkel, Daniel ; Eichler, Evan E.
2005-01-01
The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders. PMID:15918152
Lin, Ke; Zhang, Ningwen; Severing, Edouard I; Nijveen, Harm; Cheng, Feng; Visser, Richard G F; Wang, Xiaowu; de Ridder, Dick; Bonnema, Guusje
2014-03-31
Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops. To investigate the genetic variation underlying this morphological variation, we re-sequenced, assembled and annotated the genomes of two B. rapa subspecies, turnip crops (turnip) and a rapid cycling. We then analysed the two resulting genomes together with the Chinese cabbage Chiifu reference genome to obtain an impression of the B. rapa pan-genome. The number of genes with protein-coding changes between the three genotypes was lower than that among different accessions of Arabidopsis thaliana, which can be explained by the smaller effective population size of B. rapa due to its domestication. Based on orthology to a number of non-brassica species, we estimated the date of divergence among the three B. rapa morphotypes at approximately 250,000 YA, far predating Brassica domestication (5,000-10,000 YA). By analysing genes unique to turnip we found evidence for copy number differences in peroxidases, pointing to a role for the phenylpropanoid biosynthesis pathway in the generation of morphological variation. The estimated date of divergence among three B. rapa morphotypes implies that prior to domestication there was already considerably divergence among B. rapa genotypes. Our study thus provides two new B. rapa reference genomes, delivers a set of computer tools to analyse the resulting pan-genome and uses these to shed light on genetic drivers behind the rich morphological variation found in B. rapa.
Primer and platform effects on 16S rRNA tag sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tremblay, Julien; Singh, Kanwar; Fern, Alison
Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less
Primer and platform effects on 16S rRNA tag sequencing
Tremblay, Julien; Singh, Kanwar; Fern, Alison; ...
2015-08-04
Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less
He, Xiao-Lan; Li, Qian; Peng, Wei-Hong; Zhou, Jie; Cao, Xue-Lian; Wang, Di; Huang, Zhong-Qian; Tan, Wei; Li, Yu; Gan, Bing-Cheng
2017-06-26
The internal transcribed spacer (ITS), RNA polymerase II second largest subunit (RPB2), and elongation factor 1-alpha (EF1α) are often used in fungal taxonomy and phylogenetic analysis. As we know, an ideal molecular marker used in molecular identification and phylogenetic studies is homogeneous within species, and interspecific variation exceeds intraspecific variation. However, during our process of performing ITS, RPB2, and EF1α sequencing on the Pleurotus spp., we found that intra-isolate sequence polymorphism might be present in these genes because direct sequencing of PCR products failed in some isolates. Therefore, we detected intra- and inter-isolate variation of the three genes in Pleurotus by polymerase chain reaction amplification and cloning in this study. Results showed that intra-isolate variation of ITS was not uncommon but the polymorphic level in each isolate was relatively low in Pleurotus; intra-isolate variations of EF1α and RPB2 sequences were present in an unexpectedly high amount. The polymorphism level differed significantly between ITS, RPB2, and EF1α in the same individual, and the intra-isolate heterogeneity level of each gene varied between isolates within the same species. Intra-isolate and intraspecific variation of ITS in the tested isolates was less than interspecific variation, and intra-isolate and intraspecific variation of RPB2 was probably equal with interspecific divergence. Meanwhile, intra-isolate and intraspecific variation of EF1α could exceed interspecific divergence. These findings suggested that RPB2 and EF1α are not desirable barcoding candidates for Pleurotus. We also discussed the reason why rDNA and protein-coding genes showed variants within a single isolate in Pleurotus, but must be addressed in further research. Our study demonstrated that intra-isolate variation of ribosomal and protein-coding genes are likely widespread in fungi. This has implications for studies on fungal evolution, taxonomy, phylogenetics, and population genetics. More extensive sampling of these genes and other candidates will be required to ensure reliability as phylogenetic markers and DNA barcodes.
What Advances Are Being Made in DNA Sequencing?
... to identify genetic variations; both methods rely on new technologies that allow rapid sequencing of large amounts of ... describes the different sequencing technologies and what the new technologies have meant for the study of the genetic ...
2013-01-01
Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680
Woo, Seonock; Yang, Shan-Hua; Chen, Hsing-Ju; Tseng, Yu-Fang; Hwang, Sung-Jin; De Palmas, Stephane; Denis, Vianney; Imahara, Yukimitsu; Iwase, Fumihito; Yum, Seungshic; Tang, Sen-Lin
2017-01-01
Environmental impacts can alter relationships between a coral and its symbiotic microbial community. Furthermore, changes in the microbial community associated with increased seawater temperatures can cause opportunistic infections, coral disease and death. Interactions between soft corals and their associated microbes are not well understood. The species Scleronephthya gracillimum is distributed in tropical to temperate zones in coral assemblages along the Kuroshio Current region. In this study we collected S. gracillimum from various sites at different latitudes, and compared composition of their bacterial communities using Next Generation Sequencing. Coral samples from six geographically distinct areas (two sites each in Taiwan, Japan, and Korea) had considerable variation in their associated bacterial communities and diversity. Endozoicimonaceae was the dominant group in corals from Korea and Japan, whereas Mycoplasma was dominant in corals from Taiwan corals. Interestingly, the latter corals had lower relative abundance of Endozoicimonaceae, but greater diversity. These biogeographic differences in bacterial composition may have been due to varying environmental conditions among study locations, or because of host responses to prevailing environmental conditions. This study provided a baseline for future studies of soft coral microbiomes, and assessment of functions of host metabolites and soft coral holobionts.
Sreedhar, Reddampalli V; Venkatachalam, Lakshmanan; Bhagyalakshmi, Neelwarne
2007-08-01
Occurrence of genetic variants during micropropagation is occasionally encountered when the cultures are maintained in vitro for long period. Therefore, the micropropagated multiple shoots of Vanilla planifolia Andrews developed from axillary bud explants established 10 years ago were used to determine somaclonal variation using random amplified polymorphic DNA (RAPD) and intersimple sequence repeats markers (ISSR). One thousand micro-plants were established in soil of which 95 plantlets (consisting of four phenotypes) along with the mother plant were subjected to genetic analyses using RAPD and ISSR markers. Out of the 45 RAPD and 20 ISSR primers screened, 30 RAPD and 7 ISSR primers showed 317 clear, distinct and reproducible band classes resulting in a total of 30 115 bands. However, no difference was observed in banding patterns of any of the samples for a particular primer, indicating the absence of variation among the micropropagated plants. Our results allow us to conclude that the micropropagation protocol that we have used for in vitro proliferation of vanilla plantlets for the last 10 years might be applicable for the production of clonal plants over a considerable period of time.
No population genetic structure in a widespread aquatic songbird from the Neotropics
Cadena, Carlos Daniel; Gutierrez-Pinto, Natalia; Davila, Nicolas; Chesser, R. Terry
2011-01-01
Neotropical lowland organisms often show marked population genetic structure, suggesting restricted migration among populations. However, most phylogeographic studies have focused on species inhabiting humid forest interior. Little attention has been devoted to the study of species with ecologies conducive to dispersal, such as those of more open and variable environments associated with watercourses. Using mtDNA sequences, we examined patterns of genetic variation in a widely distributed Neotropical songbird of aquatic environments, the Yellow-hooded Blackbird (Icteridae, Chrysomus icterocephalus). In contrast to many forest species, Yellow-hooded Blackbirds showed no detectable genetic structure across their range, which includes lowland populations on both sides of the Andes, much of northeastern South America, Amazonia, as well as a phenotypically distinct highland population in Colombia. A coalescent-based analysis of the species indicated that its effective population size has increased considerably, suggesting a range expansion. Our results support the hypothesis that species occurring in open habitats and tracking temporally dynamic environments should show increased dispersal propensities (hence gene flow) relative to species from closed and more stable environments. The phenotypic and behavioral variation among populations of our study species appears to have arisen recently and perhaps in the face of gene flow.
Chen, Hsing-Ju; Tseng, Yu-Fang; Hwang, Sung-Jin; De Palmas, Stephane; Denis, Vianney; Imahara, Yukimitsu; Iwase, Fumihito; Yum, Seungshic; Tang, Sen-Lin
2017-01-01
Environmental impacts can alter relationships between a coral and its symbiotic microbial community. Furthermore, changes in the microbial community associated with increased seawater temperatures can cause opportunistic infections, coral disease and death. Interactions between soft corals and their associated microbes are not well understood. The species Scleronephthya gracillimum is distributed in tropical to temperate zones in coral assemblages along the Kuroshio Current region. In this study we collected S. gracillimum from various sites at different latitudes, and compared composition of their bacterial communities using Next Generation Sequencing. Coral samples from six geographically distinct areas (two sites each in Taiwan, Japan, and Korea) had considerable variation in their associated bacterial communities and diversity. Endozoicimonaceae was the dominant group in corals from Korea and Japan, whereas Mycoplasma was dominant in corals from Taiwan corals. Interestingly, the latter corals had lower relative abundance of Endozoicimonaceae, but greater diversity. These biogeographic differences in bacterial composition may have been due to varying environmental conditions among study locations, or because of host responses to prevailing environmental conditions. This study provided a baseline for future studies of soft coral microbiomes, and assessment of functions of host metabolites and soft coral holobionts. PMID:28859111
Thermodynamic framework to assess low abundance DNA mutation detection by hybridization.
Willems, Hanny; Jacobs, An; Hadiwikarta, Wahyu Wijaya; Venken, Tom; Valkenborg, Dirk; Van Roy, Nadine; Vandesompele, Jo; Hooyberghs, Jef
2017-01-01
The knowledge of genomic DNA variations in patient samples has a high and increasing value for human diagnostics in its broadest sense. Although many methods and sensors to detect or quantify these variations are available or under development, the number of underlying physico-chemical detection principles is limited. One of these principles is the hybridization of sample target DNA versus nucleic acid probes. We introduce a novel thermodynamics approach and develop a framework to exploit the specific detection capabilities of nucleic acid hybridization, using generic principles applicable to any platform. As a case study, we detect point mutations in the KRAS oncogene on a microarray platform. For the given platform and hybridization conditions, we demonstrate the multiplex detection capability of hybridization and assess the detection limit using thermodynamic considerations; DNA containing point mutations in a background of wild type sequences can be identified down to at least 1% relative concentration. In order to show the clinical relevance, the detection capabilities are confirmed on challenging formalin-fixed paraffin-embedded clinical tumor samples. This enzyme-free detection framework contains the accuracy and efficiency to screen for hundreds of mutations in a single run with many potential applications in molecular diagnostics and the field of personalised medicine.
Fluorescent signatures for variable DNA sequences
Rice, John E.; Reis, Arthur H.; Rice, Lisa M.; Carver-Brown, Rachel K.; Wangh, Lawrence J.
2012-01-01
Life abounds with genetic variations writ in sequences that are often only a few hundred nucleotides long. Rapid detection of these variations for identification of genetic diseases, pathogens and organisms has become the mainstay of molecular science and medicine. This report describes a new, highly informative closed-tube polymerase chain reaction (PCR) strategy for analysis of both known and unknown sequence variations. It combines efficient quantitative amplification of single-stranded DNA targets through LATE-PCR with sets of Lights-On/Lights-Off probes that hybridize to their target sequences over a broad temperature range. Contiguous pairs of Lights-On/Lights-Off probes of the same fluorescent color are used to scan hundreds of nucleotides for the presence of mutations. Sets of probes in different colors can be combined in the same tube to analyze even longer single-stranded targets. Each set of hybridized Lights-On/Lights-Off probes generates a composite fluorescent contour, which is mathematically converted to a sequence-specific fluorescent signature. The versatility and broad utility of this new technology is illustrated in this report by characterization of variant sequences in three different DNA targets: the rpoB gene of Mycobacterium tuberculosis, a sequence in the mitochondrial cytochrome C oxidase subunit 1 gene of nematodes and the V3 hypervariable region of the bacterial 16 s ribosomal RNA gene. We anticipate widespread use of these technologies for diagnostics, species identification and basic research. PMID:22879378
Wang, Yan; Liu, Guo-Hua; Li, Jia-Yuan; Xu, Min-Jun; Ye, Yong-Gang; Zhou, Dong-Hui; Song, Hui-Qun; Lin, Rui-Qing; Zhu, Xing-Quan
2013-02-01
This study examined sequence variation in three mitochondrial DNA (mtDNA) regions, namely cytochrome c oxidase subunit 1 (cox1), NADH dehydrogenase subunit 5 (nad5) and cytochrome b (cytb), among Trichuris ovis isolates from different hosts in Guangdong Province, China. A portion of the cox1 (pcox1), nad5 (pnad5) and cytb (pcytb) genes was amplified separately from individual whipworms by PCR, and was subjected to sequencing from both directions. The size of the sequences of pcox1, pnad5 and pcytb was 618, 240 and 464 bp, respectively. Although the intra-specific sequence variations within T. ovis were 0-0.8% for pcox1, 0-0.8% for pnad5 and 0-1.9% for pcytb, the inter-specific sequence differences among members of the genus Trichuris were significantly higher, being 24.3-26.5% for pcox1, 33.7-56.4% for pnad5 and 24.8-26.1% for pcytb, respectively. Phylogenetic analyses using combined sequences of pcox1, pnad5 and pcytb, with three different computational algorithms (maximum likelihood, maximum parsimony and Bayesian inference), indicated that all of the T. ovis isolates grouped together with high statistical support. These findings demonstrated the existence of intra-specific variation in mtDNA sequences among T. ovis isolates from different hosts, and have implications for studying molecular epidemiology and population genetics of T. ovis.
Complex multifractal nature in Mycobacterium tuberculosis genome
Mandal, Saurav; Roychowdhury, Tanmoy; Chirom, Keilash; Bhattacharya, Alok; Brojen Singh, R. K.
2017-01-01
The mutifractal and long range correlation (C(r)) properties of strings, such as nucleotide sequence can be a useful parameter for identification of underlying patterns and variations. In this study C(r) and multifractal singularity function f(α) have been used to study variations in the genomes of a pathogenic bacteria Mycobacterium tuberculosis. Genomic sequences of M. tuberculosis isolates displayed significant variations in C(r) and f(α) reflecting inherent differences in sequences among isolates. M. tuberculosis isolates can be categorised into different subgroups based on sensitivity to drugs, these are DS (drug sensitive isolates), MDR (multi-drug resistant isolates) and XDR (extremely drug resistant isolates). C(r) follows significantly different scaling rules in different subgroups of isolates, but all the isolates follow one parameter scaling law. The richness in complexity of each subgroup can be quantified by the measures of multifractal parameters displaying a pattern in which XDR isolates have highest value and lowest for drug sensitive isolates. Therefore C(r) and multifractal functions can be useful parameters for analysis of genomic sequences. PMID:28440326
Rare variants and autoimmune disease.
Massey, Jonathan; Eyre, Steve
2014-09-01
The study of rare variants in monogenic forms of autoimmune disease has offered insight into the aetiology of more complex pathologies. Research in complex autoimmune disease initially focused on sequencing candidate genes, with some early successes, notably in uncovering low-frequency variation associated with Type 1 diabetes mellitus. However, other early examples have proved difficult to replicate, and a recent study across six autoimmune diseases, re-sequencing 25 autoimmune disease-associated genes in large sample sizes, failed to find any associated rare variants. The study of rare and low-frequency variation in autoimmune diseases has been made accessible by the inclusion of such variants on custom genotyping arrays (e.g. Immunochip and Exome arrays). Whole-exome sequencing approaches are now also being utilised to uncover the contribution of rare coding variants to disease susceptibility, severity and treatment response. Other sequencing strategies are starting to uncover the role of regulatory rare variation. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Hart, Reece K; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A
2015-01-15
Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Complex multifractal nature in Mycobacterium tuberculosis genome
NASA Astrophysics Data System (ADS)
Mandal, Saurav; Roychowdhury, Tanmoy; Chirom, Keilash; Bhattacharya, Alok; Brojen Singh, R. K.
2017-04-01
The mutifractal and long range correlation (C(r)) properties of strings, such as nucleotide sequence can be a useful parameter for identification of underlying patterns and variations. In this study C(r) and multifractal singularity function f(α) have been used to study variations in the genomes of a pathogenic bacteria Mycobacterium tuberculosis. Genomic sequences of M. tuberculosis isolates displayed significant variations in C(r) and f(α) reflecting inherent differences in sequences among isolates. M. tuberculosis isolates can be categorised into different subgroups based on sensitivity to drugs, these are DS (drug sensitive isolates), MDR (multi-drug resistant isolates) and XDR (extremely drug resistant isolates). C(r) follows significantly different scaling rules in different subgroups of isolates, but all the isolates follow one parameter scaling law. The richness in complexity of each subgroup can be quantified by the measures of multifractal parameters displaying a pattern in which XDR isolates have highest value and lowest for drug sensitive isolates. Therefore C(r) and multifractal functions can be useful parameters for analysis of genomic sequences.
Hart, Reece K.; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A.
2015-01-01
Summary: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Contact: reecehart@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273102
Zhang, Zhenying; Liu, Xiaoming; Lv, Xuelian; Lin, Jingrong
2011-12-01
Sporotrichosis is usually a localized, lymphocutaneous disease, but its disseminated type was rarely reported. The main objective of this study was to identify specific DNA sequence variation and virulence of a strain of Sporothrix schenckii isolated from the lesion of disseminated cutaneous sporotrichosis. We confirmed this strain to be S. schenckii by(®) tubulin and chitin synthase gene sequence analysis in addition to the routine mycological and partial ITS and NTS sequencing. We found a 10-bp deletion in the ribosomal NTS region of this strain, in reference to the sequence of control strains isolated from fixed cutaneous sporotrichosis. After inoculated into immunosuppressed mice, this strain caused more extensive system involvement and showed stronger virulence than the control strain isolated from a fixed cutaneous sporotrichosis. Our study thus suggests that different clinical manifestation of sporotrichosis may be associated with variation in genotype and virulence of the strain, independent of effects due to the immune status of the host.
A map of human genome variation from population-scale sequencing.
Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A
2010-10-28
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
2014-01-01
Background Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. Results In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. Conclusions As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality. PMID:24755115
Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian
2016-07-12
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
Establishing Quantitative Within-Subject Confidence Limits For Clinical Stereoroentgenographs
NASA Astrophysics Data System (ADS)
Korn, Edward L.; Baumrind, Sheldon; Chafetz, Neil; Curry, Sean; Moffitt, Francis
1983-07-01
It is now quite clear that under ideal conditions, discrete points can be located on x-ray films with standard deviations of less than 50 i. However, under routine clinical conditions, such considerations as individual variation in anatomy, movement of the subject between exposures, and variations in image quality combine to produce considerable reductions in the confidence which can be placed in quantitative assessments made from stereoroentgenographic films. This paper discusses some considerations involved in designing mathematical models in such a way as to optimize the use of imperfect data in answering specific clinical questions.
A reference human genome dataset of the BGISEQ-500 sequencer.
Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian
2017-05-01
BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. © The Authors 2017. Published by Oxford University Press.
Cho, Anna; Seong, Moon-Woo; Lim, Byung Chan; Lee, Hwa Jeen; Byeon, Jung Hye; Kim, Seung Soo; Kim, Soo Yeon; Choi, Sun Ah; Wong, Ai-Lynn; Lee, Jeongho; Kim, Jon Soo; Ryu, Hye Won; Lee, Jin Sook; Kim, Hunmin; Hwang, Hee; Choi, Ji Eun; Kim, Ki Joong; Hwang, Young Seung; Hong, Ki Ho; Park, Seungman; Cho, Sung Im; Lee, Seung Jun; Park, Hyunwoong; Seo, Soo Hyun; Park, Sung Sup; Chae, Jong Hee
2017-05-01
Duchenne and Becker muscular dystrophies (DMD and BMD) are allelic X-linked recessive muscle diseases caused by mutations in the large and complex dystrophin gene. We analyzed the dystrophin gene in 507 Korean DMD/BMD patients by multiple ligation-dependent probe amplification and direct sequencing. Overall, 117 different deletions, 48 duplications, and 90 pathogenic sequence variations, including 30 novel variations, were identified. Deletions and duplications accounted for 65.4% and 13.3% of Korean dystrophinopathy, respectively, suggesting that the incidence of large rearrangements in dystrophin is similar among different ethnic groups. We also detected sequence variations in >100 probands. The small variations were dispersed across the whole gene, and 12.3% were nonsense mutations. Precise genetic characterization in patients with DMD/BMD is timely and important for implementing nationwide registration systems and future molecular therapeutic trials in Korea and globally. Muscle Nerve 55: 727-734, 2017. © 2016 Wiley Periodicals, Inc.
An experimental phylogeny to benchmark ancestral sequence reconstruction
Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.
2016-01-01
Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687
Species conservation and natural variation among populations [Chapter 5
Leonard F. Ruggiero; Michael K. Schwartz; Keith B. Aubry; Charles J. Krebs; Amanda Stanley; Steven W. Buskirk
2000-01-01
In conservation planning, the importance of natural variation is often given inadequate consideration. However, ignoring the implications of variation within species may result in conservation strategies that jeopardize, rather than conserve, target species (see Grieg 1979; Turcek 1951; Storfer 1999). Natural variation in the traits of individuals and populations is...
Shakhssalim, Nasser; Houshmand, Massoud; Kamalidehghan, Behnam; Faraji, Abolfazl; Sarhangnejad, Reza; Dadgar, Sepideh; Mobaraki, Maryam; Rosli, Rozita; Sanati, Mohammad Hossein
2013-12-05
Bladder cancer is a relatively common and potentially life-threatening neoplasm that ranks ninth in terms of worldwide cancer incidence. The aim of this study was to determine deletions and sequence variations in the mitochondrial displacement loop (D-loop) region from the blood specimens and tumoral tissues of patients with bladder cancer, compared to adjacent non-tumoral tissues. The DNA from blood, tumoral tissues and adjacent non-tumoral tissues of twenty-six patients with bladder cancer and DNA from blood of 504 healthy controls from different ethnicities were investigated to determine sequence variation in the mitochondrial D-loop region using multiplex polymerase chain reaction (PCR), DNA sequencing and southern blotting analysis. From a total of 110 variations, 48 were reported as new mutations. No deletions were detected in tumoral tissues, adjacent non-tumoral tissues and blood samples from patients. Although the polymorphisms at loci 16189, 16261 and 16311 were not significantly correlated with bladder cancer, the C16069T variation was significantly present in patient samples compared to control samples (p < 0.05). Interestingly, there was no significant difference (p > 0.05) of C variations, including C7TC6, C8TC6, C9TC6 and C10TC6, in D310 mitochondrial DNA between patients and control samples. Our study suggests that 16069 mitochondrial DNA D-Loop mutations may play a significant role in the etiology of bladder cancer and facilitate the definition of carcinogenesis-related mutations in human cancer.
Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja
2014-01-01
Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu.
2014-01-01
Background Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have helped resolve the phylogeny of numerous clades (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome sequence data. Results We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa. Conclusions Analyses of the plastid sequence data recovered a strongly supported framework of relationships for green plants. This framework includes: i) the placement of Zygnematophyceace as sister to land plants (Embryophyta), ii) a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining extant gymnosperms and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees), and iii) within the monilophyte clade (Monilophyta), Equisetales + Psilotales are sister to Marattiales + leptosporangiate ferns. Our analyses also highlight the challenges of using plastid genome sequences in deep-level phylogenomic analyses, and we provide suggestions for future analyses that will likely incorporate plastid genome sequence data for thousands of species. We particularly emphasize the importance of exploring the effects of different partitioning and character coding strategies. PMID:24533922
Singh, Satyendra K; Prasad, Kashi N; Singh, Aloukick K; Gupta, Kamlesh K; Chauhan, Ranjeet S; Singh, Amrita; Singh, Avinash; Rai, Ravi P; Pati, Binod K
2016-10-01
Taenia solium is the major cause of taeniasis and cysticercosis/neurocysticercosis (NCC) in the developing countries including India, but the existence of other Taenia species and genetic variation have not been studied in India. So, we studied the existence of different Taenia species, and sequence variation in Taenia isolates from human (proglottids and cysticerci) and swine (cysticerci) in North India. Amplification of cytochrome c oxidase subunit 1 gene (cox1) was done by polymerase chain reaction (PCR) followed by sequencing and phylogenetic analysis. We identified two species of Taenia i.e. T. solium and Taenia asiatica in our isolates. T. solium isolates showed similarity with Asian genotype and nucleotide variations from 0.25 to 1.01 %, whereas T. asiatica displayed nucleotide variations ranged from 0.25 to 0.5 %. These findings displayed the minimal genetic variations in North Indian isolates of T. solium and T. asiatica.
Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.
Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao
2018-05-01
STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Küpper, Clemens; Burke, Terry; Lank, David B.
2015-01-01
Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species. PMID:25534935
Fornage, Myriam; Mosley, Thomas H; Jack, Clifford R; de Andrade, Mariza; Kardia, Sharon L R; Boerwinkle, Eric; Turner, Stephen T
2007-01-01
Susceptibility to ischemic damage to the subcortical white matter of the brain has a strong genetic basis. Dysregulation of matrix metalloproteinases (MMPs) contributes to loss of cerebrovascular integrity and white matter injury. We investigated whether sequence variation in the genes encoding MMP3 and MMP9 is associated with variation in leukoaraiosis volume, determined by magnetic resonance imaging, in non-Hispanic whites and African-Americans using family-based association tests. Seven hundred and fifty-six white and 671 African-American individuals from sibships ascertained through two or more siblings with hypertension were genotyped for 7 and 8 haplotype-tagging polymorphisms in the MMP3 and MMP9 genes, respectively. MMP3 sequence variation was significantly associated with variation in leukoaraiosis volume in Whites. Two common haplotypes with opposing relationships to leukoaraiosis volume were identified. MMP9 sequence variation was also significantly associated with variation in leukoaraiosis volume in both African-Americans and Whites. Different haplotypes contributed to these associations in the two racial groups. These findings add to the growing body of evidence from animal models and human clinical studies suggesting a role of MMPs in ischemic white matter injury. They provide the basis for further investigation of the role of these genes in susceptibility and/or progression to clinical disease.
Donaldson, Michael E; Rico, Yessica; Hueffer, Karsten; Rando, Halie M; Kukekova, Anna V; Kyle, Christopher J
2018-01-01
Pathogens are recognized as major drivers of local adaptation in wildlife systems. By determining which gene variants are favored in local interactions among populations with and without disease, spatially explicit adaptive responses to pathogens can be elucidated. Much of our current understanding of host responses to disease comes from a small number of genes associated with an immune response. High-throughput sequencing (HTS) technologies, such as genotype-by-sequencing (GBS), facilitate expanded explorations of genomic variation among populations. Hybridization-based GBS techniques can be leveraged in systems not well characterized for specific variants associated with disease outcome to "capture" specific genes and regulatory regions known to influence expression and disease outcome. We developed a multiplexed, sequence capture assay for red foxes to simultaneously assess ~300-kbp of genomic sequence from 116 adaptive, intrinsic, and innate immunity genes of predicted adaptive significance and their putative upstream regulatory regions along with 23 neutral microsatellite regions to control for demographic effects. The assay was applied to 45 fox DNA samples from Alaska, where three arctic rabies strains are geographically restricted and endemic to coastal tundra regions, yet absent from the boreal interior. The assay provided 61.5% on-target enrichment with relatively even sequence coverage across all targeted loci and samples (mean = 50×), which allowed us to elucidate genetic variation across introns, exons, and potential regulatory regions (4,819 SNPs). Challenges remained in accurately describing microsatellite variation using this technique; however, longer-read HTS technologies should overcome these issues. We used these data to conduct preliminary analyses and detected genetic structure in a subset of red fox immune-related genes between regions with and without endemic arctic rabies. This assay provides a template to assess immunogenetic variation in wildlife disease systems.
2013-01-01
Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482
Machczyńska, Joanna; Zimny, Janusz; Bednarek, Piotr Tomasz
2015-10-01
Plant regeneration via in vitro culture can induce genetic and epigenetic variation; however, the extent of such changes in triticale is not yet understood. In the present study, metAFLP, a variation of methylation-sensitive amplified fragment length polymorphism analysis, was used to investigate tissue culture-induced variation in triticale regenerants derived from four distinct genotypes using androgenesis and somatic embryogenesis. The metAFLP technique enabled identification of both sequence and DNA methylation pattern changes in a single experiment. Moreover, it was possible to quantify subtle effects such as sequence variation, demethylation, and de novo methylation, which affected 19, 5.5, 4.5% of sites, respectively. Comparison of variation in different genotypes and with different in vitro regeneration approaches demonstrated that both the culture technique and genetic background of donor plants affected tissue culture-induced variation. The results showed that the metAFLP approach could be used for quantification of tissue culture-induced variation and provided direct evidence that in vitro plant regeneration could cause genetic and epigenetic variation.
Wyllie, David H; Sanderson, Nicholas; Myers, Richard; Peto, Tim; Robinson, Esther; Crook, Derrick W; Smith, E Grace; Walker, A Sarah
2018-06-06
Contact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics. Copyright © 2018 Wyllie et al.
Salleh, Mohd Zaki; Teh, Lay Kek; Lee, Lian Shien; Ismet, Rose Iszati; Patowary, Ashok; Joshi, Kandarp; Pasha, Ayesha; Ahmed, Azni Zain; Janor, Roziah Mohd; Hamzah, Ahmad Sazali; Adam, Aishah; Yusoff, Khalid; Hoh, Boon Peng; Hatta, Fazleen Haslinda Mohd; Ismail, Mohamad Izwan; Scaria, Vinod; Sivasubbu, Sridhar
2013-01-01
With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.
Norling, Martin; Bishop, Richard P; Pelle, Roger; Qi, Weihong; Henson, Sonal; Drábek, Elliott F; Tretina, Kyle; Odongo, David; Mwaura, Stephen; Njoroge, Thomas; Bongcam-Rudloff, Erik; Daubenberger, Claudia A; Silva, Joana C
2015-09-24
There are no commercially available vaccines against human protozoan parasitic diseases, despite the success of vaccination-induced long-term protection against infectious diseases. East Coast fever, caused by the protist Theileria parva, kills one million cattle each year in sub-Saharan Africa, and contributes significantly to hunger and poverty in the region. A highly effective, live, multi-isolate vaccine against T. parva exists, but its component isolates have not been characterized. Here we sequence and compare the three component T. parva stocks within this vaccine, the Muguga Cocktail, namely Muguga, Kiambu5 and Serengeti-transformed, aiming to identify genomic features that contribute to vaccine efficacy. We find that Serengeti-transformed, originally isolated from the wildlife carrier, the African Cape buffalo, is remarkably and unexpectedly similar to the Muguga isolate. The 420 detectable non-synonymous SNPs were distributed among only 53 genes, primarily subtelomeric antigens and antigenic families. The Kiambu5 isolate is considerably more divergent, with close to 40,000 SNPs relative to Muguga, including >8,500 non-synonymous mutations distributed among >1,700 (42.5 %) of the predicted genes. These genetic markers of the component stocks can be used to characterize the composition of new batches of the Muguga Cocktail. Differences among these three isolates, while extensive, represent only a small proportion of the genetic variation in the entire species. Given the efficacy of the Muguga Cocktail in inducing long-lasting protection against infections in the field, our results suggest that whole-organism vaccines against parasitic diseases can be highly efficacious despite considerable genome-wide differences relative to the isolates against which they protect.
NASA Astrophysics Data System (ADS)
Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.
2015-12-01
Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.
Hoy, Marshal S.; Rodriguez, Rusty J.
2013-01-01
Molecular genetic analysis was conducted on two populations of the invasive non-native New Zealand mud snail (Potamopyrgus antipodarum), one from a freshwater ecosystem in Devil's Lake (Oregon, USA) and the other from an ecosystem of higher salinity in the Columbia River estuary (Hammond Harbor, Oregon, USA). To elucidate potential genetic differences between the two populations, three segments of nuclear ribosomal DNA (rDNA), the ITS1-ITS2 regions and the 18S and 28S rDNA genes were cloned and sequenced. Variant sequences within each individual were found in all three rDNA segments. Folding models were utilized for secondary structure analysis and results indicated that there were many sequences which contained structure-altering polymorphisms, which suggests they could be nonfunctional pseudogenes. In addition, analysis of molecular variance (AMOVA) was used for hierarchical analysis of genetic variance to estimate variation within and among populations and within individuals. AMOVA revealed significant variation in the ITS region between the populations and among clones within individuals, while in the 5.8S rDNA significant variation was revealed among individuals within the two populations. High levels of intragenomic variation were found in the ITS regions, which are known to be highly variable in many organisms. More interestingly, intragenomic variation was also found in the 18S and 28S rDNA, which has rarely been observed in animals and is so far unreported in Mollusca. We postulate that in these P. antipodarum populations the effects of concerted evolution are diminished due to the fact that not all of the rDNA genes in their polyploid genome should be essential for sustaining cellular function. This could lead to a lessening of selection pressures, allowing mutations to accumulate in some copies, changing them into variant sequences.
Microfluidic droplet enrichment for targeted sequencing
Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.
2015-01-01
Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629
Typing Clostridium difficile strains based on tandem repeat sequences
2009-01-01
Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124
West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N
2014-07-01
The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Tumor Heterogeneity, Single-Cell Sequencing, and Drug Resistance.
Schmidt, Felix; Efferth, Thomas
2016-06-16
Tumor heterogeneity has been compared with Darwinian evolution and survival of the fittest. The evolutionary ecosystem of tumors consisting of heterogeneous tumor cell populations represents a considerable challenge to tumor therapy, since all genetically and phenotypically different subpopulations have to be efficiently killed by therapy. Otherwise, even small surviving subpopulations may cause repopulation and refractory tumors. Single-cell sequencing allows for a better understanding of the genomic principles of tumor heterogeneity and represents the basis for more successful tumor treatments. The isolation and sequencing of single tumor cells still represents a considerable technical challenge and consists of three major steps: (1) single cell isolation (e.g., by laser-capture microdissection), fluorescence-activated cell sorting, micromanipulation, whole genome amplification (e.g., with the help of Phi29 DNA polymerase), and transcriptome-wide next generation sequencing technologies (e.g., 454 pyrosequencing, Illumina sequencing, and other systems). Data demonstrating the feasibility of single-cell sequencing for monitoring the emergence of drug-resistant cell clones in patient samples are discussed herein. It is envisioned that single-cell sequencing will be a valuable asset to assist the design of regimens for personalized tumor therapies based on tumor subpopulation-specific genetic alterations in individual patients.
VarDetect: a nucleotide sequence variation exploratory tool
Ngamphiw, Chumpol; Kulawonganunchai, Supasak; Assawamakin, Anunchai; Jenwitheesuk, Ekachai; Tongsima, Sissades
2008-01-01
Background Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides. Results We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency. Availability VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at . PMID:19091032
Model-based quality assessment and base-calling for second-generation sequencing data.
Bravo, Héctor Corrada; Irizarry, Rafael A
2010-09-01
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
Intraspecific variation in Cryptocaryon irritans.
Diggles, B K; Adlard, R D
1997-01-01
Intraspecific variation in the ciliate Cryptocaryon irritans was examined using sequences of the first internal transcribed spacer region (ITS-1) of ribosomal DNA (rDNA) combined with developmental and morphological characters. Amplified rDNA sequences consisting of 151 bases of the flanking 18 S and 5.8 S regions, and the entire ITS-1 region (169 or 170 bases), were determined and compared for 16 isolates of C. irritans from Australia, Israel and the USA. There was one variable base between isolates in the 18 S region and 11 variable bases in the ITS-1 region. Despite their similar morphology, significant sequence variation (4.1% divergence) and developmental differences indicate that Australian C. irritans isolates from estuarine (Moreton Bay) and coral reef (Heron Island) environments are distinct. The Heron Island isolate was genetically closer to morphologically dissimilar isolates from Israel (1.8% divergence) and the USA (2.3% divergence) than it was to the Moreton Bay isolates. Three isolates maintained in our laboratory since February 1994 differed in sequence from earlier laboratory isolates (2.9% to 3.5% divergence), even though all were similar morphologically and originated from the same source. During this time the sequence of the isolates from wild fish in Moreton Bay remained unchanged. These genetic differences indicate the existence of a founder effect in laboratory populations of C. irritans. The genetic variation found here, combined with known morphological and developmental differences, is used to characterise four strains of C. irritans.
FPGA implementation of predictive degradation model for engine oil lifetime
NASA Astrophysics Data System (ADS)
Idros, M. F. M.; Razak, A. H. A.; Junid, S. A. M. Al; Suliman, S. I.; Halim, A. K.
2018-03-01
This paper presents the implementation of linear regression model for degradation prediction on Register Transfer Logic (RTL) using QuartusII. A stationary model had been identified in the degradation trend for the engine oil in a vehicle in time series method. As for RTL implementation, the degradation model is written in Verilog HDL and the data input are taken at a certain time. Clock divider had been designed to support the timing sequence of input data. At every five data, a regression analysis is adapted for slope variation determination and prediction calculation. Here, only the negative value are taken as the consideration for the prediction purposes for less number of logic gate. Least Square Method is adapted to get the best linear model based on the mean values of time series data. The coded algorithm has been implemented on FPGA for validation purposes. The result shows the prediction time to change the engine oil.
Technological advances for improving adenoma detection rates: The changing face of colonoscopy.
Ishaq, Sauid; Siau, Keith; Harrison, Elizabeth; Tontini, Gian Eugenio; Hoffman, Arthur; Gross, Seth; Kiesslich, Ralf; Neumann, Helmut
2017-07-01
Worldwide, colorectal cancer is the third commonest cancer. Over 90% follow an adenoma-to-cancer sequence over many years. Colonoscopy is the gold standard method for cancer screening and early adenoma detection. However, considerable variation exists between endoscopists' detection rates. This review considers the effects of different endoscopic techniques on adenoma detection. Two areas of technological interest were considered: (1) optical technologies and (2) mechanical technologies. Optical solutions, including FICE, NBI, i-SCAN and high definition colonoscopy showed mixed results. In contrast, mechanical advances, such as cap-assisted colonoscopy, FUSE, EndoCuff and G-EYE™, showed promise, with reported detections rates of up to 69%. However, before definitive recommendations can be made for their incorporation into daily practice, further studies and comparison trials are required. Copyright © 2017 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Barash, M. S.
2016-02-01
In the interval of the Triassic-Jurassic boundary, 80% of the marine species became extinct. Four main hypotheses about the causes of this mass extinction are considered: volcanism, climatic oscillations, sea level variations accompanied by anoxia, and asteroid impact events. The extinction was triggered by an extensive flooding of basalts in the Central Atlantic Magmatic Province. Furthermore, a number of meteoritic craters have been found. Under the effect of cosmic causes, two main sequences of events developed on the Earth: terrestrial ones, leading to intensive volcanism, and cosmic ones (asteroid impacts). Their aftermaths, however, were similar in terms of the chemical compounds and aerosols released. As a consequence, the greenhouse effect, dimming of the atmosphere (impeding photosynthesis), ocean stagnation, and anoxia emerged. Then, biological productivity decreased and food chains were destroyed. Thus, the entire ecosystem was disturbed and a considerable part of the biota became extinct.
Mitochondrial genomes of parasitic flatworms.
Le, Thanh H; Blair, David; McManus, Donald P
2002-05-01
Complete or near-complete mitochondrial genomes are now available for 11 species or strains of parasitic flatworms belonging to the Trematoda and the Cestoda. The organization of these genomes is not strikingly different from those of other eumetazoans, although one gene (atp8) commonly found in other phyla is absent from flatworms. The gene order in most flatworms has similarities to those seen in higher protostomes such as annelids. However, the gene order has been drastically altered in Schistosoma mansoni, which obscures this possible relationship. Among the sequenced taxa, base composition varies considerably, creating potential difficulties for phylogeny reconstruction. Long non-coding regions are present in all taxa, but these vary in length from only a few hundred to approximately 10000 nucleotides. Among Schistosoma spp., the long non-coding regions are rich in repeats and length variation among individuals is known. Data from mitochondrial genomes are valuable for studies on species identification, phylogenies and biogeography.
2010-01-01
Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org. PMID:20459805
Hermes Transposon Distribution and Structure in Musca domestica
Subramanian, Ramanand A.; Cathcart, Laura A.; Krafsur, Elliot S.; Atkinson, Peter W.
2009-01-01
Hermes are hAT transposons from Musca domestica that are very closely related to the hobo transposons from Drosophila melanogaster and are useful as gene vectors in a wide variety of organisms including insects, planaria, and yeast. hobo elements show distinct length variations in a rapidly evolving region of the transposase-coding region as a result of expansions and contractions of a simple repeat sequence encoding 3 amino acids threonine, proline, and glutamic acid (TPE). These variations in length may influence the function of the protein and the movement of hobo transposons in natural populations. Here, we determine the distribution of Hermes in populations of M. domestica as well as whether Hermes transposase has undergone similar sequence expansions and contractions during its evolution in this species. Hermes transposons were found in all M. domestica individuals sampled from 14 populations collected from 4 continents. All individuals with Hermes transposons had evidence for the presence of intact transposase open reading frames, and little sequence variation was observed among Hermes elements. A systematic analysis of the TPE-homologous region of the Hermes transposase-coding region revealed no evidence for length variation. The simple sequence repeat found in hobo elements is a feature of this transposon that evolved since the divergence of hobo and Hermes. PMID:19366812
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project
Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John
2008-01-01
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213
Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup
2016-01-01
Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L
2006-01-01
Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851
Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M
2014-06-01
It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.
Talla, Venkat; Suh, Alexander; Kalsoom, Faheema; Dinca, Vlad; Vila, Roger; Friberg, Magne; Wiklund, Christer; Backström, Niclas
2017-10-01
Characterizing and quantifying genome size variation among organisms and understanding if genome size evolves as a consequence of adaptive or stochastic processes have been long-standing goals in evolutionary biology. Here, we investigate genome size variation and association with transposable elements (TEs) across lepidopteran lineages using a novel genome assembly of the common wood-white (Leptidea sinapis) and population re-sequencing data from both L. sinapis and the closely related L. reali and L. juvernica together with 12 previously available lepidopteran genome assemblies. A phylogenetic analysis confirms established relationships among species, but identifies previously unknown intraspecific structure within Leptidea lineages. The genome assembly of L. sinapis is one of the largest of any lepidopteran taxon so far (643 Mb) and genome size is correlated with abundance of TEs, both in Lepidoptera in general and within Leptidea where L. juvernica from Kazakhstan has considerably larger genome size than any other Leptidea population. Specific TE subclasses have been active in different Lepidoptera lineages with a pronounced expansion of predominantly LINEs, DNA elements, and unclassified TEs in the Leptidea lineage after the split from other Pieridae. The rate of genome expansion in Leptidea in general has been in the range of four Mb/Million year (My), with an increase in a particular L. juvernica population to 72 Mb/My. The considerable differences in accumulation rates of specific TE classes in different lineages indicate that TE activity plays a major role in genome size evolution in butterflies and moths. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Global and Local Helioseismic Studies of Solar Convection Zone Dynamics Using SOI-MDI on SOHO
NASA Technical Reports Server (NTRS)
Toomre, Juri; Haber, Deborah; Hindman, Bradley; Christensen-Dalsgaard, Joergen; Gough, Douglas; Thompson, Michael
2003-01-01
Our joint collaborative analyses of global mode data to characterize the solar differential rotation (e.g. Thompson et al. 1996, Schou et al. 1998), and most recently to detect and analyze temporal variations in angular velocity Omega profiles both within the convection zone and in the deeper radiative interior (e.g. Howe et al 2000a,b; Toomre et al. 2000), have led to a series of fascinating discoveries. These should be pursued further as the solar cycle continues. The physical deductions being made from these studies have been greatly strengthened by utilizing both SOI-MDI and GONG data in order to have two independent observational realizations of Doppler images spanning a five-year interval, using two separate procedures to determine global mode splittings, and then analyzing those splitting data sets using both RLS and SOLA inversion procedures. There are considerable subtleties in the effects of instrumental response functions and calibrations, sensitivity of peak finding algorithms and their mode leakage estimates, and stochastic variations in mode amplitudes that can all contribute to apparent changes in the Omega profiles being inferred from sequences of helioseismic data. We have come to understand the implications of many of these calibration and analysis steps, greatly aided by frequent multi-week collaborative working sessions in our Helioseismic Analysis Facility (HAF) at JILA involving many members of the SO1 dynamics and inversion team, including most of our Co-Is during the summer months when we hold intensive working sessions. Considerable further focused attention is required in a collaborative setting on such global mode issues as we continue studying the changing sun.
Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza
2015-01-01
Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus. Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research (FSHR and LHR) as well as reproduction-linked polymorphisms and breeding programs. PMID:27844002
Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza
2015-06-01
Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus . Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research ( FSHR and LHR ) as well as reproduction-linked polymorphisms and breeding programs.
Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori
2018-01-01
Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397
Marine turtle mitogenome phylogenetics and evolution.
Duchene, Sebastián; Frey, Amy; Alfaro-Núñez, Alonzo; Dutton, Peter H; Thomas P Gilbert, M; Morin, Phillip A
2012-10-01
The sea turtles are a group of cretaceous origin containing seven recognized living species: leatherback, hawksbill, Kemp's ridley, olive ridley, loggerhead, green, and flatback. The leatherback is the single member of the Dermochelidae family, whereas all other sea turtles belong in Cheloniidae. Analyses of partial mitochondrial sequences and some nuclear markers have revealed phylogenetic inconsistencies within Cheloniidae, especially regarding the placement of the flatback. Population genetic studies based on D-Loop sequences have shown considerable structuring in species with broad geographic distributions, shedding light on complex migration patterns and possible geographic or climatic events as driving forces of sea-turtle distribution. We have sequenced complete mitogenomes for all sea-turtle species, including samples from their geographic range extremes, and performed phylogenetic analyses to assess sea-turtle evolution with a large molecular dataset. We found variation in the length of the ATP8 gene and a highly variable site in ND4 near a proton translocation channel in the resulting protein. Complete mitogenomes show strong support and resolution for phylogenetic relationships among all sea turtles, and reveal phylogeographic patterns within globally-distributed species. Although there was clear concordance between phylogenies and geographic origin of samples in most taxa, we found evidence of more recent dispersal events in the loggerhead and olive ridley turtles, suggesting more recent migrations (<1 Myr) in these species. Overall, our results demonstrate the complexity of sea-turtle diversity, and indicate the need for further research in phylogeography and molecular evolution. Published by Elsevier Inc.
Nakao, Minoru; Li, Tiaoying; Han, Xiumin; Ma, Xiumin; Xiao, Ning; Qiu, Jiamin; Wang, Hu; Yanagida, Tetsuya; Mamuti, Wulamu; Wen, Hao; Moro, Pedro L.; Giraudoux, Patrick; Craig, Philip S.; Ito, Akira
2009-01-01
The genetic polymorphisms of Echinococcus spp. in the eastern Tibetan Plateau and the Xinjiang Uyghur Autonomous Region were evaluated by DNA sequencing analyses of genes for mitochondrial cytochrome c oxidase subunit 1 (cox1) and nuclear elongation factor-1 alpha (ef1a). We collected 68 isolates of Echinococcus granulosus sensu stricto (s.s.) from Xinjiang and 113 isolates of E. granulosus s. s., 49 isolates of Echinococcus multilocularis and 34 isolates of Echinococcus shiquicus from the Tibetan Plateau. The results of molecular identification by mitochondrial and nuclear markers were identical, suggesting the infrequency of introgressive hybridization. A considerable intraspecific variation was detected in mitochondrial cox1 sequences. The parsimonious network of cox1 haplotypes showed star-like features in E. granulosus s. s. and E. multilocularis, but a divergent feature in E. shiquicus. The cox1 neutrality indexes computed by Tajima's D and Fu's Fs tests showed high negative values in E. granulosus s. s. and E. multilocularis, indicating significant deviations from neutrality. In contrast, the low positive values of both tests were obtained in E. shiquicus. These results suggest the following hypotheses: (i) recent founder effects arose in E. granulosus and E. multilocularis after introducing particular individuals into the endemic areas by anthropogenic movement or natural migration of host mammals, and (ii) the ancestor of E. shiquicus was segregated into the Tibetan Plateau by colonizing alpine mammals and its mitochondrial locus has evolved without bottleneck effects. PMID:19800346
Isozyme variation in wild and cultivated pineapple
USDA-ARS?s Scientific Manuscript database
Isozyme variation was studied in 161 accessions of pineapple including four species of Ananas and one of Pseudananas. Six enzyme systems (ADH, GPI, PGM, SKDH, TPI, UGPP) involving seven putative loci revealed 35 electromorphs . Considerable variation exists within and between species of Ananas. Sixt...
Caporale, Lynn Helena
2012-09-01
This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.
Wong, Gerard; Leckie, Christopher; Gorringe, Kylie L; Haviv, Izhak; Campbell, Ian G; Kowalczyk, Adam
2010-04-15
High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed. We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Detecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them. The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/ approximately gwong/DRECS/index.html.
Analysis of human herpesvirus-6 IE1 sequence variation in clinical samples.
Stanton, Richard; Wilkinson, Gavin W G; Fox, Julie D
2003-12-01
Herpesvirus immediate early (IE) proteins are known to play key roles in establishing productive infections, regulating reactivation from latency, and creating a cellular environment favourable to viral replication. Human herpesvirus-6 (HHV-6) IE genes have not been studied as intensively as their homologues in the prototype betaherpesvirus human cytomegalovirus (HCMV). Whilst the HCMV IE1 gene is relatively conserved, early studies indicated that HHV-6 IE1 exhibited a high level of sequence variation between HHV-6A and HHV-6B isolates, although the observation was based primarily on virus stocks that had been isolated and propagated in vitro. In this study, we investigated the level of HHV-6 IE1 sequence variation in vivo by direct sequencing of circulating virus in clinical samples without prior in vitro culture. Sequences exactly matching those reported for reference HHV-6 isolates were identified in clinical samples, thus the HHV-6 laboratory strains used in the majority of in vitro studies appear to be representative of virus circulating in vivo with respect to the IE1 gene. The HHV-6 IE1 sequence is also conserved in reference strains that had been passaged extensively in vitro. The high degree of divergence between variant A and B type IE1 sequences was confirmed, but interestingly HHV-6B IE1 sequences were observed to further segregate into two distinct subgroups, with the laboratory strains Z29 and HST representative of these two subgroups. Within each HHV-6B subgroup, a remarkably high level of homology was observed. Thus the HHV-6 IE1 sequence appears highly stable, underlining its potential importance to the viral life cycle. Copyright 2003 Wiley-Liss, Inc.
Lathe, R
1985-05-05
Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.
LenVarDB: database of length-variant protein domains.
Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan
2014-01-01
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Ginther, C; Corach, D; Penacino, G A; Rey, J A; Carnese, F R; Hutz, M H; Anderson, A; Just, J; Salzano, F M; King, M C
1993-01-01
DNA samples from 60 Mapuche Indians, representing 39 maternal lineages, were genetically characterized for (1) nucleotide sequences of the mtDNA control region; (2) presence or absence of a nine base duplication in mtDNA region V; (3) HLA loci DRB1 and DQA1; (4) variation at three nuclear genes with short tandem repeats; and (5) variation at the polymorphic marker D2S44. The genetic profile of the Mapuche population was compared to other Amerinds and to worldwide populations. Two highly polymorphic portions of the mtDNA control region, comprising 650 nucleotides, were amplified by the polymerase chain reaction (PCR) and directly sequenced. The 39 maternal lineages were defined by two or three generation families identified by the Mapuches. These 39 lineages included 19 different mtDNA sequences that could be grouped into four classes. The same classes of sequences appear in other Amerinds from North, Central, and South American populations separated by thousands of miles, suggesting that the origin of the mtDNA patterns predates the migration to the Americas. The mtDNA sequence similarity between Amerind populations suggests that the migration throughout the Americas occurred rapidly relative to the mtDNA mutation rate. HLA DRB1 alleles 1602 and 1402 were frequent among the Mapuches. These alleles also occur at high frequency among other Amerinds in North and South America, but not among Spanish, Chinese or African-American populations. The high frequency of these alleles throughout the Americas, and their specificity to the Americas, supports the hypothesis that Mapuches and other Amerind groups are closely related.(ABSTRACT TRUNCATED AT 250 WORDS)
An unsupervised classification scheme for improving predictions of prokaryotic TIS.
Tech, Maike; Meinicke, Peter
2006-03-09
Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes. We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from E. coli and B. subtilis. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on P. aeruginosa, B. pseudomallei and R. solanacearum. On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool "TICO" (TIs COrrector) which is publicly available from our web site.
Schoening, Timm; Bergmann, Melanie; Ontrup, Jörg; Taylor, James; Dannheim, Jennifer; Gutt, Julian; Purser, Autun; Nattkemper, Tim W
2012-01-01
Megafauna play an important role in benthic ecosystem function and are sensitive indicators of environmental change. Non-invasive monitoring of benthic communities can be accomplished by seafloor imaging. However, manual quantification of megafauna in images is labor-intensive and therefore, this organism size class is often neglected in ecosystem studies. Automated image analysis has been proposed as a possible approach to such analysis, but the heterogeneity of megafaunal communities poses a non-trivial challenge for such automated techniques. Here, the potential of a generalized object detection architecture, referred to as iSIS (intelligent Screening of underwater Image Sequences), for the quantification of a heterogenous group of megafauna taxa is investigated. The iSIS system is tuned for a particular image sequence (i.e. a transect) using a small subset of the images, in which megafauna taxa positions were previously marked by an expert. To investigate the potential of iSIS and compare its results with those obtained from human experts, a group of eight different taxa from one camera transect of seafloor images taken at the Arctic deep-sea observatory HAUSGARTEN is used. The results show that inter- and intra-observer agreements of human experts exhibit considerable variation between the species, with a similar degree of variation apparent in the automatically derived results obtained by iSIS. Whilst some taxa (e. g. Bathycrinus stalks, Kolga hyalina, small white sea anemone) were well detected by iSIS (i. e. overall Sensitivity: 87%, overall Positive Predictive Value: 67%), some taxa such as the small sea cucumber Elpidia heckeri remain challenging, for both human observers and iSIS.
Conard, Nicholas J.; Will, Manuel
2015-01-01
Sibudu in KwaZulu-Natal (South Africa) with its rich and high-resolution archaeological sequence provides an ideal case study to examine the causes and consequences of short-term variation in the behavior of modern humans during the Middle Stone Age (MSA). We present the results from a technological analysis of 11 stratified lithic assemblages which overlie the Howiesons Poort deposits and all date to ~58 ka. Based on technological and typological attributes, we conducted inter-assemblage comparisons to characterize the nature and tempo of cultural change in successive occupations. This work identified considerable short-term variation with clear temporal trends throughout the sequence, demonstrating that knappers at Sibudu varied their technology over short time spans. The lithic assemblages can be grouped into three cohesive units which differ from each other in the procurement of raw materials, the frequency in the methods of core reduction, the kind of blanks produced, and in the nature of tools the inhabitants of Sibudu made and used. These groups of assemblages represent different strategies of lithic technology, which build upon each other in a gradual, cumulative manner. We also identify a clear pattern of development toward what we have previously defined as the Sibudan cultural taxonomic unit. Contextualizing these results on larger geographical scales shows that the later phase of the MSA during MIS 3 in KwaZulu-Natal and southern Africa is one of dynamic cultural change rather than of stasis or stagnation as has at times been claimed. In combination with environmental, subsistence and contextual information, our high-resolution data on lithic technology suggest that short-term behavioral variability at Sibudu can be best explained by changes in technological organization and socio-economic dynamics instead of environmental forcing. PMID:26098694
Conard, Nicholas J; Will, Manuel
2015-01-01
Sibudu in KwaZulu-Natal (South Africa) with its rich and high-resolution archaeological sequence provides an ideal case study to examine the causes and consequences of short-term variation in the behavior of modern humans during the Middle Stone Age (MSA). We present the results from a technological analysis of 11 stratified lithic assemblages which overlie the Howiesons Poort deposits and all date to ~58 ka. Based on technological and typological attributes, we conducted inter-assemblage comparisons to characterize the nature and tempo of cultural change in successive occupations. This work identified considerable short-term variation with clear temporal trends throughout the sequence, demonstrating that knappers at Sibudu varied their technology over short time spans. The lithic assemblages can be grouped into three cohesive units which differ from each other in the procurement of raw materials, the frequency in the methods of core reduction, the kind of blanks produced, and in the nature of tools the inhabitants of Sibudu made and used. These groups of assemblages represent different strategies of lithic technology, which build upon each other in a gradual, cumulative manner. We also identify a clear pattern of development toward what we have previously defined as the Sibudan cultural taxonomic unit. Contextualizing these results on larger geographical scales shows that the later phase of the MSA during MIS 3 in KwaZulu-Natal and southern Africa is one of dynamic cultural change rather than of stasis or stagnation as has at times been claimed. In combination with environmental, subsistence and contextual information, our high-resolution data on lithic technology suggest that short-term behavioral variability at Sibudu can be best explained by changes in technological organization and socio-economic dynamics instead of environmental forcing.
Firrao, Giuseppe; Torelli, Emanuela; Polano, Cesare; Ferrante, Patrizia; Ferrini, Francesca; Martini, Marta; Marcelletti, Simone; Scortichini, Marco; Ermacora, Paolo
2018-01-01
Pseudomonas syringae pv. actinidiae (Psa) biovar 3 caused pandemic bacterial canker of Actinidia chinensis and Actinidia deliciosa since 2008. In Europe, the disease spread rapidly in the kiwifruit cultivation areas from a single introduction. In this study, we investigated the genomic diversity of Psa biovar 3 strains during the primary clonal expansion in Europe using single molecule real-time (SMRT), Illumina and Sanger sequencing technologies. We recorded evidences of frequent mobilization and loss of transposon Tn6212, large chromosome inversions, and ectopic integration of IS sequences (remarkably ISPsy31, ISPsy36, and ISPsy37). While no phenotype change associated with Tn6212 mobilization could be detected, strains CRAFRU 12.29 and CRAFRU 12.50 did not elicit the hypersensitivity response (HR) on tobacco and eggplant leaves and were limited in their growth in kiwifruit leaves due to insertion of ISPsy31 and ISPsy36 in the hrpS and hrpR genes, respectively, interrupting the hrp cluster. Both strains had been isolated from symptomatic plants, suggesting coexistence of variant strains with reduced virulence together with virulent strains in mixed populations. The structural differences caused by rearrangements of self-genetic elements within European and New Zealand strains were comparable in number and type to those occurring among the European strains, in contrast with the significant difference in terms of nucleotide polymorphisms. We hypothesize a relaxation, during clonal expansion, of the selection limiting the accumulation of deleterious mutations associated with genome structural variation due to transposition of mobile elements. This consideration may be relevant when evaluating strategies to be adopted for epidemics management. PMID:29675009
Seismicity and source spectra analysis in Salton Sea Geothermal Field
NASA Astrophysics Data System (ADS)
Cheng, Y.; Chen, X.
2016-12-01
The surge of "man-made" earthquakes in recent years has led to considerable concerns about the associated hazards. Improved monitoring of small earthquakes would significantly help understand such phenomena and the underlying physical mechanisms. In the Salton Sea Geothermal field in southern California, open access of a local borehole network provides a unique opportunity to better understand the seismicity characteristics, the related earthquake hazards, and the relationship with the geothermal system, tectonic faulting and other physical conditions. We obtain high-resolution earthquake locations in the Salton Sea Geothermal Field, analyze characteristics of spatiotemporal isolated earthquake clusters, magnitude-frequency distributions and spatial variation of stress drops. The analysis reveals spatial coherent distributions of different types of clustering, b-value distributions, and stress drop distribution. The mixture type clusters (short-duration rapid bursts with high aftershock productivity) are predominately located within active geothermal field that correlate with high b-value, low stress drop microearthquake clouds, while regular aftershock sequences and swarms are distributed throughout the study area. The differences between earthquakes inside and outside of geothermal operation field suggest a possible way to distinguish directly induced seismicity due to energy operation versus typical seismic slip driven sequences. The spatial coherent b-value distribution enables in-situ estimation of probabilities for M≥3 earthquakes, and shows that the high large-magnitude-event (LME) probability zones with high stress drop are likely associated with tectonic faulting. The high stress drop in shallow (1-3 km) depth indicates the existence of active faults, while low stress drops near injection wells likely corresponds to the seismic response to fluid injection. I interpret the spatial variation of seismicity and source characteristics as the result of fluid circulation, the fracture network, and tectonic faulting.
Schoening, Timm; Bergmann, Melanie; Ontrup, Jörg; Taylor, James; Dannheim, Jennifer; Gutt, Julian; Purser, Autun; Nattkemper, Tim W.
2012-01-01
Megafauna play an important role in benthic ecosystem function and are sensitive indicators of environmental change. Non-invasive monitoring of benthic communities can be accomplished by seafloor imaging. However, manual quantification of megafauna in images is labor-intensive and therefore, this organism size class is often neglected in ecosystem studies. Automated image analysis has been proposed as a possible approach to such analysis, but the heterogeneity of megafaunal communities poses a non-trivial challenge for such automated techniques. Here, the potential of a generalized object detection architecture, referred to as iSIS (intelligent Screening of underwater Image Sequences), for the quantification of a heterogenous group of megafauna taxa is investigated. The iSIS system is tuned for a particular image sequence (i.e. a transect) using a small subset of the images, in which megafauna taxa positions were previously marked by an expert. To investigate the potential of iSIS and compare its results with those obtained from human experts, a group of eight different taxa from one camera transect of seafloor images taken at the Arctic deep-sea observatory HAUSGARTEN is used. The results show that inter- and intra-observer agreements of human experts exhibit considerable variation between the species, with a similar degree of variation apparent in the automatically derived results obtained by iSIS. Whilst some taxa (e. g. Bathycrinus stalks, Kolga hyalina, small white sea anemone) were well detected by iSIS (i. e. overall Sensitivity: 87%, overall Positive Predictive Value: 67%), some taxa such as the small sea cucumber Elpidia heckeri remain challenging, for both human observers and iSIS. PMID:22719868
Diversity of the P2 protein among nontypeable Haemophilus influenzae isolates.
Bell, J; Grass, S; Jeanteur, D; Munson, R S
1994-01-01
The genes for outer membrane protein P2 of four nontypeable Haemophilus influenzae strains were cloned and sequenced. The derived amino acid sequences were compared with the outer membrane protein P2 sequence from H. influenzae type b MinnA and the sequences of P2 from three additional nontypeable H. influenzae strains. The sequences were 76 to 94% identical. The sequences had regions with considerable variability separated by regions which were highly conserved. The variable regions mapped to putative surface-exposed loops of the protein. PMID:8188390
Sampathkumar, Raghavan; Shadabi, Elnaz; Luo, Ma
2012-01-01
As of February 2012, 50 circulating recombinant forms (CRFs) have been reported for HIV-1 while one CRF for HIV-2. Also according to HIV sequence compendium 2011, the HIV sequence database is replete with 414,398 sequences. The fact that there are CRFs, which are an amalgamation of sequences derived from six or more subtypes (CRF27_cpx (cpx refers to complex) is a mosaic with sequences from 6 different subtypes besides an unclassified fragment), serves as a testimony to the continual divergent evolution of the virus with its approximate 1% per year rate of evolution, and this phenomena per se poses tremendous challenge for vaccine development against HIV/AIDS, a devastating disease that has killed 1.8 million patients in 2010. Here, we explore the interaction between HIV-1 and host genetic variation in the context of HIV/AIDS and antiretroviral therapy response. PMID:22666249
Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko
1998-01-01
By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600
A survey of tools for variant analysis of next-generation genome sequencing data
Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes
2014-01-01
Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494
Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.
Hashemi, Abolfazl; Zhu, Banghua; Vikalo, Haris
2018-03-21
Haplotype assembly is the task of reconstructing haplotypes of an individual from a mixture of sequenced chromosome fragments. Haplotype information enables studies of the effects of genetic variations on an organism's phenotype. Most of the mathematical formulations of haplotype assembly are known to be NP-hard and haplotype assembly becomes even more challenging as the sequencing technology advances and the length of the paired-end reads and inserts increases. Assembly of haplotypes polyploid organisms is considerably more difficult than in the case of diploids. Hence, scalable and accurate schemes with provable performance are desired for haplotype assembly of both diploid and polyploid organisms. We propose a framework that formulates haplotype assembly from sequencing data as a sparse tensor decomposition. We cast the problem as that of decomposing a tensor having special structural constraints and missing a large fraction of its entries into a product of two factors, U and [Formula: see text]; tensor [Formula: see text] reveals haplotype information while U is a sparse matrix encoding the origin of erroneous sequencing reads. An algorithm, AltHap, which reconstructs haplotypes of either diploid or polyploid organisms by iteratively solving this decomposition problem is proposed. The performance and convergence properties of AltHap are theoretically analyzed and, in doing so, guarantees on the achievable minimum error correction scores and correct phasing rate are established. The developed framework is applicable to diploid, biallelic and polyallelic polyploid species. The code for AltHap is freely available from https://github.com/realabolfazl/AltHap . AltHap was tested in a number of different scenarios and was shown to compare favorably to state-of-the-art methods in applications to haplotype assembly of diploids, and significantly outperforms existing techniques when applied to haplotype assembly of polyploids.
Exome Sequencing in the Clinical Diagnosis of Sporadic or Familial Cerebellar Ataxia
Fogel, Brent L.; Lee, Hane; Deignan, Joshua L.; Strom, Samuel P.; Kantarci, Sibel; Wang, Xizhe; Quintero-Rivera, Fabiola; Vilain, Eric; Grody, Wayne W.; Perlman, Susan; Geschwind, Daniel H.; Nelson, Stanley F.
2015-01-01
IMPORTANCE Cerebellar ataxias are a diverse collection of neurologic disorders with causes ranging from common acquired etiologies to rare genetic conditions. Numerous genetic disorders have been associated with chronic progressive ataxia and this consequently presents a diagnostic challenge for the clinician regarding how to approach and prioritize genetic testing in patients with such clinically heterogeneous phenotypes. Additionally, while the value of genetic testing in early-onset and/or familial cases seems clear, many patients with ataxia present sporadically with adult onset of symptoms and the contribution of genetic variation to the phenotype of these patients has not yet been established. OBJECTIVE To investigate the contribution of genetic disease in a population of patients with predominantly adult- and sporadic-onset cerebellar ataxia. DESIGN, SETTING, AND PARTICIPANTS We examined a consecutive series of 76 patients presenting to a tertiary referral center for evaluation of chronic progressive cerebellar ataxia. MAIN OUTCOMES AND MEASURES Next-generation exome sequencing coupled with comprehensive bioinformatic analysis, phenotypic analysis, and clinical correlation. RESULTS We identified clinically relevant genetic information in more than 60% of patients studied (n = 46), including diagnostic pathogenic gene variants in 21% (n = 16), a notable yield given the diverse genetics and clinical heterogeneity of the cerebellar ataxias. CONCLUSIONS AND RELEVANCE This study demonstrated that clinical exome sequencing in patients with adult-onset and sporadic presentations of ataxia is a high-yield test, providing a definitive diagnosis in more than one-fifth of patients and suggesting a potential diagnosis in more than one-third to guide additional phenotyping and diagnostic evaluation. Therefore, clinical exome sequencing is an appropriate consideration in the routine genetic evaluation of all patients presenting with chronic progressive cerebellar ataxia. PMID:25133958
Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A
Salzberg, Steven L; Sommer, Daniel D; Schatz, Michael C; Phillippy, Adam M; Rabinowicz, Pablo D; Tsuge, Seiji; Furutani, Ayako; Ochiai, Hirokazu; Delcher, Arthur L; Kelley, David; Madupu, Ramana; Puiu, Daniela; Radune, Diana; Shumway, Martin; Trapnell, Cole; Aparna, Gudlur; Jha, Gopaljee; Pandey, Alok; Patil, Prabhu B; Ishihara, Hiromichi; Meyer, Damien F; Szurek, Boris; Verdier, Valerie; Koebnik, Ralf; Dow, J Maxwell; Ryan, Robert P; Hirata, Hisae; Tsuyumu, Shinji; Won Lee, Sang; Ronald, Pamela C; Sonti, Ramesh V; Van Sluys, Marie-Anne; Leach, Jan E; White, Frank F; Bogdanove, Adam J
2008-01-01
Background Xanthomonas oryzae pv. oryzae causes bacterial blight of rice (Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We report here on the complete genome sequence of strain PXO99A and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another. Results The PXO99A genome is a single circular chromosome of 5,240,075 bp, considerably longer than the genomes of the other strains (4,941,439 bp and 4,940,217 bp, respectively), and it contains 5083 protein-coding genes, including 87 not found in KACC10331 or MAFF311018. PXO99A contains a greater number of virulence-associated transcription activator-like effector genes and has at least ten major chromosomal rearrangements relative to KACC10331 and MAFF311018. PXO99A contains numerous copies of diverse insertion sequence elements, members of which are associated with 7 out of 10 of the major rearrangements. A rapidly-evolving CRISPR (clustered regularly interspersed short palindromic repeats) region contains evidence of dozens of phage infections unique to the PXO99A lineage. PXO99A also contains a unique, near-perfect tandem repeat of 212 kilobases close to the replication terminus. Conclusion Our results provide striking evidence of genome plasticity and rapid evolution within Xanthomonas oryzae pv. oryzae. The comparisons point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen that help to explain the extraordinary diversity of Xanthomonas oryzae pv. oryzae genotypes and races that have been isolated from around the world. PMID:18452608
Humble, E; Martinez-Barrio, A; Forcada, J; Trathan, P N; Thorne, M A S; Hoffmann, M; Wolf, J B W; Hoffman, J I
2016-07-01
Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41 Gb; scaffold/contig N50 : 3.1 Mb/27.5 kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, reanalysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modelling. © 2015 John Wiley & Sons Ltd.
Mitochondrial Mutations in Subjects with Psychiatric Disorders
Magnan, Christophe; van Oven, Mannis; Baldi, Pierre; Myers, Richard M.; Barchas, Jack D.; Schatzberg, Alan F.; Watson, Stanley J.; Akil, Huda; Bunney, William E.; Vawter, Marquis P.
2015-01-01
A considerable body of evidence supports the role of mitochondrial dysfunction in psychiatric disorders and mitochondrial DNA (mtDNA) mutations are known to alter brain energy metabolism, neurotransmission, and cause neurodegenerative disorders. Genetic studies focusing on common nuclear genome variants associated with these disorders have produced genome wide significant results but those studies have not directly studied mtDNA variants. The purpose of this study is to investigate, using next generation sequencing, the involvement of mtDNA variation in bipolar disorder, schizophrenia, major depressive disorder, and methamphetamine use. MtDNA extracted from multiple brain regions and blood were sequenced (121 mtDNA samples with an average of 8,800x coverage) and compared to an electronic database containing 26,850 mtDNA genomes. We confirmed novel and rare variants, and confirmed next generation sequencing error hotspots by traditional sequencing and genotyping methods. We observed a significant increase of non-synonymous mutations found in individuals with schizophrenia. Novel and rare non-synonymous mutations were found in psychiatric cases in mtDNA genes: ND6, ATP6, CYTB, and ND2. We also observed mtDNA heteroplasmy in brain at a locus previously associated with schizophrenia (T16519C). Large differences in heteroplasmy levels across brain regions within subjects suggest that somatic mutations accumulate differentially in brain regions. Finally, multiplasmy, a heteroplasmic measure of repeat length, was observed in brain from selective cases at a higher frequency than controls. These results offer support for increased rates of mtDNA substitutions in schizophrenia shown in our prior results. The variable levels of heteroplasmic/multiplasmic somatic mutations that occur in brain may be indicators of genetic instability in mtDNA. PMID:26011537
Lal, Dennis; Neubauer, Bernd A.; Toliat, Mohammad R.; Altmüller, Janine; Thiele, Holger; Nürnberg, Peter; Kamrath, Clemens; Schänzer, Anne; Sander, Thomas; Hahn, Andreas; Nothnagel, Michael
2016-01-01
Massively parallel sequencing of whole genomes and exomes has facilitated a direct assessment of causative genetic variation, now enabling the identification of genetic factors involved in rare diseases (RD) with Mendelian inheritance patterns on an almost routine basis. Here, we describe the illustrative case of a single consanguineous family where this strategy suffered from the difficulty to distinguish between two etiologically distinct disorders, namely the co-occurrence of hereditary hypophosphatemic rickets (HRR) and congenital myopathies (CM), by their phenotypic manifestation alone. We used parametric linkage analysis, homozygosity mapping and whole exome-sequencing to identify mutations underlying HRR and CM. We also present an approximate approach for assessing the probability of co-occurrence of two unlinked recessive RD in a single family as a function of the degree of consanguinity and the frequency of the disease-causing alleles. Linkage analysis and homozygosity mapping yielded elusive results when assuming a single RD, but whole-exome sequencing helped to identify two mutations in two genes, namely SLC34A3 and SEPN1, that segregated independently in this family and that have previously been linked to two etiologically different diseases. We assess the increase in chance co-occurrence of rare diseases due to consanguinity, i.e. under circumstances that generally favor linkage mapping of recessive disease, and show that this probability can increase by several orders of magnitudes. We conclude that such potential co-occurrence represents an underestimated risk when analyzing rare or undefined diseases in consanguineous families and should be given more consideration in the clinical and genetic evaluation. PMID:26789268
Ku, Chuan; Chung, Wan-Chia; Chen, Ling-Ling; Kuo, Chih-Horng
2013-01-01
The Madagascar periwinkle ( Catharanthus roseus in the family Apocynaceae) is an important medicinal plant and is the source of several widely marketed chemotherapeutic drugs. It is also commonly grown for its ornamental values and, due to ease of infection and distinctiveness of symptoms, is often used as the host for studies on phytoplasmas, an important group of uncultivated plant pathogens. To gain insights into the characteristics of apocynaceous plastid genomes (plastomes), we used a reference-assisted approach to assemble the complete plastome of C . roseus , which could be applied to other C . roseus -related studies. The C . roseus plastome is the second completely sequenced plastome in the asterid order Gentianales. We performed comparative analyses with two other representative sequences in the same order, including the complete plastome of Coffea arabica (from the basal Gentianales family Rubiaceae) and the nearly complete plastome of Asclepias syriaca (Apocynaceae). The results demonstrated considerable variations in gene content and plastome organization within Apocynaceae, including the presence/absence of three essential genes (i.e., accD, clpP, and ycf1) and large size changes in non-coding regions (e.g., rps2-rpoC2 and IRb-ndhF). To find plastome markers of potential utility for Catharanthus breeding and phylogenetic analyses, we identified 41 C . roseus -specific simple sequence repeats. Furthermore, five intergenic regions with high divergence between C . roseus and three other euasterids I taxa were identified as candidate markers. To resolve the euasterids I interordinal relationships, 82 plastome genes were used for phylogenetic inference. With the addition of representatives from Apocynaceae and sampling of most other asterid orders, a sister relationship between Gentianales and Solanales is supported. PMID:23825699
Miller, Hilary C; O'Meally, Denis; Ezaz, Tariq; Amemiya, Chris; Marshall-Graves, Jennifer A; Edwards, Scott
2015-05-07
Major histocompatibility complex (MHC) genes are a central component of the vertebrate immune system and usually exist in a single genomic region. However, considerable differences in MHC organization and size exist between different vertebrate lineages. Reptiles occupy a key evolutionary position for understanding how variation in MHC structure evolved in vertebrates, but information on the structure of the MHC region in reptiles is limited. In this study, we investigate the organization and cytogenetic location of MHC genes in the tuatara (Sphenodon punctatus), the sole extant representative of the early-diverging reptilian order Rhynchocephalia. Sequencing and mapping of 12 clones containing class I and II MHC genes from a bacterial artificial chromosome library indicated that the core MHC region is located on chromosome 13q. However, duplication and translocation of MHC genes outside of the core region was evident, because additional class I MHC genes were located on chromosome 4p. We found a total of seven class I sequences and 11 class II β sequences, with evidence for duplication and pseudogenization of genes within the tuatara lineage. The tuatara MHC is characterized by high repeat content and low gene density compared with other species and we found no antigen processing or MHC framework genes on the MHC gene-containing clones. Our findings indicate substantial differences in MHC organization in tuatara compared with mammalian and avian MHCs and highlight the dynamic nature of the MHC. Further sequencing and annotation of tuatara and other reptile MHCs will determine if the tuatara MHC is representative of nonavian reptiles in general. Copyright © 2015 Miller et al.
Allozyme variation in spineless Pejibaye (Bactris gasipaes Kunth)
USDA-ARS?s Scientific Manuscript database
Isozyme variation was studied in 161 accessions of pineapple including four species of Ananas and one of Pseudananas. Six enzyme systems (ADH, GPI, PGM, SKDH, TPI, UGPP) involving seven putative loci revealed 35 electromorphs . Considerable variation exists within and between species of Ananas. Sixt...
Survey Shows Variation in Ph.D. Methods Training.
ERIC Educational Resources Information Center
Steeves, Leslie; And Others
1983-01-01
Reports on a 1982 survey of journalism graduate studies indicating considerable variation in research methods requirements and emphases in 23 universities offering doctoral degrees in mass communication. (HOD)
Bavykin, Sergei G.; Mirzabekova, legal representative, Natalia V.; Mirzabekov, deceased, Andrei D.
2007-12-04
The present invention relates to methods and compositions for using nucleotide sequence variations of 16S and 23S rRNA within the B. cereus group to discriminate a highly infectious bacterium B. anthracis from closely related microorganisms. Sequence variations in the 16S and 23S rRNA of the B. cereus subgroup including B. anthracis are utilized to construct an array that can detect these sequence variations through selective hybridizations and discriminate B. cereus group that includes B. anthracis. Discrimination of single base differences in rRNA was achieved with a microchip during analysis of B. cereus group isolates from both single and in mixed samples, as well as identification of polymorphic sites. Successful use of a microchip to determine the appropriate subgroup classification using eight reference microorganisms from the B. cereus group as a study set, was demonstrated.
Bavykin, Sergei G.; Mirzabekov, Andrei D.
2007-10-30
The present invention is directed to a novel method of discriminating a highly infectious bacterium Bacillus anthracis from a group of closely related microorganisms. Sequence variations in the 16S and 23S rRNA of the B. cereus subgroup including B. anthracis are utilized to construct an array that can detect these sequence variations through selective hybridizations. The identification and analysis of these sequence variations enables positive discrimination of isolates of the B. cereus group that includes B. anthracis. Discrimination of single base differences in rRNA was achieved with a microchip during analysis of B. cereus group isolates from both single and in mixed probes, as well as identification of polymorphic sites. Successful use of a microchip to determine the appropriate subgroup classification using eight reference microorganisms from the B. cereus group as a study set, was demonstrated.
Babak, Tomas; Garrett-Engele, Philip; Armour, Christopher D; Raymond, Christopher K; Keller, Mark P; Chen, Ronghua; Rohl, Carol A; Johnson, Jason M; Attie, Alan D; Fraser, Hunter B; Schadt, Eric E
2010-08-13
Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing.
Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L
2015-01-15
The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. Copyright © 2014 Elsevier Inc. All rights reserved.
Dissecting the relationship between protein structure and sequence variation
NASA Astrophysics Data System (ADS)
Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team
2015-03-01
Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.
The African Genome Variation Project shapes medical genetics in Africa
NASA Astrophysics Data System (ADS)
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.
2015-01-01
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
The African Genome Variation Project shapes medical genetics in Africa.
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S
2015-01-15
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Hartl, Daniel L.
2008-01-01
Simple models of molecular evolution assume that sequences evolve by a Poisson process in which nucleotide or amino acid substitutions occur as rare independent events. In these models, the expected ratio of the variance to the mean of substitution counts equals 1, and substitution processes with a ratio greater than 1 are called overdispersed. Comparing the genomes of 10 closely related species of Drosophila, we extend earlier evidence for overdispersion in amino acid replacements as well as in four-fold synonymous substitutions. The observed deviation from the Poisson expectation can be described as a linear function of the rate at which substitutions occur on a phylogeny, which implies that deviations from the Poisson expectation arise from gene-specific temporal variation in substitution rates. Amino acid sequences show greater temporal variation in substitution rates than do four-fold synonymous sequences. Our findings provide a general phenomenological framework for understanding overdispersion in the molecular clock. Also, the presence of substantial variation in gene-specific substitution rates has broad implications for work in phylogeny reconstruction and evolutionary rate estimation. PMID:18480070
First known EL5 chondrite - Evidence for dual genetic sequence for enstatite chondrites
NASA Technical Reports Server (NTRS)
Sears, D. W. G.; Weeks, K. S.; Rubin, A. E.
1984-01-01
The compositionally distinct EH and EL groups together with four (3-6) petrologic types which constitute the enstatite chondrites represent increasing degrees of metamorphic alteration. Although bulk composition variations preclude a simple conversion of EH4 into EL6 material, complex models which involve simultaneous bulk composition and petrologic type variations may be implied by other classification schemes in common use. Attention is presently given to the discovery of the first EL5 chondrite, which breaks the EH3,4-EH5-EL6 sequence and indicates that the enstatite chondrites constitute the two discrete, isochemical metamorphic sequences EH3-5 and EL5-6.
van der Ley, P
1988-11-01
Gonococci express a family of related outer membrane proteins designated protein II (P.II). These surface proteins are subject to both phase variation and antigenic variation. The P.II gene repertoire of Neisseria gonorrhoeae strain JS3 was found to consist of at least ten genes, eight of which were cloned. Sequence analysis and DNA hybridization studies revealed that one particular P.II-encoding sequence is present in three distinct, but almost identical, copies in the JS3 genome. These genes encode the P.II protein that was previously identified as P.IIc. Comparison of their sequences shows that the multiple copies of this P.IIc-encoding gene might have been generated by both gene conversion and gene duplication.
Lineages of Streptococcus equi ssp. equi in the Irish equine industry.
Moloney, Emma; Kavanagh, Kerrie S; Buckley, Tom C; Cooney, Jakki C
2013-01-01
Streptococcus equi ssp. equi is the causative agent of 'Strangles' in horses. This is a debilitating condition leading to economic loss, yard closures and cancellation of equestrian events. There are multiple genotypes of S. equi ssp. equi which can cause disease, but to date there has been no systematic study of strains which are prevalent in Ireland. This study identified and classified Streptococcus equi ssp. equi strains isolated from within the Irish equine industry. Two hundred veterinary isolates were subjected to SLST (single locus sequence typing) based on an internal sequence from the seM gene of Streptococcus equi ssp equi. Of the 171 samples which successfully gave an amplicon, 162 samples (137 Irish and 24 UK strains) gave robust DNA sequence information. Analysis of the sequences allowed division of the isolates into 19 groups, 13 of which contain at least 2 isolates and 6 groups containing single isolates. There were 19 positions where a DNA SNP (single nucleotide polymorphism) occurs, and one 3 bp insertion. All groups had multiple (2-8) SNPs. Of the SNPs 17 would result in an amino acid change in the encoded protein. Interestingly, the single isolate EI8, which has 6 SNPs, has the three base pair insertion which is not seen in any other isolate, this would result in the insertion of an Ile residue at position 62 in that protein sequence. Comparison of the relevant region in the determined sequences with the UK Streptococcus equi seM MLST database showed that Group B (15 isolates) and Group I (2 isolates), as well as the individual isolates EI3 and EI8, are unique to Ireland, and some groups are most likely of UK origin (Groups F and M), but many more probably passed back and forth between the two countries. The strains occurring in Ireland are not clonal and there is a considerable degree of sequence variation seen in the seM gene. There are two major clades causing infection in Ireland and these strains are also common in the UK.
Ortí, G; Meyer, A
1996-04-01
The rate and pattern of DNA evolution of ependymin, a single-copy gene coding for a highly expressed glycoprotein in the brain matrix of teleost fishes, is characterized and its phylogenetic utility for fish systematics is assessed. DNA sequences were determined from catfish, electric fish, and characiforms and compared with published ependymin sequences from cyprinids, salmon, pike, and herring. Among these groups, ependymin amino acid sequences were highly divergent (up to 60% sequence difference), but had surprisingly similar hydropathy profiles and invariant glycosylation sites, suggesting that functional properties of the proteins are conserved. Comparison of base composition at third codon positions and introns revealed AT-rich introns and GC-rich third codon positions, suggesting that the biased codon usage observed might not be due to mutational bias. Phylogenetic information content of third codon positions was surprisingly high and sufficient to recover the most basal nodes of the tree, in spite of the observation that pairwise distances (at third codon positions) were well above the presumed saturation level. This finding can be explained by the high proportion of phylogenetically informative nonsynonymous changes at third codon positions among these highly divergent proteins. Ependymin DNA sequences have established the first molecular evidence for the monophyly of a group containing salmonids and esociforms. In addition, ependymin suggests a sister group relationship of electric fish (Gymnotiformes) and Characiformes, constituting a significant departure from currently accepted classifications. However, relationships among characiform lineages were not completely resolved by ependymin sequences in spite of seemingly appropriate levels of variation among taxa and considerably low levels of homoplasy in the data (consistency index = 0.7). If the diversification of Characiformes took place in an "explosive" manner, over a relatively short period of time this pattern should also be observed using other phylogenetic markers. Poor conservation of ependymin's primary structure hinders the design of efficient primers for PCR that could be used in wide-ranging fish systematic studies. However, alternative methods like PCR amplification from cDNA used here should provide promising comparative sequence data for the resolution of phylogenetic relationships among other basal lineages of teleost fishes.
Lineages of Streptococcus equi ssp. equi in the Irish equine industry
2013-01-01
Background Streptococcus equi ssp. equi is the causative agent of ‘Strangles’ in horses. This is a debilitating condition leading to economic loss, yard closures and cancellation of equestrian events. There are multiple genotypes of S. equi ssp. equi which can cause disease, but to date there has been no systematic study of strains which are prevalent in Ireland. This study identified and classified Streptococcus equi ssp. equi strains isolated from within the Irish equine industry. Results Two hundred veterinary isolates were subjected to SLST (single locus sequence typing) based on an internal sequence from the seM gene of Streptococcus equi ssp equi. Of the 171 samples which successfully gave an amplicon, 162 samples (137 Irish and 24 UK strains) gave robust DNA sequence information. Analysis of the sequences allowed division of the isolates into 19 groups, 13 of which contain at least 2 isolates and 6 groups containing single isolates. There were 19 positions where a DNA SNP (single nucleotide polymorphism) occurs, and one 3 bp insertion. All groups had multiple (2–8) SNPs. Of the SNPs 17 would result in an amino acid change in the encoded protein. Interestingly, the single isolate EI8, which has 6 SNPs, has the three base pair insertion which is not seen in any other isolate, this would result in the insertion of an Ile residue at position 62 in that protein sequence. Comparison of the relevant region in the determined sequences with the UK Streptococcus equi seM MLST database showed that Group B (15 isolates) and Group I (2 isolates), as well as the individual isolates EI3 and EI8, are unique to Ireland, and some groups are most likely of UK origin (Groups F and M), but many more probably passed back and forth between the two countries. Conclusions The strains occurring in Ireland are not clonal and there is a considerable degree of sequence variation seen in the seM gene. There are two major clades causing infection in Ireland and these strains are also common in the UK. PMID:23731628
Guo, Guo-Ye; Chen, Fang; Shi, Xiao-Dong; Tian, Yin-Shuai; Yu, Mao-Qun; Han, Xue-Qin; Yuan, Li-Chun; Zhang, Ying
2016-01-01
Genetic variation and phylogenetic relationships among 102 Jatropha curcas accessions from Asia, Africa, and the Americas were assessed using the internal transcribed spacer region of nuclear ribosomal DNA (nrDNA ITS). The average G+C content (65.04%) was considerably higher than the A+T (34.96%) content. The estimated genetic diversity revealed moderate genetic variation. The pairwise genetic divergences (GD) between haplotypes were evaluated and ranged from 0.000 to 0.017, suggesting a higher level of genetic differentiation in Mexican accessions than those of other regions. Phylogenetic relationships and intraspecific divergence were inferred by Bayesian inference (BI), maximum parsimony (MP), and median joining (MJ) network analysis and were generally resolved. The J. curcas accessions were consistently divided into three lineages, groups A, B, and C, which demonstrated distant geographical isolation and genetic divergence between American accessions and those from other regions. The MJ network analysis confirmed that Central America was the possible center of origin. The putative migration route suggested that J. curcas was distributed from Mexico or Brazil, via Cape Verde and then split into two routes. One route was dispersed to Spain, then migrated to China, eventually spreading to southeastern Asia, while the other route was dispersed to Africa, via Madagascar and migrated to China, later spreading to southeastern Asia. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Phylogeny of Fomitopsis pinicola: A species complex
John Haight; Gary A. Laursen; Jessie A. Glaeser; D. Lee Taylor
2016-01-01
Fungal species with a broad distribution may exhibit considerable genetic variation over their geographic ranges. Variation may develop among populations based on geographic isolation, lack of migration, and genetic drift, though this genetic variation may not always be evident when examining phenotypic characters. Fomitopsis pinicola is an...
TUMOR HAPLOTYPE ASSEMBLY ALGORITHMS FOR CANCER GENOMICS
AGUIAR, DEREK; WONG, WENDY S.W.; ISTRAIL, SORIN
2014-01-01
The growing availability of inexpensive high-throughput sequence data is enabling researchers to sequence tumor populations within a single individual at high coverage. But, cancer genome sequence evolution and mutational phenomena like driver mutations and gene fusions are difficult to investigate without first reconstructing tumor haplotype sequences. Haplotype assembly of single individual tumor populations is an exceedingly difficult task complicated by tumor haplotype heterogeneity, tumor or normal cell sequence contamination, polyploidy, and complex patterns of variation. While computational and experimental haplotype phasing of diploid genomes has seen much progress in recent years, haplotype assembly in cancer genomes remains uncharted territory. In this work, we describe HapCompass-Tumor a computational modeling and algorithmic framework for haplotype assembly of copy number variable cancer genomes containing haplotypes at different frequencies and complex variation. We extend our polyploid haplotype assembly model and present novel algorithms for (1) complex variations, including copy number changes, as varying numbers of disjoint paths in an associated graph, (2) variable haplotype frequencies and contamination, and (3) computation of tumor haplotypes using simple cycles of the compass graph which constrain the space of haplotype assembly solutions. The model and algorithm are implemented in the software package HapCompass-Tumor which is available for download from http://www.brown.edu/Research/Istrail_Lab/. PMID:24297529
DOE Office of Scientific and Technical Information (OSTI.GOV)
Le Coq, Johanne; Ghosh, Partho
2012-06-19
Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein,more » TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.« less
Le Coq, Johanne; Ghosh, Partho
2011-01-01
Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 1020 potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation. PMID:21873231
Le Coq, Johanne; Ghosh, Partho
2011-08-30
Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10(20) potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.
Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore
2017-01-01
The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732
Relating Human Genetic Variation to Variation in Drug Responses
Madian, Ashraf G.; Wheeler, Heather E.; Jones, Richard Baker; Dolan, M. Eileen
2012-01-01
Although sequencing a single human genome was a monumental effort a decade ago, more than one thousand genomes have now been sequenced. The task ahead lies in transforming this information into personalized treatment strategies that are tailored to the unique genetics of each individual. One important aspect of personalized medicine is patient-to-patient variation in drug response. Pharmacogenomics addresses this issue by seeking to identify genetic contributors to human variation in drug efficacy and toxicity. Here, we present a summary of the current status of this field, which has evolved from studies of single candidate genes to comprehensive genome-wide analyses. Additionally, we discuss the major challenges in translating this knowledge into a systems-level understanding of drug physiology with the ultimate goal of developing more effective personalized clinical treatment strategies. PMID:22840197
Adaptation of Organisms by Resonance of RNA Transcription with the Cellular Redox Cycle
NASA Technical Reports Server (NTRS)
Stolc, Viktor
2012-01-01
Sequence variation in organisms differs across the genome and the majority of mutations are caused by oxidation, yet its origin is not fully understood. It has also been shown that the reduction-oxidation reaction cycle is the fundamental biochemical cycle that coordinates the timing of all biochemical processes in that cell, including energy production, DNA replication, and RNA transcription. It is shown that the temporal resonance of transcriptome biosynthesis with the oscillating binary state of the reduction-oxidation reaction cycle serves as a basis for non-random sequence variation at specific genome-wide coordinates that change faster than by accumulation of chance mutations. This work demonstrates evidence for a universal, persistent and iterative feedback mechanism between the environment and heredity, whereby acquired variation between cell divisions can outweigh inherited variation.
Sequences for Student Investigation
ERIC Educational Resources Information Center
Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette
2004-01-01
We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…
USDA-ARS?s Scientific Manuscript database
Simple sequence repeat (SSR) markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily,...
Complete Genome Sequences of 38 Gordonia sp. Bacteriophages
Montgomery, Matthew T.; Bonilla, J. Alfred; Dejong, Randall; Garlena, Rebecca A.; Guerrero Bustamante, Carlos; Klyczek, Karen K.; Russell, Daniel A.; Wertz, John T.; Jacobs-Sera, Deborah; Hatfull, Graham F.
2017-01-01
ABSTRACT We report here the genome sequences of 38 newly isolated bacteriophages using Gordonia terrae 3612 (ATCC 25594) and Gordonia neofelifaecis NRRL59395 as bacterial hosts. All of the phages are double-stranded DNA (dsDNA) tail phages with siphoviral morphologies, with genome sizes ranging from 17,118 bp to 93,843 bp and spanning considerable nucleotide sequence diversity. PMID:28057748
Schuller, Dorit; Pereira, Leonor; Alves, Hugo; Cambon, Brigitte; Dequin, Sylvie; Casal, Margarida
2007-08-01
One hundred isolates of the commercial Saccharomyces cerevisiae strain Zymaflore VL1 were recovered from spontaneous fermentations carried out with grapes collected from vineyards located close to wineries in the Vinho Verde wine region of Portugal. Isolates were differentiated based on their mitochondrial DNA restriction patterns and the evaluation of genetic polymorphisms was carried out by microsatellite analysis, interdelta sequence typing and pulsed-field gel electrophoresis (PFGE). Genetic patterns were compared to those obtained for 30 isolates of the original commercialized Zymaflore VL1 strain. Among the 100 recovered isolates we found a high percentage of chromosomal size variations, most evident for the smaller chromosomes III and VI. Complete loss of heterozygosity was observed for two isolates that had also lost chromosomal heteromorphism; their growth and fermentative capacity in a synthetic must medium was also affected. A considerably higher number of variant patterns for interdelta sequence amplifications was obtained for grape-derived strains compared to the original VL1 isolates. Our data show that the long-term presence of strain VL1 in natural grapevine environments induced genetic changes that can be detected using different fingerprinting methods. The observed genetic changes may reflect adaptive mechanisms to changed environmental conditions that yeast cells encounter during their existence in nature. (c) 2007 John Wiley & Sons, Ltd.
Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C
2016-01-01
Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.
Irizarry, Kristopher J. L.; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L.; Barrett, Gini; Barr, Margaret C.
2016-01-01
Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management. PMID:27376076
Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo
2015-08-24
Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.
2013-01-01
Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. Conclusions This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens. PMID:23497218
Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D
2013-03-07
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li's D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.
Asian affinities and continental radiation of the four founding Native American mtDNAs.
Torroni, A; Schurr, T G; Cabell, M F; Brown, M D; Neel, J V; Larsen, M; Smith, D G; Vullo, C M; Wallace, D C
1993-01-01
The mtDNA variation of 321 individuals from 17 Native American populations was examined by high-resolution restriction endonuclease analysis. All mtDNAs were amplified from a variety of sources by using PCR. The mtDNA of a subset of 38 of these individuals was also analyzed by D-loop sequencing. The resulting data were combined with previous mtDNA data from five other Native American tribes, as well as with data from a variety of Asian populations, and were used to deduce the phylogenetic relationships between mtDNAs and to estimate sequence divergences. This analysis revealed the presence of four haplotype groups (haplogroups A, B, C, and D) in the Amerind, but only one haplogroup (A) in the Na-Dene, and confirmed the independent origins of the Amerinds and the Na-Dene. Further, each haplogroup appeared to have been founded by a single mtDNA haplotype, a result which is consistent with a hypothesized founder effect. Most of the variation within haplogroups was tribal specific, that is, it occurred as tribal private polymorphisms. These observations suggest that the process of tribalization began early in the history of the Amerinds, with relatively little intertribal genetic exchange occurring subsequently. The sequencing of 341 nucleotides in the mtDNA D-loop revealed that the D-loop sequence variation correlated strongly with the four haplogroups defined by restriction analysis, and it indicated that the D-loop variation, like the haplotype variation, arose predominantly after the migration of the ancestral Amerinds across the Bering land bridge. Images Figure 4 PMID:7688932
NASA Astrophysics Data System (ADS)
Veglia, A. J.; Milford, C. R.; Marston, M.
2016-02-01
Viruses infecting marine Synechococcus are abundant in coastal marine environments and influence the community composition and abundance of their cyanobacterial hosts. In this study, we focused on the cyanopodoviruses which have smaller genomes and narrower host ranges relative to cyanomyoviruses. While previous studies have compared the genomes of diverse podoviruses, here we analyzed the genomic variation, host ranges, and infection kinetics of podoviruses within the same OTU. The genomes of fifty-five podoviral isolates from the coastal waters of New England were fully sequenced. Based on DNA polymerase gene sequences, these isolates fall into five discrete OTUs (termed RIP - Rhode Island Podovirus). Although all the isolates belonging to the same RIP have very similar DNA polymerase gene sequences (>98% sequence identity), differences in genome content, particularly in regions associated with tail fiber genes, were observed among isolates in the same RIP. Host range tests reveal variation both across and within RIPs. Notably within RIP1, isolates that had similar tail fiber regions also had similar host ranges. Isolates belonging to RIP4 do not contain the host-derived psbA photosynthesis gene, while isolates in the other four RIPs do possess a psbA gene. Nevertheless, infection kinetic experiments suggest that the latent period and burst size for RIP4 isolates are similar to RIP1 isolates. We are continuing to investigate the correlations among genome content, host range, and infection kinetics of isolates belonging to the same OTU. Our results to date suggest that there is substantial genomic variation within an OTU and that this variation likely influences cyanopodoviral - host interactions.
Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome
2013-01-01
Background There is growing evidence for the prevalence of copy number variation (CNV) and its role in phenotypic variation in many eukaryotic species. Here we use array comparative genomic hybridization to explore the extent of this type of structural variation in domesticated barley cultivars and wild barleys. Results A collection of 14 barley genotypes including eight cultivars and six wild barleys were used for comparative genomic hybridization. CNV affects 14.9% of all the sequences that were assessed. Higher levels of CNV diversity are present in the wild accessions relative to cultivated barley. CNVs are enriched near the ends of all chromosomes except 4H, which exhibits the lowest frequency of CNVs. CNV affects 9.5% of the coding sequences represented on the array and the genes affected by CNV are enriched for sequences annotated as disease-resistance proteins and protein kinases. Sequence-based comparisons of CNV between cultivars Barke and Morex provided evidence that DNA repair mechanisms of double-strand breaks via single-stranded annealing and synthesis-dependent strand annealing play an important role in the origin of CNV in barley. Conclusions We present the first catalog of CNVs in a diploid Triticeae species, which opens the door for future genome diversity research in a tribe that comprises the economically important cereal species wheat, barley, and rye. Our findings constitute a valuable resource for the identification of CNV affecting genes of agronomic importance. We also identify potential mechanisms that can generate variation in copy number in plant genomes. PMID:23758725
No evidence that sex and transposable elements drive genome size variation in evening primroses.
Ågren, J Arvid; Greiner, Stephan; Johnson, Marc T J; Wright, Stephen I
2015-04-01
Genome size varies dramatically across species, but despite an abundance of attention there is little agreement on the relative contributions of selective and neutral processes in governing this variation. The rate of sex can potentially play an important role in genome size evolution because of its effect on the efficacy of selection and transmission of transposable elements (TEs). Here, we used a phylogenetic comparative approach and whole genome sequencing to investigate the contribution of sex and TE content to genome size variation in the evening primrose (Oenothera) genus. We determined genome size using flow cytometry for 30 species that vary in genetic system and find that variation in sexual/asexual reproduction cannot explain the almost twofold variation in genome size. Moreover, using whole genome sequences of three species of varying genome sizes and reproductive system, we found that genome size was not associated with TE abundance; instead the larger genomes had a higher abundance of simple sequence repeats. Although it has long been clear that sexual reproduction may affect various aspects of genome evolution in general and TE evolution in particular, it does not appear to have played a major role in genome size evolution in the evening primroses. © 2015 The Author(s).
Graña-Miraglia, Lucía; Lozano, Luis F.; Velázquez, Consuelo; Volkow-Fernández, Patricia; Pérez-Oseguera, Ángeles; Cevallos, Miguel A.; Castillo-Ramírez, Santiago
2017-01-01
Genome sequencing has been useful to gain an understanding of bacterial evolution. It has been used for studying the phylogeography and/or the impact of mutation and recombination on bacterial populations. However, it has rarely been used to study gene turnover at microevolutionary scales. Here, we sequenced Mexican strains of the human pathogen Acinetobacter baumannii sampled from the same locale over a 3 year period to obtain insights into the microevolutionary dynamics of gene content variability. We found that the Mexican A. baumannii population was recently founded and has been emerging due to a rapid clonal expansion. Furthermore, we noticed that on average the Mexican strains differed from each other by over 300 genes and, notably, this gene content variation has accrued more frequently and faster than the accumulation of mutations. Moreover, due to its rapid pace, gene content variation reflects the phylogeny only at very short periods of time. Additionally, we found that the external branches of the phylogeny had almost 100 more genes than the internal branches. All in all, these results show that rapid gene turnover has been of paramount importance in producing genetic variation within this population and demonstrate the utility of genome sequencing to study alternative forms of genetic variation. PMID:28979253
Graña-Miraglia, Lucía; Lozano, Luis F; Velázquez, Consuelo; Volkow-Fernández, Patricia; Pérez-Oseguera, Ángeles; Cevallos, Miguel A; Castillo-Ramírez, Santiago
2017-01-01
Genome sequencing has been useful to gain an understanding of bacterial evolution. It has been used for studying the phylogeography and/or the impact of mutation and recombination on bacterial populations. However, it has rarely been used to study gene turnover at microevolutionary scales. Here, we sequenced Mexican strains of the human pathogen Acinetobacter baumannii sampled from the same locale over a 3 year period to obtain insights into the microevolutionary dynamics of gene content variability. We found that the Mexican A. baumannii population was recently founded and has been emerging due to a rapid clonal expansion. Furthermore, we noticed that on average the Mexican strains differed from each other by over 300 genes and, notably, this gene content variation has accrued more frequently and faster than the accumulation of mutations. Moreover, due to its rapid pace, gene content variation reflects the phylogeny only at very short periods of time. Additionally, we found that the external branches of the phylogeny had almost 100 more genes than the internal branches. All in all, these results show that rapid gene turnover has been of paramount importance in producing genetic variation within this population and demonstrate the utility of genome sequencing to study alternative forms of genetic variation.
Human germline and pan-cancer variomes and their distinct functional profiles
Pan, Yang; Karagiannis, Konstantinos; Zhang, Haichen; Dingerdissen, Hayley; Shamsaddini, Amirhossein; Wan, Quan; Simonyan, Vahan; Mazumder, Raja
2014-01-01
Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations. PMID:25232094
An Overview of Genomic Sequence Variation Markup Language (GSVML)
Nakaya, Jun; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Kimura, Michio
2006-01-01
Internationally accumulated genomic sequence variation data on human requires the interoperable data exchanging format. We developed the GSVML as the data exchanging format. The GSVML is human health oriented and has three categories. Analyses on the use case in human health domain and the investigation on the databases and markup languages were conducted. An interface ability to Health Level Seven Genotype Model was examined. GSVML provides a sharable platform for both clinical and research applications.
Turner, Thomas L.; Stewart, Andrew D.; Fields, Andrew T.; Rice, William R.; Tarone, Aaron M.
2011-01-01
Body size is a classic quantitative trait with evolutionarily significant variation within many species. Locating the alleles responsible for this variation would help understand the maintenance of variation in body size in particular, as well as quantitative traits in general. However, successful genome-wide association of genotype and phenotype may require very large sample sizes if alleles have low population frequencies or modest effects. As a complementary approach, we propose that population-based resequencing of experimentally evolved populations allows for considerable power to map functional variation. Here, we use this technique to investigate the genetic basis of natural variation in body size in Drosophila melanogaster. Significant differentiation of hundreds of loci in replicate selection populations supports the hypothesis that the genetic basis of body size variation is very polygenic in D. melanogaster. Significantly differentiated variants are limited to single genes at some loci, allowing precise hypotheses to be formed regarding causal polymorphisms, while other significant regions are large and contain many genes. By using significantly associated polymorphisms as a priori candidates in follow-up studies, these data are expected to provide considerable power to determine the genetic basis of natural variation in body size. PMID:21437274
Bangham, Jenny; Kim, Kang-Wook; Webster, Claire L; Jiggins, Francis M
2008-04-01
In natural populations, genetic variation affects resistance to disease. Knowing how much variation exists, and understanding the genetic architecture of this variation, is important for medicine, for agriculture, and for understanding evolutionary processes. To investigate the extent and nature of genetic variation affecting resistance to pathogens, we are studying a tractable model system: Drosophila melanogaster and its natural pathogen the vertically transmitted sigma virus. We show that considerable genetic variation affects transmission of the virus from parent to offspring. However, maternal and paternal transmission of the virus is affected by different genes. Maternal transmission is a simple Mendelian trait: most of the genetic variation is explained by a polymorphism in ref(2)P, a gene already well known to affect resistance to sigma. In contrast, there is considerable genetic variation in paternal transmission that cannot be explained by ref(2)P and is caused by other loci on chromosome 2. Furthermore, we found no genetic correlation between paternal transmission of the virus and resistance to infection by the sigma virus following injection. This suggests that different loci affect viral replication and paternal transmission.
Re-sequencing and genetic variation identification of a rice line with ideal plant architecture.
Li, Shuangcheng; Xie, Kailong; Li, Wenbo; Zou, Ting; Ren, Yun; Wang, Shiquan; Deng, Qiming; Zheng, Aiping; Zhu, Jun; Liu, Huainian; Wang, Lingxia; Ai, Peng; Gao, Fengyan; Huang, Bin; Cao, Xuemei; Li, Ping
2012-12-01
The ideal plant architecture (IPA) includes several important characteristics such as low tiller numbers, few or no unproductive tillers, more grains per panicle, and thick and sturdy stems. We have developed an indica restorer line 7302R that displays the IPA phenotype in terms of tiller number, grain number, and stem strength. However, its mechanism had to be clarified. We performed re-sequencing and genome-wide variation analysis of 7302R using the Solexa sequencing technology. With the genomic sequence of the indica cultivar 9311 as reference, 307 627 SNPs, 57 372 InDels, and 3 096 SVs were identified in the 7302R genome. The 7302R-specific variations were investigated via the synteny analysis of all the SNPs of 7302R with those of the previous sequenced none-IPA-type lines IR24, MH63, and SH527. Moreover, we found 178 168 7302R-specific SNPs across the whole genome and 30 239 SNPs in the predicted mRNA regions, among which 8 517 were Non-syn CDS. In addition, 263 large-effect SNPs that were expected to affect the integrity of encoded proteins were identified from the 7302R-specific SNPs. SNPs of several important previously cloned rice genes were also identified by aligning the 7302R sequence with other sequence lines. Our results provided several candidates account for the IPA phenotype of 7302R. These results therefore lay the groundwork for long-term efforts to uncover important genes and alleles for rice plant architecture construction, also offer useful data resources for future genetic and genomic studies in rice.
Naturally occurring variation in tadpole morphology and performance linked to predator regime
James B. Johnson; Daniel Saenz; Cory K. Adams; Toby J. Hibbitts
2015-01-01
Divergent natural selection drives a considerable amount of the phenotypic and genetic variation observed in natural populations. For example, variation in the predator community can generate conflicting selection on behavioral, life-history, morphological, and performance traits. Differences in predator regime can subsequently increase phenotypic and genetic...
A survey of copy number variation in the porcine genome detected from whole-genome sequence
USDA-ARS?s Scientific Manuscript database
An important challenge to post-genomic biology is relating observed phenotypic variation to the underlying genotypic variation. Genome-wide association studies (GWAS) have made thousands of connections between single nucleotide polymorphisms (SNPs) and phenotypes, implicating regions of the genome t...
A high-resolution cattle CNV map by population-scale genome sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. CNVs represent an important type of genetic variation among cattle breeds and even individual animals; however, only low-resolution maps of cattle CNVs currently exis...
Guo, Xiao-Hui; Bi, Zhe-Guang; Wu, Bi-Hua; Wang, Zhen-Zhen; Hu, Ji-Liang; Zheng, You-Liang; Liu, Deng-Cai
2013-12-01
High-molecular-weight glutenin subunits (HMW-GSs) are of considerable interest, because they play a crucial role in determining dough viscoelastic properties and end-use quality of wheat flour. In this paper, ChAy/Bx, a novel chimeric HMW-GS gene from Triticum turgidum ssp. dicoccoides (AABB, 2n=4x=28) accession D129, was isolated and characterized. Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) analysis revealed that the electrophoretic mobility of the glutenin subunit encoded by ChAy/Bx was slightly faster than that of 1Dy12. The complete ORF of ChAy/Bx contained 1,671 bp encoding a deduced polypeptide of 555 amino acid residues (or 534 amino acid residues for the mature protein), making it the smallest HMW-GS gene known from Triticum species. Sequence analysis showed that ChAy/Bx was neither a conventional x-type nor a conventional y-type subunit gene, but a novel chimeric gene. Its first 1305 nt sequence was highly homologous with the corresponding sequence of 1Ay type genes, while its final 366 nt sequence was highly homologous with the corresponding sequence of 1Bx type genes. The mature ChAy/Bx protein consisted of the N-terminus of 1Ay type subunit (the first 414 amino acid residues) and the C-terminus of 1Bx type subunit (the final 120 amino acid residues). Secondary structure prediction showed that ChAy/Bx contained some domains of 1Ay subunit and some domains of 1Bx subunit. The special structure of this HMW glutenin chimera ChAy/Bx subunit might have unique effects on the end-use quality of wheat flour. Here we propose that homoeologous recombination might be a novel pathway for allelic variation or molecular evolution of HMW-GSs. © 2013.
Sabir, Jamal S M; Arasappan, Dhivya; Bahieldin, Ahmed; Abo-Aba, Salah; Bafeel, Sameera; Zari, Talal A; Edris, Sherif; Shokry, Ahmed M; Gadalla, Nour O; Ramadan, Ahmed M; Atef, Ahmed; Al-Kordy, Magdy A; El-Domyati, Fotoh M; Jansen, Robert K
2014-01-01
Date palm is a very important crop in western Asia and northern Africa, and it is the oldest domesticated fruit tree with archaeological records dating back 5000 years. The huge economic value of this crop has generated considerable interest in breeding programs to enhance production of dates. One of the major limitations of these efforts is the uncertainty regarding the number of date palm cultivars, which are currently based on fruit shape, size, color, and taste. Whole mitochondrial and plastid genome sequences were utilized to examine single nucleotide polymorphisms (SNPs) of date palms to evaluate the efficacy of this approach for molecular characterization of cultivars. Mitochondrial and plastid genomes of nine Saudi Arabian cultivars were sequenced. For each species about 60 million 100 bp paired-end reads were generated from total genomic DNA using the Illumina HiSeq 2000 platform. For each cultivar, sequences were aligned separately to the published date palm plastid and mitochondrial reference genomes, and SNPs were identified. The results identified cultivar-specific SNPs for eight of the nine cultivars. Two previous SNP analyses of mitochondrial and plastid genomes identified substantial intra-cultivar ( = intra-varietal) polymorphisms in organellar genomes but these studies did not properly take into account the fact that nearly half of the plastid genome has been integrated into the mitochondrial genome. Filtering all sequencing reads that mapped to both organellar genomes nearly eliminated mitochondrial heteroplasmy but all plastid SNPs remained heteroplasmic. This investigation provides valuable insights into how to deal with interorganellar DNA transfer in performing SNP analyses from total genomic DNA. The results confirm recent suggestions that plastid heteroplasmy is much more common than previously thought. Finally, low levels of sequence variation in plastid and mitochondrial genomes argue for using nuclear SNPs for molecular characterization of date palm cultivars.
Vallée, Geneviève C; Muñoz, Daniella Santos; Sankoff, David
2016-11-11
Of the approximately two hundred sequenced plant genomes, how many and which ones were sequenced motivated by strictly or largely scientific considerations, and how many by chiefly economic, in a wide sense, incentives? And how large a role does publication opportunity play? In an integration of multiple disparate databases and other sources of information, we collect and analyze data on the size (number of species) in the plant orders and families containing sequenced genomes, on the trade value of these species, and of all the same-family or same-order species, and on the publication priority within the family and order. These data are subjected to multiple regression and other statistical analyses. We find that despite the initial importance of model organisms, it is clearly economic considerations that outweigh others in the choice of genome to be sequenced. This has important implications for generalizations about plant genomes, since human choices of plants to harvest (and cultivate) will have incurred many biases with respect to phenotypic characteristics and hence of genomic properties, and recent genomic evolution will also have been affected by human agricultural practices.
Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter
2014-02-01
Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.
Chae, Heejoon; Lee, Sangseon; Seo, Seokjun; Jung, Daekyoung; Chang, Hyeonsook; Nephew, Kenneth P; Kim, Sun
2016-12-01
Measuring gene expression, DNA sequence variation, and DNA methylation status is routinely done using high throughput sequencing technologies. To analyze such multi-omics data and explore relationships, reliable bioinformatics systems are much needed. Existing systems are either for exploring curated data or for processing omics data in the form of a library such as R. Thus scientists have much difficulty in investigating relationships among gene expression, DNA sequence variation, and DNA methylation using multi-omics data. In this study, we report a system called BioVLAB-mCpG-SNP-EXPRESS for the integrated analysis of DNA methylation, sequence variation (SNPs), and gene expression for distinguishing cellular phenotypes at the pairwise and multiple phenotype levels. The system can be deployed on either the Amazon cloud or a publicly available high-performance computing node, and the data analysis and exploration of the analysis result can be conveniently done using a web-based interface. In order to alleviate analysis complexity, all the process are fully automated, and graphical workflow system is integrated to represent real-time analysis progression. The BioVLAB-mCpG-SNP-EXPRESS system works in three stages. First, it processes and analyzes multi-omics data as input in the form of the raw data, i.e., FastQ files. Second, various integrated analyses such as methylation vs. gene expression and mutation vs. methylation are performed. Finally, the analysis result can be explored in a number of ways through a web interface for the multi-level, multi-perspective exploration. Multi-level interpretation can be done by either gene, gene set, pathway or network level and multi-perspective exploration can be explored from either gene expression, DNA methylation, sequence variation, or their relationship perspective. The utility of the system is demonstrated by performing analysis of phenotypically distinct 30 breast cancer cell line data set. BioVLAB-mCpG-SNP-EXPRESS is available at http://biohealth.snu.ac.kr/software/biovlab_mcpg_snp_express/. Copyright © 2016 Elsevier Inc. All rights reserved.
Yield and Economic Responses of Peanut to Crop Rotation Sequence
USDA-ARS?s Scientific Manuscript database
National Peanut Research Laboratory, Dawson, GA 39842. Proper crop rotation is essential to maintaining high peanut yield and quality. However, the economic considerations of maintaining or altering crop rotation sequences must incorporate the commodity prices, production costs, and yield responses...
Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G
2014-11-29
Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.
Variation in tooth morphology of Gorilla gorilla.
Uchida, A
1998-01-01
Gorilla gorilla exemplifies a species that shows considerable variation in habitat, behaviour, genetic structure and morphology. This study examines variation of dental morphology in gorillas. Despite the marked size dimorphism, there are no significant shape differences between the sexes within subspecies. Differences in dental morphology, including tooth cusp proportions between the western G. g. gorilla and the eastern G. g. beringei are considerable. Although more similar to G. g. beringei than to the western G. g. gorilla, G. g. graueri also shows distinct morphological features. This indicates that the morphology of G. g. graueri is not merely intermediate, and genetic isolation between the two eastern subspecies could have had a substantial influence. Such extensive variation in dental morphology in Gorilla gorilla can be considered to be the result of an interesting combination of factors, including local dietary adaptations.
SNP discovery by high-throughput sequencing in soybean
2010-01-01
Background With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is essential for fine-mapping and map-based cloning of economically important genes. Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation existing between any diverse genotypes that are usually used for QTL mapping studies. The massively parallel sequencing technologies (Roche GS/454, Illumina GA/Solexa, and ABI/SOLiD), have been widely applied to identify genome-wide sequence variations. However, it is still remains unclear whether sequence data at a low sequencing depth are enough to detect the variations existing in any QTL regions of interest in a crop genome, and how to prepare sequencing samples for a complex genome such as soybean. Therefore, with the aims of identifying SNP markers in a cost effective way for fine-mapping several QTL regions, and testing the validation rate of the putative SNPs predicted with Solexa short sequence reads at a low sequencing depth, we evaluated a pooled DNA fragment reduced representation library and SNP detection methods applied to short read sequences generated by Solexa high-throughput sequencing technology. Results A total of 39,022 putative SNPs were identified by the Illumina/Solexa sequencing system using a reduced representation DNA library of two parental lines of a mapping population. The validation rates of these putative SNPs predicted with low and high stringency were 72% and 85%, respectively. One hundred sixty four SNP markers resulted from the validation of putative SNPs and have been selectively chosen to target a known QTL, thereby increasing the marker density of the targeted region to one marker per 42 K bp. Conclusions We have demonstrated how to quickly identify large numbers of SNPs for fine mapping of QTL regions by applying massively parallel sequencing combined with genome complexity reduction techniques. This SNP discovery approach is more efficient for targeting multiple QTL regions in a same genetic population, which can be applied to other crops. PMID:20701770
Kimura, Yasumasa; Soma, Takahiro; Kasahara, Naoko; Delobel, Diane; Hanami, Takeshi; Tanaka, Yuki; de Hoon, Michiel J L; Hayashizaki, Yoshihide; Usui, Kengo; Harbers, Matthias
2016-01-01
Analytical PCR experiments preferably use internal probes for monitoring the amplification reaction and specific detection of the amplicon. Such internal probes have to be designed in close context with the amplification primers, and may require additional considerations for the detection of genetic variations. Here we describe Edesign, a new online and stand-alone tool for designing sets of PCR primers together with an internal probe for conducting quantitative real-time PCR (qPCR) and genotypic experiments. Edesign can be used for selecting standard DNA oligonucleotides like for instance TaqMan probes, but has been further extended with new functions and enhanced design features for Eprobes. Eprobes, with their single thiazole orange-labelled nucleotide, allow for highly sensitive genotypic assays because of their higher DNA binding affinity as compared to standard DNA oligonucleotides. Using new thermodynamic parameters, Edesign considers unique features of Eprobes during primer and probe design for establishing qPCR experiments and genotyping by melting curve analysis. Additional functions in Edesign allow probe design for effective discrimination between wild-type sequences and genetic variations either using standard DNA oligonucleotides or Eprobes. Edesign can be freely accessed online at http://www.dnaform.com/edesign2/, and the source code is available for download.
Thermodynamic framework to assess low abundance DNA mutation detection by hybridization
Willems, Hanny; Jacobs, An; Hadiwikarta, Wahyu Wijaya; Venken, Tom; Valkenborg, Dirk; Van Roy, Nadine; Vandesompele, Jo; Hooyberghs, Jef
2017-01-01
The knowledge of genomic DNA variations in patient samples has a high and increasing value for human diagnostics in its broadest sense. Although many methods and sensors to detect or quantify these variations are available or under development, the number of underlying physico-chemical detection principles is limited. One of these principles is the hybridization of sample target DNA versus nucleic acid probes. We introduce a novel thermodynamics approach and develop a framework to exploit the specific detection capabilities of nucleic acid hybridization, using generic principles applicable to any platform. As a case study, we detect point mutations in the KRAS oncogene on a microarray platform. For the given platform and hybridization conditions, we demonstrate the multiplex detection capability of hybridization and assess the detection limit using thermodynamic considerations; DNA containing point mutations in a background of wild type sequences can be identified down to at least 1% relative concentration. In order to show the clinical relevance, the detection capabilities are confirmed on challenging formalin-fixed paraffin-embedded clinical tumor samples. This enzyme-free detection framework contains the accuracy and efficiency to screen for hundreds of mutations in a single run with many potential applications in molecular diagnostics and the field of personalised medicine. PMID:28542229
Kasahara, Naoko; Delobel, Diane; Hanami, Takeshi; Tanaka, Yuki; de Hoon, Michiel J. L.; Hayashizaki, Yoshihide; Usui, Kengo; Harbers, Matthias
2016-01-01
Analytical PCR experiments preferably use internal probes for monitoring the amplification reaction and specific detection of the amplicon. Such internal probes have to be designed in close context with the amplification primers, and may require additional considerations for the detection of genetic variations. Here we describe Edesign, a new online and stand-alone tool for designing sets of PCR primers together with an internal probe for conducting quantitative real-time PCR (qPCR) and genotypic experiments. Edesign can be used for selecting standard DNA oligonucleotides like for instance TaqMan probes, but has been further extended with new functions and enhanced design features for Eprobes. Eprobes, with their single thiazole orange-labelled nucleotide, allow for highly sensitive genotypic assays because of their higher DNA binding affinity as compared to standard DNA oligonucleotides. Using new thermodynamic parameters, Edesign considers unique features of Eprobes during primer and probe design for establishing qPCR experiments and genotyping by melting curve analysis. Additional functions in Edesign allow probe design for effective discrimination between wild-type sequences and genetic variations either using standard DNA oligonucleotides or Eprobes. Edesign can be freely accessed online at http://www.dnaform.com/edesign2/, and the source code is available for download. PMID:26863543
DYZ1 arrays show sequence variation between the monozygotic males
2014-01-01
Background Monozygotic twins (MZT) are an important resource for genetical studies in the context of normal and diseased genomes. In the present study we used DYZ1, a satellite fraction present in the form of tandem arrays on the long arm of the human Y chromosome, as a tool to uncover sequence variations between the monozygotic males. Results We detected copy number variation, frequent insertions and deletions within the sequences of DYZ1 arrays amongst all the three sets of twins used in the present study. MZT1b showed loss of 35 bp compared to that in 1a, whereas 2a showed loss of 31 bp compared to that in 2b. Similarly, 3b showed 10 bp insertion compared to that in 3a. MZT1a germline DNA showed loss of 5 bp and 1b blood DNA showed loss of 26 bp compared to that of 1a blood and 1b germline DNA, respectively. Of the 69 restriction sites detected in DYZ1 arrays, MboII, BsrI, TspEI and TaqI enzymes showed frequent loss and or gain amongst all the 3 pairs studied. MZT1 pair showed loss/gain of VspI, BsrDI, AgsI, PleI, TspDTI, TspEI, TfiI and TaqI restriction sites in both blood and germline DNA. All the three sets of MZT showed differences in the number of DYZ1 copies. FISH signals reflected somatic mosaicism of the DYZ1 copies across the cells. Conclusions DYZ1 showed both sequence and copy number variation between the MZT males. Sequence variation was also noticed between germline and blood DNA samples of the same individual as we observed at least in one set of sample. The result suggests that DYZ1 faithfully records all the genetical changes occurring after the twining which may be ascribed to the environmental factors. PMID:24495361
Chaisi, Mamohale E; Collins, Nicola E; Potgieter, Fred T; Oosthuizen, Marinda C
2013-01-16
The African buffalo (Syncerus caffer) is a natural reservoir host for both pathogenic and non-pathogenic Theileria species. These often occur naturally as mixed infections in buffalo. Although the benign and mildly pathogenic forms do not have any significant economic importance, their presence could complicate the interpretation of diagnostic test results aimed at the specific diagnosis of the pathogenic Theileria parva in cattle and buffalo in South Africa. The 18S rRNA gene has been used as the target in a quantitative real-time PCR (qPCR) assay for the detection of T. parva infections. However, the extent of sequence variation within this gene in the non-pathogenic Theileria spp. of the Africa buffalo is not well known. The aim of this study was, therefore, to characterise the full-length 18S rRNA genes of Theileria mutans, Theileria sp. (strain MSD) and T. velifera and to determine the possible influence of any sequence variation on the specific detection of T. parva using the 18S rRNA qPCR. The reverse line blot (RLB) hybridization assay was used to select samples which either tested positive for several different Theileria spp., or which hybridised only with the Babesia/Theileria genus-specific probe and not with any of the Babesia or Theileria species-specific probes. The full-length 18S rRNA genes from 14 samples, originating from 13 buffalo and one bovine from different localities in South Africa, were amplified, cloned and the resulting recombinants sequenced. Variations in the 18S rRNA gene sequences were identified in T. mutans, Theileria sp. (strain MSD) and T. velifera, with the greatest diversity observed amongst the T. mutans variants. This variation possibly explained why the RLB hybridization assay failed to detect T. mutans and T. velifera in some of the analysed samples. Copyright © 2012 Elsevier B.V. All rights reserved.
Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao
2018-06-15
Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied by the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here, we explored within-host genetic diversity of HPV by performing deep-sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52, and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) and were deep sequenced. After constructing a reference viral genome sequence for each specimen, nucleotide positions showing changes with >0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with various numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the trinucleotide context encompassing substituted bases revealed that TpCpN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep-sequencing analyses, we show for the first time a comprehensive snapshot of the within-host genetic diversity of high-risk HPVs during cervical carcinogenesis. Quasispecies harboring minor nucleotide variations in viral whole-genome sequences were extensively observed across different grades of CIN and cervical cancer. Among the within-host variations, C-to-T transitions, a characteristic change mediated by cellular APOBEC cytosine deaminases, were predominantly detected throughout the whole viral genome, most strikingly in low-grade CIN lesions. The results strongly suggest that within-host variations of the HPV genome are primarily generated through the interaction with host cell DNA-editing enzymes and that such within-host variability is an evolutionary source of the genetic diversity of HPVs. Copyright © 2018 American Society for Microbiology.
Sequencing thousands of single-cell genomes with combinatorial indexing.
Vitak, Sarah A; Torkenczy, Kristof A; Rosenkrantz, Jimi L; Fields, Andrew J; Christiansen, Lena; Wong, Melissa H; Carbone, Lucia; Steemers, Frank J; Adey, Andrew
2017-03-01
Single-cell genome sequencing has proven valuable for the detection of somatic variation, particularly in the context of tumor evolution. Current technologies suffer from high library construction costs, which restrict the number of cells that can be assessed and thus impose limitations on the ability to measure heterogeneity within a tissue. Here, we present single-cell combinatorial indexed sequencing (SCI-seq) as a means of simultaneously generating thousands of low-pass single-cell libraries for detection of somatic copy-number variants. We constructed libraries for 16,698 single cells from a combination of cultured cell lines, primate frontal cortex tissue and two human adenocarcinomas, and obtained a detailed assessment of subclonal variation within a pancreatic tumor.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moore, B; Yin, F; Cai, J
Purpose: To determine the variation in tumor contrast between different MRI sequences and between patients for the purpose of MRI-based treatment planning. Methods: Multiple MRI scans of 11 patients with cancer(s) in the liver were included in this IRB-approved study. Imaging sequences consisted of T1W MRI, Contrast-Enhanced T1W MRI, T2W MRI, and T2*/T1W MRI. MRI images were acquired on a 1.5T GE Signa scanner with a four-channel torso coil. We calculated the tumor-to-tissue contrast to noise ratio (CNR) for each MR sequence by contouring the tumor and a region of interest (ROI) in a homogeneous region of the liver usingmore » the Eclipse treatment planning software. CNR was calculated (I-Tum-I-ROI)/SD-ROI, where I-Tum and I-ROI are the mean values of the tumor and the ROI respectively, and SD-ROI is the standard deviation of the ROI. The same tumor and ROI structures were used in all measurements for different MR sequences. Inter-patient Coefficient of variation (CV), and inter-sequence CV was determined. In addition, mean and standard deviation of CNR were calculated and compared between different MR sequences. Results: Our preliminary results showed large inter-patient CV (range: 37.7% to 88%) and inter-sequence CV (range 5.3% to 104.9%) of liver tumor CNR, indicating great variations in tumor CNR between MR sequences and between patients. Tumor CNR was found to be largest in CE-T1W (8.5±7.5), followed by T2W (4.2±2.4), T1W (3.4±2.2), and T2*/T1W (1.7±0.6) MR scans. The inter-patient CV of tumor CNR was also the largest in CE-T1W (88%), followed by T1W (64.3%), T1W (56.2%), and T2*/T1W (37.7) MR scans. Conclusion: Large inter-sequence and inter-patient variations were observed in liver tumor CNR. CE-T1W MR images on average provided the best tumor CNR. Efforts are needed to optimize tumor contrast and its consistency for MRI-based treatment planning of cancer in the liver. This project is supported by NIH grant: 1R21CA165384.« less