Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.
Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A
2010-02-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer
Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.
2010-01-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640
Setoh, Yin Xiang; Amarilla, Alberto A; Peng, Nias Y; Slonchak, Andrii; Periasamy, Parthiban; Figueiredo, Luiz T M; Aquino, Victor H; Khromykh, Alexander A
2018-01-01
Rocio virus (ROCV) is an arbovirus belonging to the genus Flavivirus, family Flaviviridae. We present an updated sequence of ROCV strain SPH 34675 (GenBank: AY632542.4), the only available full genome sequence prior to this study. Using next-generation sequencing of the entire genome, we reveal substantial sequence variation from the prototype sequence, with 30 nucleotide differences amounting to 14 amino acid changes, as well as significant changes to predicted 3'UTR RNA structures. Our results present an updated and corrected sequence of a potential emerging human-virulent flavivirus uniquely indigenous to Brazil (GenBank: MF461639).
Shakhssalim, Nasser; Houshmand, Massoud; Kamalidehghan, Behnam; Faraji, Abolfazl; Sarhangnejad, Reza; Dadgar, Sepideh; Mobaraki, Maryam; Rosli, Rozita; Sanati, Mohammad Hossein
2013-12-05
Bladder cancer is a relatively common and potentially life-threatening neoplasm that ranks ninth in terms of worldwide cancer incidence. The aim of this study was to determine deletions and sequence variations in the mitochondrial displacement loop (D-loop) region from the blood specimens and tumoral tissues of patients with bladder cancer, compared to adjacent non-tumoral tissues. The DNA from blood, tumoral tissues and adjacent non-tumoral tissues of twenty-six patients with bladder cancer and DNA from blood of 504 healthy controls from different ethnicities were investigated to determine sequence variation in the mitochondrial D-loop region using multiplex polymerase chain reaction (PCR), DNA sequencing and southern blotting analysis. From a total of 110 variations, 48 were reported as new mutations. No deletions were detected in tumoral tissues, adjacent non-tumoral tissues and blood samples from patients. Although the polymorphisms at loci 16189, 16261 and 16311 were not significantly correlated with bladder cancer, the C16069T variation was significantly present in patient samples compared to control samples (p < 0.05). Interestingly, there was no significant difference (p > 0.05) of C variations, including C7TC6, C8TC6, C9TC6 and C10TC6, in D310 mitochondrial DNA between patients and control samples. Our study suggests that 16069 mitochondrial DNA D-Loop mutations may play a significant role in the etiology of bladder cancer and facilitate the definition of carcinogenesis-related mutations in human cancer.
HIV-1 sequence variation between isolates from mother-infant transmission pairs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wike, C.M.; Daniels, M.R.; Furtado, M.
1991-12-31
To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between eachmore » linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.« less
Genetic variation patterns of American chestnut populations at EST-SSRs
Oliver Gailing; C. Dana Nelson
2017-01-01
The objective of this study is to analyze patterns of genetic variation at genic expressed sequence tag - simple sequence repeats (EST-SSRs) and at chloroplast DNA markers in populations of American chestnut (Castanea dentata Borkh.) to assist in conservation and breeding efforts. Allelic diversity at EST-SSRs decreased significantly from southwest to northeast along...
Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B
2015-07-01
The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.
Complex multifractal nature in Mycobacterium tuberculosis genome
Mandal, Saurav; Roychowdhury, Tanmoy; Chirom, Keilash; Bhattacharya, Alok; Brojen Singh, R. K.
2017-01-01
The mutifractal and long range correlation (C(r)) properties of strings, such as nucleotide sequence can be a useful parameter for identification of underlying patterns and variations. In this study C(r) and multifractal singularity function f(α) have been used to study variations in the genomes of a pathogenic bacteria Mycobacterium tuberculosis. Genomic sequences of M. tuberculosis isolates displayed significant variations in C(r) and f(α) reflecting inherent differences in sequences among isolates. M. tuberculosis isolates can be categorised into different subgroups based on sensitivity to drugs, these are DS (drug sensitive isolates), MDR (multi-drug resistant isolates) and XDR (extremely drug resistant isolates). C(r) follows significantly different scaling rules in different subgroups of isolates, but all the isolates follow one parameter scaling law. The richness in complexity of each subgroup can be quantified by the measures of multifractal parameters displaying a pattern in which XDR isolates have highest value and lowest for drug sensitive isolates. Therefore C(r) and multifractal functions can be useful parameters for analysis of genomic sequences. PMID:28440326
Complex multifractal nature in Mycobacterium tuberculosis genome
NASA Astrophysics Data System (ADS)
Mandal, Saurav; Roychowdhury, Tanmoy; Chirom, Keilash; Bhattacharya, Alok; Brojen Singh, R. K.
2017-04-01
The mutifractal and long range correlation (C(r)) properties of strings, such as nucleotide sequence can be a useful parameter for identification of underlying patterns and variations. In this study C(r) and multifractal singularity function f(α) have been used to study variations in the genomes of a pathogenic bacteria Mycobacterium tuberculosis. Genomic sequences of M. tuberculosis isolates displayed significant variations in C(r) and f(α) reflecting inherent differences in sequences among isolates. M. tuberculosis isolates can be categorised into different subgroups based on sensitivity to drugs, these are DS (drug sensitive isolates), MDR (multi-drug resistant isolates) and XDR (extremely drug resistant isolates). C(r) follows significantly different scaling rules in different subgroups of isolates, but all the isolates follow one parameter scaling law. The richness in complexity of each subgroup can be quantified by the measures of multifractal parameters displaying a pattern in which XDR isolates have highest value and lowest for drug sensitive isolates. Therefore C(r) and multifractal functions can be useful parameters for analysis of genomic sequences.
NASA Astrophysics Data System (ADS)
Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.
2016-06-01
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.
2016-01-01
Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631
Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.
Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng
2009-10-01
The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, P<0.01) between the expansion of the cytosine sequence length in the C-stretch of HVS-I and a reduction in the number of upstream adenines. These results indicate that the C-stretch could be a useful genetic maker in forensic identification of Chinese populations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.
Ekanayake, Saliya; Ruan, Yang; Schütte, Ursel M. E.; Kaonongbua, Wittaya; Fox, Geoffrey; Ye, Yuzhen; Bever, James D.
2016-01-01
ABSTRACT Arbuscular mycorrhizal (AM) fungi form mutualisms with plant roots that increase plant growth and shape plant communities. Each AM fungal cell contains a large amount of genetic diversity, but it is unclear if this diversity varies across evolutionary lineages. We found that sequence variation in the nuclear large-subunit (LSU) rRNA gene from 29 isolates representing 21 AM fungal species generally assorted into genus- and species-level clades, with the exception of species of the genera Claroideoglomus and Entrophospora. However, there were significant differences in the levels of sequence variation across the phylogeny and between genera, indicating that it is an evolutionarily constrained trait in AM fungi. These consistent patterns of sequence variation across both phylogenetic and taxonomic groups pose challenges to interpreting operational taxonomic units (OTUs) as approximations of species-level groups of AM fungi. We demonstrate that the OTUs produced by five sequence clustering methods using 97% or equivalent sequence similarity thresholds failed to match the expected species of AM fungi, although OTUs from AbundantOTU, CD-HIT-OTU, and CROP corresponded better to species than did OTUs from mothur or UPARSE. This lack of OTU-to-species correspondence resulted both from sequences of one species being split into multiple OTUs and from sequences of multiple species being lumped into the same OTU. The OTU richness therefore will not reliably correspond to the AM fungal species richness in environmental samples. Conservatively, this error can overestimate species richness by 4-fold or underestimate richness by one-half, and the direction of this error will depend on the genera represented in the sample. IMPORTANCE Arbuscular mycorrhizal (AM) fungi form important mutualisms with the roots of most plant species. Individual AM fungi are genetically diverse, but it is unclear whether the level of this diversity differs among evolutionary lineages. We found that the amount of sequence variation in an rRNA gene that is commonly used to identify AM fungal species varied significantly between evolutionary groups that correspond to different genera, with the exception of two genera that are genetically indistinguishable from each other. When we clustered groups of similar sequences into operational taxonomic units (OTUs) using five different clustering methods, these patterns of sequence variation caused the number of OTUs to either over- or underestimate the actual number of AM fungal species, depending on the genus. Our results indicate that OTU-based inferences about AM fungal species composition from environmental sequences can be improved if they take these taxonomically structured patterns of sequence variation into account. PMID:27260357
Genetic and Epigenetic Variations Induced by Wheat-Rye 2R and 5R Monosomic Addition Lines
Fu, Shulan; Sun, Chuanfei; Yang, Manyu; Fei, Yunyan; Tan, Feiqun; Yan, Benju; Ren, Zhenglong; Tang, Zongxiang
2013-01-01
Background Monosomic alien addition lines (MAALs) can easily induce structural variation of chromosomes and have been used in crop breeding; however, it is unclear whether MAALs will induce drastic genetic and epigenetic alterations. Methodology/Principal Findings In the present study, wheat-rye 2R and 5R MAALs together with their selfed progeny and parental common wheat were investigated through amplified fragment length polymorphism (AFLP) and methylation-sensitive amplification polymorphism (MSAP) analyses. The MAALs in different generations displayed different genetic variations. Some progeny that only contained 42 wheat chromosomes showed great genetic/epigenetic alterations. Cryptic rye chromatin has introgressed into the wheat genome. However, one of the progeny that contained cryptic rye chromatin did not display outstanding genetic/epigenetic variation. 78 and 49 sequences were cloned from changed AFLP and MSAP bands, respectively. Blastn search indicated that almost half of them showed no significant similarity to known sequences. Retrotransposons were mainly involved in genetic and epigenetic variations. Genetic variations basically affected Gypsy-like retrotransposons, whereas epigenetic alterations affected Copia-like and Gypsy-like retrotransposons equally. Genetic and epigenetic variations seldom affected low-copy coding DNA sequences. Conclusions/Significance The results in the present study provided direct evidence to illustrate that monosomic wheat-rye addition lines could induce different and drastic genetic/epigenetic variations and these variations might not be caused by introgression of rye chromatins into wheat. Therefore, MAALs may be directly used as an effective means to broaden the genetic diversity of common wheat. PMID:23342073
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.
Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M
2015-10-01
The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Fornage, Myriam; Mosley, Thomas H; Jack, Clifford R; de Andrade, Mariza; Kardia, Sharon L R; Boerwinkle, Eric; Turner, Stephen T
2007-01-01
Susceptibility to ischemic damage to the subcortical white matter of the brain has a strong genetic basis. Dysregulation of matrix metalloproteinases (MMPs) contributes to loss of cerebrovascular integrity and white matter injury. We investigated whether sequence variation in the genes encoding MMP3 and MMP9 is associated with variation in leukoaraiosis volume, determined by magnetic resonance imaging, in non-Hispanic whites and African-Americans using family-based association tests. Seven hundred and fifty-six white and 671 African-American individuals from sibships ascertained through two or more siblings with hypertension were genotyped for 7 and 8 haplotype-tagging polymorphisms in the MMP3 and MMP9 genes, respectively. MMP3 sequence variation was significantly associated with variation in leukoaraiosis volume in Whites. Two common haplotypes with opposing relationships to leukoaraiosis volume were identified. MMP9 sequence variation was also significantly associated with variation in leukoaraiosis volume in both African-Americans and Whites. Different haplotypes contributed to these associations in the two racial groups. These findings add to the growing body of evidence from animal models and human clinical studies suggesting a role of MMPs in ischemic white matter injury. They provide the basis for further investigation of the role of these genes in susceptibility and/or progression to clinical disease.
Dissecting the relationship between protein structure and sequence variation
NASA Astrophysics Data System (ADS)
Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team
2015-03-01
Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.
Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E
2012-07-01
Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Polymorphism in the Eruption Sequence of Primary Dentition: A Cross-sectional Study
Bhojraj, Nandlal; Narayanappa
2017-01-01
Introduction Primary teeth have shown wide variations in their eruption time among different population. Population specific eruption ages are provided as mean with standard deviations or median ages with its percentile range. This alone will be insufficient for prediction of tooth eruption sequence because they provide no information on the frequency of sequence variation within the pairs of teeth. Norms of polymorphic variation in the eruption sequence can be more useful. Aim This study aims at providing norms for the sequence polymorphism in primary teeth among the children of Mysore population. Materials and Methods A cross-sectional study was designed with 1392 children, recruited from December 2015 to June 2016 by simple random sampling method. Tooth was recorded as present or absent. Across the entire possible intra quadrant tooth pair, cases of present-present, absent-absent, present-absent and absent-present and were counted and computed as percentages. Results Sequence polymorphisms were more common in 82-84 pairs of teeth. Significant polymorphic reverse sequence was observed in 52-54 (9%), 82-84 (35%) in males and 82-84 (18%) in females. There was no polymorphism in maxillary arch in females. Conclusion The present study provides the baseline data values for sequence variation in primary teeth eruption. To the best of investigators knowledge, there are no previous studies describing the sequence polymorphism in primary teeth in Indian population. The results of this study helps in assessment of eruption sequence problems in paediatric dentistry and in evaluation and prediction of tooth eruption sequence in individual child. PMID:28658912
Camats, Núria; Fernández-Cancio, Mónica; Carrascosa, Antonio; Andaluz, Pilar; Albisu, M Ángeles; Clemente, María; Gussinyé, Miquel; Yeste, Diego; Audí, Laura
2012-10-01
Molecular causes of isolated severe growth hormone deficiency (ISGHD) in several genes have been established. The aim of this study was to analyse the contribution of growth hormone-releasing hormone receptor (GHRHR) gene sequence variation to GH deficiency in a series of prepubertal ISGHD patients and to normal adult height. A systematic GHRHR gene sequence analysis was performed in 69 ISGHD patients and 60 normal adult height controls (NAHC). Four GHRHR single-nucleotide polymorphisms (SNPs) were genotyped in 248 additional NAHC. An analysis was performed on individual SNPs and combined genotype associations with diagnosis in ISGHD patients and with height-SDS in NAHC. Twenty-one SNPs were found. P3, P13, P15 and P20 had not been previously described. Patients and controls shared 12 SNPs (P1, P2, P4-P11, P16 and P21). Significantly different frequencies of the heterozygous genotype and alternate allele were detected in P9 (exon 4, rs4988498) and P12 (intron 6, rs35609199); P9 heterozygous genotype frequencies were similar in patients and the shortest control group (heights between -2 and -1 SDS) and significantly different in controls (heights between -1 and +2 SDS). GHRHR P9 together with 4 GH1 SNP genotypes contributed to 6·2% of height-SDS variation in the entire 308 NAHC. This study established the GHRHR gene sequence variation map in ISGHD patients and NAHC. No evidence of GHRHR mutation contribution to ISGHD was found in this population, although P9 and P12 SNP frequencies were significantly different between ISGHD and NAHC. Thus, the gene sequence may contribute to normal adult height, as demonstrated in NAHC. © 2012 Blackwell Publishing Ltd.
Swallow Event Sequencing: Comparing Healthy Older and Younger Adults.
Herzberg, Erica G; Lazarus, Cathy L; Steele, Catriona M; Molfenter, Sonja M
2018-04-23
Previous research has established that a great deal of variation exists in the temporal sequence of swallowing events for healthy adults. Yet, the impact of aging on swallow event sequence is not well understood. Kendall et al. (Dysphagia 18(2):85-91, 2003) suggested there are 4 obligatory paired-event sequences in swallowing. We directly compared adherence to these sequences, as well as event latencies, and quantified the percentage of unique sequences in two samples of healthy adults: young (< 45) and old (> 65). The 8 swallowing events that contribute to the sequences were reliably identified from videofluoroscopy in a sample of 23 healthy seniors (10 male, mean age 74.7) and 20 healthy young adults (10 male, mean age 31.5) with no evidence of penetration-aspiration or post-swallow residue. Chi-square analyses compared the proportions of obligatory pairs and unique sequences by age group. Compared to the older subjects, younger subjects had significantly lower adherence to two obligatory sequences: Upper Esophageal Sphincter (UES) opening occurs before (or simultaneous with) the bolus arriving at the UES and UES maximum distention occurs before maximum pharyngeal constriction. The associated latencies were significantly different between age groups as well. Further, significantly fewer unique swallow sequences were observed in the older group (61%) compared with the young (82%) (χ 2 = 31.8; p < 0.001). Our findings suggest that paired swallow event sequences may not be robust across the age continuum and that variation in swallow sequences appears to decrease with aging. These findings provide normative references for comparisons to older individuals with dysphagia.
Human germline and pan-cancer variomes and their distinct functional profiles
Pan, Yang; Karagiannis, Konstantinos; Zhang, Haichen; Dingerdissen, Hayley; Shamsaddini, Amirhossein; Wan, Quan; Simonyan, Vahan; Mazumder, Raja
2014-01-01
Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations. PMID:25232094
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J
2016-03-22
The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.
Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A; Sahgal, Arjun
2011-11-21
Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R² = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.
BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
Wang, Junbai; Batmanov, Kirill
2015-01-01
Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972
Hoy, Marshal S.; Rodriguez, Rusty J.
2013-01-01
Molecular genetic analysis was conducted on two populations of the invasive non-native New Zealand mud snail (Potamopyrgus antipodarum), one from a freshwater ecosystem in Devil's Lake (Oregon, USA) and the other from an ecosystem of higher salinity in the Columbia River estuary (Hammond Harbor, Oregon, USA). To elucidate potential genetic differences between the two populations, three segments of nuclear ribosomal DNA (rDNA), the ITS1-ITS2 regions and the 18S and 28S rDNA genes were cloned and sequenced. Variant sequences within each individual were found in all three rDNA segments. Folding models were utilized for secondary structure analysis and results indicated that there were many sequences which contained structure-altering polymorphisms, which suggests they could be nonfunctional pseudogenes. In addition, analysis of molecular variance (AMOVA) was used for hierarchical analysis of genetic variance to estimate variation within and among populations and within individuals. AMOVA revealed significant variation in the ITS region between the populations and among clones within individuals, while in the 5.8S rDNA significant variation was revealed among individuals within the two populations. High levels of intragenomic variation were found in the ITS regions, which are known to be highly variable in many organisms. More interestingly, intragenomic variation was also found in the 18S and 28S rDNA, which has rarely been observed in animals and is so far unreported in Mollusca. We postulate that in these P. antipodarum populations the effects of concerted evolution are diminished due to the fact that not all of the rDNA genes in their polyploid genome should be essential for sustaining cellular function. This could lead to a lessening of selection pressures, allowing mutations to accumulate in some copies, changing them into variant sequences.
In Silico Detection of Sequence Variations Modifying Transcriptional Regulation
Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob
2008-01-01
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319
Model-based quality assessment and base-calling for second-generation sequencing data.
Bravo, Héctor Corrada; Irizarry, Rafael A
2010-09-01
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
Ling, Juan; Zhang, Yan-Ying; Dong, Jun-De; Wang, You-Shao; Feng, Jing-Bing; Zhou, Wei-Hua
2015-10-01
Bacteria play important roles in the structure and function of marine food webs by utilizing nutrients and degrading the pollutants, and their distribution are determined by surrounding water chemistry to a certain extent. It is vital to investigate the bacterial community's structure and identifying the significant factors by controlling the bacterial distribution in the paper. Flow cytometry showed that the total bacterial abundance ranged from 5.27 × 10(5) to 3.77 × 10(6) cells/mL. Molecular fingerprinting technique, denaturing gradient gel electrophoresis (DGGE) followed by DNA sequencing has been employed to investigate the bacterial community composition. The results were then interpreted through multivariate statistical analysis and tended to explain its relationship to the environmental factors. A total of 270 bands at 83 different positions were detected in DGGE profiles and 29 distinct DGGE bands were sequenced. The predominant bacteria were related to Phyla Protebacteria species (31 %, nine sequences), Cyanobacteria (37.9 %, eleven sequences) and Actinobacteria (17.2 %, five sequences). Other phylogenetic groups identified including Firmicutes (6.9 %, two sequences), Bacteroidetes (3.5 %, one sequences) and Verrucomicrobia (3.5 %, one sequences). Conical correspondence analysis was used to elucidate the relationships between the bacterial community compositions and environmental factors. The results showed that the spatial variations in the bacterial community composition was significantly related to phosphate (P = 0.002, P < 0.01), dissolved organic carbon (P = 0.004, P < 0.01), chemical oxygen demand (P = 0.010, P < 0.05) and nitrite (P = 0.016, P < 0.05). This study revealed the spatial variations of bacterial community and significant environmental factors driving the bacterial composition shift. These results may be valuable for further investigation on the functional microbial structure and expression quantitatively under the polluted environments in the world.
Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J
2004-10-01
Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
Minimal Absent Words in Four Human Genome Assemblies
Garcia, Sara P.; Pinho, Armando J.
2011-01-01
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we aim to contribute to the catalogue of human genomic variation by investigating the variation in number and content of minimal absent words within a species, using four human genome assemblies. We compare the reference human genome GRCh37 assembly, the HuRef assembly of the genome of Craig Venter, the NA12878 assembly from cell line GM12878, and the YH assembly of the genome of a Han Chinese individual. We find the variation in number and content of minimal absent words between assemblies more significant for large and very large minimal absent words, where the biases of sequencing and assembly methodologies become more pronounced. Moreover, we find generally greater similarity between the human genome assemblies sequenced with capillary-based technologies (GRCh37 and HuRef) than between the human genome assemblies sequenced with massively parallel technologies (NA12878 and YH). Finally, as expected, we find the overall variation in number and content of minimal absent words within a species to be generally smaller than the variation between species. PMID:22220210
An experimental phylogeny to benchmark ancestral sequence reconstruction
Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.
2016-01-01
Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687
Zhu, X Q; Gasser, R B
1998-06-01
In this study, we assessed single-strand conformation polymorphism (SSCP)-based approaches for their capacity to fingerprint sequence variation in ribosomal DNA (rDNA) of ascaridoid nematodes of veterinary and/or human health significance. The second internal transcribed spacer region (ITS-2) of rDNA was utilised as the target region because it is known to provide species-specific markers for this group of parasites. ITS-2 was amplified by PCR from genomic DNA derived from individual parasites and subjected to analysis. Direct SSCP analysis of amplicons from seven taxa (Toxocara vitulorum, Toxocara cati, Toxocara canis, Toxascaris leonina, Baylisascaris procyonis, Ascaris suum and Parascaris equorum) showed that the single-strand (ss) ITS-2 patterns produced allowed their unequivocal identification to species. While no variation in SSCP patterns was detected in the ITS-2 within four species for which multiple samples were available, the method allowed the direct display of four distinct sequence types of ITS-2 among individual worms of T. cati. Comparison of SSCP/sequencing with the methods of dideoxy fingerprinting (ddF) and restriction endonuclease fingerprinting (REF) revealed that also ddF allowed the definition of the four sequence types, whereas REF displayed three of four. The findings indicate the usefulness of the SSCP-based approaches for the identification of ascaridoid nematodes to species, the direct display of sequence variation in rDNA and the detection of population variation. The ability to fingerprint microheterogeneity in ITS-2 rDNA using such approaches also has implications for studying fundamental aspects relating to mutational change in rDNA.
Krieger, Jeannette; Hett, Anne Kathrin; Fuerst, Paul A; Birstein, Vadim J; Ludwig, Arne
2006-01-01
Significant intraindividual variation in the sequence of the 18S rRNA gene is unusual in animal genomes. In a previous study, multiple 18S rRNA gene sequences were observed within individuals of eight species of sturgeon from North America but not in the North American paddlefish, Polyodon spathula, in two species of Polypterus (Polypterus delhezi and Polypterus senegalus), in other primitive fishes (Erpetoichthys calabaricus, Lepisosteus osseus, Amia calva) or in a lungfish (Protopterus sp.). These observations led to the hypothesis that this unusual genetic characteristic arose within the Acipenseriformes after the presumed divergence of the sturgeon and paddlefish families. In the present study, a survey of nearly all Eurasian acipenseriform species was conducted to examine 18S rDNA variation. Intraindividual variation was not found in the polyodontid species, the Chinese paddlefish, Psephurus gladius, but variation was detected in all Eurasian acipenserid species. The comparison of sequences from two major segments of the 18S rRNA gene and identification of sites where insertion/deletion events have occurred are placed in the context of evolutionary relationships within the Acipenseriformes and the evolution of rDNA variation in this group.
NASA Astrophysics Data System (ADS)
Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A.; Sahgal, Arjun
2011-11-01
Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R2 = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.
Microfluidic droplet enrichment for targeted sequencing
Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.
2015-01-01
Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629
Baeßler, Bettina; Schaarschmidt, Frank; Stehning, Christian; Schnackenburg, Bernhard; Maintz, David; Bunck, Alexander C
2015-11-01
Previous studies showed that myocardial T2 relaxation times measured by cardiac T2-mapping vary significantly depending on sequence and field strength. Therefore, a systematic comparison of different T2-mapping sequences and the establishment of dedicated T2 reference values is mandatory for diagnostic decision-making. Phantom experiments using gel probes with a range of different T1 and T2 times were performed on a clinical 1.5T and 3T scanner. In addition, 30 healthy volunteers were examined at 1.5 and 3T in immediate succession. In each examination, three different T2-mapping sequences were performed at three short-axis slices: Multi Echo Spin Echo (MESE), T2-prepared balanced SSFP (T2prep), and Gradient Spin Echo with and without fat saturation (GraSEFS/GraSE). Segmented T2-Maps were generated according to the AHA 16-segment model and statistical analysis was performed. Significant intra-individual differences between mean T2 times were observed for all sequences. In general, T2prep resulted in lowest and GraSE in highest T2 times. A significant variation with field strength was observed for mean T2 in phantom as well as in vivo, with higher T2 values at 1.5T compared to 3T, regardless of the sequence used. Segmental T2 values for each sequence at 1.5 and 3T are presented. Despite a careful selection of sequence parameters and volunteers, significant variations of the measured T2 values were observed between field strengths, MR sequences and myocardial segments. Therefore, we present segmental T2 values for each sequence at 1.5 and 3T with the inherent potential to serve as reference values for future studies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Webb, Kristen M; Rosenthal, Benjamin M
2011-01-01
The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Intra-specific variation in genome size in maize: cytological and phenotypic correlates
Realini, María Florencia; Poggio, Lidia; Cámara-Hernández, Julián; González, Graciela Esther
2016-01-01
Genome size variation accompanies the diversification and evolution of many plant species. Relationships between DNA amount and phenotypic and cytological characteristics form the basis of most hypotheses that ascribe a biological role to genome size. The goal of the present research was to investigate the intra-specific variation in the DNA content in maize populations from Northeastern Argentina and further explore the relationship between genome size and the phenotypic traits seed weight and length of the vegetative cycle. Moreover, cytological parameters such as the percentage of heterochromatin as well as the number, position and sequence composition of knobs were analysed and their relationships with 2C DNA values were explored. The populations analysed presented significant differences in 2C DNA amount, from 4.62 to 6.29 pg, representing 36.15 % of the inter-populational variation. Moreover, intra-populational genome size variation was found, varying from 1.08 to 1.63-fold. The variation in the percentage of knob heterochromatin as well as in the number, chromosome position and sequence composition of the knobs was detected among and within the populations. Although a positive relationship between genome size and the percentage of heterochromatin was observed, a significant correlation was not found. This confirms that other non-coding repetitive DNA sequences are contributing to the genome size variation. A positive relationship between DNA amount and the seed weight has been reported in a large number of species, this relationship was not found in the populations studied here. The length of the vegetative cycle showed a positive correlation with the percentage of heterochromatin. This result allowed attributing an adaptive effect to heterochromatin since the length of this cycle would be optimized via selection for an appropriate percentage of heterochromatin. PMID:26644343
Al-Bustan, Suzanne A; Al-Serri, Ahmad; Annice, Babitha G; Alnaqeeb, Majed A; Al-Kandari, Wafa Y; Dashti, Mohammed
2018-01-01
The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel "rare" variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004-0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001-0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia.
Al-Serri, Ahmad; Annice, Babitha G.; Alnaqeeb, Majed A.; Al-Kandari, Wafa Y.; Dashti, Mohammed
2018-01-01
The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel “rare” variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004–0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001–0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia. PMID:29438437
Liu, G H; Zhou, W; Nisbet, A J; Xu, M J; Zhou, D H; Zhao, G H; Wang, S K; Song, H Q; Lin, R Q; Zhu, X Q
2014-03-01
Trichuris trichiura and Trichuris suis parasitize (at the adult stage) the caeca of humans and pigs, respectively, causing trichuriasis. Despite these parasites being of human and animal health significance, causing considerable socio-economic losses globally, little is known of the molecular characteristics of T. trichiura and T. suis from China. In the present study, the entire first and second internal transcribed spacer (ITS-1 and ITS-2) regions of nuclear ribosomal DNA (rDNA) of T. trichiura and T. suis from China were amplified by polymerase chain reaction (PCR), the representative amplicons were cloned and sequenced, and sequence variation in the ITS rDNA was examined. The ITS rDNA sequences for the T. trichiura and T. suis samples were 1222-1267 bp and 1339-1353 bp in length, respectively. Sequence analysis revealed that the ITS-1, 5.8S and ITS-2 rDNAs of both whipworms were 600-627 bp and 655-661 bp, 154 bp, and 468-486 bp and 530-538 bp in size, respectively. Sequence variation in ITS rDNA within and among T. trichiura and T. suis was examined. Excluding nucleotide variations in the simple sequence repeats, the intra-species sequence variation in the ITS-1 was 0.2-1.7% within T. trichiura, and 0-1.5% within T. suis. For ITS-2 rDNA, the intra-species sequence variation was 0-1.3% within T. trichiura and 0.2-1.7% within T. suis. The inter-species sequence differences between the two whipworms were 60.7-65.3% for ITS-1 and 59.3-61.5% for ITS-2. These results demonstrated that the ITS rDNA sequences provide additional genetic markers for the characterization and differentiation of the two whipworms. These data should be useful for studying the epidemiology and population genetics of T. trichiura and T. suis, as well as for the diagnosis of trichuriasis in humans and pigs.
NASA Astrophysics Data System (ADS)
Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.
2015-12-01
Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.
McRobie, Helen R; King, Linda M; Fanutti, Cristina; Coussons, Peter J; Moncrief, Nancy D; Thomas, Alison P M
2014-01-01
Sequence variations in the melanocortin 1 receptor (MC1R) gene are associated with melanism in many different species of mammals, birds, and reptiles. The gray squirrel (Sciurus carolinensis), found in the British Isles, was introduced from North America in the late 19th century. Melanism in the British gray squirrel is associated with a 24-bp deletion in the MC1R. To investigate the origin of this mutation, we sequenced the MC1R of 95 individuals including 44 melanic gray squirrels from both the British Isles and North America. Melanic gray squirrels of both populations had the same 24-bp deletion associated with melanism. Given the significant deletion associated with melanism in the gray squirrel, we sequenced the MC1R of both wild-type and melanic fox squirrels (Sciurus niger) (9 individuals) and red squirrels (Sciurus vulgaris) (39 individuals). Unlike the gray squirrel, no association between sequence variation in the MC1R and melanism was found in these 2 species. We conclude that the melanic gray squirrel found in the British Isles originated from one or more introductions of melanic gray squirrels from North America. We also conclude that variations in the MC1R are not associated with melanism in the fox and red squirrels.
Michael, Todd P.; Park, Sohyun; Kim, Tae-Sung; Booth, Jim; Byer, Amanda; Sun, Qi; Chory, Joanne; Lee, Kwangwon
2007-01-01
Background WHITE COLLAR-1 (WC-1) mediates interactions between the circadian clock and the environment by acting as both a core clock component and as a blue light photoreceptor in Neurospora crassa. Loss of the amino-terminal polyglutamine (NpolyQ) domain in WC-1 results in an arrhythmic circadian clock; this data is consistent with this simple sequence repeat (SSR) being essential for clock function. Methodology/Principal Findings Since SSRs are often polymorphic in length across natural populations, we reasoned that investigating natural variation of the WC-1 NpolyQ may provide insight into its role in the circadian clock. We observed significant phenotypic variation in the period, phase and temperature compensation of circadian regulated asexual conidiation across 143 N. crassa accessions. In addition to the NpolyQ, we identified two other simple sequence repeats in WC-1. The sizes of all three WC-1 SSRs correlated with polymorphisms in other clock genes, latitude and circadian period length. Furthermore, in a cross between two N. crassa accessions, the WC-1 NpolyQ co-segregated with period length. Conclusions/Significance Natural variation of the WC-1 NpolyQ suggests a mechanism by which period length can be varied and selected for by the local environment that does not deleteriously affect WC-1 activity. Understanding natural variation in the N. crassa circadian clock will facilitate an understanding of how fungi exploit their environments. PMID:17726525
Genetic and epigenetic variations induced by wheat-rye 2R and 5R monosomic addition lines.
Fu, Shulan; Sun, Chuanfei; Yang, Manyu; Fei, Yunyan; Tan, Feiqun; Yan, Benju; Ren, Zhenglong; Tang, Zongxiang
2013-01-01
Monosomic alien addition lines (MAALs) can easily induce structural variation of chromosomes and have been used in crop breeding; however, it is unclear whether MAALs will induce drastic genetic and epigenetic alterations. In the present study, wheat-rye 2R and 5R MAALs together with their selfed progeny and parental common wheat were investigated through amplified fragment length polymorphism (AFLP) and methylation-sensitive amplification polymorphism (MSAP) analyses. The MAALs in different generations displayed different genetic variations. Some progeny that only contained 42 wheat chromosomes showed great genetic/epigenetic alterations. Cryptic rye chromatin has introgressed into the wheat genome. However, one of the progeny that contained cryptic rye chromatin did not display outstanding genetic/epigenetic variation. 78 and 49 sequences were cloned from changed AFLP and MSAP bands, respectively. Blastn search indicated that almost half of them showed no significant similarity to known sequences. Retrotransposons were mainly involved in genetic and epigenetic variations. Genetic variations basically affected Gypsy-like retrotransposons, whereas epigenetic alterations affected Copia-like and Gypsy-like retrotransposons equally. Genetic and epigenetic variations seldom affected low-copy coding DNA sequences. The results in the present study provided direct evidence to illustrate that monosomic wheat-rye addition lines could induce different and drastic genetic/epigenetic variations and these variations might not be caused by introgression of rye chromatins into wheat. Therefore, MAALs may be directly used as an effective means to broaden the genetic diversity of common wheat.
DNA methylation Landscape of body size variation in sheep.
Cao, Jiaxue; Wei, Caihong; Liu, Dongming; Wang, Huihua; Wu, Mingming; Xie, Zhiyuan; Capellini, Terence D; Zhang, Li; Zhao, Fuping; Li, Li; Zhong, Tao; Wang, Linjie; Lu, Jian; Liu, Ruizao; Zhang, Shifang; Du, Yongfei; Zhang, Hongping; Du, Lixin
2015-10-16
Sub-populations of Chinese Mongolian sheep exhibit significant variance in body mass. In the present study, we sequenced the whole genome DNA methylation in these breeds to detect whether DNA methylation plays a role in determining the body mass of sheep by Methylated DNA immunoprecipitation - sequencing method. A high quality methylation map of Chinese Mongolian sheep was obtained in this study. We identified 399 different methylated regions located in 93 human orthologs, which were previously reported as body size related genes in human genome-wide association studies. We tested three regions in LTBP1, and DNA methylation of two CpG sites showed significant correlation with its RNA expression. Additionally, a particular set of differentially methylated windows enriched in the "development process" (GO: 0032502) was identified as potential candidates for association with body mass variation. Next, we validated small part of these windows in 5 genes; DNA methylation of SMAD1, TSC1 and AKT1 showed significant difference across breeds, and six CpG were significantly correlated with RNA expression. Interestingly, two CpG sites showed significant correlation with TSC1 protein expression. This study provides a thorough understanding of body size variation in sheep from an epigenetic perspective.
Schadt, Eric E.; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H.; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A.; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew
2013-01-01
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720
Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew
2013-01-01
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Wang, Yan; Liu, Guo-Hua; Li, Jia-Yuan; Xu, Min-Jun; Ye, Yong-Gang; Zhou, Dong-Hui; Song, Hui-Qun; Lin, Rui-Qing; Zhu, Xing-Quan
2013-02-01
This study examined sequence variation in three mitochondrial DNA (mtDNA) regions, namely cytochrome c oxidase subunit 1 (cox1), NADH dehydrogenase subunit 5 (nad5) and cytochrome b (cytb), among Trichuris ovis isolates from different hosts in Guangdong Province, China. A portion of the cox1 (pcox1), nad5 (pnad5) and cytb (pcytb) genes was amplified separately from individual whipworms by PCR, and was subjected to sequencing from both directions. The size of the sequences of pcox1, pnad5 and pcytb was 618, 240 and 464 bp, respectively. Although the intra-specific sequence variations within T. ovis were 0-0.8% for pcox1, 0-0.8% for pnad5 and 0-1.9% for pcytb, the inter-specific sequence differences among members of the genus Trichuris were significantly higher, being 24.3-26.5% for pcox1, 33.7-56.4% for pnad5 and 24.8-26.1% for pcytb, respectively. Phylogenetic analyses using combined sequences of pcox1, pnad5 and pcytb, with three different computational algorithms (maximum likelihood, maximum parsimony and Bayesian inference), indicated that all of the T. ovis isolates grouped together with high statistical support. These findings demonstrated the existence of intra-specific variation in mtDNA sequences among T. ovis isolates from different hosts, and have implications for studying molecular epidemiology and population genetics of T. ovis.
Ellis, Lisa L.; Huang, Wen; Quinn, Andrew M.; Ahuja, Astha; Alfrejd, Ben; Gomez, Francisco E.; Hjelmen, Carl E.; Moore, Kristi L.; Mackay, Trudy F. C.; Johnston, J. Spencer; Tarone, Aaron M.
2014-01-01
We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions. PMID:25057905
Ahmed, Wael A; Tsutsumi, Makiko; Nakata, Seiichi; Mori, Terumi; Nishimura, Yoichi; Fujisawa, Toshiyuki; Kato, Ichiro; Nakashima, Mayuki; Kurahashi, Hiroki; Suzuki, Kenji
2012-04-01
To evaluate the association of hypocretin neuropeptide precursor gene (HCRT) variations with obstructive sleep apnea syndrome (OSAS) in a cohort of Japanese patients and to further evaluate whether the significant HCRT variations have potential functional consequences on HCRT expression. Case-control genetic association study. We studied the genetic variations within the HCRT gene. The study population consisted of 100 OSAS patients and 100 control subjects. The HCRT gene was amplified by polymerase chain reaction in all study subjects followed by direct sequencing and analysis of sequencing data. Two genetic variations within the HCRT intron, IVS1+16T>C (rs9902709) and IVS1-69G>C, were identified with significant differences between patients and controls (P < .05). A reporter gene assay using HeLa cells showed that the construct containing the C allele of the rs9902709 variation had significantly higher luciferase activity compared with the construct containing the T allele (P = .002). Furthermore, enzyme immunoassay revealed that subjects with T/C and C/C genotypes for rs9902709 had 1.4-fold and 1.5-fold increases in sera levels of orexin-A, respectively. Our genetic association study, followed by functional and quantitative phenotyping assays, demonstrated a functional locus within the HCRT gene, which may act to increase HCRT expression and lead to a protective effect against the development of OSAS. Copyright © 2012 The American Laryngological, Rhinological, and Otological Society, Inc.
Martin-Fernandez, Laura; Gavidia-Bovadilla, Giovana; Corrales, Irene; Brunel, Helena; Ramírez, Lorena; López, Sonia; Souto, Juan Carlos; Vidal, Francisco; Soria, José Manuel
2017-01-01
Venous thromboembolism is a complex disease with a high heritability. There are significant associations among Factor XI (FXI) levels and SNPs in the KNG1 and F11 loci. Our aim was to identify the genetic variation of KNG1 and F11 that might account for the variability of FXI levels. The KNG1 and F11 loci were sequenced completely in 110 unrelated individuals from the GAIT-2 (Genetic Analysis of Idiopathic Thrombophilia 2) Project using Next Generation Sequencing on an Illumina MiSeq. The GAIT-2 Project is a study of 935 individuals in 35 extended Spanish families selected through a proband with idiopathic thrombophilia. Among the 110 individuals, a subset of 40 individuals was chosen as a discovery sample for identifying variants. A total of 762 genetic variants were detected. Several significant associations were established among common variants and low-frequency variants sets in KNG1 and F11 with FXI levels using the PLINK and SKAT packages. Among these associations, those of rs710446 and five low-frequency variant sets in KNG1 with FXI level variation were significant after multiple testing correction and permutation. Also, two putative pathogenic mutations related to high and low FXI levels were identified by data filtering and in silico predictions. This study of KNG1 and F11 loci should help to understand the connection between genotypic variation and variation in FXI levels. The functional genetic variants should be useful as markers of thromboembolic risk.
Complete Genome Sequences of Four Isolates of Plutella xylostella Granulovirus.
Spence, Robert J; Noune, Christopher; Hauxwell, Caroline
2016-06-30
Granuloviruses are widespread pathogens of Plutella xylostella L. (diamondback moth) and potential biopesticides for control of this global insect pest. We report the complete genomes of four Plutella xylostella granulovirus isolates from China, Malaysia, and Taiwan exhibiting pairs of noncoding, homologous repeat regions with significant sequence variation but equivalent length. Copyright © 2016 Spence et al.
Intraspecific variation in Cryptocaryon irritans.
Diggles, B K; Adlard, R D
1997-01-01
Intraspecific variation in the ciliate Cryptocaryon irritans was examined using sequences of the first internal transcribed spacer region (ITS-1) of ribosomal DNA (rDNA) combined with developmental and morphological characters. Amplified rDNA sequences consisting of 151 bases of the flanking 18 S and 5.8 S regions, and the entire ITS-1 region (169 or 170 bases), were determined and compared for 16 isolates of C. irritans from Australia, Israel and the USA. There was one variable base between isolates in the 18 S region and 11 variable bases in the ITS-1 region. Despite their similar morphology, significant sequence variation (4.1% divergence) and developmental differences indicate that Australian C. irritans isolates from estuarine (Moreton Bay) and coral reef (Heron Island) environments are distinct. The Heron Island isolate was genetically closer to morphologically dissimilar isolates from Israel (1.8% divergence) and the USA (2.3% divergence) than it was to the Moreton Bay isolates. Three isolates maintained in our laboratory since February 1994 differed in sequence from earlier laboratory isolates (2.9% to 3.5% divergence), even though all were similar morphologically and originated from the same source. During this time the sequence of the isolates from wild fish in Moreton Bay remained unchanged. These genetic differences indicate the existence of a founder effect in laboratory populations of C. irritans. The genetic variation found here, combined with known morphological and developmental differences, is used to characterise four strains of C. irritans.
A survey of tools for variant analysis of next-generation genome sequencing data
Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes
2014-01-01
Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494
Khankhet, Jordan; Vanderwolf, Karen J.; McAlpine, Donald F.; McBurney, Scott; Overy, David P.; Slavic, Durda; Xu, Jianping
2014-01-01
Pseudogymnoascus destructans is the causative agent of an emerging infectious disease that threatens populations of several North American bat species. The fungal disease was first observed in 2006 and has since caused the death of nearly six million bats. The disease, commonly known as white-nose syndrome, is characterized by a cutaneous infection with P. destructans causing erosions and ulcers in the skin of nose, ears and/or wings of bats. Previous studies based on sequences from eight loci have found that isolates of P. destructans from bats in the US all belong to one multilocus genotype. Using the same multilocus sequence typing method, we found that isolates from eastern and central Canada also had the same genotype as those from the US, consistent with the clonal expansion of P. destructans into Canada. However, our PCR fingerprinting revealed that among the 112 North American isolates we analyzed, three, all from Canada, showed minor genetic variation. Furthermore, we found significant variations among isolates in mycelial growth rate; the production of mycelial exudates; and pigment production and diffusion into agar media. These phenotypic differences were influenced by culture medium and incubation temperature, indicating significant variation in environmental condition - dependent phenotypic expression among isolates of the clonal P. destructans genotype in North America. PMID:25122221
Khankhet, Jordan; Vanderwolf, Karen J; McAlpine, Donald F; McBurney, Scott; Overy, David P; Slavic, Durda; Xu, Jianping
2014-01-01
Pseudogymnoascus destructans is the causative agent of an emerging infectious disease that threatens populations of several North American bat species. The fungal disease was first observed in 2006 and has since caused the death of nearly six million bats. The disease, commonly known as white-nose syndrome, is characterized by a cutaneous infection with P. destructans causing erosions and ulcers in the skin of nose, ears and/or wings of bats. Previous studies based on sequences from eight loci have found that isolates of P. destructans from bats in the US all belong to one multilocus genotype. Using the same multilocus sequence typing method, we found that isolates from eastern and central Canada also had the same genotype as those from the US, consistent with the clonal expansion of P. destructans into Canada. However, our PCR fingerprinting revealed that among the 112 North American isolates we analyzed, three, all from Canada, showed minor genetic variation. Furthermore, we found significant variations among isolates in mycelial growth rate; the production of mycelial exudates; and pigment production and diffusion into agar media. These phenotypic differences were influenced by culture medium and incubation temperature, indicating significant variation in environmental condition--dependent phenotypic expression among isolates of the clonal P. destructans genotype in North America.
Geleta, Mulatu; Herrera, Isabel; Monzón, Arnulfo; Bryngelsson, Tomas
2012-01-01
Coffea arabica L. (arabica coffee), the only tetraploid species in the genus Coffea, represents the majority of the world's coffee production and has a significant contribution to Nicaragua's economy. The present paper was conducted to determine the genetic diversity of arabica coffee in Nicaragua for its conservation and breeding values. Twenty-six populations that represent eight varieties in Nicaragua were investigated using simple sequence repeat (SSR) markers. A total of 24 alleles were obtained from the 12 loci investigated across 260 individual plants. The total Nei's gene diversity (H T) and the within-population gene diversity (H S) were 0.35 and 0.29, respectively, which is comparable with that previously reported from other countries and regions. Among the varieties, the highest diversity was recorded in the variety Catimor. Analysis of variance (AMOVA) revealed that about 87% of the total genetic variation was found within populations and the remaining 13% differentiate the populations (F ST = 0.13; P < 0.001). The variation among the varieties was also significant. The genetic variation in Nicaraguan coffee is significant enough to be used in the breeding programs, and most of this variation can be conserved through ex situ conservation of a low number of populations from each variety. PMID:22701376
Wyllie, David H; Sanderson, Nicholas; Myers, Richard; Peto, Tim; Robinson, Esther; Crook, Derrick W; Smith, E Grace; Walker, A Sarah
2018-06-06
Contact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics. Copyright © 2018 Wyllie et al.
Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C. Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B.; Nauck, Markus; Kaminski, Wolfgang E.
2017-01-01
The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its “a” determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the “a” determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of “a” determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly underappreciated. PMID:28472040
Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-Suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B; Nauck, Markus; Kaminski, Wolfgang E
2017-01-01
The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its "a" determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the "a" determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of "a" determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly underappreciated.
Graña-Miraglia, Lucía; Lozano, Luis F.; Velázquez, Consuelo; Volkow-Fernández, Patricia; Pérez-Oseguera, Ángeles; Cevallos, Miguel A.; Castillo-Ramírez, Santiago
2017-01-01
Genome sequencing has been useful to gain an understanding of bacterial evolution. It has been used for studying the phylogeography and/or the impact of mutation and recombination on bacterial populations. However, it has rarely been used to study gene turnover at microevolutionary scales. Here, we sequenced Mexican strains of the human pathogen Acinetobacter baumannii sampled from the same locale over a 3 year period to obtain insights into the microevolutionary dynamics of gene content variability. We found that the Mexican A. baumannii population was recently founded and has been emerging due to a rapid clonal expansion. Furthermore, we noticed that on average the Mexican strains differed from each other by over 300 genes and, notably, this gene content variation has accrued more frequently and faster than the accumulation of mutations. Moreover, due to its rapid pace, gene content variation reflects the phylogeny only at very short periods of time. Additionally, we found that the external branches of the phylogeny had almost 100 more genes than the internal branches. All in all, these results show that rapid gene turnover has been of paramount importance in producing genetic variation within this population and demonstrate the utility of genome sequencing to study alternative forms of genetic variation. PMID:28979253
Graña-Miraglia, Lucía; Lozano, Luis F; Velázquez, Consuelo; Volkow-Fernández, Patricia; Pérez-Oseguera, Ángeles; Cevallos, Miguel A; Castillo-Ramírez, Santiago
2017-01-01
Genome sequencing has been useful to gain an understanding of bacterial evolution. It has been used for studying the phylogeography and/or the impact of mutation and recombination on bacterial populations. However, it has rarely been used to study gene turnover at microevolutionary scales. Here, we sequenced Mexican strains of the human pathogen Acinetobacter baumannii sampled from the same locale over a 3 year period to obtain insights into the microevolutionary dynamics of gene content variability. We found that the Mexican A. baumannii population was recently founded and has been emerging due to a rapid clonal expansion. Furthermore, we noticed that on average the Mexican strains differed from each other by over 300 genes and, notably, this gene content variation has accrued more frequently and faster than the accumulation of mutations. Moreover, due to its rapid pace, gene content variation reflects the phylogeny only at very short periods of time. Additionally, we found that the external branches of the phylogeny had almost 100 more genes than the internal branches. All in all, these results show that rapid gene turnover has been of paramount importance in producing genetic variation within this population and demonstrate the utility of genome sequencing to study alternative forms of genetic variation.
Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza
2015-01-01
Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus. Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research (FSHR and LHR) as well as reproduction-linked polymorphisms and breeding programs. PMID:27844002
Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza
2015-06-01
Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus . Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research ( FSHR and LHR ) as well as reproduction-linked polymorphisms and breeding programs.
Gold nanoparticles for high-throughput genotyping of long-range haplotypes
NASA Astrophysics Data System (ADS)
Chen, Peng; Pan, Dun; Fan, Chunhai; Chen, Jianhua; Huang, Ke; Wang, Dongfang; Zhang, Honglu; Li, You; Feng, Guoyin; Liang, Peiji; He, Lin; Shi, Yongyong
2011-10-01
Completion of the Human Genome Project and the HapMap Project has led to increasing demands for mapping complex traits in humans to understand the aetiology of diseases. Identifying variations in the DNA sequence, which affect how we develop disease and respond to pathogens and drugs, is important for this purpose, but it is difficult to identify these variations in large sample sets. Here we show that through a combination of capillary sequencing and polymerase chain reaction assisted by gold nanoparticles, it is possible to identify several DNA variations that are associated with age-related macular degeneration and psoriasis on significant regions of human genomic DNA. Our method is accurate and promising for large-scale and high-throughput genetic analysis of susceptibility towards disease and drug resistance.
He, Xiao-Lan; Li, Qian; Peng, Wei-Hong; Zhou, Jie; Cao, Xue-Lian; Wang, Di; Huang, Zhong-Qian; Tan, Wei; Li, Yu; Gan, Bing-Cheng
2017-06-26
The internal transcribed spacer (ITS), RNA polymerase II second largest subunit (RPB2), and elongation factor 1-alpha (EF1α) are often used in fungal taxonomy and phylogenetic analysis. As we know, an ideal molecular marker used in molecular identification and phylogenetic studies is homogeneous within species, and interspecific variation exceeds intraspecific variation. However, during our process of performing ITS, RPB2, and EF1α sequencing on the Pleurotus spp., we found that intra-isolate sequence polymorphism might be present in these genes because direct sequencing of PCR products failed in some isolates. Therefore, we detected intra- and inter-isolate variation of the three genes in Pleurotus by polymerase chain reaction amplification and cloning in this study. Results showed that intra-isolate variation of ITS was not uncommon but the polymorphic level in each isolate was relatively low in Pleurotus; intra-isolate variations of EF1α and RPB2 sequences were present in an unexpectedly high amount. The polymorphism level differed significantly between ITS, RPB2, and EF1α in the same individual, and the intra-isolate heterogeneity level of each gene varied between isolates within the same species. Intra-isolate and intraspecific variation of ITS in the tested isolates was less than interspecific variation, and intra-isolate and intraspecific variation of RPB2 was probably equal with interspecific divergence. Meanwhile, intra-isolate and intraspecific variation of EF1α could exceed interspecific divergence. These findings suggested that RPB2 and EF1α are not desirable barcoding candidates for Pleurotus. We also discussed the reason why rDNA and protein-coding genes showed variants within a single isolate in Pleurotus, but must be addressed in further research. Our study demonstrated that intra-isolate variation of ribosomal and protein-coding genes are likely widespread in fungi. This has implications for studies on fungal evolution, taxonomy, phylogenetics, and population genetics. More extensive sampling of these genes and other candidates will be required to ensure reliability as phylogenetic markers and DNA barcodes.
Sun, Wei; Dong, Hui; Gao, Yue-Bo; Su, Qian-Fu; Qian, Hai-Tao; Bai, Hong-Yan; Zhang, Zhu-Ting; Cong, Bin
2015-01-01
The nonmigratory grasshopper Oedaleus infernalis Saussure (Orthoptera : Acridoidea) is an agricultural pest to crops and forage grasses over a wide natural geographical distribution in China. The genetic diversity and genetic variation among 10 geographically separated populations of O. infernalis was assessed using polymerase chain reaction-based molecular markers, including the intersimple sequence repeat and mitochondrial cytochrome oxidase sequences. A high level of genetic diversity was detected among these populations from the intersimple sequence repeat (H: 0.2628, I: 0.4129, Hs: 0.2130) and cytochrome oxidase analyses (Hd: 0.653). There was no obvious geographical structure based on an unweighted pair group method analysis and median-joining network. The values of FST, θII, and Gst estimated in this study are low, and the gene flow is high (Nm > 4). Analysis of the molecular variance suggested that most of the genetic variation occurs within populations, whereas only a small variation takes place between populations. No significant correlation was found between the genetic distance and geographical distance. Overall, our results suggest that the geographical distance plays an unimpeded role in the gene flow among O. infernalis populations. PMID:26496789
Novel rare variations of the oxytocin receptor (OXTR) gene in autism spectrum disorder individuals.
Liu, Xiaoxi; Kawashima, Minae; Miyagawa, Taku; Otowa, Takeshi; Latt, Khun Zaw; Thiri, Myo; Nishida, Hisami; Sugiyama, Toshiro; Tsurusaki, Yoshinori; Matsumoto, Naomichi; Mabuchi, Akihiko; Tokunaga, Katsushi; Sasaki, Tsukasa
2015-01-01
The oxytocin receptor (OXTR) gene has been implicated as a risk gene for autism spectrum disorder (ASD)-a neurodevelopmental disorder with essential features of impairments in social communication and reciprocal interaction. The genetic associations between common variations in OXTR and ASD have been reported in multiple ethnic populations. However, little is known about the distribution of rare variations within OXTR in ASD patients. In this study, we resequenced the full length of OXTR in 105 ASD individuals using an approach that combined the power of next-generation sequencing technology, long-range PCR and DNA pooling. We demonstrated that rare variants with minor allele frequency as low as 0.05% could be reliably detected by our method. We identified 28 novel variants including potential functional variants in the intron region and one rare missense variant (R150S). We subsequently performed Sanger sequencing and validated five novel variants located in previously suggested candidate regions in ASD individuals. Further sequencing of 312 healthy subjects showed that the burden of rare variants is significantly higher in ASDs compared with healthy individuals. Our results support that the rare variation in OXTR gene might be involved in ASD.
Gupta, Anamika; Pal, Sudhir K; Pandey, Divya; Fakir, Najneen A; Rathod, Sunita; Sinha, Dhiraj; SivaKumar, S; Sinha, Pallavi; Periera, Mycal; Balgam, Shilpa; Sekar, Gomathi; UmaDevi, K R; Anupurba, Shampa; Nema, Vijay
2017-08-18
The Mycobacterium tuberculosis (M.tb) protein kinase B (PknB) which is now proved to be essential for the growth and survival of M.tb, is a transmembrane protein with a potential to be a good drug target. However it is not known if this target remains conserved in otherwise resistant isolates from clinical origin. The present study describes the conservation analysis of sequences covering the inhibitor binding domain of PknB to assess if it remains conserved in susceptible and resistant clinical strains of mycobacteria picked from three different geographical areas of India. A total of 116 isolates from North, South and West India were used in the study with a variable profile of their susceptibilities towards streptomycin, isoniazid, rifampicin, ethambutol and ofloxacin. Isolates were also spoligotyped in order to find if the conservation pattern of pknB gene remain consistent or differ with different spoligotypes. The impact of variation as found in the study was analyzed using Molecular dynamics simulations. The sequencing results with 115/116 isolates revealed the conserved nature of pknB sequences irrespective of their susceptibility status and spoligotypes. The only variation found was in one strains wherein pnkB sequence had G to A mutation at 664 position translating into a change of amino acid, Valine to Isoleucine. After analyzing the impact of this sequence variation using Molecular dynamics simulations, it was observed that the variation is causing no significant change in protein structure or the inhibitor binding. Hence, the study endorses that PknB is an ideal target for drug development and there is no pre-existing or induced resistance with respect to the sequences involved in inhibitor binding. Also if the mutation that we are reporting for the first time is found again in subsequent work, it should be checked with phenotypic profile before drawing the conclusion that it would affect the activity in any way. Bioinformatics analysis in our study says that it has no significant effect on the binding and hence the activity of the protein.
Donaldson, Michael E; Rico, Yessica; Hueffer, Karsten; Rando, Halie M; Kukekova, Anna V; Kyle, Christopher J
2018-01-01
Pathogens are recognized as major drivers of local adaptation in wildlife systems. By determining which gene variants are favored in local interactions among populations with and without disease, spatially explicit adaptive responses to pathogens can be elucidated. Much of our current understanding of host responses to disease comes from a small number of genes associated with an immune response. High-throughput sequencing (HTS) technologies, such as genotype-by-sequencing (GBS), facilitate expanded explorations of genomic variation among populations. Hybridization-based GBS techniques can be leveraged in systems not well characterized for specific variants associated with disease outcome to "capture" specific genes and regulatory regions known to influence expression and disease outcome. We developed a multiplexed, sequence capture assay for red foxes to simultaneously assess ~300-kbp of genomic sequence from 116 adaptive, intrinsic, and innate immunity genes of predicted adaptive significance and their putative upstream regulatory regions along with 23 neutral microsatellite regions to control for demographic effects. The assay was applied to 45 fox DNA samples from Alaska, where three arctic rabies strains are geographically restricted and endemic to coastal tundra regions, yet absent from the boreal interior. The assay provided 61.5% on-target enrichment with relatively even sequence coverage across all targeted loci and samples (mean = 50×), which allowed us to elucidate genetic variation across introns, exons, and potential regulatory regions (4,819 SNPs). Challenges remained in accurately describing microsatellite variation using this technique; however, longer-read HTS technologies should overcome these issues. We used these data to conduct preliminary analyses and detected genetic structure in a subset of red fox immune-related genes between regions with and without endemic arctic rabies. This assay provides a template to assess immunogenetic variation in wildlife disease systems.
Robert E. Farmer
1966-01-01
Flowering of Populus deItoides Bartr. occurred from early March to early April; differences between trees within stands accounted for 98 percent of the significant variation in dates. High correlation (r = .91 to .96) between 1963 and 1964 dates of individual trees indicated that trees within stands flower in a predictable sequence. Seed dispersal...
Saranathan, Vinodkumar; Hamilton, Deborah; Powell, George V N; Kroodsma, Donald E; Prum, Richard O
2007-09-01
Vocal learning is thought to have evolved in three clades of birds (parrots, hummingbirds, and oscine passerines), and three clades of mammals (whales, bats, and primates). Behavioural data indicate that, unlike other suboscine passerines, the three-wattled bellbird Procnias tricarunculata (Cotingidae) is capable of vocal learning. Procnias tricarunculata shows conspicuous vocal ontogeny, striking geographical variation in song, and rapid temporal change in song within a population. Deprivation studies of vocal development in P. tricarunculata are impractical. Here, we report evidence from mitochondrial DNA sequences and nuclear microsatellite loci that genetic variation within and among the four allopatric breeding populations of P. tricarunculata is not congruent with variation in vocal behaviour. Sequences of the mitochondrial DNA control region document extensive haplotype sharing among localities and song types, and no phylogenetic resolution of geographical populations or behavioural groups. The vocally differentiated, allopatric breeding populations of P. tricarunculata are only weakly genetically differentiated populations, and are not distinct taxa. Mitochondrial DNA and microsatellite variation show small (2.9% and 13.5%, respectively) but significant correlation with geographical distance, but no significant residual variation by song type. Estimates of the strength of selection that would be needed to maintain the observed geographical pattern in vocal differentiation if songs were genetically based are unreasonably high, further discrediting the hypothesis of a genetic origin of vocal variation. These data support a fourth, phylogenetically independent origin of avian vocal learning in Procnias. Geographical variations in P. tricarunculata vocal behaviour are likely culturally evolved dialects.
Viswanathan, R; Balamuralikrishnan, M; Karuppaiah, R
2008-12-01
Sugarcane yellow leaf virus (SCYLV) that causes yellow leaf disease (YLD) in sugarcane (recently reported in India) belongs to Polerovirus. Detailed studies were conducted to characterize the virus based on partial open reading frames (ORFs) 1 and 2 and complete ORFs 3 and 4 sequences in their genome. Reverse-transcriptase polymerase chain reaction (RT-PCR) was performed on 48 sugarcane leaf samples to detect the virus using a specific set of primers. Of the 48 samples, 36 samples (field samples with and without foliar symptoms) including 10 meristem culture derived plants were found to be positive to SCYLV infection. Additionally, an aphid colony collected from symptomatic sugarcane in the field was also found to be SCYLV positive. The amplicons from 22 samples were cloned, sequenced and acronymed as SCYLV-CB isolates. The nucleotide (nt) and amino acid (aa) sequence comparison showed a significant variation between SCYLV-CB and the database sequences at nt (3.7-5.1%) and aa (3.2-5.3%) sequence level in the CP coding region. However, the database sequences comprising isolates of three reported genotypes, viz., BRA, PER and REU, were observed with least nt and aa sequence dissimilarities (0.0-1.6%). The phylogenetic analyses of the overlapping ORFs (ORF 3 and ORF 4) of SCYLV encoding CP and MP determined in this study and additional sequences of 26 other isolates including an Indian isolate (SCYLV-IND) available from GenBank were distributed in four phylogenetic clusters. The SCYLV-CB isolates from this study lineated in two clusters (C1 and C2) and all the other isolates from the worldwide locations into another two clusters (C3 and C4). The sequence variation of the isolates in this study with the database isolates, even in the least variable region of the SCYLV genome, showed that the population existing in India is significantly different from rest of the world. Further, comparison of partial sequences encoding for ORFs 1 and 2 revealed that YLD in sugarcane in India is caused by at least three genotypes, viz., CUB, IND and BRA-PER, of which a majority of the samples were found infected with Cuban genotype (CUB) and lesser by IND and BRA-PER genotypes. The genotype IND was identified as a new genotype from this study, and this was found to have significant variation with the reported genotypes.
Quantification of the tissue-culture induced variation in barley (Hordeum vulgare L.)
Bednarek, Piotr T; Orłowska, Renata; Koebner, Robert MD; Zimny, Janusz
2007-01-01
Background When plant tissue is passaged through in vitro culture, many regenerated plants appear to be no longer clonal copies of their donor genotype. Among the factors that affect this so-called tissue culture induced variation are explant genotype, explant tissue origin, medium composition, and the length of time in culture. Variation is understood to be generated via a combination of genetic and/or epigenetic changes. A lack of any phenotypic variation between regenerants does not necessarily imply a concomitant lack of genetic (or epigenetic) change, and it is therefore of interest to assay the outcomes of tissue culture at the genotypic level. Results A variant of methylation sensitive AFLP, based on the isoschizomeric combinations Acc65I/MseI and KpnI/MseI was applied to analyze, at both the sequence and methylation levels, the outcomes of regeneration from tissue culture in barley. Both sequence mutation and alteration in methylation pattern were detected. Two sets of regenerants from each of five DH donor lines were compared. One set was derived via androgenesis, and the other via somatic embryogenesis, developed from immature embryos. These comparisons delivered a quantitative assessment of the various types of somaclonal variation induced. The average level of variation was 6%, of which almost 1.7% could be accounted for by nucleotide mutation, and the remainder by changes in methylation state. The nucleotide mutation rates and the rate of epimutations were substantially similar between the andro- and embryo-derived sets of regenerants across all the donors. Conclusion We have developed an AFLP based approach that is capable of describing the qualitative and quantitative characteristics of the tissue culture-induced variation. We believe that this approach will find particular value in the study of patterns of inheritance of somaclonal variation, since non-heritable variation is of little interest for the improvement of plant species which are sexually propagated. Of significant biological interest is the conclusion that the mode of regeneration has no significant effect on the balance between sequence and methylation state change induced by the tissue culture process. PMID:17335560
Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore
2017-01-01
The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732
Robustness of Fat Quantification using Chemical Shift Imaging
Hansen, Katie H; Schroeder, Michael E; Hamilton, Gavin; Sirlin, Claude B; Bydder, Mark
2011-01-01
This purpose of this study was to investigate the effect of parameter changes that can potentially lead to unreliable measurements in fat quantification. Chemical shift imaging was performed using spoiled gradient echo sequences with systematic variations in the following: 2D/3D sequence, number of echoes, delta echo time, fractional echo factor, slice thickness, repetition time, flip angle, bandwidth, matrix size, flow compensation and field strength. Results indicated no significant (or significant but small) changes in fat fraction with parameter. The significant changes can be attributed to known effects of T1 bias and the two forms of noise bias. PMID:22055856
2017 Valparaíso earthquake sequence and the megathrust patchwork of central Chile
NASA Astrophysics Data System (ADS)
Nealy, Jennifer L.; Herman, Matthew W.; Moore, Ginevra L.; Hayes, Gavin P.; Benz, Harley M.; Bergman, Eric A.; Barrientos, Sergio E.
2017-09-01
In April 2017, a sequence of earthquakes offshore Valparaíso, Chile, raised concerns of a potential megathrust earthquake in the near future. The largest event in the 2017 sequence was a
2017 Valparaíso earthquake sequence and the megathrust patchwork of central Chile
Nealy, Jennifer; Herman, Matthew W.; Moore, Ginevra; Hayes, Gavin; Benz, Harley M.; Bergman, Eric A.; Barrientos, Sergio E
2017-01-01
In April 2017, a sequence of earthquakes offshore Valparaíso, Chile, raised concerns of a potential megathrust earthquake in the near future. The largest event in the 2017 sequence was a M6.9 on 24 April, seemingly colocated with the last great-sized earthquake in the region—a M8.0 in March 1985. The history of large earthquakes in this region shows significant variation in rupture size and extent, typically highlighted by a juxtaposition of large ruptures interspersed with smaller magnitude sequences. We show that the 2017 sequence ruptured an area between the two main slip patches during the 1985 earthquake, rerupturing a patch that had previously slipped during the October 1973 M6.5 earthquake sequence. A significant gap in historic ruptures exists directly to the south of the 2017 sequence, with large enough moment deficit to host a great-sized earthquake in the near future, if it is locked.
West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N
2014-07-01
The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Bellissimo, Daniel B; Christopherson, Pamela A; Flood, Veronica H; Gill, Joan Cox; Friedman, Kenneth D; Haberichter, Sandra L; Shapiro, Amy D; Abshire, Thomas C; Leissinger, Cindy; Hoots, W Keith; Lusher, Jeanne M; Ragni, Margaret V; Montgomery, Robert R
2012-03-01
Diagnosis and classification of VWD is aided by molecular analysis of the VWF gene. Because VWF polymorphisms have not been fully characterized, we performed VWF laboratory testing and gene sequencing of 184 healthy controls with a negative bleeding history. The controls included 66 (35.9%) African Americans (AAs). We identified 21 new sequence variations, 13 (62%) of which occurred exclusively in AAs and 2 (G967D, T2666M) that were found in 10%-15% of the AA samples, suggesting they are polymorphisms. We identified 14 sequence variations reported previously as VWF mutations, the majority of which were type 1 mutations. These controls had VWF Ag levels within the normal range, suggesting that these sequence variations might not always reduce plasma VWF levels. Eleven mutations were found in AAs, and the frequency of M740I, H817Q, and R2185Q was 15%-18%. Ten AA controls had the 2N mutation H817Q; 1 was homozygous. The average factor VIII level in this group was 99 IU/dL, suggesting that this variation may confer little or no clinical symptoms. This study emphasizes the importance of sequencing healthy controls to understand ethnic-specific sequence variations so that asymptomatic sequence variations are not misidentified as mutations in other ethnic or racial groups.
Novel variable number of tandem repeats of gibbon MAOA gene and its evolutionary significance.
Choi, Yuri; Jung, Yi-Deun; Ayarpadikannan, Selvam; Koga, Akihiko; Imai, Hiroo; Hirai, Hirohisa; Roos, Christian; Kim, Heui-Soo
2014-08-01
Variable number of tandem repeats (VNTRs) are scattered throughout the primate genome, and genetic variation of these VNTRs have been accumulated during primate radiation. Here, we analyzed VNTRs upstream of the monoamine oxidase A (MAOA) gene in 11 different gibbon species. An abundance of truncated VNTR sequences and copy number differences were observed compared to those of human VNTR sequences. To better understand the biological role of these VNTRs, a luciferase activity assay was conducted and results indicated that selected VNTR sequences of the MAOA gene from human and three different gibbon species (Hylobates klossii, Hylobates lar, and Nomascus concolor) showed silencing ability. Together, these data could be useful for understanding the evolutionary history and functional significance of MAOA VNTR sequences in gibbon species.
Erickson, Robert P
2016-01-01
The advent of next generation sequencing (NGS, which consists of massively parallel sequencing to perform TGS (total genome sequencing) or WES (whole exome sequencing)) has abundantly discovered many causative mutations in patients with pediatric neurological disease. A surprisingly high number of these are de novo mutations which have not been inherited from either parent. For epilepsy, autism spectrum disorders, and neuromotor disorders, including cerebral palsy, initial estimates put the frequency of causative de novo mutations at about 15% and about 10% of these are somatic. There are some shared mutated genes between these three classes of disease. Studies of copy number variation by comparative genomic hybridization (CGH) proceded the NGS approaches but they also detect de novo variation which is especially important for ASDs. There are interesting differences between the mutated genes detected by CGS and NGS. In summary, de novo mutations cause a very significant proportion of pediatric neurological disease. Copyright © 2015 Elsevier B.V. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant economic loss in salmonid aquaculture. Using microsatellites genome scan we have previously detected significant and suggestive QTL with major effects on the phenotypic variation of survival following challenge with Flavobacterium psychrophilum...
On the variability of LSI+61 deg 303 (identical with GT 0236)
NASA Technical Reports Server (NTRS)
Tanzi, E. G.; Bignami, G. F.; Caraveo, P. A.; Maraschi, L.; Sormani, F.; Treves, A.
1982-01-01
Out of six long and six short wavelength observations, one spectrum exhibits a significant photometric variation: or approximately 20%. Interpreting the continuum as due to superposition of an early B main sequence star plus a gaseous component contributing at lambda 2000 A, the wavelength dependence of the variation suggests that it derives from the latter component. The data indicate that if the observed variation is phase dependent, a minimum should occur between phases 0.8 and 0.2. However, since the variation is observed in only one spectrum, it may well be erratic.
USDA-ARS?s Scientific Manuscript database
Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...
Yao, Hui; Song, Jing-Yuan; Ma, Xin-Ye; Liu, Chang; Li, Ying; Xu, Hong-Xi; Han, Jian-Ping; Duan, Li-Sheng; Chen, Shi-Lin
2009-05-01
DNA barcoding is a novel technology that uses a standard DNA sequence to facilitate species identification. Although a consensus has not been reached regarding which DNA sequences can be used as the best plant barcodes, the psbA-trnH spacer region has been tested extensively in recent years. In this study, we hypothesize that the psbA-trnH spacer regions are also effective barcodes for Dendrobium species. We have sequenced the chloroplast psbA-trnH intergenic spacers of 17 Dendrobium species to test this hypothesis. The sequences were found to be significantly different from those of other species, with percentages of variation ranging from 0.3 % to 2.3 % and an average of 1.2 %. In contrast, the intraspecific variation among the Dendrobium species studied ranged from 0 % to 0.1 %. The sequence difference between the psbA-trnH sequences of 17 Dendrobium species and one Bulbophyllum odoratissimum ranged from 2.0 % to 3.1 %, with an average of 2.5 %. Our results support the notion that the psbA-trnH intergenic spacer region could be used as a barcode to distinguish various Dendrobium species and to differentiate Dendrobium species from other adulterating species. Copyright Georg Thieme Verlag KG Stuttgart. New York.
Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing
NASA Astrophysics Data System (ADS)
Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael
2016-09-01
Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.
Cánovas, A; Rincón, G; Islas-Trejo, A; Jimenez-Flores, R; Laubscher, A; Medrano, J F
2013-04-01
The technological properties of milk have significant importance for the dairy industry. Citrate, a normal constituent of milk, forms one of the main buffer systems that regulate the equilibrium between Ca(2+) and H(+) ions. Higher-than-normal citrate content is associated with poor coagulation properties of milk. To identify the genes responsible for the variation of citrate content in milk in dairy cattle, the metabolic steps involved in citrate and fatty acid synthesis pathways in ruminant mammary tissue using RNA sequencing were studied. Genetic markers that could influence milk citrate content in Holstein cows were used in a marker-trait association study to establish the relationship between 74 single nucleotide polymorphisms (SNP) in 20 candidate genes and citrate content in 250 Holstein cows. This analysis revealed 6 SNP in key metabolic pathway genes [isocitrate dehydrogenase 1 (NADP+), soluble (IDH1); pyruvate dehydrogenase (lipoamide) β (PDHB); pyruvate kinase (PKM2); and solute carrier family 25 (mitochondrial carrier; citrate transporter), member 1 (SLC25A1)] significantly associated with increased milk citrate content. The amount of the phenotypic variation explained by the 6 SNP ranged from 10.1 to 13.7%. Also, genotype-combination analysis revealed the highest phenotypic variation was explained combining IDH1_23211, PDHB_5562, and SLC25A1_4446 genotypes. This specific genotype combination explained 21.3% of the phenotypic variation. The largest citrate associated effect was in the 3' untranslated region of the SLC25A1 gene, which is responsible for the transport of citrate across the mitochondrial inner membrane. This study provides an approach using RNA sequencing, metabolic pathway analysis, and association studies to identify genetic variation in functional target genes determining complex trait phenotypes. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L
2006-01-01
Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851
Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H; Proukakis, Christos
2017-01-01
Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array "waves", and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance.
Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M.; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H.
2017-01-01
Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array “waves”, and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance. PMID:28683077
Kinkar, Liina; Laurimäe, Teivi; Simsek, Sami; Balkaya, Ibrahim; Casulli, Adriano; Manfredi, Maria Teresa; Ponce-Gordo, Francisco; Varcasia, Antonio; Lavikainen, Antti; González, Luis Miguel; Rehbein, Steffen; VAN DER Giessen, Joke; Sprong, Hein; Saarma, Urmas
2016-11-01
Echinococcus granulosus is the causative agent of cystic echinococcosis. The disease is a significant global public health concern and human infections are most commonly associated with E. granulosus sensu stricto (s. s.) genotype G1. The objectives of this study were to: (i) analyse the genetic variation and phylogeography of E. granulosus s. s. G1 in part of its main distribution range in Europe using 8274 bp of mtDNA; (ii) compare the results with those derived from previously used shorter mtDNA sequences and highlight the major differences. We sequenced a total of 91 E. granulosus s. s. G1 isolates from six different intermediate host species, including humans. The isolates originated from seven countries representing primarily Turkey, Italy and Spain. Few samples were also from Albania, Greece, Romania and from a patient originating from Algeria, but diagnosed in Finland. The analysed 91 sequences were divided into 83 haplotypes, revealing complex phylogeography and high genetic variation of E. granulosus s. s. G1 in Europe, particularly in the high-diversity domestication centre of western Asia. Comparisons with shorter mtDNA datasets revealed that 8274 bp sequences provided significantly higher phylogenetic resolution and thus more power to reveal the genetic relations between different haplotypes.
Cavusoglu, M; Ciloglu, T; Serinagaoglu, Y; Kamasak, M; Erogul, O; Akcam, T
2008-08-01
In this paper, 'snore regularity' is studied in terms of the variations of snoring sound episode durations, separations and average powers in simple snorers and in obstructive sleep apnoea (OSA) patients. The goal was to explore the possibility of distinguishing among simple snorers and OSA patients using only sleep sound recordings of individuals and to ultimately eliminate the need for spending a whole night in the clinic for polysomnographic recording. Sequences that contain snoring episode durations (SED), snoring episode separations (SES) and average snoring episode powers (SEP) were constructed from snoring sound recordings of 30 individuals (18 simple snorers and 12 OSA patients) who were also under polysomnographic recording in Gülhane Military Medical Academy Sleep Studies Laboratory (GMMA-SSL), Ankara, Turkey. Snore regularity is quantified in terms of mean, standard deviation and coefficient of variation values for the SED, SES and SEP sequences. In all three of these sequences, OSA patients' data displayed a higher variation than those of simple snorers. To exclude the effects of slow variations in the base-line of these sequences, new sequences that contain the coefficient of variation of the sample values in a 'short' signal frame, i.e., short time coefficient of variation (STCV) sequences, were defined. The mean, the standard deviation and the coefficient of variation values calculated from the STCV sequences displayed a stronger potential to distinguish among simple snorers and OSA patients than those obtained from the SED, SES and SEP sequences themselves. Spider charts were used to jointly visualize the three parameters, i.e., the mean, the standard deviation and the coefficient of variation values of the SED, SES and SEP sequences, and the corresponding STCV sequences as two-dimensional plots. Our observations showed that the statistical parameters obtained from the SED and SES sequences, and the corresponding STCV sequences, possessed a strong potential to distinguish among simple snorers and OSA patients, both marginally, i.e., when the parameters are examined individually, and jointly. The parameters obtained from the SEP sequences and the corresponding STCV sequences, on the other hand, did not have a strong discrimination capability. However, the joint behaviour of these parameters showed some potential to distinguish among simple snorers and OSA patients.
Karpyak, Victor M; Kim, Jeong-Hyun; Biernacka, Joanna M; Wieben, Eric D; Mrazek, David A; Black, John L; Choi, Doo-Sup
2009-04-01
Mpdz gene variations are known contributors of acute alcohol withdrawal severity and seizures in mice. To investigate the relevance of these findings for human alcoholism, we resequenced 46 exons, exon-intron boundaries, and 2 kilobases in the 5' region of the human MPDZ gene in 61 subjects with a history of alcohol withdrawal seizures (AWS), 59 subjects with a history of alcohol withdrawal without AWS, and 64 Coriell samples from self-reported nonalcoholic subjects [all European American (EA) ancestry] and compared with the Mpdz sequences of 3 mouse strains with different propensity to AWS. To explore potential associations of the human MPDZ gene with alcoholism and AWS, single SNP and haplotype analyses were performed using 13 common variants. Sixty-seven new, mostly rare variants were discovered in the human MPDZ gene. Sequence comparison revealed that the human gene does not have variations identical to those comprising Mpdz gene haplotype associated with AWS in mice. We also found no significant association between MPDZ haplotypes and AWS in humans. However, a global test of haplotype association revealed a significant difference in haplotype frequencies between alcohol-dependent subjects without AWS and Coriell controls (p = 0.015), suggesting a potential role of MPDZ in alcoholism and/or related phenotypes other than AWS. Haplotype-specific tests for the most common haplotypes (frequency > 0.05), revealed a specific high-risk haplotype (p = 0.006, maximum statistic p = 0.051), containing rs13297480G allele also found to be significantly more prevalent in alcoholics without AWS compared with nonalcoholic Coriell subjects (p = 0.019). Sequencing of MPDZ gene in individuals with EA ancestry revealed no variations in the sites identical to those associated with AWS in mice. Exploratory haplotype and single SNP association analyses suggest a possible association between the MPDZ gene and alcohol dependence but not AWS. Further functional genomic analysis of MPDZ variants and investigation of their association with a broader array of alcoholism-related phenotypes could reveal additional genetic markers of alcoholism.
RefSeq microbial genomes database: new representation and annotation strategy.
Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor
2014-01-01
The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
Zhou, Daniel X M; Chan, Paul K S; Zhang, Tiejun; Tully, Damien C; Tam, John S
2010-10-01
Studies on the association between sequence variability of the interferon sensitivity-determining region (ISDR) of hepatitis C virus and the outcome of treatment have reached conflicting results. In this study, 25 patients infected with HCV 6a who had received interferon-alpha/ribavirin combination treatment were analyzed for the sequence variations. 14 of them had the full genome sequences obtained from a previous study, whereas the other 11 samples were sequenced for the extended ISDR (eISDR). This eISDR fragment covers 192 bp (64 amino acids) upstream and 201 bp (67 amino acids) downstream from the ISDR previously defined for HCV 1b. The comparison between interferon-alpha resistance and response groups for the amino acid mutations located in the full genome (6 and 8 patients respectively) as well as the mutations located in the eISDR (10 and 15 patients respectively) showed that the mutations I2160V, I2256V, V2292I (P<0.05) within eISDR were significantly associated with resistance to treatment. However, the extent of amino acid variations within previously defined ISDR was not associated with resistance to treatment as previously reported. Four amino acid variations I248V (P=0.03-0.06) within E1, R445K (P=0.02-0.05) and S747T (P=0.03) within E2, I861V (P=0.01) within NS2 which located outside the eISDR may also associate with treatment outcome as identified by a prescreening of variations within 14 HCV 6a full genomes. (c) 2010 Elsevier B.V. All rights reserved.
Chaisi, Mamohale E; Collins, Nicola E; Potgieter, Fred T; Oosthuizen, Marinda C
2013-01-16
The African buffalo (Syncerus caffer) is a natural reservoir host for both pathogenic and non-pathogenic Theileria species. These often occur naturally as mixed infections in buffalo. Although the benign and mildly pathogenic forms do not have any significant economic importance, their presence could complicate the interpretation of diagnostic test results aimed at the specific diagnosis of the pathogenic Theileria parva in cattle and buffalo in South Africa. The 18S rRNA gene has been used as the target in a quantitative real-time PCR (qPCR) assay for the detection of T. parva infections. However, the extent of sequence variation within this gene in the non-pathogenic Theileria spp. of the Africa buffalo is not well known. The aim of this study was, therefore, to characterise the full-length 18S rRNA genes of Theileria mutans, Theileria sp. (strain MSD) and T. velifera and to determine the possible influence of any sequence variation on the specific detection of T. parva using the 18S rRNA qPCR. The reverse line blot (RLB) hybridization assay was used to select samples which either tested positive for several different Theileria spp., or which hybridised only with the Babesia/Theileria genus-specific probe and not with any of the Babesia or Theileria species-specific probes. The full-length 18S rRNA genes from 14 samples, originating from 13 buffalo and one bovine from different localities in South Africa, were amplified, cloned and the resulting recombinants sequenced. Variations in the 18S rRNA gene sequences were identified in T. mutans, Theileria sp. (strain MSD) and T. velifera, with the greatest diversity observed amongst the T. mutans variants. This variation possibly explained why the RLB hybridization assay failed to detect T. mutans and T. velifera in some of the analysed samples. Copyright © 2012 Elsevier B.V. All rights reserved.
Major histocompatibility complex variation in the endangered Przewalski's horse.
Hedrick, P W; Parker, K M; Miller, E L; Miller, P S
1999-01-01
The major histocompatibility complex (MHC) is a fundamental part of the vertebrate immune system, and the high variability in many MHC genes is thought to play an essential role in recognition of parasites. The Przewalski's horse is extinct in the wild and all the living individuals descend from 13 founders, most of whom were captured around the turn of the century. One of the primary genetic concerns in endangered species is whether they have ample adaptive variation to respond to novel selective factors. In examining 14 Przewalski's horses that are broadly representative of the living animals, we found six different class II DRB major histocompatibility sequences. The sequences showed extensive nonsynonymous variation, concentrated in the putative antigen-binding sites, and little synonymous variation. Individuals had from two to four sequences as determined by single-stranded conformation polymorphism (SSCP) analysis. On the basis of the SSCP data, phylogenetic analysis of the nucleotide sequences, and segregation in a family group, we conclude that four of these sequences are from one gene (although one sequence codes for a nonfunctional allele because it contains a stop codon) and two other sequences are from another gene. The position of the stop codon is at the same amino-acid position as in a closely related sequence from the domestic horse. Because other organisms have extensive variation at homologous loci, the Przewalski's horse may have quite low variation in this important adaptive region. PMID:10430594
Fogt-Wyrwas, R; Mizgajska-Wiktor, H; Pacoń, J; Jarosz, W
2013-12-01
Some parasitic nematodes can inhabit different definitive hosts, which raises the question of the intraspecific variability of the nematode genotype affecting their preferences to choose particular species as hosts. Additionally, the issue of a possible intraspecific DNA microheterogeneity in specimens from different parts of the world seems to be interesting, especially from the evolutionary point of view. The problem was analysed in three related species - Toxocara canis, Toxocara cati and Toxascaris leonina - specimens originating from Central Europe (Poland). Using specific primers for species identification, internal transcribed spacer (ITS)-1 and ITS-2 regions were amplified and then sequenced. The sequences obtained were compared with sequences previously described for specimens originating from other geographical locations. No differences in nucleotide sequences were established in T. canis isolated from two different hosts (dogs and foxes). A comparison of ITS sequences of T. canis from Poland with sequences deposited in GenBank showed that the scope of intraspecific variability of the species did not exceed 0.4%, while in T. cati the differences did not exceed 2%. Significant differences were found in T. leonina, where ITS-1 differed by 3% and ITS-2 by as much as 7.4% in specimens collected from foxes in Poland and dogs in Australia. Such scope of differences in the nucleotide sequence seems to exceed the intraspecific variation of the species.
Using chaos to generate variations on movement sequences
NASA Astrophysics Data System (ADS)
Bradley, Elizabeth; Stuart, Joshua
1998-12-01
We describe a method for introducing variations into predefined motion sequences using a chaotic symbol-sequence reordering technique. A progression of symbols representing the body positions in a dance piece, martial arts form, or other motion sequence is mapped onto a chaotic trajectory, establishing a symbolic dynamics that links the movement sequence and the attractor structure. A variation on the original piece is created by generating a trajectory with slightly different initial conditions, inverting the mapping, and using special corpus-based graph-theoretic interpolation schemes to smooth any abrupt transitions. Sensitive dependence guarantees that the variation is different from the original; the attractor structure and the symbolic dynamics guarantee that the two resemble one another in both aesthetic and mathematical senses.
2013-01-01
Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. Conclusions This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens. PMID:23497218
Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D
2013-03-07
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li's D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.
Genetic Variation in Cardiomyopathy and Cardiovascular Disorders.
McNally, Elizabeth M; Puckelwartz, Megan J
2015-01-01
With the wider deployment of massively-parallel, next-generation sequencing, it is now possible to survey human genome data for research and clinical purposes. The reduced cost of producing short-read sequencing has now shifted the burden to data analysis. Analysis of genome sequencing remains challenged by the complexity of the human genome, including redundancy and the repetitive nature of genome elements and the large amount of variation in individual genomes. Public databases of human genome sequences greatly facilitate interpretation of common and rare genetic variation, although linking database sequence information to detailed clinical information is limited by privacy and practical issues. Genetic variation is a rich source of knowledge for cardiovascular disease because many, if not all, cardiovascular disorders are highly heritable. The role of rare genetic variation in predicting risk and complications of cardiovascular diseases has been well established for hypertrophic and dilated cardiomyopathy, where the number of genes that are linked to these disorders is growing. Bolstered by family data, where genetic variants segregate with disease, rare variation can be linked to specific genetic variation that offers profound diagnostic information. Understanding genetic variation in cardiomyopathy is likely to help stratify forms of heart failure and guide therapy. Ultimately, genetic variation may be amenable to gene correction and gene editing strategies.
[Hydrologic variability and sensitivity based on Hurst coefficient and Bartels statistic].
Lei, Xu; Xie, Ping; Wu, Zi Yi; Sang, Yan Fang; Zhao, Jiang Yan; Li, Bin Bin
2018-04-01
Due to the global climate change and frequent human activities in recent years, the pure stochastic components of hydrological sequence is mixed with one or several of the variation ingredients, including jump, trend, period and dependency. It is urgently needed to clarify which indices should be used to quantify the degree of their variability. In this study, we defined the hydrological variability based on Hurst coefficient and Bartels statistic, and used Monte Carlo statistical tests to test and analyze their sensitivity to different variants. When the hydrological sequence had jump or trend variation, both Hurst coefficient and Bartels statistic could reflect the variation, with the Hurst coefficient being more sensitive to weak jump or trend variation. When the sequence had period, only the Bartels statistic could detect the mutation of the sequence. When the sequence had a dependency, both the Hurst coefficient and the Bartels statistics could reflect the variation, with the latter could detect weaker dependent variations. For the four variations, both the Hurst variability and Bartels variability increased with the increases of variation range. Thus, they could be used to measure the variation intensity of the hydrological sequence. We analyzed the temperature series of different weather stations in the Lancang River basin. Results showed that the temperature of all stations showed the upward trend or jump, indicating that the entire basin had experienced warming in recent years and the temperature variability in the upper and lower reaches was much higher. This case study showed the practicability of the proposed method.
Diamant, Eran; Palti, Yniv; Gur-Arie, Riva; Cohen, Helit; Hallerman, Eric M; Kashi, Yechezkel
2004-04-01
Multilocus sequencing of housekeeping genes has been used previously for bacterial strain typing and for inferring evolutionary relationships among strains of Escherichia coli. In this study, we used shorter intergenic sequences that contained simple sequence repeats (SSRs) of repeating mononucleotide motifs (mononucleotide repeats [MNRs]) to infer the phylogeny of pathogenic and commensal E. coli strains. Seven noncoding loci (four MNRs and three non-SSRs) were sequenced in 27 strains, including enterohemorrhagic (six isolates of O157:H7), enteropathogenic, enterotoxigenic, B, and K-12 strains. The four MNRs were also sequenced in 20 representative strains of the E. coli reference (ECOR) collection. Sequence polymorphism was significantly higher at the MNR loci, including the flanking sequences, indicating a higher mutation rate in the sequences flanking the MNR tracts. The four MNR loci were amplifiable by PCR in the standard ECOR A, B1, and D groups, but only one (yaiN) in the B2 group was amplified, which is consistent with previous studies that suggested that B2 is the most ancient group. High sequence compatibility was found between the four MNR loci, indicating that they are in the same clonal frame. The phylogenetic trees that were constructed from the sequence data were in good agreement with those of previous studies that used multilocus enzyme electrophoresis. The results demonstrate that MNR loci are useful for inferring phylogenetic relationships and provide much higher sequence variation than housekeeping genes. Therefore, the use of MNR loci for multilocus sequence typing should prove efficient for clinical diagnostics, epidemiology, and evolutionary study of bacteria.
Diamant, Eran; Palti, Yniv; Gur-Arie, Riva; Cohen, Helit; Hallerman, Eric M.; Kashi, Yechezkel
2004-01-01
Multilocus sequencing of housekeeping genes has been used previously for bacterial strain typing and for inferring evolutionary relationships among strains of Escherichia coli. In this study, we used shorter intergenic sequences that contained simple sequence repeats (SSRs) of repeating mononucleotide motifs (mononucleotide repeats [MNRs]) to infer the phylogeny of pathogenic and commensal E. coli strains. Seven noncoding loci (four MNRs and three non-SSRs) were sequenced in 27 strains, including enterohemorrhagic (six isolates of O157:H7), enteropathogenic, enterotoxigenic, B, and K-12 strains. The four MNRs were also sequenced in 20 representative strains of the E. coli reference (ECOR) collection. Sequence polymorphism was significantly higher at the MNR loci, including the flanking sequences, indicating a higher mutation rate in the sequences flanking the MNR tracts. The four MNR loci were amplifiable by PCR in the standard ECOR A, B1, and D groups, but only one (yaiN) in the B2 group was amplified, which is consistent with previous studies that suggested that B2 is the most ancient group. High sequence compatibility was found between the four MNR loci, indicating that they are in the same clonal frame. The phylogenetic trees that were constructed from the sequence data were in good agreement with those of previous studies that used multilocus enzyme electrophoresis. The results demonstrate that MNR loci are useful for inferring phylogenetic relationships and provide much higher sequence variation than housekeeping genes. Therefore, the use of MNR loci for multilocus sequence typing should prove efficient for clinical diagnostics, epidemiology, and evolutionary study of bacteria. PMID:15066845
Xia, Shu; Kohli, Manish; Du, Meijun; Dittmar, Rachel L; Lee, Adam; Nandy, Debashis; Yuan, Tiezheng; Guo, Yongchen; Wang, Yuan; Tschannen, Michael R; Worthey, Elizabeth; Jacob, Howard; See, William; Kilari, Deepak; Wang, Xuexia; Hovey, Raymond L; Huang, Chiang-Ching; Wang, Liang
2015-06-30
Liquid biopsies, examinations of tumor components in body fluids, have shown promise for predicting clinical outcomes. To evaluate tumor-associated genomic and genetic variations in plasma cell-free DNA (cfDNA) and their associations with treatment response and overall survival, we applied whole genome and targeted sequencing to examine the plasma cfDNAs derived from 20 patients with advanced prostate cancer. Sequencing-based genomic abnormality analysis revealed locus-specific gains or losses that were common in prostate cancer, such as 8q gains, AR amplifications, PTEN losses and TMPRSS2-ERG fusions. To estimate tumor burden in cfDNA, we developed a Plasma Genomic Abnormality (PGA) score by summing the most significant copy number variations. Cox regression analysis showed that PGA scores were significantly associated with overall survival (p < 0.04). After androgen deprivation therapy or chemotherapy, targeted sequencing showed significant mutational profile changes in genes involved in androgen biosynthesis, AR activation, DNA repair, and chemotherapy resistance. These changes may reflect the dynamic evolution of heterozygous tumor populations in response to these treatments. These results strongly support the feasibility of using non-invasive liquid biopsies as potential tools to study biological mechanisms underlying therapy-specific resistance and to predict disease progression in advanced prostate cancer.
Shang, Zhi-Yuan; Wang, Jian; Zhang, Wen; Li, Yan-Yan; Cui, Ming-Xing; Chen, Zhen-Ju; Zhao, Xing-Yun
2013-01-01
A measurement was made on the vertical direction tree ring stable carbon isotope ratio (delta13C) and tree ring width of Pinus sylvestris var. mongolica in northern Daxing' an Mountains of Northeast China, with the relationship between the vertical direction variations of the tree ring delta13C and tree ring width analyzed. In the whole ring of xylem, earlywood (EW) and bark endodermis, the delta13C all exhibited an increasing trend from the top to the base at first, with the maximum at the bottom of tree crown, and then, decreased rapidly to the minimum downward. The EW and late-wood (LW) had an increasing ratio of average tree ring width from the base to the top. The average annual sequence of the delta13C in vertical direction had an obvious reverse correspondence with the average annual sequence of tree ring width, and had a trend comparatively in line with the average annual sequence of the tree ring width ratio of EW to LW above tree crown. The variance analysis showed that there existed significant differences in the sequences of tree ring delta13C and ring width in vertical direction, and the magnitude of vertical delta13C variability was basically the same as that of the inter-annual delta13C variability. The year-to-year variation trend of the vertical delta13C sequence was approximately identical. For each sample, the delta13C sequence at the same heights was negatively correlated with the ring width sequence, but the statistical significance differed with tree height.
Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G
2014-11-29
Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.
Arias-Pulido, Hugo; Peyton, Cheri L; Torrez-Martínez, Norah; Anderson, D Nelson; Wheeler, Cosette M
2005-07-20
While HPV 16 variant lineages have been well characterized, the knowledge about HPV 18 variants is limited. In this study, HPV 18 nucleotide variations in the E2 hinge region were characterized by sequence analysis in 47 control and 51 tumor specimens. Fifty of these specimens were randomly selected for sequencing of an LCR-E6 segment and 20 samples representative of LCR-E6 and E2 sequence variants were examined across the L1 region. A total of 2770 nucleotides per HPV 18 variant genome were considered in this study. HPV 18 variant nucleotides were linked among all gene segments analyzed and grouped into three main branches: Asian-American (AA), European (E), and African (Af). These three branches were equally distributed among controls and cases and when stratified by Hispanic and non-Hispanic ethnicities. Among invasive cervical cancer cases, no significant differences in the three HPV variant branches were observed among ethnic groups or when stratified by histopathology (squamous vs. adenocarcinoma). The Af branch showed the greatest nucleotide variability when compared to the HPV 18 reference sequence and was more closely related to HPV 45 than either AA or E branches. Our data also characterize nucleotide and amino acid variations in the L1 capsid gene among HPV 18 variants, which may be relevant to vaccine strategies and subsequent studies of naturally occurring HPV 18 variants. Several novel HPV 18 nucleotide variations were identified in this study.
Nesbitt, T Clint; Tanksley, Steven D
2002-01-01
Sequence variation was sampled in cultivated and related wild forms of tomato at fw2.2--a fruit weight QTL key to the evolution of domesticated tomatoes. Variation at fw2.2 was contrasted with variation at four other loci not involved in fruit weight determination. Several conclusions could be reached: (1) Fruit weight variation attributable to fw2.2 is not caused by variation in the FW2.2 protein sequence; more likely, it is due to transcriptional variation associated with one or more of eight nucleotide changes unique to the promoter of large-fruit alleles; (2) fw2.2 and loci not involved in fruit weight have not evolved at distinguishably different rates in cultivated and wild tomatoes, despite the fact that fw2.2 was likely a target of selection during domestication; (3) molecular-clock-based estimates suggest that the large-fruit allele of fw2.2, now fixed in most cultivated tomatoes, arose in tomato germplasm long before domestication; (4) extant accessions of L. esculentum var. cerasiforme, the subspecies thought to be the most likely wild ancestor of domesticated tomatoes, appear to be an admixture of wild and cultivated tomatoes rather than a transitional step from wild to domesticated tomatoes; and (5) despite the fact that cerasiforme accessions are polymorphic for large- and small-fruit alleles at fw2.2, no significant association was detected between fruit size and fw2.2 genotypes in the subspecies--as tested by association genetic studies in the relatively small sample studied--suggesting the role of other fruit weight QTL in fruit weight variation in cerasiforme. PMID:12242247
Keel, B N; Nonneman, D J; Rohrer, G A
2017-08-01
Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
A wide extent of inter-strain diversity in virulent and vaccine strains of alphaherpesviruses.
Szpara, Moriah L; Tafuri, Yolanda R; Parsons, Lance; Shamim, S Rafi; Verstrepen, Kevin J; Legendre, Matthieu; Enquist, L W
2011-10-01
Alphaherpesviruses are widespread in the human population, and include herpes simplex virus 1 (HSV-1) and 2, and varicella zoster virus (VZV). These viral pathogens cause epithelial lesions, and then infect the nervous system to cause lifelong latency, reactivation, and spread. A related veterinary herpesvirus, pseudorabies (PRV), causes similar disease in livestock that result in significant economic losses. Vaccines developed for VZV and PRV serve as useful models for the development of an HSV-1 vaccine. We present full genome sequence comparisons of the PRV vaccine strain Bartha, and two virulent PRV isolates, Kaplan and Becker. These genome sequences were determined by high-throughput sequencing and assembly, and present new insights into the attenuation of a mammalian alphaherpesvirus vaccine strain. We find many previously unknown coding differences between PRV Bartha and the virulent strains, including changes to the fusion proteins gH and gB, and over forty other viral proteins. Inter-strain variation in PRV protein sequences is much closer to levels previously observed for HSV-1 than for the highly stable VZV proteome. Almost 20% of the PRV genome contains tandem short sequence repeats (SSRs), a class of nucleic acids motifs whose length-variation has been associated with changes in DNA binding site efficiency, transcriptional regulation, and protein interactions. We find SSRs throughout the herpesvirus family, and provide the first global characterization of SSRs in viruses, both within and between strains. We find SSR length variation between different isolates of PRV and HSV-1, which may provide a new mechanism for phenotypic variation between strains. Finally, we detected a small number of polymorphic bases within each plaque-purified PRV strain, and we characterize the effect of passage and plaque-purification on these polymorphisms. These data add to growing evidence that even plaque-purified stocks of stable DNA viruses exhibit limited sequence heterogeneity, which likely seeds future strain evolution.
He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z
2013-12-04
Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.
Methodologic European external quality assurance for DNA sequencing: the EQUALseq program.
Ahmad-Nejad, Parviz; Dorn-Beineke, Alexandra; Pfeiffer, Ulrike; Brade, Joachim; Geilenkeuser, Wolf-Jochen; Ramsden, Simon; Pazzagli, Mario; Neumaier, Michael
2006-04-01
DNA sequencing is a key technique in molecular diagnostics, but to date no comprehensive methodologic external quality assessment (EQA) programs have been instituted. Between 2003 and 2005, the European Union funded, as specific support actions, the EQUAL initiative to develop methodologic EQA schemes for genotyping (EQUALqual), quantitative PCR (EQUALquant), and sequencing (EQUALseq). Here we report on the results of the EQUALseq program. The participating laboratories received a 4-sample set comprising 2 DNA plasmids, a PCR product, and a finished sequencing reaction to be analyzed. Data and information from detailed questionnaires were uploaded online and evaluated by use of a scoring system for technical skills and proficiency of data interpretation. Sixty laboratories from 21 European countries registered, and 43 participants (72%) returned data and samples. Capillary electrophoresis was the predominant platform (n = 39; 91%). The median contiguous correct sequence stretch was 527 nucleotides with considerable variation in quality of both primary data and data evaluation. The association between laboratory performance and the number of sequencing assays/year was statistically significant (P <0.05). Interestingly, more than 30% of participants neither added comments to their data nor made efforts to identify the gene sequences or mutational positions. Considerable variations exist even in a highly standardized methodology such as DNA sequencing. Methodologic EQAs are appropriate tools to uncover strengths and weaknesses in both technique and proficiency, and our results emphasize the need for mandatory EQAs. The results of EQUALseq should help improve the overall quality of molecular genetics findings obtained by DNA sequencing.
The Relevance of HLA Sequencing in Population Genetics Studies
Sanchez-Mazas, Alicia
2014-01-01
Next generation sequencing (NGS) is currently being adapted by different biotechnological platforms to the standard typing method for HLA polymorphism, the huge diversity of which makes this initiative particularly challenging. Boosting the molecular characterization of the HLA genes through efficient, rapid, and low-cost technologies is expected to amplify the success of tissue transplantation by enabling us to find donor-recipient matching for rare phenotypes. But the application of NGS technologies to the molecular mapping of the MHC region also anticipates essential changes in population genetic studies. Huge amounts of HLA sequence data will be available in the next years for different populations, with the potential to change our understanding of HLA variation in humans. In this review, we first explain how HLA sequencing allows a better assessment of the HLA diversity in human populations, taking also into account the methodological difficulties it introduces at the statistical level; secondly, we show how analyzing HLA sequence variation may improve our comprehension of population genetic relationships by facilitating the identification of demographic events that marked human evolution; finally, we discuss the interest of both HLA and genome-wide sequencing and genotyping in detecting functionally significant SNPs in the MHC region, the latter having also contributed to the makeup of the HLA molecular diversity observed today. PMID:25126587
The relevance of HLA sequencing in population genetics studies.
Sanchez-Mazas, Alicia; Meyer, Diogo
2014-01-01
Next generation sequencing (NGS) is currently being adapted by different biotechnological platforms to the standard typing method for HLA polymorphism, the huge diversity of which makes this initiative particularly challenging. Boosting the molecular characterization of the HLA genes through efficient, rapid, and low-cost technologies is expected to amplify the success of tissue transplantation by enabling us to find donor-recipient matching for rare phenotypes. But the application of NGS technologies to the molecular mapping of the MHC region also anticipates essential changes in population genetic studies. Huge amounts of HLA sequence data will be available in the next years for different populations, with the potential to change our understanding of HLA variation in humans. In this review, we first explain how HLA sequencing allows a better assessment of the HLA diversity in human populations, taking also into account the methodological difficulties it introduces at the statistical level; secondly, we show how analyzing HLA sequence variation may improve our comprehension of population genetic relationships by facilitating the identification of demographic events that marked human evolution; finally, we discuss the interest of both HLA and genome-wide sequencing and genotyping in detecting functionally significant SNPs in the MHC region, the latter having also contributed to the makeup of the HLA molecular diversity observed today.
Fokkema, Ivo F A C; den Dunnen, Johan T; Taschner, Peter E M
2005-08-01
The completion of the human genome project has initiated, as well as provided the basis for, the collection and study of all sequence variation between individuals. Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). We have developed the Leiden Open (source) Variation Database (LOVD) software approaching the "LSDB-in-a-Box" idea for the easy creation and maintenance of a fully web-based gene sequence variation database. LOVD is platform-independent and uses PHP and MySQL open source software only. The basic gene-centered and modular design of the database follows the recommendations of the Human Genome Variation Society (HGVS) and focuses on the collection and display of DNA sequence variations. With minimal effort, the LOVD platform is extendable with clinical data. The open set-up should both facilitate and promote functional extension with scripts written by the community. The LOVD software is freely available from the Leiden Muscular Dystrophy pages (www.DMD.nl/LOVD/). To promote the use of LOVD, we currently offer curators the possibility to set up an LSDB on our Leiden server. (c) 2005 Wiley-Liss, Inc.
Ekblom, Robert; Farrell, Lindsay L; Lank, David B; Burke, Terry
2012-01-01
By next generation transcriptome sequencing, it is possible to obtain data on both nucleotide sequence variation and gene expression. We have used this approach (RNA-Seq) to investigate the genetic basis for differences in plumage coloration and mating strategies in a non-model bird species, the ruff (Philomachus pugnax). Ruff males show enormous variation in the coloration of ornamental feathers, used for individual recognition. This polymorphism is linked to reproductive strategies, with dark males (Independents) defending territories on leks against other Independents, whereas white morphs (Satellites) co-occupy Independent's courts without agonistic interactions. Previous work found a strong genetic component for mating strategy, but the genes involved were not identified. We present feather transcriptome data of more than 6,000 de-novo sequenced ruff genes (although with limited coverage for many of them). None of the identified genes showed significant expression divergence between males, but many genetic markers showed nucleotide differentiation between different color morphs and mating strategies. These include several feather keratin genes, splicing factors, and the Xg blood-group gene. Many of the genes with significant genetic structure between mating strategies have not yet been annotated and their functions remain to be elucidated. We also conducted in-depth investigations of 28 pre-identified coloration candidate genes. Two of these (EDNRB and TYR) were specifically expressed in black- and rust-colored males, respectively. We have demonstrated the utility of next generation transcriptome sequencing for identifying and genotyping large number of genetic markers in a non-model species without previous genomic resources, and highlight the potential of this approach for addressing the genetic basis of ecologically important variation. PMID:23145334
Neocortical malformation as consequence of nonadaptive regulation of neuronogenetic sequence
NASA Technical Reports Server (NTRS)
Caviness, V. S. Jr; Takahashi, T.; Nowakowski, R. S.
2000-01-01
Variations in the structure of the neocortex induced by single gene mutations may be extreme or subtle. They differ from variations in neocortical structure encountered across and within species in that these "normal" structural variations are adaptive (both structurally and behaviorally), whereas those associated with disorders of development are not. Here we propose that they also differ in principle in that they represent disruptions of molecular mechanisms that are not normally regulatory to variations in the histogenetic sequence. We propose an algorithm for the operation of the neuronogenetic sequence in relation to the overall neocortical histogenetic sequence and highlight the restriction point of the G1 phase of the cell cycle as the master regulatory control point for normal coordinate structural variation across species and importantly within species. From considerations based on the anatomic evidence from neocortical malformation in humans, we illustrate in principle how this overall sequence appears to be disrupted by molecular biological linkages operating principally outside the control mechanisms responsible for the normal structural variation of the neocortex. MRDD Research Reviews 6:22-33, 2000. Copyright 2000 Wiley-Liss, Inc.
Verma, Kapil; Sharma, Sapna; Sharma, Arun; Dalal, Jyoti; Bhardwaj, Tapeshwar
2018-06-01
Genetic variations among humans occur both within and among populations and range from single nucleotide changes to multiple-nucleotide variants. These multiple-nucleotide variants are useful for studying the relationships among individuals or various population groups. The study of human genetic variations can help scientists understand how different population groups are biologically related to one another. Sequence analysis of hypervariable regions of human mitochondrial DNA (mtDNA) has been successfully used for the genetic characterization of different population groups for forensic purposes. It is well established that different ethnic or population groups differ significantly in their mtDNA distributions. In the last decade, very little research has been conducted on mtDNA variations in the Indian population, although such data would be useful for elucidating the history of human population expansion across the world. Moreover, forensic studies on mtDNA variations in the Indian subcontinent are also scarce, particularly in the northern part of India. In this report, variations in the hypervariable regions of mtDNA were analyzed in the Yadav population of Haryana. Different molecular diversity indices were computed. Further, the obtained haplotypes were classified into different haplogroups and the phylogenetic relationship between different haplogroups was inferred.
Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W
2012-12-20
Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.
Genomic Sequence Variation Markup Language (GSVML).
Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi
2010-02-01
With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as a potential data exchanging format for genomic sequence variation data exchange focusing on human health applications. The international standardization of GSVML is necessary, and is currently underway. GSVML can be applied to enhance the utilization of genomic sequence variation data worldwide by providing a communicable platform between clinical and research applications. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.
Methods and statistics for combining motif match scores.
Bailey, T L; Gribskov, M
1998-01-01
Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.
Rostami, S; Salavati, R; Beech, R N; Babaei, Z; Sharbatkhori, M; Baneshi, M R; Hajialilo, E; Shad, H; Harandi, M F
2015-03-01
Although Taenia hydatigena is one of the most prevalent taeniid species of livestock, very little molecular genetic information exists for this parasite. Up to 100 sheep isolates of T. hydatigena were collected from 19 abattoirs located in the provinces of Tehran, Alborz and Kerman. A calibrated microscope was used to measure the larval rostellar hook lengths. Following DNA extraction, fragments of cytochrome c oxidase 1 (CO1) and 12S rRNA genes were amplified by the polymerase chain reaction method and the amplicons were subjected to sequencing. The mean total length of large and small hooks was 203.4 μm and 135.9 μm, respectively. Forty CO1 and 39 12S rRNA sequence haplotypes were obtained in the study. The levels of pairwise nucleotide variation between individual haplotypes of CO1 and 12S rRNA genes were determined to be between 0.3-3.4% and 0.2-2.1%, respectively. The overall nucleotide variation among all the CO1 haplotypes was 9.7%, and for all the 12S rRNA haplotypes it was 10.1%. A significant difference was observed between rostellar hook morphometry and both CO1 and 12S rRNA sequence variability. A significantly high level of genetic variation was observed in the present study. The results showed that the 12S rRNA gene is more variable than CO1.
Zhang, Yanying; Yang, Qingsong; Ling, Juan; Van Nostrand, Joy D; Shi, Zhou; Zhou, Jizhong; Dong, Junde
2016-01-01
The coral holobiont often resides in oligotrophic waters; both coral cells and their symbiotic dinoflagellates possess ammonium assimilation enzymes and potentially benefit from the nitrogen fixation of coral-associated diazotrophs. However, the seasonal dynamics of coral-associated diazotrophs are not well characterized. Here, the seasonal variations of diazotrophic communities associated with three corals, Galaxea astreata, Pavona decussata , and Porites lutea , were studied using nifH gene amplicon pyrosequencing techniques. Our results revealed a great diversity of coral-associated diazotrophs. nifH sequences related to Alphaproteobacteria, Deltaproteobacteria , and Gammaproteobacteria were ubiquitous and dominant in all corals in two seasons. In contrast with the coral P. decussata , both G. astreata and P. lutea showed significant seasonal changes in the diazotrophic communities and nifH gene abundance. Variable diazotroph groups accounted for a range from 11 to 49% within individual coral samples. Most of the variable diazotrophic groups from P. decussata were species-specific, however, the majority of overlapping variable groups in G. astreata and P. lutea showed the same seasonal variation characteristics. Rhodopseudomonas palustris - and Gluconacetobacter diazotrophicus -affiliated sequences were relatively abundant in the summer, whereas a nifH sequence related to Halorhodospira halophila was relatively abundant in spring G. astreata and P. lutea . The seasonal variations of all diazotrophic communities were significantly correlated with the seasonal shifts of ammonium and nitrate, suggesting that diazotrophs play an important role in the nitrogen cycle of the coral holobiont.
Zhang, Yanying; Yang, Qingsong; Ling, Juan; Van Nostrand, Joy D.; Shi, Zhou; Zhou, Jizhong; Dong, Junde
2016-01-01
The coral holobiont often resides in oligotrophic waters; both coral cells and their symbiotic dinoflagellates possess ammonium assimilation enzymes and potentially benefit from the nitrogen fixation of coral-associated diazotrophs. However, the seasonal dynamics of coral-associated diazotrophs are not well characterized. Here, the seasonal variations of diazotrophic communities associated with three corals, Galaxea astreata, Pavona decussata, and Porites lutea, were studied using nifH gene amplicon pyrosequencing techniques. Our results revealed a great diversity of coral-associated diazotrophs. nifH sequences related to Alphaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria were ubiquitous and dominant in all corals in two seasons. In contrast with the coral P. decussata, both G. astreata and P. lutea showed significant seasonal changes in the diazotrophic communities and nifH gene abundance. Variable diazotroph groups accounted for a range from 11 to 49% within individual coral samples. Most of the variable diazotrophic groups from P. decussata were species-specific, however, the majority of overlapping variable groups in G. astreata and P. lutea showed the same seasonal variation characteristics. Rhodopseudomonas palustris- and Gluconacetobacter diazotrophicus-affiliated sequences were relatively abundant in the summer, whereas a nifH sequence related to Halorhodospira halophila was relatively abundant in spring G. astreata and P. lutea. The seasonal variations of all diazotrophic communities were significantly correlated with the seasonal shifts of ammonium and nitrate, suggesting that diazotrophs play an important role in the nitrogen cycle of the coral holobiont. PMID:27920768
Differences and Similarities between Summer and Winter Temperatures and Winds during MaCWAVE
NASA Technical Reports Server (NTRS)
Schmidlin, F. J.; Goldberg, R. A.
2008-01-01
The Mountain and Convective Waves Ascending Vertically Experiment (MaCWAVE) was carried out in two sequences: one during the summer from the Andoya Rocket Range (69N) during July 2002 to examine convective initiation of gravity waves. The second was a winter sequence from ESRANGE (68N) during January 2003 to examine mountain-initiated waves. Inflatable falling spheres released from small meteorological rockets provided significant information about the variation of temperature and wind from 50 km and higher. The small rocket launch activity was restricted to 12-hour periods that inhibited observing a full diurnal cycle, nonetheless, the time-history of the measurements have provided information about tidal motion. During summer, temperature variation was smaller than observed during winter when peak differences reached 15-20 K at 80-85 km. variation in zonal winds varied up to more than 100 mps in summer and winter. Times of wind vs. altitude showed that the peak zonal component occurred approximately two hours ahead of the peak meridional wind. Measurement details and the observed variations are discussed.
Differences and Similarities in MaCWAVE Summer and Winter Temperatures and Winds
NASA Technical Reports Server (NTRS)
Schmidlin, F. J.; Goldberg, R. A.
2008-01-01
Small meteorological rockets released inflatable falling spheres during the MaCWAVE Campaign. The Mountain and Convective Waves Ascending Vertically Experiment (MaCWAVE) was carried out in two parts, a summer sequence from Andoya Rocket Range (69N) during July 2002 to examine convective initiation of gravity waves and a winter sequence from ESRANGE (68N) during January 2003 to examine mountain-terrain initiated gravity waves. The sphere-tracked data provided significant information about the variation of temperature and wind from 70 km and above. The changes observed may be considered akin to tidal motion; unfortunately the launch activity was restricted to 12-hour periods, thus the observation of a full diurnal cycle was not possible. During summer, temperature variation was smaller than that observed during winter when peak to null differences reached 15-20 K at 80-85 km. Variation in the zonal winds varied up to 100+mps in summer and winter. Examination of the times of peak wind vs altitude showed that the peak zonal wind occurred approximately two hours ahead of the peak meridional wind. We provide details about the measurements and observed variations.
Wiel, Laurens; Venselaar, Hanka; Veltman, Joris A.; Vriend, Gert
2017-01-01
Abstract Whole exomes of patients with a genetic disorder are nowadays routinely sequenced but interpretation of the identified genetic variants remains a major challenge. The increased availability of population‐based human genetic variation has given rise to measures of genetic tolerance that have been used, for example, to predict disease‐causing genes in neurodevelopmental disorders. Here, we investigated whether combining variant information from homologous protein domains can improve variant interpretation. For this purpose, we developed a framework that maps population variation and known pathogenic mutations onto 2,750 “meta‐domains.” These meta‐domains consist of 30,853 homologous Pfam protein domain instances that cover 36% of all human protein coding sequences. We find that genetic tolerance is consistent across protein domain homologues, and that patterns of genetic tolerance faithfully mimic patterns of evolutionary conservation. Furthermore, for a significant fraction (68%) of the meta‐domains high‐frequency population variation re‐occurs at the same positions across domain homologues more often than expected. In addition, we observe that the presence of pathogenic missense variants at an aligned homologous domain position is often paired with the absence of population variation and vice versa. The use of these meta‐domains can improve the interpretation of genetic variation. PMID:28815929
Wang, Haonan; Bangerter, Neal K; Park, Daniel J; Adluru, Ganesh; Kholmovski, Eugene G; Xu, Jian; DiBella, Edward
2015-10-01
Highly undersampled three-dimensional (3D) saturation-recovery sequences are affected by k-space trajectory since the magnetization does not reach steady state during the acquisition and the slab excitation profile yields different flip angles in different slices. This study compares centric and reverse-centric 3D cardiac perfusion imaging. An undersampled (98 phase encodes) 3D ECG-gated saturation-recovery sequence that alternates centric and reverse-centric acquisitions each time frame was used to image phantoms and in vivo subjects. Flip angle variation across the slices was measured, and contrast with each trajectory was analyzed via Bloch simulation. Significant variations in flip angle were observed across slices, leading to larger signal variation across slices for the centric acquisition. In simulation, severe transient artifacts were observed when using the centric trajectory with higher flip angles, placing practical limits on the maximum flip angle used. The reverse-centric trajectory provided less contrast, but was more robust to flip angle variations. Both of the k-space trajectories can provide reasonable image quality. The centric trajectory can have higher CNR, but is more sensitive to flip angle variation. The reverse-centric trajectory is more robust to flip angle variation. © 2014 Wiley Periodicals, Inc.
Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
Degner, Jacob F.; Marioni, John C.; Pai, Athma A.; Pickrell, Joseph K.; Nkadori, Everlyne; Gilad, Yoav; Pritchard, Jonathan K.
2009-01-01
Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. Contact: jdegner@uchicago.edu; marioni@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19808877
NASA Astrophysics Data System (ADS)
Fanini, L.; Zampicinini, G.; Tsigenopoulos, C. S.; Barboza, F. R.; Lozoya, J. P.; Gómez, J.; Celentano, E.; Lercari, D.; Marchetti, G. M.; Defeo, O.
2017-03-01
Life-history, substrate choice and Cytochrome Oxidase I (COI) sequences were analysed in populations of two peracaridans, the supralittoral talitrid Atlantorchestoidea brasiliensis and the intertidal cirolanid Excirolana armata. Three populations of each species, from beaches with similar grain size and located at different points along the natural gradient generated by the Rio de la Plata estuary were analysed. Abundance of E. armata increased with distance from the estuary, while the opposite trend was observed for A. brasiliensis. The proportion of females decreased towards high salinities for both species, significantly for E. armata. A test on substrate salinity preference revealed the absence of patterns due to active choice in E. armata. By contrast, A. brasiliensis showed no preference for the population closer to the estuary, while individuals from the other two sites significantly preferred high salinity substrates. Mitochondrial COI sequences were obtained from A. brasiliensis specimens tested for behaviour. Sequence analysis showed the population from the intermediate site to differ significantly from the other two. No significant genetic differentiation was instead found between populations from the two most distant sites, nor between individuals that expressed different salinity preference. Results showed that diverse sets of traits at the population level enable sandy beach species to cope with local environmental changes: life-history and behavioural traits appear to change in response to different ecological conditions, and, in the case of A brasiliensis, independently of the population structure inferred from COI sequence variation. Information from multiple traits allowed detection of population profiles, highlighting the relevance of multidisciplinary information and the concurrent analysis of field data and laboratory experiments, to detect responses of resident biota to environmental changes.
Markovska-Simoska, S; Pop-Jordanova, N
2009-07-01
(Full text is available at http://www.manu.edu.mk/prilozi). Continous Performance Tests (CPTs) form a group of paradigms for the evaluation of attention and, to a lesser degree, the response inhibition (or disinhibition) component of executive control. The object of this study was to compare performance on a CPT using both visual and emotional tasks in 46 normal adult subjects. In particular, it was to examine the effects of the type of task (VCPT or ECPT), sequence of presentation, and gender/age influence on performance as measured errors of omission, errors of commission, reaction time and variation of reaction time. From the results we can assume that there are significantly worse performance parameters for ECPT than VCPT tasks, with a probable explanation of the influence of emotional stimuli on attention and information-processing and no significant effect of order of presentation and gender on performance. Significant differences with more omission errors for older groups were obtained, showing better attention in younger subjects. Key words: VCPT, ECPT, omission errors, commission errors, reaction time, variation of reaction time, normal adults.
Molecular Population Genetics of the Alcohol Dehydrogenase Gene Region of DROSOPHILA MELANOGASTER
Aquadro, Charles F.; Desse, Susan F.; Bland, Molly M.; Langley, Charles H.; Laurie-Ahlberg, Cathy C.
1986-01-01
Variation in the DNA restriction map of a 13-kb region of chromosome II including the alcohol dehydrogenase structural gene (Adh) was examined in Drosophila melanogaster from natural populations. Detailed analysis of 48 D. melanogaster lines representing four eastern United States populations revealed extensive DNA sequence variation due to base substitutions, insertions and deletions. Cloning of this region from several lines allowed characterization of length variation as due to unique sequence insertions or deletions [nine sizes; 21–200 base pairs (bp)] or transposable element insertions (several sizes, 340 bp to 10.2 kb, representing four different elements). Despite this extensive variation in sequences flanking the Adh gene, only one length polymorphism is clearly associated with altered Adh expression (a copia element approximately 250 bp 5' to the distal transcript start site). Nonetheless, the frequency spectra of transposable elements within and between Drosophila species suggests they are slightly deleterious. Strong nonrandom associations are observed among Adh region sequence variants, ADH allozyme (Fast vs. Slow), ADH enzyme activity and the chromosome inversion ln(2L) t. Phylogenetic analysis of restriction map haplotypes suggest that the major twofold component of ADH activity variation (high vs. low, typical of Fast and Slow allozymes, respectively) is due to sequence variation tightly linked to and possibly distinct from that underlying the allozyme difference. The patterns of nucleotide and haplotype variation for Fast and Slow allozyme lines are consistent with the recent increase in frequency and spread of the Fast haplotype associated with high ADH activity. These data emphasize the important role of evolutionary history and strong nonrandom associations among tightly linked sequence variation as determinants of the patterns of variation observed in natural populations. PMID:3026893
Variation, Repetition, And Choice
Abreu-Rodrigues, Josele; Lattal, Kennon A; dos Santos, Cristiano V; Matos, Ricardo A
2005-01-01
Experiment 1 investigated the controlling properties of variability contingencies on choice between repeated and variable responding. Pigeons were exposed to concurrent-chains schedules with two alternatives. In the REPEAT alternative, reinforcers in the terminal link depended on a single sequence of four responses. In the VARY alternative, a response sequence in the terminal link was reinforced only if it differed from the n previous sequences (lag criterion). The REPEAT contingency generated low, constant levels of sequence variation whereas the VARY contingency produced levels of sequence variation that increased with the lag criterion. Preference for the REPEAT alternative tended to increase directly with the degree of variation required for reinforcement. Experiment 2 examined the potential confounding effects in Experiment 1 of immediacy of reinforcement by yoking the interreinforcer intervals in the REPEAT alternative to those in the VARY alternative. Again, preference for REPEAT was a function of the lag criterion. Choice between varying and repeating behavior is discussed with respect to obtained behavioral variability, probability of reinforcement, delay of reinforcement, and switching within a sequence. PMID:15828592
Feng, Zhao; Kui-Dong, Xu; Zhao-Cui, Meng
2012-12-01
By using denaturing gradient gel electrophoresis (DGGE) and sequencing as well as Ludox-QPS method, an investigation was made on the ciliate diversity and its spatiotemporal variation in the surface sediments at three sites of Yangtze River estuary hypoxic zone in April and August 2011. The ANOSIM analysis indicated that the ciliate diversity had significant difference among the sites (R = 0.896, P = 0.0001), but less difference among seasons (R = 0.043, P = 0.207). The sequencing of 18S rDNA DGGE bands revealed that the most predominant groups were planktonic Choreotrichia and Oligotrichia. The detection by Ludox-QPS method showed that the species number and abundance of active ciliates were maintained at a higher level, and increased by 2-5 times in summer, as compared with those in spring. Both the Ludox-QPS method and the DGGE technique detected that the ciliate diversity at the three sites had the similar variation trend, and the Ludox-QPS method detected that there was a significant variation in the ciliate species number and abundance between different seasons. The species number detected by Ludox-QPS method was higher than that detected by DGGE bands. Our study indicated that the ciliates in Yangtze River estuary hypoxic zone had higher diversity and abundance, with the potential to supply food for the polyps of jellyfish.
Lashkari, Mohammadreza; Manzari, Shahab; Sahragard, Ahad; Malagnini, Valeria; Boykin, Laura M; Hosseini, Reza
2014-07-01
The Asian citrus psyllid, Diaphorina citri Kuwayama (Hemiptera: Liviidae), is one of the most serious pests of citrus in the world, because it transmits the pathogen that causes citrus greening disease. To determine genetic variation among geographic populations of D. citri, microsatellite markers, mitochondrial gene cytochrome oxidase I (mtCOI) and the Wolbachia-Diaphorina, wDi, gene wsp sequence data were used to characterize Iranian and Pakistani populations. Also, a Bayesian phylogenetic technique was utilized to elucidate the relationships among the sequences data in this study and all mtCOI and wsp sequence data available in GenBank and the Wolbachia database. Microsatellite markers revealed significant genetic differentiation among Iranian populations, as well as between Iranian and Pakistani populations (FST = 0.0428, p < 0.01). Within Iran, the Sistan-Baluchestan population is significantly different from the Hormozgan (Fareghan) and Fars populations. By contrast, mtCOI data revealed two polymorphic sites separating the sequences from Iran and Pakistan. Global phylogenetic analyses showed that D. citri populations in Iran, India, Saudi Arabia, Brazil, Mexico, Florida and Texas (USA) are similar. Wolbachia, wDi, wsp sequences were similar among Iranian populations, but different between Iranian and Pakistani populations. The South West Asia (SWA) group is the most likely source of the introduced Iranian populations of D. citri. This assertion is also supported by the sequence similarity of the Wolbachia, wDi, strains from the Florida, USA and Iranian D. citri. These results should be considered when looking for biological controls in either country. © 2013 Society of Chemical Industry.
Iskow, Rebecca C.; Austermann, Christian; Scharer, Christopher D.; Raj, Towfique; Boss, Jeremy M.; Sunyaev, Shamil; Price, Alkes; Stranger, Barbara; Simon, Viviana; Lee, Charles
2013-01-01
Ancient population structure shaping contemporary genetic variation has been recently appreciated and has important implications regarding our understanding of the structure of modern human genomes. We identified a ∼36-kb DNA segment in the human genome that displays an ancient substructure. The variation at this locus exists primarily as two highly divergent haplogroups. One of these haplogroups (the NE1 haplogroup) aligns with the Neandertal haplotype and contains a 4.6-kb deletion polymorphism in perfect linkage disequilibrium with 12 single nucleotide polymorphisms (SNPs) across diverse populations. The other haplogroup, which does not contain the 4.6-kb deletion, aligns with the chimpanzee haplotype and is likely ancestral. Africans have higher overall pairwise differences with the Neandertal haplotype than Eurasians do for this NE1 locus (p<10−15). Moreover, the nucleotide diversity at this locus is higher in Eurasians than in Africans. These results mimic signatures of recent Neandertal admixture contributing to this locus. However, an in-depth assessment of the variation in this region across multiple populations reveals that African NE1 haplotypes, albeit rare, harbor more sequence variation than NE1 haplotypes found in Europeans, indicating an ancient African origin of this haplogroup and refuting recent Neandertal admixture. Population genetic analyses of the SNPs within each of these haplogroups, along with genome-wide comparisons revealed significant FST (p = 0.00003) and positive Tajima's D (p = 0.00285) statistics, pointing to non-neutral evolution of this locus. The NE1 locus harbors no protein-coding genes, but contains transcribed sequences as well as sequences with putative regulatory function based on bioinformatic predictions and in vitro experiments. We postulate that the variation observed at this locus predates Human–Neandertal divergence and is evolving under balancing selection, especially among European populations. PMID:23593015
A comparative molecular analysis of water-filled limestone sinkholes in north-eastern Mexico.
Sahl, Jason W; Gary, Marcus O; Harris, J Kirk; Spear, John R
2011-01-01
Sistema Zacatón in north-eastern Mexico is host to several deep, water-filled, anoxic, karstic sinkholes (cenotes). These cenotes were explored, mapped, and geochemically and microbiologically sampled by the autonomous underwater vehicle deep phreatic thermal explorer (DEPTHX). The community structure of the filterable fraction of the water column and extensive microbial mats that coat the cenote walls was investigated by comparative analysis of small-subunit (SSU) 16S rRNA gene sequences. Full-length Sanger gene sequence analysis revealed novel microbial diversity that included three putative bacterial candidate phyla and three additional groups that showed high intra-clade distance with poorly characterized bacterial candidate phyla. Limited functional gene sequence analysis in these anoxic environments identified genes associated with methanogenesis, sulfate reduction and anaerobic ammonium oxidation. A directed, barcoded amplicon, multiplex pyrosequencing approach was employed to compare ∼100,000 bacterial SSU gene sequences from water column and wall microbial mat samples from five cenotes in Sistema Zacatón. A new, high-resolution sequence distribution profile (SDP) method identified changes in specific phylogenetic types (phylotypes) in microbial mats at varied depths; Mantel tests showed a correlation of the genetic distances between mat communities in two cenotes and the geographic location of each cenote. Community structure profiles from the water column of three neighbouring cenotes showed distinct variation; statistically significant differences in the concentration of geochemical constituents suggest that the variation observed in microbial communities between neighbouring cenotes are due to geochemical variation. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.
Molecular identification based on ITS sequences for Kappaphycus and Eucheuma cultivated in China
NASA Astrophysics Data System (ADS)
Zhao, Sufen; He, Peimin
2011-11-01
The systematic classification of the Eucheumatoideae is difficult because of their variable morphology and interpretation of reproductive structures. Kappaphycus and Eucheuma specimens cultivated on the Hainan and Fujian coast of China were introduced from Vietnam, the Philippines and Indonesia. Combined with morphological characteristics, all Kappaphycus and Eucheuma cultivated strains were identified by internal transcribed spacer (ITS) sequences. The phylogenetic tree was constructed using neighbor-joining and maximum likelihood methods. The results indicate that different ITS sequence lengths occurred in the different genera and species. An obvious difference in morphology could be found in the protuberance shape between Kappaphycus and Eucheuma. The protuberance in Eucheuma was thorn-like and in Kappaphycus was wartlike or papillate. Their ITS sequence lengths differed significantly in nucleotide variation rates up to 58.55%-63.90%. All nucleotide variations occurred in the ITS1 and ITS2 regions except for five nucleotide transversions in the 5.8S rDNA region. In addition, the difference was at the branches among congeneric species. Kappaphycus sp. had branches with small buds, while K. alvarezii did not have such a feature. The nucleotide variation rates varied from 7.02% to 7.48% among species; within the same species of the clades it was <1.20%. Eucheumatoideae algae cultivated in China consisted of three clades, K. alvarezii, Kappaphycus sp., and E. denticulatum. The results indicate that ITS sequence analysis was an effective way for identification of interspecies and intraspecies phylogenetic relationships and might provide a clue for molecular identification of algal Eucheumatoideae.
Chemical-biogeographic survey of secondary metabolism in soil.
Charlop-Powers, Zachary; Owen, Jeremy G; Reddy, Boojala Vijay B; Ternei, Melinda A; Brady, Sean F
2014-03-11
In this study, we compare biosynthetic gene richness and diversity of 96 soil microbiomes from diverse environments found throughout the southwestern and northeastern regions of the United States. The 454-pyroseqencing of nonribosomal peptide adenylation (AD) and polyketide ketosynthase (KS) domain fragments amplified from these microbiomes provide a means to evaluate the variation of secondary metabolite biosynthetic diversity in different soil environments. Through soil composition and AD- and KS-amplicon richness analysis, we identify soil types with elevated biosynthetic potential. In general, arid soils show the richest observed biosynthetic diversity, whereas brackish sediments and pine forest soils show the least. By mapping individual environmental amplicon sequences to sequences derived from functionally characterized biosynthetic gene clusters, we identified conserved soil type-specific secondary metabolome enrichment patterns despite significant sample-to-sample sequence variation. These data are used to create chemical biogeographic distribution maps for biomedically valuable families of natural products in the environment that should prove useful for directing the discovery of bioactive natural products in the future.
Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A
2011-01-01
PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.
Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.
2005-01-01
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
DNA barcode identification of Podocarpaceae--the second largest conifer family.
Little, Damon P; Knopf, Patrick; Schulz, Christian
2013-01-01
We have generated matK, rbcL, and nrITS2 DNA barcodes for 320 specimens representing all 18 extant genera of the conifer family Podocarpaceae. The sample includes 145 of the 198 recognized species. Comparative analyses of sequence quality and species discrimination were conducted on the 159 individuals from which all three markers were recovered (representing 15 genera and 97 species). The vast majority of sequences were of high quality (B 30 = 0.596-0.989). Even the lowest quality sequences exceeded the minimum requirements of the BARCODE data standard. In the few instances that low quality sequences were generated, the responsible mechanism could not be discerned. There were no statistically significant differences in the discriminatory power of markers or marker combinations (p = 0.05). The discriminatory power of the barcode markers individually and in combination is low (56.7% of species at maximum). In some instances, species discrimination failed in spite of ostensibly useful variation being present (genotypes were shared among species), but in many cases there was simply an absence of sequence variation. Barcode gaps (maximum intraspecific p-distance > minimum interspecific p-distance) were observed in 50.5% of species when all three markers were considered simultaneously. The presence of a barcode gap was not predictive of discrimination success (p = 0.02) and there was no statistically significant difference in the frequency of barcode gaps among markers (p = 0.05). In addition, there was no correlation between number of individuals sampled per species and the presence of a barcode gap (p = 0.27).
DNA Barcode Identification of Podocarpaceae—The Second Largest Conifer Family
Little, Damon P.; Knopf, Patrick; Schulz, Christian
2013-01-01
We have generated matK, rbcL, and nrITS2 DNA barcodes for 320 specimens representing all 18 extant genera of the conifer family Podocarpaceae. The sample includes 145 of the 198 recognized species. Comparative analyses of sequence quality and species discrimination were conducted on the 159 individuals from which all three markers were recovered (representing 15 genera and 97 species). The vast majority of sequences were of high quality (B 30 = 0.596–0.989). Even the lowest quality sequences exceeded the minimum requirements of the BARCODE data standard. In the few instances that low quality sequences were generated, the responsible mechanism could not be discerned. There were no statistically significant differences in the discriminatory power of markers or marker combinations (p = 0.05). The discriminatory power of the barcode markers individually and in combination is low (56.7% of species at maximum). In some instances, species discrimination failed in spite of ostensibly useful variation being present (genotypes were shared among species), but in many cases there was simply an absence of sequence variation. Barcode gaps (maximum intraspecific p–distance > minimum interspecific p–distance) were observed in 50.5% of species when all three markers were considered simultaneously. The presence of a barcode gap was not predictive of discrimination success (p = 0.02) and there was no statistically significant difference in the frequency of barcode gaps among markers (p = 0.05). In addition, there was no correlation between number of individuals sampled per species and the presence of a barcode gap (p = 0.27). PMID:24312258
Kropáčková, Lucie; Těšický, Martin; Albrecht, Tomáš; Kubovčiak, Jan; Čížková, Dagmar; Tomášek, Oldřich; Martin, Jean-François; Bobek, Lukáš; Králová, Tereza; Procházka, Petr; Kreisinger, Jakub
2017-10-01
Vertebrate gut microbiota (GM) is comprised of a taxonomically diverse consortium of symbiotic and commensal microorganisms that have a pronounced effect on host physiology, immune system function and health status. Despite much research on interactions between hosts and their GM, the factors affecting inter- and intraspecific GM variation in wild populations are still poorly known. We analysed data on faecal microbiota composition in 51 passerine species (319 individuals) using Illumina MiSeq sequencing of bacterial 16S rRNA (V3-V4 variable region). Despite pronounced interindividual variation, GM composition exhibited significant differences at the interspecific level, accounting for approximately 20%-30% of total GM variation. We also observed a significant correlation between GM composition divergence and host's phylogenetic divergence, with strength of correlation higher than that of GM vs. ecological or life history traits and geographic variation. The effect of host's phylogeny on GM composition was significant, even after statistical control for these confounding factors. Hence, our data do not support codiversification of GM and passerine phylogeny solely as a by-product of their ecological divergence. Furthermore, our findings do not support that GM vs. host's phylogeny codiversification is driven primarily through trans-generational GM transfer as the GM vs. phylogeny correlation does not increase with higher sequence similarity used when delimiting operational taxonomic units. Instead, we hypothesize that the GM vs. phylogeny correlation may arise as a consequence of interspecific divergence of genes that directly or indirectly modulate composition of GM. © 2017 John Wiley & Sons Ltd.
Najafi, Nargess; Akmali, Vahid; Sharifi, Mozafar
2018-04-26
Molecular phylogeography and species distribution modelling (SDM) suggest that late Quaternary glacial cycles have portrayed a significant role in structuring current population genetic structure and diversity. Based on phylogenetic relationships using Bayesian inference and maximum likelihood of 535 bp mtDNA (D-loop) and 745 bp mtDNA (Cytb) in 62 individuals of the Mediterranean Horseshoe Bat, Rhinolophus euryale, from 13 different localities in Iran we identified two subspecific populations with differing population genetic structure distributed in southern Zagros Mts. and northern Elburz Mts. Analysis of molecular variance (AMOVA) obtained from D-loop sequences indicates that 21.18% of sequence variation is distributed among populations and 10.84% within them. Moreover, a degree of genetic subdivision, mainly attributable to the existence of significant variance among the two regions is shown (θCT = 0.68, p = .005). The positive and significant correlation between geographic and genetic distances (R 2 = 0.28, r = 0.529, p = .000) is obtained following controlling for environmental distance. Spatial distribution of haplotypes indicates that marginal population of the species in southern part of the species range have occupied this section as a glacial refugia. However, this genetic variation, in conjunction with results of the SDM shows a massive postglacial range expansion for R. euryale towards higher latitudes in Iran.
Application of a mitochondrial DNA control region frequency database for UK domestic cats.
Ottolini, Barbara; Lall, Gurdeep Matharu; Sacchini, Federico; Jobling, Mark A; Wetton, Jon H
2017-03-01
DNA variation in 402bp of the mitochondrial control region flanked by repeat sequences RS2 and RS3 was evaluated by Sanger sequencing in 152 English domestic cats, in order to determine the significance of matching DNA sequences between hairs found with a victim's body and the suspect's pet cat. Whilst 95% of English cats possessed one of the twelve globally widespread mitotypes, four new variants were observed, the most common of which (2% frequency) was shared with the evidential samples. No significant difference in mitotype frequency was seen between 32 individuals from the locality of the crime and 120 additional cats from the rest of England, suggesting a lack of local population structure. However, significant differences were observed in comparison with frequencies in other countries, including the closely neighbouring Netherlands, highlighting the importance of appropriate genetic databases when determining the evidential significance of mitochondrial DNA evidence. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Masking as an effective quality control method for next-generation sequencing data analysis.
Yun, Sajung; Yun, Sijung
2014-12-13
Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).
Li, Dora A; Walker, Esther; Francki, Michael G
2015-12-01
Carotenoids (especially lutein) are known to be the pigment source for flour b* colour in bread wheat. Flour b* colour variation is controlled by a quantitative trait locus (QTL) on wheat chromosome 7AL and one gene from the carotenoid pathway, phytoene synthase, was functionally associated with the QTL on 7AL in some, but not all, wheat genotypes. A SNP marker within a sequence similar to catalase (Cat3-A1snp) derived from full-length (FL) cDNA (AK332460), however, was consistently associated with the QTL on 7AL and implicated in regulating hydrogen peroxide (H2O2) to control carotenoid accumulation affecting flour b* colour. The number of catalase genes on chromosome 7AL was investigated in this study to identify which gene may be implicated in flour b* variation and two were identified through interrogation of the draft wheat genome survey sequence consisting of five exons and a further two members having eight exons identified through comparative analysis with the single catalase gene on rice chromosome 6, PCR amplification and sequencing. It was evident that the catalase genes on chromosome 7A had duplicated and diverged during evolution relative to its counterpart on rice chromosome 6. The detection of transcripts in seeds, the co-location with Cat3-A1snp marker and maximised alignment of FL-cDNA (AK332460) with cognate genomic sequence indicated that TaCat3-A1 was the member of the catalase gene family associated with flour b* colour variation. Re-sequencing identified three alleles from three wheat varieties, TaCat3-A1a, TaCat3-A1b and TaCat3-A1c, and their predicted protein identified differences in peroxisomal targeting signal tri-peptide domain in the carboxyl terminal end providing new insights into their potential role in regulating cellular H2O2 that contribute to flour b* colour variation.
Gimenez, Magalí Diana; Yañez-Santos, Anahí Mara; Paz, Rosalía Cristina; Quiroga, Mariana Paola; Marfil, Carlos Federico; Conci, Vilma Cecilia; García-Lampasona, Sandra Claudia
2016-01-01
This is the first report assessing epigenetic variation in garlic. High genetic and epigenetic polymorphism during in vitro culture was detected.Sequencing of MSAP fragments revealed homology with ESTs. Garlic (Allium sativum) is a worldwide crop of economic importance susceptible to viral infections that can cause significant yield losses. Meristem tissue culture is the most employed method to sanitize elite cultivars.Often the virus-free garlic plants obtained are multiplied in vitro (micro propagation). However, it was reported that micro-propagation frequently produces somaclonal variation at the phenotypic level, which is an undesirable trait when breeders are seeking to maintain varietal stability. We employed amplification fragment length polymorphism and methylation sensitive amplified polymorphism (MSAP) methodologies to assess genetic and epigenetic modifications in two culture systems: virus-free plants obtained by meristem culture followed by in vitro multiplication and field culture. Our results suggest that garlic exhibits genetic and epigenetic polymorphism under field growing conditions. However, during in vitro culture system both kinds of polymorphisms intensify indicating that this system induces somaclonal variation. Furthermore, while genetic changes accumulated along the time of in vitro culture, epigenetic polymorphism reached the major variation at 6 months and then stabilize, being demethylation and CG methylation the principal conversions.Cloning and sequencing differentially methylated MSAP fragments allowed us to identify coding and unknown sequences of A. sativum, including sequences belonging to LTR Gypsy retrotransposons. Together, our results highlight that main changes occur in the initial 6 months of micro propagation. For the best of our knowledge, this is the first report on epigenetic assessment in garlic.
The genetic diversity and epizootiology of infectious hematopoietic necrosis virus
Oshima, Kevin H.; Arakawa, Cindy K.; Higman, Keith H.; Landolt, Marsha L.; Nichol, Stuart T.; Winton, James R.
1994-01-01
Infectious hematopoietic necrosis virus (IHNV) is a rhabdovirus which causes a serious disease in salmondd fish. The T1 ribonuclease fingerprinttin method was used to compare the RNA genomes of 26 isolates of IHNV recovered from sockeye salmon (Oncorhynchus nerka), chinook salmon (O. tshawytscha), and steelhead trout (O. mykiss) throughout the enzootic portion of western North America. Most of the isolates as a source of genetic variation. In from a single year (1987) to limit time of isolation as a source of genetic variation. In addition, isolates from different years collected at three sites were analyzed to investigate genetic drift or evolution of IHNV within specific locations. All of the isolates examined by T1 fingerprint analysis contained less than a 50% variation in spot location and were represented by a single fingerprint group. The observed variation was estimated to correspond to less than 5% variation in the nucleic acid sequence. However, sufficient variation was detected to separate the isolates into four subgroups which appeared to correlate to different geographic regions. Host species appeared not to be a significant source of variation. The evolutionary and epizootiologic significance of these findings and their relationship to other evidence of genetic variation in IHNV isolates are discussed.
Sequence characterization of 5S ribosomal RNA from eight gram positive procaryotes
NASA Technical Reports Server (NTRS)
Woese, C. R.; Luehrsen, K. R.; Pribula, C. D.; Fox, G. E.
1976-01-01
Complete nucleotide sequences are presented for 5S rRNA from Bacillus subtilis, B. firmus, B. pasteurii, B. brevis, Lactobacillus brevis, and Streptococcus faecalis, and 5S rRNA oligonucleotide catalogs and partial sequence data are given for B. cereus and Sporosarcina ureae. These data demonstrate a striking consistency of 5S rRNA primary and secondary structure within a given bacterial grouping. An exception is B. brevis, in which the 5S rRNA sequence varies significantly from that of other bacilli in the tuned helix and the procaryotic loop. The localization of these variations suggests that B. brevis occupies an ecological niche that selects such changes. It is noted that this organism produces antibiotics which affect ribosome function.
Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
Assmus, Jens; Kleffe, Jürgen; Schmitt, Armin O.; Brockmann, Gudrun A.
2013-01-01
There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence. PMID:23658777
McCutchen-Maloney, Sandra L.
2002-01-01
DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.
Ries, David; Holtgräwe, Daniela; Viehöver, Prisca; Weisshaar, Bernd
2016-03-15
The combination of bulk segregant analysis (BSA) and next generation sequencing (NGS), also known as mapping by sequencing (MBS), has been shown to significantly accelerate the identification of causal mutations for species with a reference genome sequence. The usual approach is to cross homozygous parents that differ for the monogenic trait to address, to perform deep sequencing of DNA from F2 plants pooled according to their phenotype, and subsequently to analyze the allele frequency distribution based on a marker table for the parents studied. The method has been successfully applied for EMS induced mutations as well as natural variation. Here, we show that pooling genetically diverse breeding lines according to a contrasting phenotype also allows high resolution mapping of the causal gene in a crop species. The test case was the monogenic locus causing red vs. green hypocotyl color in Beta vulgaris (R locus). We determined the allele frequencies of polymorphic sequences using sequence data from two diverging phenotypic pools of 180 B. vulgaris accessions each. A single interval of about 31 kbp among the nine chromosomes was identified which indeed contained the causative mutation. By applying a variation of the mapping by sequencing approach, we demonstrated that phenotype-based pooling of diverse accessions from breeding panels and subsequent direct determination of the allele frequency distribution can be successfully applied for gene identification in a crop species. Our approach made it possible to identify a small interval around the causative gene. Sequencing of parents or individual lines was not necessary. Whenever the appropriate plant material is available, the approach described saves time compared to the generation of an F2 population. In addition, we provide clues for planning similar experiments with regard to pool size and the sequencing depth required.
Integrating evolutionary and functional approaches to infer adaptation at specific loci.
Storz, Jay F; Wheat, Christopher W
2010-09-01
Inferences about adaptation at specific loci are often exclusively based on the static analysis of DNA sequence variation. Ideally,population-genetic evidence for positive selection serves as a stepping-off point for experimental studies to elucidate the functional significance of the putatively adaptive variation. We argue that inferences about adaptation at specific loci are best achieved by integrating the indirect, retrospective insights provided by population-genetic analyses with the more direct, mechanistic insights provided by functional experiments. Integrative studies of adaptive genetic variation may sometimes be motivated by experimental insights into molecular function, which then provide the impetus to perform population genetic tests to evaluate whether the functional variation is of adaptive significance. In other cases, studies may be initiated by genome scans of DNA variation to identify candidate loci for recent adaptation. Results of such analyses can then motivate experimental efforts to test whether the identified candidate loci do in fact contribute to functional variation in some fitness-related phenotype. Functional studies can provide corroborative evidence for positive selection at particular loci, and can potentially reveal specific molecular mechanisms of adaptation.
Kodama, T; Mori, K; Kawahara, T; Ringler, D J; Desrosiers, R C
1993-01-01
One rhesus macaque displayed severe encephalomyelitis and another displayed severe enterocolitis following infection with molecularly cloned simian immunodeficiency virus (SIV) strain SIVmac239. Little or no free anti-SIV antibody developed in these two macaques, and they died relatively quickly (4 to 6 months) after infection. Manifestation of the tissue-specific disease in these macaques was associated with the emergence of variants with high replicative capacity for macrophages and primary infection of tissue macrophages. The nature of sequence variation in the central region (vif, vpr, and vpx), the env gene, and the nef long terminal repeat (LTR) region in brain, colon, and other tissues was examined to see whether specific genetic changes were associated with SIV replication in brain or gut. Sequence analysis revealed strong conservation of the intergenic central region, nef, and the LTR. However, analysis of env sequences in these two macaques and one other revealed significant, interesting patterns of sequence variation. (i) Changes in env that were found previously to contribute to the replicative ability of SIVmac for macrophages in culture were present in the tissues of these animals. (ii) The greatest variability was located in the regions between V1 and V2 and from "V3" through C3 in gp120, which are different in location from the variable regions observed previously in animals with strong antibody responses and long-term persistent infection. (iii) The predominant sequence change of D-->N at position 385 in C3 is most surprising, since this change in both SIV and human immunodeficiency virus type 1 has been associated with dramatically diminished affinity for CD4 and replication in vitro. (iv) The nature of sequence changes at some positions (146, 178, 345, 385, and "V3") suggests that viral replication in brain and gut may be facilitated by specific sequence changes in env in addition to those that impart a general ability to replicate well in macrophages. These results demonstrate that complex selective pressures, including immune responses and varying cell and tissue specificity, can influence the nature of sequence changes in env. Images PMID:8411355
Kim, Daniel Seung; Crosslin, David R; Auer, Paul L; Suzuki, Stephanie M; Marsillach, Judit; Burt, Amber A; Gordon, Adam S; Meschia, James F; Nalls, Mike A; Worrall, Bradford B; Longstreth, W T; Gottesman, Rebecca F; Furlong, Clement E; Peters, Ulrike; Rich, Stephen S; Nickerson, Deborah A; Jarvik, Gail P
2014-06-01
HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10(-3)). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10(-3)). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10(-3); AA P = 6.52 × 10(-4)), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted.
De, Rajat K.
2015-01-01
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision. PMID:26291322
Sinha, Rituparna; Samaddar, Sandip; De, Rajat K
2015-01-01
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.
USDA-ARS?s Scientific Manuscript database
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...
Human Genome Sequencing in Health and Disease
Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.
2013-01-01
Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.
Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi
2016-03-01
Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen
2014-02-01
Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.
Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C
2012-12-12
The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population studies. Despite the conservation of the core genes, the mitochondrial genomes of Aspergillus and Penicillium species examined here exhibit significant amount of interspecies variation. Most of this variation can be attributed to accessory genes and mobile introns, presumably acquired by horizontal gene transfer of mitochondrial plasmids and intron homing.
2010-01-01
Genetic variation and evolutionary demography of the shrimp Fenneropenaeus chinensis were investigated using sequence data of the complete mitochondrial control region (CR). Fragments of 993 bp of the CR were sequenced for 93 individuals from five localities over most of the species' range in the Yellow Sea and the Bohai Sea. There were 84 variable sites defining 68 haplotypes. Haplotype diversity levels were very high (0.95 ± 0.03-0.99 ± 0.02) in F. chinensis populations, whereas those of nucleotide diversity were moderate to low (0.66 ± 0.36%-0.84 ± 0.46%). Analysis of molecular variance and conventional population statistics (FST ) revealed no significant genetic structure throughout the range of F. chinensis. Mismatch distribution, estimates of population parameters and neutrality tests revealed that the significant fluctuations and shallow coalescence of mtDNA genealogies observed were coincident with estimated demographic parameters and neutrality tests, in implying important past-population size fluctuations or range expansion. Isolation with Migration (IM) coalescence results suggest that F. chinensis, distributed along the coasts of northern China and the Korean Peninsula (about 1000 km apart), diverged recently, the estimated time-split being 12,800 (7,400-18,600) years ago. PMID:21637498
Alkan, Can; Kavak, Pinar; Somel, Mehmet; Gokcumen, Omer; Ugurlu, Serkan; Saygi, Ceren; Dal, Elif; Bugra, Kuyas; Güngör, Tunga; Sahinalp, S Cenk; Özören, Nesrin; Bekpen, Cemalettin
2014-11-07
Turkey is a crossroads of major population movements throughout history and has been a hotspot of cultural interactions. Several studies have investigated the complex population history of Turkey through a limited set of genetic markers. However, to date, there have been no studies to assess the genetic variation at the whole genome level using whole genome sequencing. Here, we present whole genome sequences of 16 Turkish individuals resequenced at high coverage (32×-48×). We show that the genetic variation of the contemporary Turkish population clusters with South European populations, as expected, but also shows signatures of relatively recent contribution from ancestral East Asian populations. In addition, we document a significant enrichment of non-synonymous private alleles, consistent with recent observations in European populations. A number of variants associated with skin color and total cholesterol levels show frequency differentiation between the Turkish populations and European populations. Furthermore, we have analyzed the 17q21.31 inversion polymorphism region (MAPT locus) and found increased allele frequency of 31.25% for H1/H2 inversion polymorphism when compared to European populations that show about 25% of allele frequency. This study provides the first map of common genetic variation from 16 western Asian individuals and thus helps fill an important geographical gap in analyzing natural human variation and human migration. Our data will help develop population-specific experimental designs for studies investigating disease associations and demographic history in Turkey.
CNV-TV: a robust method to discover copy number variation from short sequencing reads.
Duan, Junbo; Zhang, Ji-Gang; Deng, Hong-Wen; Wang, Yu-Ping
2013-05-02
Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.
Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen
2009-03-01
We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.
Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti
2016-10-06
With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
Awua, Adolf K; Adanu, Richard M K; Wiredu, Edwin K; Afari, Edwin A; Zubuch, Vanessa A; Asmah, Richard H; Severini, Alberto
2017-04-21
In addition to being useful for classification, sequence variations of human Papillomavirus (HPV) genotypes have been implicated in differential oncogenic potential and a differential association with the different histological forms of invasive cervical cancer. These associations have also been indicated for HPV genotype lineages and sub-lineages. In order to better understand the potential implications of lineage variation in the occurrence of cervical cancers in Ghana, we studied the lineages of the three most prevalent HPV genotypes among women with normal cytology as baseline to further studies. Of previously collected self- and health personnel-collected cervical specimen, 54, which were positive for HPV16, 18 and 45, were selected and the long control region (LCR) of each HPV genotype was separately amplified by a nested PCR. DNA sequences of 41 isolates obtained with the forward and reverse primers by Sanger sequencing were analysed. Nucleotide sequence variations of the HPV16 genotypes were observed at 30 positions within the LCR (7460 - 7840). Of these, 19 were the known variations for the lineages B and C (African lineages), while the other 11 positions had variations unique to the HPV16 isolates of this study. For the HPV18 isolates, the variations were at 35 positions, 22 of which were known variations of Africa lineages and the other 13 were unique variations observed for the isolates obtained in this study (at positions 7799 and 7813). HPV45 isolates had variations at 35 positions and 2 (positions 7114 and 97) were unique to the isolates of this study. This study provides the first data on the lineages of HPV 16, 18 and 45 isolates from Ghana. Although the study did not obtain full genome sequence data for a comprehensive comparison with known lineages, these genotypes were predominately of the Africa lineages and had some unique sequence variations at positions that suggest potential oncogenic implications. These data will be useful for comparison with lineages of these genotypes from women with cervical lesion and all the forms of invasive cervical cancers.
Global Genomic Diversity of Oryza sativa Varieties Revealed by Comparative Physical Mapping
Wang, Xiaoming; Kudrna, David A.; Pan, Yonglong; Wang, Hao; Liu, Lin; Lin, Haiyan; Zhang, Jianwei; Song, Xiang; Goicoechea, Jose Luis; Wing, Rod A.; Zhang, Qifa; Luo, Meizhong
2014-01-01
Bacterial artificial chromosome (BAC) physical maps embedding a large number of BAC end sequences (BESs) were generated for Oryza sativa ssp. indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) and were compared with the genome sequences of O. sativa spp. japonica cv. Nipponbare and O. sativa ssp. indica cv. 93-11. The comparisons exhibited substantial diversities in terms of large structural variations and small substitutions and indels. Genome-wide BAC-sized and contig-sized structural variations were detected, and the shared variations were analyzed. In the expansion regions of the Nipponbare reference sequence, in comparison to the MH63 and ZS97 physical maps, as well as to the previously constructed 93-11 physical map, the amounts and types of the repeat contents, and the outputs of gene ontology analysis, were significantly different from those of the whole genome. Using the physical maps of four wild Oryza species from OMAP (http://www.omap.org) as a control, we detected many conserved and divergent regions related to the evolution process of O. sativa. Between the BESs of MH63 and ZS97 and the two reference sequences, a total of 1532 polymorphic simple sequence repeats (SSRs), 71,383 SNPs, 1767 multiple nucleotide polymorphisms, 6340 insertions, and 9137 deletions were identified. This study provides independent whole-genome resources for intra- and intersubspecies comparisons and functional genomics studies in O. sativa. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html). PMID:24424778
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Chengyuan; De Grijs, Richard; Deng, Licai, E-mail: joshuali@pku.edu.cn, E-mail: grijs@pku.edu.cn
2014-04-01
Using a combination of high-resolution Hubble Space Telescope/Wide-Field and Planetary Camera-2 observations, we explore the physical properties of the stellar populations in two intermediate-age star clusters, NGC 1831 and NGC 1868, in the Large Magellanic Cloud based on their color-magnitude diagrams. We show that both clusters exhibit extended main-sequence turn offs. To explain the observations, we consider variations in helium abundance, binarity, age dispersions, and the fast rotation of the clusters' member stars. The observed narrow main sequence excludes significant variations in helium abundance in both clusters. We first establish the clusters' main-sequence binary fractions using the bulk of themore » clusters' main-sequence stellar populations ≳ 1 mag below their turn-offs. The extent of the turn-off regions in color-magnitude space, corrected for the effects of binarity, implies that age spreads of order 300 Myr may be inferred for both clusters if the stellar distributions in color-magnitude space were entirely due to the presence of multiple populations characterized by an age range. Invoking rapid rotation of the population of cluster members characterized by a single age also allows us to match the observed data in detail. However, when taking into account the extent of the red clump in color-magnitude space, we encounter an apparent conflict for NGC 1831 between the age dispersion derived from that based on the extent of the main-sequence turn off and that implied by the compact red clump. We therefore conclude that, for this cluster, variations in stellar rotation rate are preferred over an age dispersion. For NGC 1868, both models perform equally well.« less
Characterization of genetic sequence variation of 58 STR loci in four major population groups.
Novroski, Nicole M M; King, Jonathan L; Churchill, Jennifer D; Seah, Lay Hong; Budowle, Bruce
2016-11-01
Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Variation block-based genomics method for crop plants.
Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon
2014-06-15
In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.
Sawle, Lucas; Ghosh, Kingshuk
2015-08-28
A general formalism to compute configurational properties of proteins and other heteropolymers with an arbitrary sequence of charges and non-uniform excluded volume interaction is presented. A variational approach is utilized to predict average distance between any two monomers in the chain. The presented analytical model, for the first time, explicitly incorporates the role of sequence charge distribution to determine relative sizes between two sequences that vary not only in total charge composition but also in charge decoration (even when charge composition is fixed). Furthermore, the formalism is general enough to allow variation in excluded volume interactions between two monomers. Model predictions are benchmarked against the all-atom Monte Carlo studies of Das and Pappu [Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013)] for 30 different synthetic sequences of polyampholytes. These sequences possess an equal number of glutamic acid (E) and lysine (K) residues but differ in the patterning within the sequence. Without any fit parameter, the model captures the strong sequence dependence of the simulated values of the radius of gyration with a correlation coefficient of R(2) = 0.9. The model is then applied to real proteins to compare the unfolded state dimensions of 540 orthologous pairs of thermophilic and mesophilic proteins. The excluded volume parameters are assumed similar under denatured conditions, and only electrostatic effects encoded in the sequence are accounted for. With these assumptions, thermophilic proteins are found-with high statistical significance-to have more compact disordered ensemble compared to their mesophilic counterparts. The method presented here, due to its analytical nature, is capable of making such high throughput analysis of multiple proteins and will have broad applications in proteomic studies as well as in other heteropolymeric systems.
Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.
Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M
2017-08-16
High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.
Bazsalovicsová, Eva; Králová-Hromadová, Ivica; Stefka, Jan; Scholz, Tomáš
2012-05-01
Sequence structure of complete internal transcribed spacer 1 and 2 (ITS1 and ITS2) of the ribosomal DNA region and partial mitochondrial cytochrome c oxidase subunit I (cox1) gene sequences were studied in the monozoic tapeworm Atractolytocestus sagittatus (Kulakovskaya et Akhmerov, 1965) (Cestoda: Caryophyllidea), a parasite of common carp (Cyprinus carpio carpio L.). Intraindividual sequence diversity was observed in both ribosomal spacers. In ITS1, a total number of 19 recombinant clones yielded eight different sequence types (pairwise sequence identity, 99.7-100%) which, however, did not resemble the structure typical for divergent intragenomic ITS copies (paralogues). Polymorphism was displayed by several single nucleotide mutations present exclusively in single clones, but variation in the number of short repetitive motifs was not observed. In ITS2, a total of 21 recombinant clones yielded ten different sequence types (pairwise sequence identity, 97.5-100%). They were mostly characterized by a varying number of (TCGT)(n) repeats resulting in assortment of ITS2 sequences into two sequence variants, which reflected the structure specific for ITS paralogues. The third DNA region analysed, mitochondrial cox1 gene (669 bp) was detected to be 100% identical in all studied A. sagittatus individuals. Comparison of molecular data on A. sagittatus with those on Atractolytocestus huronensis Anthony, 1958, an invasive parasite of common carp, has shown that interspecific differences significantly exceeded intraspecific variation in both ribosomal spacers (81.4-82.5% in ITS1, 74.4-75.2% in ITS2) as well as in mitochondrial cox1, which confirms validity of both congeneric tapeworms parasitic in the same fish host.
Taylor, E B; Pollard, S; Louie, D
1999-07-01
Bull trout, Salvelinus confluentus (Salmonidae), are distributed in northwestern North America from Nevada to Yukon Territory, largely in interior drainages. The species is of conservation concern owing to declines in abundance, particularly in southern portions of its range. To investigate phylogenetic structure within bull trout that might form the basis for the delineation of major conservation units, we conducted a mitochondrial DNA (mtDNA) survey in bull trout from throughout its range. Restriction fragment length polymorphism (RFLP) analysis of four segments of the mtDNA genome with 11 restriction enzymes resolved 21 composite haplotypes that differed by an average of 0.5% in sequence. One group of haplotypes predominated in 'coastal' areas (west of the coastal mountain ranges) while another predominated in 'interior' regions (east of the coastal mountains). The two putative lineages differed by 0.8% in sequence and were also resolved by sequencing a portion of the ND1 gene in a representative of each RFLP haplotype. Significant variation existed within individual sample sites (12% of total variation) and among sites within major geographical regions (33%), but most variation (55%) was associated with differences between coastal and interior regions. We concluded that: (i) bull trout are subdivided into coastal and interior lineages; (ii) this subdivision reflects recent historical isolation in two refugia south of the Cordilleran ice sheet during the Pleistocene: the Chehalis and Columbia refugia; and (iii) most of the molecular variation resides at the interpopulation and inter-region levels. Conservation efforts, therefore, should focus on maintaining as many populations as possible across as many geographical regions as possible within both coastal and interior lineages.
Norris, Steven J.
2015-01-01
Summary Spirochetes that cause Lyme borreliosis (also called Lyme disease) possess the vls locus, encoding an elaborate antigenic variation system. This locus contains the expression site vlsE as well as a contiguous array of vls silent cassettes, which contain variations of the central cassette region of vlsE. The locus is present on one of the many linear plasmids in the organism, e.g. plasmid lp28-1 in the strain B. burgdorferi B31. Changes in the sequence of vlsE occur continuously during mammalian infection and consist of random, segmental, unidirectional recombination events between the silent cassettes and the cassette region of vlsE. These gene conversion events do not occur during in vitro culture or the tick portion of the infection cycle of Borrelia burgdorferi or the other related Borrelia species that cause Lyme disease. The mechanism of recombination is largely unknown, but requires the RuvAB Holliday junction branch migrase. Other features of the vls locus also appear to be required, including cis locations of vlsE and the silent cassettes and high G+C content and GC skew. The vls system is required for long-term survival of Lyme Borrelia in infected mammals and represents an important mechanism of immune evasion. In addition to sequence variation, immune selection also results in significant heterogeneity in the sequence of the surface lipoprotein VlsE. Despite antigenic variation, VlsE generates a robust antibody response, and both full length VlsE and the C6 peptide (corresponding to invariant region 6) are widely used in immunodiagnostic tests for Lyme disease. PMID:26104445
2010-01-01
Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441
Spuesens, Emiel B M; Oduber, Minoushka; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis
2009-07-01
The gene encoding major adhesin protein P1 of Mycoplasma pneumoniae, MPN141, contains two DNA sequence stretches, designated RepMP2/3 and RepMP4, which display variation among strains. This variation allows strains to be differentiated into two major P1 genotypes (1 and 2) and several variants. Interestingly, multiple versions of the RepMP2/3 and RepMP4 elements exist at other sites within the bacterial genome. Because these versions are closely related in sequence, but not identical, it has been hypothesized that they have the capacity to recombine with their counterparts within MPN141, and thereby serve as a source of sequence variation of the P1 protein. In order to determine the variation within the RepMP2/3 and RepMP4 elements, both within the bacterial genome and among strains, we analysed the DNA sequences of all RepMP2/3 and RepMP4 elements within the genomes of 23 M. pneumoniae strains. Our data demonstrate that: (i) recombination is likely to have occurred between two RepMP2/3 elements in four of the strains, and (ii) all previously described P1 genotypes can be explained by inter-RepMP recombination events. Moreover, the difference between the two major P1 genotypes was reflected in all RepMP elements, such that subtype 1 and 2 strains can be differentiated on the basis of sequence variation in each RepMP element. This implies that subtype 1 and subtype 2 strains represent evolutionarily diverged strain lineages. Finally, a classification scheme is proposed in which the P1 genotype of M. pneumoniae isolates can be described in a sequence-based, universal fashion.
Child Development and Structural Variation in the Human Genome
ERIC Educational Resources Information Center
Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.
2013-01-01
Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…
Sampson, Juliana K.; Sheth, Nihar U.; Koparde, Vishal N.; Scalora, Allison F.; Serrano, Myrna G.; Lee, Vladimir; Roberts, Catherine H.; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H.; Buck, Gregory A.; Neale, Michael C.; Toor, Amir A.
2016-01-01
Summary Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. PMID:24749631
An, Z; Tang, Z; Ma, B; Mason, A S; Guo, Y; Yin, J; Gao, C; Wei, L; Li, J; Fu, D
2014-07-01
Although many studies have shown that transposable element (TE) activation is induced by hybridisation and polyploidisation in plants, much less is known on how different types of TE respond to hybridisation, and the impact of TE-associated sequences on gene function. We investigated the frequency and regularity of putative transposon activation for different types of TE, and determined the impact of TE-associated sequence variation on the genome during allopolyploidisation. We designed different types of TE primers and adopted the Inter-Retrotransposon Amplified Polymorphism (IRAP) method to detect variation in TE-associated sequences during the process of allopolyploidisation between Brassica rapa (AA) and Brassica oleracea (CC), and in successive generations of self-pollinated progeny. In addition, fragments with TE insertions were used to perform Blast2GO analysis to characterise the putative functions of the fragments with TE insertions. Ninety-two primers amplifying 548 loci were used to detect variation in sequences associated with four different orders of TE sequences. TEs could be classed in ascending frequency into LTR-REs, TIRs, LINEs, SINEs and unknown TEs. The frequency of novel variation (putative activation) detected for the four orders of TEs was highest from the F1 to F2 generations, and lowest from the F2 to F3 generations. Functional annotation of sequences with TE insertions showed that genes with TE insertions were mainly involved in metabolic processes and binding, and preferentially functioned in organelles. TE variation in our study severely disturbed the genetic compositions of the different generations, resulting in inconsistencies in genetic clustering. Different types of TE showed different patterns of variation during the process of allopolyploidisation. © 2013 German Botanical Society and The Royal Botanical Society of the Netherlands.
Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.
Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D
2016-10-01
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Effective de novo assembly of fish genome using haploid larvae.
Iwasaki, Yuki; Nishiki, Issei; Nakamura, Yoji; Yasuike, Motoshige; Kai, Wataru; Nomura, Kazuharu; Yoshida, Kazunori; Nomura, Yousuke; Fujiwara, Atushi; Kobayashi, Takanori; Ototake, Mitsuru
2016-02-01
Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Mining sequence variations in representative polyploid sugarcane germplasm accessions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiping; Song, Jian; You, Qian
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
Mining sequence variations in representative polyploid sugarcane germplasm accessions
Yang, Xiping; Song, Jian; You, Qian; ...
2017-08-09
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation.
Dueck, Hannah; Khaladkar, Mugdha; Kim, Tae Kyung; Spaethling, Jennifer M; Francis, Chantal; Suresh, Sangita; Fisher, Stephen A; Seale, Patrick; Beck, Sheryl G; Bartfai, Tamas; Kuhn, Bernhard; Eberwine, James; Kim, Junhyong
2015-06-09
Differentiation of metazoan cells requires execution of different gene expression programs but recent single-cell transcriptome profiling has revealed considerable variation within cells of seeming identical phenotype. This brings into question the relationship between transcriptome states and cell phenotypes. Additionally, single-cell transcriptomics presents unique analysis challenges that need to be addressed to answer this question. We present high quality deep read-depth single-cell RNA sequencing for 91 cells from five mouse tissues and 18 cells from two rat tissues, along with 30 control samples of bulk RNA diluted to single-cell levels. We find that transcriptomes differ globally across tissues with regard to the number of genes expressed, the average expression patterns, and within-cell-type variation patterns. We develop methods to filter genes for reliable quantification and to calibrate biological variation. All cell types include genes with high variability in expression, in a tissue-specific manner. We also find evidence that single-cell variability of neuronal genes in mice is correlated with that in rats consistent with the hypothesis that levels of variation may be conserved. Single-cell RNA-sequencing data provide a unique view of transcriptome function; however, careful analysis is required in order to use single-cell RNA-sequencing measurements for this purpose. Technical variation must be considered in single-cell RNA-sequencing studies of expression variation. For a subset of genes, biological variability within each cell type appears to be regulated in order to perform dynamic functions, rather than solely molecular noise.
Simultaneous Differentiation and Typing of Entamoeba histolytica and Entamoeba dispar
Zaki, Mehreen; Meelu, Parool; Sun, Wei; Clark, C. Graham
2002-01-01
Sequences corresponding to some of the polymorphic loci previously reported from Entamoeba histolytica have been detected in Entamoeba dispar. Comparison of nucleotide sequences of two loci between E. dispar strain SAW760 and E. histolytica strain HM-1:IMSS revealed significant differences in both repeat and flanking regions. The tandem repeat units varied not only in sequence but also in number and arrangement between the two species at both the loci. Using the sequences obtained, primer pairs aimed at amplifying species-specific products were designed and tested on a variety of E. histolytica and E. dispar samples. Amplification results were in complete agreement with the original species classification in all cases, and the PCR products displayed discernible size and pattern variations among the isolates. PMID:11923344
Bertelsen, H P; Gregersen, V R; Poulsen, N; Nielsen, R O; Das, A; Madsen, L B; Buitenhuis, A J; Holm, L-E; Panitz, F; Larsen, L B; Bendixen, C
2016-04-01
Rennet-induced milk coagulation is an important trait for cheese production. Recent studies have reported an alarming frequency of cows producing poorly coagulating milk unsuitable for cheese production. Several genetic factors are known to affect milk coagulation, including variation in the major milk proteins; however, recent association studies indicate genetic effects from other genomic regions as well. The aim of this study was to detect genetic variation affecting milk coagulation properties, measured as curd-firming rate (CFR) and milk pH. This was achieved by examining allele frequency differences between pooled whole-genome sequences of phenotypically extreme samples (pool-seq).. Curd-firming rate and raw milk pH were measured for 415 Danish Holstein cows, and each animal was sequenced at low coverage. Pools were created containing whole genome sequence reads from samples with "extreme" values (high or low) for both phenotypic traits. A total of 6,992,186 and 5,295,501 SNP were assessed in relation to CFR and milk pH, respectively. Allele frequency differences were calculated between pools and 32 significantly different SNP were detected, 1 for milk pH and 31 for CFR, of which 19 are located on chromosome 6. A total of 9 significant SNP, which were selected based on the possible function of proximal candidate genes, were genotyped in the entire sample set ( = 415) to test for an association. The most significant SNP was located proximal to , explaining 33% of the phenotypic variance. , coding for κ-casein, is the most studied in relation to milk coagulation due to its position on the surface of the casein micelles and the direct involvement in milk coagulation. Three additional SNP located on chromosome 6 showed significant associations explaining 7, 3.6, and 1.3% of the phenotypic variance of CFR. The significant SNP on chromosome 6 were shown to be in linkage disequilibrium with the SNP peaking proximal to ; however, after accounting for the genotype of the peak SNP within this QTL, significant effects (-value < 0.1) could still be detected for 2 of the SNP accounting for 2 and 1% of the phenotypic variance. These 2 interesting SNP were located within introns or proximal to the candidate genes-solute carrier family 4 (sodium bicarbonate cotransporter), member 4 () and LIM and calponin homology domains 1 (), respectively-making them interesting targets for further analysis.
NASA Technical Reports Server (NTRS)
Mohr, Joseph J.; Fabricant, Daniel G.; Geller, Margaret J.
1993-01-01
We use the moments of the X-ray surface brightness distribution to constrain the dynamical state of a galaxy cluster. Using X-ray observations from the Einstein Observatory IPC, we measure the first moment FM, the ellipsoidal orientation angle, and the axial ratio at a sequence of radii in the cluster. We argue that a significant variation in the image centroid FM as a function of radius is evidence for a nonequilibrium feature in the intracluster medium (ICM) density distribution. In simple terms, centroid shifts indicate that the center of mass of the ICM varies with radius. This variation is a tracer of continuing dynamical evolution. For each cluster, we evaluate the significance of variations in the centroid of the IPC image by computing the same statistics on an ensemble of simulated cluster images. In producing these simulated images we include X-ray point source emission, telescope vignetting, Poisson noise, and characteristics of the IPC. Application of this new method to five Abell clusters reveals that the core of each one has significant substructure. In addition, we find significant variations in the orientation angle and the axial ratio for several of the clusters.
Lacerra, Giuseppina; Fiorito, Mirella; Musollino, Gennaro; Di Noce, Francesca; Esposito, Maria; Nigro, Vincenzo; Gaudiano, Carlo; Carestia, Clementina
2004-10-01
The alpha-globin chains are encoded by two duplicated genes (HBA2 and HBA1, 5'-3') showing overall sequence homology >96% and average CG content >60%. alpha-Thalassemia, the most prevalent worldwide autosomal recessive disorder, is a hereditary anemia caused by sequence variations of these genes in about 25% of carriers. We evaluated the overall sensitivity and suitability of DHPLC and DG-DGGE in scanning both the alpha-globin genes by carrying out a retrospective analysis of 19 variant alleles in 29 genotypes. The HBA2 alleles c.1A>G, c.79G>A, and c.281T>G, and the HBA1 allele c.475C>A were new. Three pathogenic sequence variations were associated in cis with nonpathogenic variations in all families studied; they were the HBA2 variation c.2T>C associated with c.-24C>G, and the HBA2 variations c.391G>C and c.427T>C, both associated with c.565G>A. We set up original experimental conditions for DHPLC and DG-DGGE and analyzed 10 normal subjects, 46 heterozygotes, seven homozygotes, seven compound heterozygotes, and six compound heterozygotes for a hybrid gene. Both the methodologies gave reproducible results and no false-positive was detected. DHPLC showed 100% sensitivity and DG-DGGE nearly 90%. About 100% of the sequence from the cap site to the polyA addition site could be scanned by DHPLC, about 87% by DG-DGGE. It is noteworthy that the three most common pathogenic sequence variations (HBA2 alleles c.2T>C, c.95+2_95+6del, and c.523A>G) were unambiguously detected by both the methodologies. Genotype diagnosis must be confirmed with PCR sequencing of single amplicons or with an allele-specific method. This study can be helpful for scanning genes with high CG content and offers a model suitable for duplicated genes with high homology. Copyright 2004 Wiley-Liss, Inc.
Read clouds uncover variation in complex regions of the human genome
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-01-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
Blanchard, Adam M; Jolley, Keith A; Maiden, Martin C J; Coffey, Tracey J; Maboni, Grazieli; Staley, Ceri E; Bollard, Nicola J; Warry, Andrew; Emes, Richard D; Davies, Peers L; Tötemeyer, Sabine
2018-01-01
Dichelobacter nodosus ( D. nodosus ) is the causative pathogen of ovine footrot, a disease that has a significant welfare and financial impact on the global sheep industry. Previous studies into the phylogenetics of D. nodosus have focused on Australia and Scandinavia, meaning the current diversity in the United Kingdom (U.K.) population and its relationship globally, is poorly understood. Numerous epidemiological methods are available for bacterial typing; however, few account for whole genome diversity or provide the opportunity for future application of new computational techniques. Multilocus sequence typing (MLST) measures nucleotide variations within several loci with slow accumulation of variation to enable the designation of allele numbers to determine a sequence type. The usage of whole genome sequence data enables the application of MLST, but also core and whole genome MLST for higher levels of strain discrimination with a negligible increase in experimental cost. An MLST database was developed alongside a seven loci scheme using publically available whole genome data from the sequence read archive. Sequence type designation and strain discrimination was compared to previously published data to ensure reproducibility. Multiple D. nodosus isolates from U.K. farms were directly compared to populations from other countries. The U.K. isolates define new clades within the global population of D. nodosus and predominantly consist of serogroups A, B and H, however serogroups C, D, E, and I were also found. The scheme is publically available at https://pubmlst.org/dnodosus/.
Foust, C M; Preite, V; Schrey, A W; Alvarez, M; Robertson, M H; Verhoeven, K J F; Richards, C L
2016-04-01
While traits and trait plasticity are partly genetically based, investigating epigenetic mechanisms may provide more nuanced understanding of the mechanisms underlying response to environment. Using AFLP and methylation-sensitive AFLP, we tested the hypothesis that differentiation to habitats along natural salt marsh environmental gradients occurs at epigenetic, but not genetic loci in two salt marsh perennials. We detected significant genetic and epigenetic structure among populations and among subpopulations, but we found multilocus patterns of differentiation to habitat type only in epigenetic variation for both species. In addition, more epigenetic than genetic loci were correlated with habitat in both species. When we analysed genetic and epigenetic variation simultaneously with partial Mantel, we found no correlation between genetic variation and habitat and a significant correlation between epigenetic variation and habitat in Spartina alterniflora. In Borrichia frutescens, we found significant correlations between epigenetic and/or genetic variation and habitat in four of five populations when populations were analysed individually, but there was no significant correlation between genetic or epigenetic variation and habitat when analysed jointly across the five populations. These analyses suggest that epigenetic mechanisms are involved in the response to salt marsh habitats, but also that the relationships among genetic and epigenetic variation and habitat vary by species. Site-specific conditions may also cloud our ability to detect response in replicate populations with similar environmental gradients. Future studies analysing sequence data and the correlation between genetic variation and DNA methylation will be powerful to identify the contributions of genetic and epigenetic response to environmental gradients. © 2016 John Wiley & Sons Ltd.
ACTG: novel peptide mapping onto gene models.
Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok
2017-04-15
In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.
2015-01-01
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. PMID:26332131
Nabavi, Reza; Conneely, Brendan; McCarthy, Elaine; Good, Barbara; Shayan, Parviz; DE Waal, Theo
2014-09-01
Accurate identification of sheep nematodes is a critical point in epidemiological studies and monitoring of drug resistance in flocks. However, due to a close morphological similarity between the eggs and larval stages of many of these nematodes, such identification is not a trivial task. There are a number of studies showing that molecular targets in ribosomal DNA (Internal transcribed spacer 1, 2 and Intergenic spacer) are suitable for accurate identification of sheep bursate nematodes. The objective of present study was to compare the ITS1, ITS2 and IGS regions of Iranian common bursate nematodes in order to choose best target for specific identification methods. The first and second internal transcribed spacers (ITS1and ITS2) and intergenic spacer (IGS) of the ribosomal DNA (rDNA) of 5 common Iranian bursate nematodes of sheep were sequenced. The sequences of some non-Iranian isolates were used for comparison in order to evaluate the variation in sequence homology between geographically different nematode populations. Comparison of the ITS1 and ITS2 sequences of Iranian nematodes showed greatest similarity among Teladorsagia circumcincta and Marshallagia marshalli of 94% and 88%, respectively. While Trichostrongylus colubriformis and M. marshalli showed the highest homology (99%) in the IGS sequences. Comparison of the spacer sequences of Iranian with non-Iranian isolates showed significantly higher variation in Haemonchus contortus compared to the other species. Both the ITS1 and ITS2 sequences are convenient targets to have species-specific identification of Iranian bursate nematodes. On the other hand the IGS region may be a less suitable molecular target.
Usage of mitochondrial D-loop variation to predict risk for Huntington disease.
Mousavizadeh, Kazem; Rajabi, Peyman; Alaee, Mahsa; Dadgar, Sepideh; Houshmand, Massoud
2015-08-01
Huntington's disease (HD) is an inherited autosomal neurodegenerative disease caused by the abnormal expansion of the CAG repeats in the Huntingtin (Htt) gene. It has been proven that mitochondrial dysfunction is contributed to the pathogenesis of Huntington's disease. The mitochondrial displacement loop (D-loop) is proven to accumulate mutations at a higher rate than other regions of mtDNA. Thus, we hypothesized that specific SNPs in the D-loop may contribute to the pathogenesis of Huntington's disease. In the present study, 30 patients with Huntington's disease and 463 healthy controls were evaluated for mitochondrial mutation sites within the D-loop region using PCR-sequencing method. Sequence analysis revealed 35 variations in HD group from Cambridge Mitochondrial Sequences. A significant difference (p < 0.05) was seen between patients and control group in eight SNPs. Polymorphisms at C16069T, T16126C, T16189C, T16519C and C16223T were correlated with an increased risk of HD while SNPs at C16150T, T16086C and T16195C were associated with a decreased risk of Huntington's disease.
Jørgensen, Agnete; Fagerheim, Toril; Rand-Hendriksen, Svend; Lunde, Per I; Vorren, Torgrim O; Pepin, Melanie G; Leistritz, Dru F; Byers, Peter H
2015-01-01
Vascular Ehlers–Danlos Syndrome (vEDS), also known as EDS type IV, is considered to be an autosomal dominant disorder caused by sequence variants in COL3A1, which encodes the chains of type III procollagen. We identified a family in which there was marked clinical variation with the earliest death due to extensive aortic dissection at age 15 years and other family members in their eighties with no complications. The proband was born with right-sided clubfoot but was otherwise healthy until he died unexpectedly at 15 years. His sister, in addition to signs consistent with vascular EDS, had bilateral frontal and parietal polymicrogyria. The proband and his sister each had two COL3A1 sequence variants, c.1786C>T, p.(Arg596*) in exon 26 and c.3851G>A, p.(Gly1284Glu) in exon 50 on different alleles. Cells from the compound heterozygote produced a reduced amount of type III procollagen, all the chains of which had abnormal electrophoretic mobility. Biallelic sequence variants have a significantly worse outcome than heterozygous variants for either null mutations or missense mutations, and frontoparietal polymicrogyria may be an added phenotype feature. This genetic constellation provides a very rare explanation for marked intrafamilial clinical variation due to sequence variants in COL3A1. PMID:25205403
Development of Genomic Simple Sequence Repeats (SSR) by Enrichment Libraries in Date Palm.
Al-Faifi, Sulieman A; Migdadi, Hussein M; Algamdi, Salem S; Khan, Mohammad Altaf; Al-Obeed, Rashid S; Ammar, Megahed H; Jakse, Jerenj
2017-01-01
Development of highly informative markers such as simple sequence repeats (SSR) for cultivar identification and germplasm characterization and management is essential for date palms genetic studies. The present study documents the development of SSR markers and assesses genetic relationships of commonly grown date palm (Phoenix dactylifera L.) cultivars in different geographical regions of Saudi Arabia. A total of 93 novel simple sequence repeat (SSR) markers were screened for their ability to detect polymorphism in date palm. Around 71% of genomic SSRs are dinucleotide, 25% trinucleotide, 3% tetranucleotide, and 1% pentanucleotide motives and show 100% polymorphism. The Unweighted Pair Group Method with Arithmetic Mean (UPGMA) cluster analysis illustrates that cultivars trend to group according to their class of maturity, region of cultivation, and fruit color. Analysis of molecular variations (AMOVA) reveals genetic variation among and within cultivars of 27% and 73%, respectively, according to the geographical distribution of the cultivars. Developed microsatellite markers are of additional value to date palm characterization, tools which can be used by researchers in population genetics, cultivar identification, as well as genetic resource exploration and management. The cultivars tested exhibited a significant amount of genetic diversity and could be suitable for successful breeding programs. Genomic sequences generated from this study are available at the National Center for Biotechnology Information (NCBI), Sequence Read Archive (Accession numbers. LIBGSS_039019).
Fishman, G A; Stone, E M; Grover, S; Derlacki, D J; Haines, H L; Hockey, R R
1999-04-01
To report the spectrum of ophthalmic findings in patients with Stargardt dystrophy or fundus flavimaculatus who have a specific sequence variation in the ABCR gene. Twenty-nine patients with Stargardt dystrophy or fundus flavimaculatus from different pedigrees were identified with possible disease-causing sequence variations in the ABCR gene from a group of 66 patients who were screened for sequence variations in this gene. Patients underwent a routine ocular examination, including slitlamp biomicroscopy and a dilated fundus examination. Fluorescein angiography was performed on 22 patients, and electroretinographic measurements were obtained on 24 of 29 patients. Kinetic visual fields were measured with a Goldmann perimeter in 26 patients. Single-strand conformation polymorphism analysis and DNA sequencing were used to identify variations in coding sequences of the ABCR gene. Three clinical phenotypes were observed among these 29 patients. In phenotype I, 9 of 12 patients had a sequence change in exon 42 of the ABCR gene in which the amino acid glutamic acid was substituted for glycine (Gly1961Glu). In only 4 of these 9 patients was a second possible disease-causing mutation found on the other ABCR allele. In addition to an atrophic-appearing macular lesion, phenotype I was characterized by localized perifoveal yellowish white flecks, the absence of a dark choroid, and normal electroretinographic amplitudes. Phenotype II consisted of 10 patients who showed a dark choroid and more diffuse yellowish white flecks in the fundus. None exhibited the Gly1961Glu change. Phenotype III consisted of 7 patients who showed extensive atrophic-appearing changes of the retinal pigment epithelium. Electroretinographic cone and rod amplitudes were reduced. One patient showed the Gly1961Glu change. A wide variation in clinical phenotype can occur in patients with sequence changes in the ABCR gene. In individual patients, a certain phenotype seems to be associated with the presence of a Gly1961Glu change in exon 42 of the ABCR gene. The identification of correlations between specific mutations in the ABCR gene and clinical phenotypes will better facilitate the counseling of patients on their visual prognosis. This information will also likely be important for future therapeutic trials in patients with Stargardt dystrophy.
Technical variations in low-input RNA-seq methodologies.
Bhargava, Vipul; Head, Steven R; Ordoukhanian, Phillip; Mercola, Mark; Subramaniam, Shankar
2014-01-14
Recent advances in RNA-seq methodologies from limiting amounts of mRNA have facilitated the characterization of rare cell-types in various biological systems. So far, however, technical variations in these methods have not been adequately characterized, vis-à-vis sensitivity, starting with reduced levels of mRNA. Here, we generated sequencing libraries from limiting amounts of mRNA using three amplification-based methods, viz. Smart-seq, DP-seq and CEL-seq, and demonstrated significant technical variations in these libraries. Reduction in mRNA levels led to inefficient amplification of the majority of low to moderately expressed transcripts. Furthermore, noise in primer hybridization and/or enzyme incorporation was magnified during the amplification step resulting in significant distortions in fold changes of the transcripts. Consequently, the majority of the differentially expressed transcripts identified were either high-expressed and/or exhibited high fold changes. High technical variations ultimately masked subtle biological differences mandating the development of improved amplification-based strategies for quantitative transcriptomics from limiting amounts of mRNA.
USDA-ARS?s Scientific Manuscript database
Magnaporthe oryzae, the rice blast pathogen, causes significant annual yield loss of rice worldwide. Currently, the most effective disease control approach is deployment of host resistance through introduction of resistance (R) genes into elite cultivars. The function of each R gene relies on the sp...
Borovsky, Yelena; Sharma, Vinod K; Verbakel, Henk; Paran, Ilan
2015-06-01
The APETALA2 transcription factor homolog CaAP2 is a candidate gene for a flowering repressor in pepper, as revealed by induced-mutation phenotype, and a candidate underlying a major QTL controlling natural variation in flowering time. To decipher the genetic control of transition to flowering in pepper (Capsicum spp.) and determine the extent of gene function conservation compared to model species, we isolated and characterized several ethyl methanesulfonate (EMS)-induced mutants that vary in their flowering time compared to the wild type. In the present study, we report on the isolation of an early-flowering mutant that flowers after four leaves on the primary stem compared to nine leaves in the wild-type 'Maor'. By genetic mapping and sequencing of putative candidate genes linked to the mutant phenotype, we identified a member of the APETALA2 (AP2) transcription factor family, CaAP2, which was disrupted in the early-flowering mutant. CaAP2 is a likely ortholog of AP2 that functions as a repressor of flowering in Arabidopsis. To test whether CaAP2 has an effect on controlling natural variation in the transition to flowering in pepper, we performed QTL mapping for flowering time in a cross between early and late-flowering C. annuum accessions. We identified a major QTL in a region of chromosome 2 in which CaAP2 was the most significant marker, explaining 52 % of the phenotypic variation of the trait. Sequence comparison of the CaAP2 open reading frames in the two parents used for QTL mapping did not reveal significant variation. In contrast, significant differences in expression level of CaAP2 were detected between near-isogenic lines that differ for the flowering time QTL, supporting the putative function of CaAP2 as a major repressor of flowering in pepper.
Intra-isolate genome variation in arbuscular mycorrhizal fungi persists in the transcriptome.
Boon, E; Zimmerman, E; Lang, B F; Hijri, M
2010-07-01
Arbuscular mycorrhizal fungi (AMF) are heterokaryotes with an unusual genetic makeup. Substantial genetic variation occurs among nuclei within a single mycelium or isolate. AMF reproduce through spores that contain varying fractions of this heterogeneous population of nuclei. It is not clear whether this genetic variation on the genome level actually contributes to the AMF phenotype. To investigate the extent to which polymorphisms in nuclear genes are transcribed, we analysed the intra-isolate genomic and cDNA sequence variation of two genes, the large subunit ribosomal RNA (LSU rDNA) of Glomus sp. DAOM-197198 (previously known as G. intraradices) and the POL1-like sequence (PLS) of Glomus etunicatum. For both genes, we find high sequence variation at the genome and transcriptome level. Reconstruction of LSU rDNA secondary structure shows that all variants are functional. Patterns of PLS sequence polymorphism indicate that there is one functional gene copy, PLS2, which is preferentially transcribed, and one gene copy, PLS1, which is a pseudogene. This is the first study that investigates AMF intra-isolate variation at the transcriptome level. In conclusion, it is possible that, in AMF, multiple nuclear genomes contribute to a single phenotype.
Whole-Genome Sequence Variation among Multiple Isolates of Pseudomonas aeruginosa
Spencer, David H.; Kas, Arnold; Smith, Eric E.; Raymond, Christopher K.; Sims, Elizabeth H.; Hastings, Michele; Burns, Jane L.; Kaul, Rajinder; Olson, Maynard V.
2003-01-01
Whole-genome shotgun sequencing was used to study the sequence variation of three Pseudomonas aeruginosa isolates, two from clonal infections of cystic fibrosis patients and one from an aquatic environment, relative to the genomic sequence of reference strain PAO1. The majority of the PAO1 genome is represented in these strains; however, at least three prominent islands of PAO1-specific sequence are apparent. Conversely, ∼10% of the sequencing reads derived from each isolate fail to align with the PAO1 backbone. While average sequence variation among all strains is roughly 0.5%, regions of pronounced differences were evident in whole-genome scans of nucleotide diversity. We analyzed two such divergent loci, the pyoverdine and O-antigen biosynthesis regions, by complete resequencing. A thorough analysis of isolates collected over time from one of the cystic fibrosis patients revealed independent mutations resulting in the loss of O-antigen synthesis alternating with a mucoid phenotype. Overall, we conclude that most of the PAO1 genome represents a core P. aeruginosa backbone sequence while the strains addressed in this study possess additional genetic material that accounts for at least 10% of their genomes. Approximately half of these additional sequences are novel. PMID:12562802
Chen, Fen; Li, Juan; Sugiyama, Hiromu; Zhou, Dong-Hui; Song, Hui-Qun; Zhao, Guang-Hui; Zhu, Xing-Quan
2015-02-01
The present study examined sequence variability in the mitochondrial (mt) protein-coding genes cytochrome b (cytb), NADH dehydrogenase subunits 2 and 6 (nad2 and nad6) among 24 isolates of Schistosoma japonicum from different endemic regions in the Philippines, Japan and China. The complete cytb, nad2 and nad6 genes were amplified and sequenced separately from individual schistosome. Sequence variations for isolates from the Philippines were 0-0.5% for cytb, 0-0.6% for nad2, and 0-0.9% for nad6. Variation was 0-0.5%, 0.1-0.8%, 0-0.7% for corresponding genes for schistosome samples from mainland China. For worms in Japan, genetic variations were 0-0.2%, 0.1-0.2% and 0 for the three genes, respectively. Sequence variations were 0-1.0%, 0-1.8% and 0-1.1% for cytb, nad2 and nad6, respectively, among schistosome isolates from different geographical strains in the Philippines, Japan and China. Of the three countries, lowest sequence variations were found between isolates from mainland China and the Philippines and highest were detected between Japan and the Philippines in three mtDNA genes. Phylogenetic analyses based on the combined sequences of cytb, nad2 and nad6 revealed that all isolates in the Philippines clustered together sistered to samples from Yunnan and Zhejiang provinces in China, while isolates from Yamanashi in Japan were in a solitary clade. These results demonstrated the usefulness of the combined three mtDNA sequences for studying genetic diversity and population structure among S. japonicum isolates from the Philippines, China and Japan.
The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.
Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir
2015-08-06
Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Mutations in the C-terminus of CDKL5: proceed with caution.
Diebold, Bertrand; Delépine, Chloé; Gataullina, Svetlana; Delahaye, Andrée; Nectoux, Juliette; Bienvenu, Thierry
2014-02-01
Mutations in the cyclin-dependent kinase-like 5 (CDKL5) gene have been described in girls with Rett-like features and early-onset epileptic encephalopathy including infantile spasms. Milder phenotypes have been associated with sequence variations in the 3'-end of the CDKL5 gene. Identification of novel CDKL5 transcripts coding isoforms characterized by an altered C-terminal region strongly questions the eventual pathogenicity of sequence variations located in the 3'-end of the gene. We investigated a group of 30 female patients with a clinically heterogeneous phenotype ranging from nonspecific intellectual disability to a severe neonatal encephalopathy and identified two heterozygous CDKL5 missense mutations, the previously reported p.Val999Met and the novel mutation p.Pro944Thr. However, these mutations have also been detected in their healthy father. Considering our results and all data from the literature, we suggest that genetic variations beyond the codon 938 in human CDKL5115 protein may have minor or no significance. It is probable that screening of exons 19-21 of the CDKL5 gene is not useful in practical molecular diagnosis of atypical Rett syndrome.
Castiglioni, Emanuela; Finazzi, Dario; Goldwurm, Stefano; Pezzoli, Gianni; Forni, Gianluca; Girelli, Domenico; Maccarinelli, Federica; Poli, Maura; Ferrari, Maurizio; Cremonesi, Laura; Arosio, Paolo
2011-01-01
The capacity to act as an electron donor and acceptor makes iron an essential cofactor of many vital processes. Its balance in the body has to be tightly regulated since its excess can be harmful by favouring oxidative damage, while its deficiency can impair fundamental activities like erythropoiesis. In the brain, an accumulation of iron or an increase in its availability has been associated with the development and/or progression of different degenerative processes, including Parkinson's disease, while iron paucity seems to be associated with cognitive deficits, motor dysfunction, and restless legs syndrome. In the search of DNA sequence variations affecting the individual predisposition to develop movement disorders, we scanned by DHPLC the exons and intronic boundary regions of ceruloplasmin, iron regulatory protein 2, hemopexin, hepcidin and hemojuvelin genes in cohorts of subjects affected by Parkinson's disease and idiopathic neurodegeneration with brain iron accumulation (NBIA). Both novel and known sequence variations were identified in most of the genes, but none of them seemed to be significantly associated to the movement diseases of interest. PMID:20981230
Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop.
Hazzouri, Khaled M; Flowers, Jonathan M; Visser, Hendrik J; Khierallah, Hussam S M; Rosas, Ulises; Pham, Gina M; Meyer, Rachel S; Johansen, Caryn K; Fresquez, Zoë A; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A; Thirkhill, Deborah; Markhand, Ghulam S; Krueger, Robert R; Zaid, Abdelouahhab; Purugganan, Michael D
2015-11-09
Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop.
Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop
Hazzouri, Khaled M.; Flowers, Jonathan M.; Visser, Hendrik J.; Khierallah, Hussam S. M.; Rosas, Ulises; Pham, Gina M.; Meyer, Rachel S.; Johansen, Caryn K.; Fresquez, Zoë A.; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A.; Thirkhill, Deborah; Markhand, Ghulam S.; Krueger, Robert R.; Zaid, Abdelouahhab; Purugganan, Michael D.
2015-01-01
Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop. PMID:26549859
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor L.; Brow, Mary Ann D.; Dahlberg, James E.
2007-12-11
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Invasive cleavage of nucleic acids
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.
1999-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Invasive cleavage of nucleic acids
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.
2002-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow; Mary Ann D.; Dahlberg, James E.
2010-11-09
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.
2000-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann; Dahlberg, James E.
2005-04-05
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.
Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.
2016-01-01
ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral factors associated with intraepidemic increasing severity in a unique epidemiological setting. Here, we investigated the intrahost genetic diversity in acute human samples collected at different time points during the DENV-3 epidemic that occurred in Cuba in 2001 to 2002 using a deep-sequencing approach. We concluded that greater variability in significant minor populations occurred as the epidemic progressed, particularly in the nonstructural genes, with higher variability observed in secondary infection cases. Remarkably, for the first time significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in structural proteins. These findings indicate that high-resolution approaches are needed to unravel molecular mechanisms involved in dengue pathogenesis. PMID:26889031
Kim, Daniel Seung; Crosslin, David R.; Auer, Paul L.; Suzuki, Stephanie M.; Marsillach, Judit; Burt, Amber A.; Gordon, Adam S.; Meschia, James F.; Nalls, Mike A.; Worrall, Bradford B.; Longstreth, W. T.; Gottesman, Rebecca F.; Furlong, Clement E.; Peters, Ulrike; Rich, Stephen S.; Nickerson, Deborah A.; Jarvik, Gail P.
2014-01-01
HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10−3). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10−3). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10−3; AA P = 6.52 × 10−4), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted. PMID:24711634
Ghalehnoei, Hossein; Ahmadzadeh, Alireza; Farzi, Nastaran; Alebouyeh, Masoud; Aghdaei, Hamid Asadzadeh; Azimzadeh, Pendram; Molaei, Mahsa; Zali, Mohammad Reza
2016-01-01
Association of the severity of Helicobacter pylori induced diseases with virulence entity of the colonized strains was proven in some studies. Urease has been demonstrated as a potent virulence factor for H. pylori. The main aim of this study was investigation of the relationships of ureB sequence diversity, urease activity and virulence genotypes of different H. pylori strains with histopathological changes of gastric tissue in infected patients suffering from different gastric disorders. Analysis of the virulence genotypes in the isolated strains indicated significant associations between the presence of severe active gastritis and cagA+ (P = 0.039) or cagA/iceA1 genotypes (P = 0.026), and intestinal metaplasia and vacA m1 (P = 0.008) or vacA s1/m2 (P = 0.001) genotypes. Our results showed a 2.4-fold increased risk of peptic ulcer (95% CI: 0.483-11.93), compared with gastritis, in the infected patients who had dupA positive strains; however this association was not statistically significant. The results of urease activity showed a significant mean difference between the isolated strains from patients with PUD and NUD (P = 0.034). This activity was relatively higher among patients with intestinal metaplasia. Also a significant association was found between the lack of cagA and increased urease activity among the isolated strains (P = 0.036). While the greatest sequence variation of ureB was detected in a strain from a patient with intestinal metaplasia, the sole determined amino acid change in UreB sequence (Ala201Thr, 30%), showed no influence on urease activity. In conclusion, the supposed role of H. pylori urease to form peptic ulcer and advancing of intestinal metaplasia was postulated in this study. Higher urease activity in the colonizing H. pylori strains that present specific virulence factors was indicated as a risk factor for promotion of histopathological changes of gastric tissue that advance gastric malignancy.
Yang, Jie; Wang, Zhen Long; Zhao, Xin Quan; Wang, De Peng; Qi, De Lin; Xu, Bao Hong; Ren, Yong Hong; Tian, Hui Fang
2008-01-01
Background Environmental stress can accelerate the evolutionary rate of specific stress-response proteins and create new functions specialized for different environments, enhancing an organism's fitness to stressful environments. Pikas (order Lagomorpha), endemic, non-hibernating mammals in the modern Holarctic Region, live in cold regions at either high altitudes or high latitudes and have a maximum distribution of species diversification confined to the Qinghai-Tibet Plateau. Variations in energy metabolism are remarkable for them living in cold environments. Leptin, an adipocyte-derived hormone, plays important roles in energy homeostasis. Methodology/Principal Findings To examine the extent of leptin variations within the Ochotona family, we cloned the entire coding sequence of pika leptin from 6 species in two regions (Qinghai-Tibet Plateau and Inner Mongolia steppe in China) and the leptin sequences of plateau pikas (O. curzonia) from different altitudes on Qinghai-Tibet Plateau. We carried out both DNA and amino acid sequence analyses in molecular evolution and compared modeled spatial structures. Our results show that positive selection (PS) acts on pika leptin, while nine PS sites located within the functionally significant segment 85-119 of leptin and one unique motif appeared only in pika lineages-the ATP synthase α and β subunit signature site. To reveal the environmental factors affecting sequence evolution of pika leptin, relative rate test was performed in pikas from different altitudes. Stepwise multiple regression shows that temperature is significantly and negatively correlated with the rates of non-synonymous substitution (Ka) and amino acid substitution (Aa), whereas altitude does not significantly affect synonymous substitution (Ks), Ka and Aa. Conclusions/Significance Our findings support the viewpoint that adaptive evolution may occur in pika leptin, which may play important roles in pikas' ecological adaptation to extreme environmental stress. We speculate that cold, and probably not hypoxia, may be the primary environmental factor for driving adaptive evolution of pika leptin. PMID:18213380
Kumar, Girish; Kocour, Martin; Kunal, Swaraj Priyaranjan
2016-05-01
In order to assess the DNA sequence variation and phylogenetic relationship among five tuna species (Auxis thazard, Euthynnus affinis, Katsuwonus pelamis, Thunnus tonggol, and T. albacares) out of all four tuna genera, partial sequences of the mitochondrial DNA (mtDNA) D-loop region were analyzed. The estimate of intra-specific sequence variation in studied species was low, ranging from 0.027 to 0.080 [Kimura's two parameter distance (K2P)], whereas values of inter-specific variation ranged from 0.049 to 0.491. The longtail tuna (T. tonggol) and yellowfin tuna (T. albacares) were found to share a close relationship (K2P = 0.049) while skipjack tuna (K. pelamis) was most divergent studied species. Phylogenetic analysis using Maximum-Likelihood (ML) and Neighbor-Joining (NJ) methods supported the monophyletic origin of Thunnus species. Similarly, phylogeny of Auxis and Euthynnus species substantiate the monophyly. However, results showed a distinct origin of K. pelamis from genus Thunnus as well as Auxis and Euthynnus. Thus, the mtDNA D-loop region sequence data supports the polyphyletic origin of tuna species.
Sampson, Juliana K; Sheth, Nihar U; Koparde, Vishal N; Scalora, Allison F; Serrano, Myrna G; Lee, Vladimir; Roberts, Catherine H; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H; Buck, Gregory A; Neale, Michael C; Toor, Amir A
2014-08-01
Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. © 2014 John Wiley & Sons Ltd.
2011-01-01
Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336
Liu, Ying; Tang, Yuanman; Qin, Xiyun; Yang, Liang; Jiang, Gaofei; Li, Shili; Ding, Wei
2017-01-01
Ralstonia solanacearum, an agent of bacterial wilt, is a highly variable species with a broad host range and wide geographic distribution. As a species complex, it has extensive genetic diversity and its living environment is polymorphic like the lowland and the highland area, so more genomes are needed for studying population evolution and environment adaptation. In this paper, we reported the genome sequencing of R. solanacearum strain CQPS-1 isolated from wilted tobacco in Pengshui, Chongqing, China, a highland area with severely acidified soil and continuous cropping of tobacco more than 20 years. The comparative genomic analysis among different R. solanacearum strains was also performed. The completed genome size of CQPS-1 was 5.89 Mb and contained the chromosome (3.83 Mb) and the megaplasmid (2.06 Mb). A total of 5229 coding sequences were predicted (the chromosome and megaplasmid encoded 3573 and 1656 genes, respectively). A comparative analysis with eight strains from four phylotypes showed that there was some variation among the species, e.g., a large set of specific genes in CQPS-1. Type III secretion system gene cluster (hrp gene cluster) was conserved in CQPS-1 compared with the reference strain GMI1000. In addition, most genes coding core type III effectors were also conserved with GMI1000, but significant gene variation was found in the gene ripAA: the identity compared with strain GMI1000 was 75% and the hrpII box promoter in the upstream had significantly mutated. This study provided a potential resource for further understanding of the relationship between variation of pathogenicity factors and adaptation to the host environment. PMID:28620361
Genetics of Inflammatory Bowel Diseases
McGovern, Dermot; Kugathasan, Subra; Cho, Judy H.
2015-01-01
In this Review, we provide an update on genome-wide association studies (GWAS) in inflammatory bowel disease (IBD). In addition, we summarize progress in defining the functional consequences of associated alleles for coding and non-coding genetic variation. In the small minority of loci where major association signals correspond to non-synonymous variation, we summarize studies defining their functional effects and implications for therapeutic targeting. Importantly, the large majority of GWAS-associated loci involve non-coding variation, many of which modulate levels of gene expression. Recent expression quantitative trait loci (eQTL) studies have established that expression of the large majority of human genes is regulated by non-coding genetic variation. Significant advances in defining the epigenetic landscape have demonstrated that IBD GWAS signals are highly enriched within cell-specific active enhancer marks. Studies in European ancestry populations have dominated the landscape of IBD genetics studies, but increasingly, studies in Asian and African-American populations are being reported. Common variation accounts for only a modest fraction of the predicted heritability and the role of rare genetic variation of higher effects (i.e. odds ratios markedly deviating from one) is increasingly being identified through sequencing efforts. These sequencing studies have been particularly productive in very-early onset, more severe cases. A major challenge in IBD genetics will be harnessing the vast array of genetic discovery for clinical utility, through emerging precision medicine initiatives. We discuss the rapidly evolving area of direct to consumer genetic testing, as well as the current utility of clinical exome sequencing, especially in very early onset, severe IBD cases. We summarize recent progress in the pharmacogenetics of IBD with respect of partitioning patient responses to anti-TNF and thiopurine therapies. Highly collaborative studies across research centers and across subspecialties and disciplines will be required to fully realize the promise of genetic discovery in IBD. PMID:26255561
An Exome Sequencing Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis.
Petrovski, Slavé; Todd, Jamie L; Durheim, Michael T; Wang, Quanli; Chien, Jason W; Kelly, Fran L; Frankel, Courtney; Mebane, Caroline M; Ren, Zhong; Bridgers, Joshua; Urban, Thomas J; Malone, Colin D; Finlen Copeland, Ashley; Brinkley, Christie; Allen, Andrew S; O'Riordan, Thomas; McHutchison, John G; Palmer, Scott M; Goldstein, David B
2017-07-01
Idiopathic pulmonary fibrosis (IPF) is an increasingly recognized, often fatal lung disease of unknown etiology. The aim of this study was to use whole-exome sequencing to improve understanding of the genetic architecture of pulmonary fibrosis. We performed a case-control exome-wide collapsing analysis including 262 unrelated individuals with pulmonary fibrosis clinically classified as IPF according to American Thoracic Society/European Respiratory Society/Japanese Respiratory Society/Latin American Thoracic Association guidelines (81.3%), usual interstitial pneumonia secondary to autoimmune conditions (11.5%), or fibrosing nonspecific interstitial pneumonia (7.2%). The majority (87%) of case subjects reported no family history of pulmonary fibrosis. We searched 18,668 protein-coding genes for an excess of rare deleterious genetic variation using whole-exome sequence data from 262 case subjects with pulmonary fibrosis and 4,141 control subjects drawn from among a set of individuals of European ancestry. Comparing genetic variation across 18,668 protein-coding genes, we found a study-wide significant (P < 4.5 × 10 -7 ) case enrichment of qualifying variants in TERT, RTEL1, and PARN. A model qualifying ultrarare, deleterious, nonsynonymous variants implicated TERT and RTEL1, and a model specifically qualifying loss-of-function variants implicated RTEL1 and PARN. A subanalysis of 186 case subjects with sporadic IPF confirmed TERT, RTEL1, and PARN as study-wide significant contributors to sporadic IPF. Collectively, 11.3% of case subjects with sporadic IPF carried a qualifying variant in one of these three genes compared with the 0.3% carrier rate observed among control subjects (odds ratio, 47.7; 95% confidence interval, 21.5-111.6; P = 5.5 × 10 -22 ). We identified TERT, RTEL1, and PARN-three telomere-related genes previously implicated in familial pulmonary fibrosis-as significant contributors to sporadic IPF. These results support the idea that telomere dysfunction is involved in IPF pathogenesis.
Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.
2011-02-01
Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.
Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species
Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.
2012-01-01
Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of genomic content. Differences in gene content likely contribute to differences in the clinical and environmental distribution of species and sequence types. PMID:23166675
Zhang, N N; Hu, J W; Liu, H H; Xu, H Y; He, H; Li, L
2015-12-29
Tyrosinase, encoded by the TYR gene, is the rate-limiting enzyme in the production of melanin pigment. In this study, plumage color separation was observed in Cherry Valley duck line D and F1 and F2 hybrid generations of Liancheng white ducks. Gene sequencing and bioinformatic analysis were applied to the 5'-regulatory region of TYR, to explore the connection between TYR sequence variation and duck plumage color. Four SNPs were found in the 5'-regulatory region. The SNPs were in tight linkage and formed three haplotypes. However, the genotype distribution in groups with different plumage color was not significantly different, and there were no changes in the transcription factor binding sites between the different genotypes. In conclusion, these SNP variations may not cause the differences in feather color observed in this test group.
NASA Astrophysics Data System (ADS)
Cawood, Adam J.; Bond, Clare E.
2018-01-01
Stratigraphic influence on structural style and strain distribution in deformed sedimentary sequences is well established, in models of 2D mechanical stratigraphy. In this study we attempt to refine existing models of stratigraphic-structure interaction by examining outcrop scale 3D variations in sedimentary architecture and the effects on subsequent deformation. At Monkstone Point, Pembrokeshire, SW Wales, digital mapping and virtual scanline data from a high resolution virtual outcrop have been combined with field observations, sedimentary logs and thin section analysis. Results show that significant variation in strain partitioning is controlled by changes, at a scale of tens of metres, in sedimentary architecture within Upper Carboniferous fluvio-deltaic deposits. Coupled vs uncoupled deformation of the sequence is defined by the composition and lateral continuity of mechanical units and unit interfaces. Where the sedimentary sequence is characterized by gradational changes in composition and grain size, we find that deformation structures are best characterized by patterns of distributed strain. In contrast, distinct compositional changes vertically and in laterally equivalent deposits results in highly partitioned deformation and strain. The mechanical stratigraphy of the study area is inherently 3D in nature, due to lateral and vertical compositional variability. Consideration should be given to 3D variations in mechanical stratigraphy, such as those outlined here, when predicting subsurface deformation in multi-layers.
Conformation and Stability of Intramolecular Telomeric G-Quadruplexes: Sequence Effects in the Loops
Sattin, Giovanna; Artese, Anna; Nadai, Matteo; Costa, Giosuè; Parrotta, Lucia; Alcaro, Stefano; Palumbo, Manlio; Richter, Sara N.
2013-01-01
Telomeres are guanine-rich sequences that protect the ends of chromosomes. These regions can fold into G-quadruplex structures and their stabilization by G-quadruplex ligands has been employed as an anticancer strategy. Genetic analysis in human telomeres revealed extensive allelic variation restricted to loop bases, indicating that the variant telomeric sequences maintain the ability to fold into G-quadruplex. To assess the effect of mutations in loop bases on G-quadruplex folding and stability, we performed a comprehensive analysis of mutant telomeric sequences by spectroscopic techniques, molecular dynamics simulations and gel electrophoresis. We found that when the first position in the loop was mutated from T to C or A the resulting structure adopted a less stable antiparallel topology; when the second position was mutated to C or A, lower thermal stability and no evident conformational change were observed; in contrast, substitution of the third position from A to C induced a more stable and original hybrid conformation, while mutation to T did not significantly affect G-quadruplex topology and stability. Our results indicate that allelic variations generate G-quadruplex telomeric structures with variable conformation and stability. This aspect needs to be taken into account when designing new potential anticancer molecules. PMID:24367632
Mitochondrial DNA variation of indigenous goats in Narok and Isiolo counties of Kenya.
Kibegwa, F M; Githui, K E; Jung'a, J O; Badamana, M S; Nyamu, M N
2016-06-01
Phylogenetic relationships among and genetic variability within 60 goats from two different indigenous breeds in Narok and Isiolo counties in Kenya and 22 published goat samples were analysed using mitochondrial control region sequences. The results showed that there were 54 polymorphic sites in a 481-bp sequence and 29 haplotypes were determined. The mean haplotype diversity and nucleotide diversity were 0.981 ± 0.006 and 0.019 ± 0.001, respectively. The phylogenetic analysis in combination with goat haplogroup reference sequences from GenBank showed that all goat sequences were clustered into two haplogroups (A and G), of which haplogroup A was the commonest in the two populations. A very high percentage (99.90%) of the genetic variation was distributed within the regions, and a smaller percentage (0.10%) distributed among regions as revealed by the analysis of molecular variance (amova). This amova results showed that the divergence between regions was not statistically significant. We concluded that the high levels of intrapopulation diversity in Isiolo and Narok goats and the weak phylogeographic structuring suggested that there existed strong gene flow among goat populations probably caused by extensive transportation of goats in history. © 2015 Blackwell Verlag GmbH.
Alverson, Andrew J.; Wei, XiaoXin; Rice, Danny W.; Stern, David B.; Barry, Kerrie; Palmer, Jeffrey D.
2010-01-01
The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)—the two smallest characterized cucurbit mitochondrial genomes—and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors. PMID:20118192
Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin
2018-01-01
Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139
RSAT 2015: Regulatory Sequence Analysis Tools
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-01-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Potenza, L; Cafiero, M A; Camarda, A; La Salandra, G; Cucchiarini, L; Dachà, M
2009-10-01
In the present work mites previously identified as Dermanyssus gallinae De Geer (Acari, Mesostigmata) using morphological keys were investigated by molecular tools. The complete internal transcribed spacer 1 (ITS1), 5.8S ribosomal DNA, and ITS2 region of the ribosomal DNA from mites were amplified and sequenced to examine the level of sequence variations and to explore the feasibility of using this region in the identification of this mite. Conserved primers located at the 3'end of 18S and at the 5'start of 28S rRNA genes were used first, and amplified fragments were sequenced. Sequence analyses showed no variation in 5.8S and ITS2 region while slight intraspecific variations involving substitutions as well as deletions concentrated in the ITS1 region. Based on the sequence analyses a nested PCR of the ITS2 region followed by RFLP analyses has been set up in the attempt to provide a rapid molecular diagnostic tool of D. gallinae.
Mapping and phasing of structural variation in patient genomes using nanopore sequencing.
Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P
2017-11-06
Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.
Human structural variation: mechanisms of chromosome rearrangements
Weckselblatt, Brooke; Rudd, M. Katharine
2015-01-01
Chromosome structural variation (SV) is a normal part of variation in the human genome, but some classes of SV can cause neurodevelopmental disorders. Analysis of the DNA sequence at SV breakpoints can reveal mutational mechanisms and risk factors for chromosome rearrangement. Large-scale SV breakpoint studies have become possible recently owing to advances in next-generation sequencing (NGS) including whole-genome sequencing (WGS). These findings have shed light on complex forms of SV such as triplications, inverted duplications, insertional translocations, and chromothripsis. Sequence-level breakpoint data resolve SV structure and determine how genes are disrupted, fused, and/or misregulated by breakpoints. Recent improvements in breakpoint sequencing have also revealed non-allelic homologous recombination (NAHR) between paralogous long interspersed nuclear element (LINE) or human endogenous retrovirus (HERV) repeats as a cause of deletions, duplications, and translocations. This review covers the genomic organization of simple and complex constitutional SVs, as well as the molecular mechanisms of their formation. PMID:26209074
Read clouds uncover variation in complex regions of the human genome.
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-10-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Grievink, Liat Shavit; Penny, David; Hendy, Mike D; Holland, Barbara R
2009-01-01
Correction to Shavit Grievink L, Penny D, Hendy MD, Holland BR: LineageSpecificSeqgen: generating sequence data with lineage-specific variation in the proportion of variable sites. BMC Evol Biol 2008, 8(1):317.
Melendrez, Melanie C.; Lange, Rachel K.; Cohan, Frederick M.; Ward, David M.
2011-01-01
Previous research has shown that sequences of 16S rRNA genes and 16S-23S rRNA internal transcribed spacer regions may not have enough genetic resolution to define all ecologically distinct Synechococcus populations (ecotypes) inhabiting alkaline, siliceous hot spring microbial mats. To achieve higher molecular resolution, we studied sequence variation in three protein-encoding loci sampled by PCR from 60°C and 65°C sites in the Mushroom Spring mat (Yellowstone National Park, WY). Sequences were analyzed using the ecotype simulation (ES) and AdaptML algorithms to identify putative ecotypes. Between 4 and 14 times more putative ecotypes were predicted from variation in protein-encoding locus sequences than from variation in 16S rRNA and 16S-23S rRNA internal transcribed spacer sequences. The number of putative ecotypes predicted depended on the number of sequences sampled and the molecular resolution of the locus. Chao estimates of diversity indicated that few rare ecotypes were missed. Many ecotypes hypothesized by sequence analyses were different in their habitat specificities, suggesting different adaptations to temperature or other parameters that vary along the flow channel. PMID:21169433
Strachan, Norval J C; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J; Hanson, Mary F; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H A M; French, Nigel P; George, Tessy; Biggs, Patrick J; Forbes, Ken J
2015-10-07
Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei's genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei's genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world.
NASA Astrophysics Data System (ADS)
Telesca, Luciano; Lovallo, Michele; Lopez, Carmen; Marti Molist, Joan
2016-03-01
A detailed statistical investigation of the seismicity occurred at El Hierro volcano (Canary Islands) from 2011 to 2014 has been performed by analysing the time variation of four parameters: the Gutenberg-Richter b-value, the local coefficient of variation, the scaling exponent of the magnitude distribution and the main periodicity of the earthquake sequence calculated by using the Schuster's test. These four parameters are good descriptors of the time and magnitude distributions of the seismic sequence, and their variation indicate dynamical changes in the volcanic system. These variations can be attributed to the causes and types of seismicity, thus allowing to distinguish between different host-rock fracturing processes caused by intrusions of magma at different depths and overpressures. The statistical patterns observed among the studied unrest episodes and between them and the eruptive episode of 2011-2012 indicate that the response of the host rock to the deformation imposed by magma intrusion did not differ significantly from one episode to the other, thus suggesting that no significant local stress changes induced by magma intrusion occurred when comparing between all them. Therefore, despite the studied unrest episodes were caused by intrusions of magma at different depths and locations below El Hierro island, the mechanical response of the lithosphere was similar in all cases. This suggests that the reason why the first unrest culminated in an eruption while the other did not, may be related to the role of the regional/local tectonics acting at that moment, rather than to the forceful of magma intrusion.
Variations of bacteria and fungi in PM2.5 in Beijing, China
NASA Astrophysics Data System (ADS)
Du, Pengrui; Du, Rui; Ren, Weishan; Lu, Zedong; Zhang, Yang; Fu, Pingqing
2018-01-01
Bacteria and fungi present in the airborne fine particulate matter (PM2.5) play important roles in the atmosphere and provide significant impacts on human health. However, variations in the species composition and community structure have not been well understood. In this study, we sampled PM2.5 in suburban Beijing and analyzed the bacterial and fungal composition during different seasons and at different air pollution levels using gene sequencing methods. The results showed that the species richness and diversity of bacterial communities displayed a downtrend with the aggravation of air pollution. Additionally, the bacterial communities in spring samples showed the highest species richness, with average richness estimators, ACE and Chao 1, up to 14,649 and 7608, respectively, followed by winter samples (7690 and 5031, respectively) and autumn samples (4368 and 3438, respectively), whereas summer samples exhibited the lowest average ACE and Chao 1 indexes (2916 and 1900, respectively). The species richness of fungal communities followed the same seasonal pattern. The community structure of bacteria and the species composition of fungi in PM2.5 showed significant seasonal variations. The dominant bacteria were Actinobacteria (33.89%), Proteobacteria (25.72%), Firmicutes (19.87%), Cyanobacteria/Chloroplast (15.34%), and Bacteroidetes (3.19%), and Ascomycota, with an average abundance of 74.68% of all sequences, were the most abundant fungi. At the genus level, as many as 791 bacterial genera and 517 fungal genera were identified in PM2.5. The results advance our understanding of the distribution and variation of airborne microorganisms in the metropolitan surrounding areas.
Wang, Q Z; Huang, M; Downie, S R; Chen, Z X
2016-05-23
Invasive plants tend to spread aggressively in new habitats and an understanding of their genetic diversity and population structure is useful for their management. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed for the invasive plant species Praxelis clematidea (Asteraceae) from 5548 Stevia rebaudiana (Asteraceae) expressed sequence tags (ESTs). A total of 133 microsatellite-containing ESTs (2.4%) were identified, of which 56 (42.1%) were hexanucleotide repeat motifs and 50 (37.6%) were trinucleotide repeat motifs. Of the 24 primer pairs designed from these 133 ESTs, 7 (29.2%) resulted in significant polymorphisms. The number of alleles per locus ranged from 5 to 9. The relatively high genetic diversity (H = 0.2667, I = 0.4212, and P = 100%) of P. clematidea was related to high gene flow (Nm = 1.4996) among populations. The coefficient of population differentiation (GST = 0.2500) indicated that most genetic variation occurred within populations. A Mantel test suggested that there was significant correlation between genetic distance and geographical distribution (r = 0.3192, P = 0.012). These results further support the transferability of EST-SSR markers between closely related genera of the same family.
Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.
1996-01-01
The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850
Sequence dependent aggregation of peptides and fibril formation
NASA Astrophysics Data System (ADS)
Hung, Nguyen Ba; Le, Duy-Manh; Hoang, Trinh X.
2017-09-01
Deciphering the links between amino acid sequence and amyloid fibril formation is key for understanding protein misfolding diseases. Here we use Monte Carlo simulations to study the aggregation of short peptides in a coarse-grained model with hydrophobic-polar (HP) amino acid sequences and correlated side chain orientations for hydrophobic contacts. A significant heterogeneity is observed in the aggregate structures and in the thermodynamics of aggregation for systems of different HP sequences and different numbers of peptides. Fibril-like ordered aggregates are found for several sequences that contain the common HPH pattern, while other sequences may form helix bundles or disordered aggregates. A wide variation of the aggregation transition temperatures among sequences, even among those of the same hydrophobic fraction, indicates that not all sequences undergo aggregation at a presumable physiological temperature. The transition is found to be the most cooperative for sequences forming fibril-like structures. For a fibril-prone sequence, it is shown that fibril formation follows the nucleation and growth mechanism. Interestingly, a binary mixture of peptides of an aggregation-prone and a non-aggregation-prone sequence shows the association and conversion of the latter to the fibrillar structure. Our study highlights the role of a sequence in selecting fibril-like aggregates and also the impact of a structural template on fibril formation by peptides of unrelated sequences.
NASA Technical Reports Server (NTRS)
Bond, Gerard C.; Beavan, John; Kominz, Michelle A.; Devlin, William
1992-01-01
Spectral analyses of two sequences of shallow marine sedimentary cycles that were deposited between 510 and 530 million years ago were completed. One sequence is from Middle Cambrian rocks in southern Utah and the other is from Upper Cambrian rocks in the southern Canadian Rockies. In spite of the antiquity of these strata, and even though there are differences in the age, location, and cycle facies between the two sequences, both records have distinct spectral peaks with surprisingly similar periodicities. A null model constructed to test for significance of the spectral peaks and circulatory in the methodology indicates that all but one of the spectral peaks are significant at the 90 percent confidence level. When the ratios between the statistically significant peaks are measured, we find a consistent relation to orbital forcing; specifically, the spectral peak ratios in both the Utah and Canadian examples imply that a significant amount of the variance in the cyclic records is driven by the short eccentricity (approximately 109 ky) and by the precessional (approximately 21 ky) components of the Earth's orbital variations. Neither section contains a significant component of variance at the period of the obliquity cycle, however.
Differential expression of a WRKY gene between wild and cultivated soybeans correlates to seed size.
Gu, Yongzhe; Li, Wei; Jiang, Hongwei; Wang, Yan; Gao, Huihui; Liu, Miao; Chen, Qingshan; Lai, Yongcai; He, Chaoying
2017-05-17
Soybean (Glycine max) probably originated from the wild soybean (Glycine soja). Glycine max has a significantly larger seed size, but the underlying genomic changes are largely unknown. Candidate regulatory genes were preliminarily proposed by data co-localizing RNA sequencing with the quantitative loci (QTLs) for seed size. The soybean gene locus SoyWRKY15a and its orthologous genes from G. max (GmWRKY15a) and G. soja (GsWRKY15a) were analyzed in detail. The coding sequences were nearly identical between the two orthologs, but GmWRKY15a was significantly more highly expressed than GsWRKY15a. Four haplotypes (H1-H4) were found and they varied in the size of a CT-core microsatellite locus in the 5'-untranslated region of this gene. H1 (with six CT-repeats) was the only allelic version found in G. max, while H3 (with five CT-repeats) was the dominant G. soja allele. Differential expression of this gene in soybean pods was correlated with CT-repeat variation, and manipulation of the CT copy number altered the reporter gene expression, suggesting a regulatory role for the simple sequence repeats. Seed weight of wild soybeans harboring H1 was significantly greater than that of soybeans having haplotypes H2, H3, or H4, and seed weight was correlated with gene expression, suggesting the influence of GsWRKY15a in controlling seed size. However, the seed size might be refractory to increased SoyWRKY15a expression in cultivated soybeans. The evolutionary significance of SoyWRKY15a variation in soybean seed domestication is discussed. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Cingoz, Sultan; Agilkaya, Sinem; Oztura, Ibrahim; Eroglu, Secil; Karadeniz, Derya; Evlice, Ahmet; Altungoz, Oguz; Yilmaz, Hikmet; Baklan, Baris
2014-04-01
The HLA-DQB1*06:02 allele across all ethnic groups and the rs5770917 variation between CPT1B and CHKB genes in Japanese and Koreans are common genetic susceptibility factors for narcolepsy. This comprehensive genetic study sought to assess variations in CHKB and CPT1B susceptibility genes and HLA-DQB1*06:02 allele status in Turkish patients with narcolepsy and healthy persons. CHKB/CPT1B genes were sequenced in patients with narcolepsy (n=37) and healthy persons (n=100) to detect variations. The HLA-DQB1*06:02 allele status was determined by sequence specific polymerase chain reaction. The HLA-DQB1*06:02 allele was significantly more frequent in narcoleptic patients than in healthy persons (p=2×10(-7)) and in patients with narcolepsy and cataplexy than in those without (p=0.018). The mean of the multiple sleep latency test, sleep-onset rapid eye movement periods, and frequency of sleep paralysis significantly differed in the HLA-DQB1*06:02-positive patients. rs5770917, rs5770911, rs2269381, and rs2269382 were detected together as a haplotype in three patients and 11 healthy persons. In addition to this haplotype, the indel variation (rs144647670) was detected in the 5' upstream region of the human CHKB gene in the patients and healthy persons carrying four variants together. This study identified a novel haplotype consisting of the indel variation, which had not been detected in previous studies in Japanese and Korean populations, and observed four single-nucleotide polymorphisms in CHKB/CPT1B. The study confirmed the association of the HLA-DQB1*06:02 allele with narcolepsy and cataplexy susceptibility. The findings suggest that the presence of HLA-DQB1*06:02 may be a predictor of cataplexy in narcoleptic patients and could therefore be used as an additional diagnostic marker alongside hypocretin.
Whole-genome CNV analysis: advances in computational approaches.
Pirooznia, Mehdi; Goes, Fernando S; Zandi, Peter P
2015-01-01
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Mitochondrial DNA Sequence Variation in North Atlantic Long-Finned Pilot Whales, Globicephala melas
1994-06-01
Delphinapterus leucas : mitochondrial DNA sequence variation within and among North American populations. M.Sc. thesis. McMaster University. Brown, G.G...Delphinapteras leucas ) (Brennin 1992), minke whales {Balaenoptera acutorostratd) (Wada et al. 1991), bottlenose dolphins {Tursiops truncatus) (Dowling & Brown
Widespread Transient Hoogsteen Base-Pairs in Canonical Duplex DNA with Variable Energetics
Alvey, Heidi S.; Gottardo, Federico L.; Nikolova, Evgenia N.; Al-Hashimi, Hashim M.
2015-01-01
Hoogsteen base-pairing involves a 180 degree rotation of the purine base relative to Watson-Crick base-pairing within DNA duplexes, creating alternative DNA conformations that can play roles in recognition, damage induction, and replication. Here, using Nuclear Magnetic Resonance R1ρ relaxation dispersion, we show that transient Hoogsteen base-pairs occur across more diverse sequence and positional contexts than previously anticipated. We observe sequence-specific variations in Hoogsteen base-pair energetic stabilities that are comparable to variations in Watson-Crick base-pair stability, with Hoogsteen base-pairs being more abundant for energetically less favorable Watson-Crick base-pairs. Our results suggest that the variations in Hoogsteen stabilities and rates of formation are dominated by variations in Watson-Crick base pair stability, suggesting a late transition state for the Watson-Crick to Hoogsteen conformational switch. The occurrence of sequence and position-dependent Hoogsteen base-pairs provide a new potential mechanism for achieving sequence-dependent DNA transactions. PMID:25185517
CNV-seq, a new method to detect copy number variation using high-throughput sequencing.
Xie, Chao; Tammi, Martti T
2009-03-06
DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.
Molecular mechanisms of epigenetic variation in plants.
Fujimoto, Ryo; Sasaki, Taku; Ishikawa, Ryo; Osabe, Kenji; Kawanabe, Takahiro; Dennis, Elizabeth S
2012-01-01
Natural variation is defined as the phenotypic variation caused by spontaneous mutations. In general, mutations are associated with changes of nucleotide sequence, and many mutations in genes that can cause changes in plant development have been identified. Epigenetic change, which does not involve alteration to the nucleotide sequence, can also cause changes in gene activity by changing the structure of chromatin through DNA methylation or histone modifications. Now there is evidence based on induced or spontaneous mutants that epigenetic changes can cause altering plant phenotypes. Epigenetic changes have occurred frequently in plants, and some are heritable or metastable causing variation in epigenetic status within or between species. Therefore, heritable epigenetic variation as well as genetic variation has the potential to drive natural variation.
The study of human Y chromosome variation through ancient DNA.
Kivisild, Toomas
2017-05-01
High throughput sequencing methods have completely transformed the study of human Y chromosome variation by offering a genome-scale view on genetic variation retrieved from ancient human remains in context of a growing number of high coverage whole Y chromosome sequence data from living populations from across the world. The ancient Y chromosome sequences are providing us the first exciting glimpses into the past variation of male-specific compartment of the genome and the opportunity to evaluate models based on previously made inferences from patterns of genetic variation in living populations. Analyses of the ancient Y chromosome sequences are challenging not only because of issues generally related to ancient DNA work, such as DNA damage-induced mutations and low content of endogenous DNA in most human remains, but also because of specific properties of the Y chromosome, such as its highly repetitive nature and high homology with the X chromosome. Shotgun sequencing of uniquely mapping regions of the Y chromosomes to sufficiently high coverage is still challenging and costly in poorly preserved samples. To increase the coverage of specific target SNPs capture-based methods have been developed and used in recent years to generate Y chromosome sequence data from hundreds of prehistoric skeletal remains. Besides the prospects of testing directly as how much genetic change in a given time period has accompanied changes in material culture the sequencing of ancient Y chromosomes allows us also to better understand the rate at which mutations accumulate and get fixed over time. This review considers genome-scale evidence on ancient Y chromosome diversity that has recently started to accumulate in geographic areas favourable to DNA preservation. More specifically the review focuses on examples of regional continuity and change of the Y chromosome haplogroups in North Eurasia and in the New World.
Detection of nucleic acid sequences by invader-directed cleavage
Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert
1999-01-01
The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.
Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N
2013-06-03
Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
2013-01-01
Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509
Bhattacharjee, Bornali; Sengupta, Sharmila
2006-02-01
We evaluated the status of the HPV16 E2 gene (disrupted or intact), nucleotide sequence alterations within intact E2 genes and LCR of HPV16 isolates in a group of CaCx cases (invasive squamous cell carcinomas, n = 81) and population controls (normal cervical scrapes, n = 27) from Indian women. E2 disruption was detected by amplifying the entire E2 gene with single set of primers, while overlapping primers were used to determine if any particular region got selectively disrupted. Nucleotide variations in E2 and LCR were analyzed by PCR amplification followed by bi-directional sequencing. The associations between the viral factors and CaCx were analyzed using Fisher's Exact or Chi-squared test and interpreted as OR (95% CI) and P values. E2 disruption was significantly higher among the cases [3.38 (1.07-10.72); P = 0.02], which was maximum in the region between nucleotides 3650 and 3872 (DNA-binding region). The European (E) variant was found to be the prevalent subgroup (87.76% among cases and 96.30% among the controls), and the remaining samples were Asian-American variants. Among the E subgroup, variation at position 7450 (T > C) within the E2-binding site-IV was found to be significantly higher among the E2 undisrupted cases (21/37; 56.76%), compared to controls (5/18; 27.78%) [3.41 (1.01-11.55); P = 0.03]. Besides HPV16 E2 disruption, LCR 7450T > C variation within undisrupted E2 of E subgroup appears to be a major factor contributing to the risk of CaCx development in Indian women. Furthermore, polymorphisms in the E2 gene of HPV16 may not be significant for disease risk.
Identification of structural variation in mouse genomes.
Keane, Thomas M; Wong, Kim; Adams, David J; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz
2014-01-01
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Giannelli, Marco; Diciotti, Stefano; Tessa, Carlo; Mascalchi, Mario
2010-01-01
Although in EPI-fMRI analyses typical acquisition parameters (TR, TE, matrix, slice thickness, etc.) are generally employed, various readout bandwidth (BW) values are used as a function of gradients characteristics of the MR scanner. Echo spacing (ES) is another fundamental parameter of EPI-fMRI acquisition sequences but the employed ES value is not usually reported in fMRI studies. In the present work, the authors investigated the effect of ES and BW on basic performances of EPI-fMRI sequences in terms of temporal stability and overall image quality of time series acquisition. EPI-fMRI acquisitions of the same water phantom were performed using two clinical MR scanner systems (scanners A and B) with different gradient characteristics and functional designs of radiofrequency coils. For both scanners, the employed ES values ranged from 0.75 to 1.33 ms. The used BW values ranged from 125.0 to 250.0 kHz/64pixels and from 78.1 to 185.2 kHz/64pixels for scanners A and B, respectively. The temporal stability of EPI-fMRI sequence was assessed measuring the signal-to-fluctuation noise ratio (SFNR) and signal drift (DR), while the overall image quality was assessed evaluating the signal-to-noise ratio (SNR(ts)) and nonuniformity (NU(ts)) of the time series acquisition. For both scanners, no significant effect of ES and BW on signal drift was revealed. The SFNR, NU(ts) and SNR(ts) values of scanner A did not significantly vary with ES. On the other hand, the SFNR, NU(ts), and SNR(ts) values of scanner B significantly varied with ES. SFNR (5.8%) and SNR(ts) (5.9%) increased with increasing ES. SFNR (25% scanner A, 32% scanner B) and SNR(ts) (26.2% scanner A, 30.1% scanner B) values of both scanners significantly decreased with increasing BW. NU(ts) values of scanners A and B were less than 3% for all BW and ES values. Nonetheless, scanner A was characterized by a significant upward trend (3% percentage of variation) of time series nonuniformity with increasing BW while NU(ts) of scanner B significantly increased (19% percentage of variation) with increasing ES. Temporal stability (SFNR and DR) and overall image quality (NU(ts) and SNR(ts)) of EPI-fMRI time series can significantly vary with echo spacing and readout bandwidth. The specific pattern of variation may depend on the performance of each single MR scanner system in terms of gradients characteristics, EPI sequence calibrations (eddy currents, shimming, etc.), and functional design of radiofrequency coil. Our results indicate that the employment of low BW improves not only the signal-to-noise ratio of EPI-fMRI time series but also the temporal stability of functional acquisitions. The use of minimum ES values is not entirely advantageous when the MR scanner system is characterized by gradients with low performances and suboptimal EPI sequence calibration. Since differences in basic performances of MR scanner system are potential source of variability for fMRI activation, phantom measurements of SFNR, DR, NU(ts), and SNR(ts) can be executed before subjects acquisitions to monitor the stability of MR scanner performances in clinical group comparison and longitudinal studies.
Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse; Hansen, Thomas Arn; Kjartansdóttir, Kristín Rós; Guldberg Frøslev, Tobias; Snogdal Boutrup, Torsten; Nielsen, Lars Peter; Willerslev, Eske; Hansen, Anders J.
2015-01-01
From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses. PMID:26493184
Demidov, German; Simakova, Tamara; Vnuchkova, Julia; Bragin, Anton
2016-10-22
Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool. We have developed a machine learning algorithm for the detection of large duplications and deletions in the targeted sequencing data generated with PCR-based enrichment step. We have performed verification studies and established the algorithm's sensitivity and specificity. We have compared developed tool with other available methods applicable for the described data and revealed its higher performance. We showed that our method has high specificity and sensitivity for high-resolution copy number detection in targeted sequencing data using large cohort of samples.
Extent, Causes, and Consequences of Small RNA Expression Variation in Human Adipose Tissue
Knights, Andrew J.; Abreu-Goodger, Cei; van de Bunt, Martijn; Guerra-Assunção, José Afonso; Bartonicek, Nenad; van Dongen, Stijn; Mägi, Reedik; Nisbet, James; Barrett, Amy; Rantalainen, Mattias; Nica, Alexandra C.; Quail, Michael A.; Small, Kerrin S.; Glass, Daniel; Enright, Anton J.; Winn, John; Deloukas, Panos; Dermitzakis, Emmanouil T.; McCarthy, Mark I.; Spector, Timothy D.; Durbin, Richard; Lindgren, Cecilia M.
2012-01-01
Small RNAs are functional molecules that modulate mRNA transcripts and have been implicated in the aetiology of several common diseases. However, little is known about the extent of their variability within the human population. Here, we characterise the extent, causes, and effects of naturally occurring variation in expression and sequence of small RNAs from adipose tissue in relation to genotype, gene expression, and metabolic traits in the MuTHER reference cohort. We profiled the expression of 15 to 30 base pair RNA molecules in subcutaneous adipose tissue from 131 individuals using high-throughput sequencing, and quantified levels of 591 microRNAs and small nucleolar RNAs. We identified three genetic variants and three RNA editing events. Highly expressed small RNAs are more conserved within mammals than average, as are those with highly variable expression. We identified 14 genetic loci significantly associated with nearby small RNA expression levels, seven of which also regulate an mRNA transcript level in the same region. In addition, these loci are enriched for variants significant in genome-wide association studies for body mass index. Contrary to expectation, we found no evidence for negative correlation between expression level of a microRNA and its target mRNAs. Trunk fat mass, body mass index, and fasting insulin were associated with more than twenty small RNA expression levels each, while fasting glucose had no significant associations. This study highlights the similar genetic complexity and shared genetic control of small RNA and mRNA transcripts, and gives a quantitative picture of small RNA expression variation in the human population. PMID:22589741
Zhu, X Q; Chilton, N B; Gasser, R B
1998-05-01
This study evaluated the use of a commercially available DNA intercalating agent (Resolver Gold) in agarose gels for the direct detection of sequence variation in ribosomal DNA (rDNA). This agent binds preferentially to AT sequence motifs in DNA. Regions of nuclear rDNA, known to provide genetic markers for the identification of species of parasitic ascarid nematodes (order Ascaridida), were amplified by polymerase chain reaction (PCR) and subjected to electrophoresis in standard agarose gels versus gels supplemented with Resolver Gold. Individual taxa examined could not be distinguished reliably based on the size of their amplicons in standard agarose gels, whereas they could be readily delineated based on mobility using Resolver Gold-supplemented gels. The latter was achieved because of differences (approximately 0.1-8.2%) in the AT content of the fragments among different taxa, which were associated with significant interspecific differences (approximately 11-39%) in the rDNA sequences employed. There was a tendency for fragments with higher AT content to migrate slower in supplemented agarose gels compared with those of lower AT content. The results indicate the usefulness of this electrophoretic approach to rapidly screen for sequence variability within or among PCR-amplified rDNA fragments of similar sizes but differing AT contents. Although evaluated on rDNA of parasites, the approach has potential to be applied to a range of genes of different groups of infectious organisms.
Wang, Peikun; Lin, Lulu; Li, Haijuan; Yang, Yongli; Huang, Teng; Wei, Ping
2018-02-01
ALV-J has caused the most serious losses to the poultry industry in China. The gp85-coding sequence of ALV-J is known to be prone to mutation, but any association between the gp85 gene and breed of chicken remains unclear. A comprehensive and systematic study of the evolutionary process of ALV-J in China is needed. In this study, we compared and analyzed gp85 gene sequences from 198 ALV-J isolates, originating from China, USA, UK and France during 1989-2016. These were sorted into five clusters. Cluster 1, 2, 3, 4 and 5 included isolates from chicken types of different genetic backgrounds, e.g. white-feather broiler, Guangxi indigenous chicken breeds, Yellow chickens and layer chickens respectively. A correlation comparison of amino acid sequence similarities in the gp85 protein among the five clusters showed significant differences (P < 0.01) with the exception being when the third and fifth cluster were compared (P > 0.05). Results of entropy analysis of the gp85 sequences revealed that cluster 3 had the largest variation and cluster 1 had the least variation. The N-glycosylation sites in the majority of isolates numbered 14, 16, 17, 16 and 16, respectively, with regards to clusters 1-5. In addition, 5 isolates from cluster 3 had one more glycosylation site than the other isolates from cluster 3. Our study provides evidence that there were five extremely different ALV-J clusters during 1989-2016 and that the gp85 genes isolated from indigenous chicken breed isolates had the largest variation.
Bosch, Jason; Noubiap, Jean Jacques N; Dandara, Collet; Makubalo, Nomlindo; Wright, Galen; Entfellner, Jean-Baka Domelevo; Tiffin, Nicki; Wonkam, Ambroise
2014-11-01
Mutations in the GJB2 gene, encoding connexin 26, could account for 50% of congenital, nonsyndromic, recessive deafness cases in some Caucasian/Asian populations. There is a scarcity of published data in sub-Saharan Africans. We Sanger sequenced the coding region of the GJB2 gene in 205 Cameroonian and Xhosa South Africans with congenital, nonsyndromic deafness; and performed bioinformatic analysis of variations in the GJB2 gene, incorporating data from the 1000 Genomes Project. Amongst Cameroonian patients, 26.1% were familial. The majority of patients (70%) suffered from sensorineural hearing loss. Ten GJB2 genetic variants were detected by sequencing. A previously reported pathogenic mutation, g.3741_3743delTTC (p.F142del), and a putative pathogenic mutation, g.3816G>A (p.V167M), were identified in single heterozygous samples. Amongst eight the remaining variants, two novel variants, g.3318-41G>A and g.3332G>A, were reported. There were no statistically significant differences in allele frequencies between cases and controls. Principal Components Analyses differentiated between Africans, Asians, and Europeans, but only explained 40% of the variation. The present study is the first to compare African GJB2 sequences with the data from the 1000 Genomes Project and have revealed the low variation between population groups. This finding has emphasized the hypothesis that the prevalence of mutations in GJB2 in nonsyndromic deafness amongst European and Asian populations is due to founder effects arising after these individuals migrated out of Africa, and not to a putative "protective" variant in the genomic structure of GJB2 in Africans. Our results confirm that mutations in GJB2 are not associated with nonsyndromic deafness in Africans.
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
Van Nostrand, Joy D.; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong
2017-01-01
Illumina’s MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1–3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility. PMID:28453559
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wen, Chongqing; Wu, Liyou; Qin, Yujia
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform
Wen, Chongqing; Wu, Liyou; Qin, Yujia; ...
2017-04-28
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less
Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform.
Wen, Chongqing; Wu, Liyou; Qin, Yujia; Van Nostrand, Joy D; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong
2017-01-01
Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.
Significance of Pharmacogenetics and Pharmacogenomics Research in Current Medical Practice.
Prakash, Swayam; Agrawal, Suraksha
2016-01-01
Human genome sequencing highlights the involvement of genetic variation towards differential risk of human diseases, presence of different phenotypes, and response to pharmacological elements. This brings the field of personalized medicine to forefront in the era of modern health care. Numerous recent approaches have shown that how variation in the genome at single nucleotide level can be used in pharmacological research. The two broad aspects that deal with pharmacological research are pharmacogenetics and pharmacogenomics. This review encompasses how these variations have created the basis of pharmacogenetics and pharmacogenomics research and important milestones accomplished in these two fields in different diseases. It further discusses at length their importance in disease diagnosis, response of drugs, and various treatment modalities on the basis of genetic determinants.
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...
A high-resolution cattle CNV map by population-scale genome sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. Prior studies in cattle have produced low-resolution CNV maps. We constructed a draft, high-resolution map of cattle CNVs based on whole genome sequencing data from 7...
Maize HapMap2 identifies extant variation from a genome in flux
USDA-ARS?s Scientific Manuscript database
The maize genome is the largest, most diverse and complex plant genome sequenced to date. Using high-throughput sequencing to access genetic variation and a population genetics model to score the polymorphisms, we characterize and unite the diversity of the world’s key breeding germplasm, wild rela...
RSAT 2015: Regulatory Sequence Analysis Tools.
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-07-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Benz, Matthias R; Bongartz, Georg; Froehlich, Johannes M; Winkel, David; Boll, Daniel T; Heye, Tobias
2018-07-01
The aim was to investigate the variation of the arterial input function (AIF) within and between various DCE MRI sequences. A dynamic flow-phantom and steady signal reference were scanned on a 3T MRI using fast low angle shot (FLASH) 2d, FLASH3d (parallel imaging factor (P) = P0, P2, P4), volumetric interpolated breath-hold examination (VIBE) (P = P0, P3, P2 × 2, P2 × 3, P3 × 2), golden-angle radial sparse parallel imaging (GRASP), and time-resolved imaging with stochastic trajectories (TWIST). Signal over time curves were normalized and quantitatively analyzed by full width half maximum (FWHM) measurements to assess variation within and between sequences. The coefficient of variation (CV) for the steady signal reference ranged from 0.07-0.8%. The non-accelerated gradient echo FLASH2d, FLASH3d, and VIBE sequences showed low within sequence variation with 2.1%, 1.0%, and 1.6%. The maximum FWHM CV was 3.2% for parallel imaging acceleration (VIBE P2 × 3), 2.7% for GRASP and 9.1% for TWIST. The FWHM CV between sequences ranged from 8.5-14.4% for most non-accelerated/accelerated gradient echo sequences except 6.2% for FLASH3d P0 and 0.3% for FLASH3d P2; GRASP FWHM CV was 9.9% versus 28% for TWIST. MRI acceleration techniques vary in reproducibility and quantification of the AIF. Incomplete coverage of the k-space with TWIST as a representative of view-sharing techniques showed the highest variation within sequences and might be less suited for reproducible quantification of the AIF. Copyright © 2018 Elsevier B.V. All rights reserved.
EEG and ECG changes during simulator operation reflect mental workload and vigilance.
Dussault, Caroline; Jouanin, Jean-Claude; Philippe, Matthieu; Guezennec, Charles-Yannick
2005-04-01
Performing mission tasks in a simulator influences many neurophysiological measures. Quantitative assessments of electroencephalography (EEG) and electrocardiography (ECG) have made it possible to develop indicators of mental workload and to estimate relative physiological responses to cognitive requirements. To evaluate the effects of mental workload without actual physical risk, we studied the cortical and cardiovascular changes that occurred during simulated flight. There were 12 pilots (8 novices and 4 experts) who simulated a flight composed of 10 sequences that induced several different mental workload levels. EEG was recorded at 12 electrode sites during rest and flight sequences; ECG activity was also recorded. Subjective tests were used to evaluate anxiety and vigilance levels. Theta band activity was lower during the two simulated flight rest sequences than during visual and instrument flight sequences at central, parietal, and occipital sites (p < 0.05). On the other hand, rest sequences resulted in higher beta (at the C4 site; p < 0.05) and gamma (at the central, parietal, and occipital sites; p < 0.05) power than active segments. The mean heart rate (HR) was not significantly different during any simulated flight sequence, but HR was lower for expert subjects than for novices. The subjective tests revealed no significant anxiety and high values for vigilance levels before and during flight. The different flight sequences performed on the simulator resulted in electrophysiological changes that expressed variations in mental workload. These results corroborate those found during study of real flights, particularly during sequences requiring the heaviest mental workload.
Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A
2017-04-01
Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.
Castro-Prieto, Aines; Wachter, Bettina; Melzheimer, Joerg; Thalwitzer, Susanne; Hofer, Heribert; Sommer, Simone
2012-01-01
Background Genes under selection provide ecologically important information useful for conservation issues. Major histocompatibility complex (MHC) class I and II genes are essential for the immune defence against pathogens from intracellular (e.g. viruses) and extracellular (e.g. helminths) origins, respectively. Serosurvey studies in Namibian cheetahs (Acinonyx juabuts) revealed higher exposure to viral pathogens in individuals from north-central than east-central regions. Here we examined whether the observed differences in exposure to viruses influence the patterns of genetic variation and differentiation at MHC loci in 88 free-ranging Namibian cheetahs. Methodology/Principal Findings Genetic variation at MHC I and II loci was assessed through single-stranded conformation polymorphism (SSCP) analysis and sequencing. While the overall allelic diversity did not differ, we observed a high genetic differentiation at MHC class I loci between cheetahs from north-central and east-central Namibia. No such differentiation in MHC class II and neutral markers were found. Conclusions/Significance Our results suggest that MHC class I variation mirrors the variation in selection pressure imposed by viruses in free-ranging cheetahs across Namibian farmland. This is of high significance for future management and conservation programs of this species. PMID:23145096
Buhler, Stéphane; Sanchez-Mazas, Alicia
2011-01-01
Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC) genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies. Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model). However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used to explore the genetic history of human populations, and that their analysis allows a more thorough investigation of human MHC molecular evolution. PMID:21408106
Oliveros, R; Cutillas, C; De Rojas, M; Arias, P
2000-12-01
Adult worms of Trichuris ovis and T. globulosa were collected from Ovis aries (sheep) and Capra hircus (goats). T. suis was isolated from Sus scrofa domestica (swine) and T. leporis was isolated from Lepus europaeus (rabbits) in Spain. Genomic DNA was isolated and a ribosomal internal transcribed spacer (ITS2) was amplified and sequenced using polymerase-chain-reaction (PCR) techniques. The ITS2 of T. ovis and T. globulosa was 407 nucleotides in length and had a GC content of about 62%. Furthermore, the ITS2 of T. suis and T. leporis was 534 and 418 nucleotides in length and had a GC content of about 64.8% and 62.4%, respectively. There was evidence of slight variation in the sequence within individuals of all species analyzed, indicating intraindividual variation in the sequence of different copies of the ribosomal DNA. Furthermore, low-level intraspecific variation was detected. Sequence analyses of ITS2 products of T. ovis and T. globulosa demonstrated no sequence difference between them. Nevertheless, differences were detected between the ITS2 sequences of T. suis, T. leporis, and T. ovis, indicating that Trichuris species can reliably be differentiated by their ITS2 sequences and PCR-linked restriction-fragment-length polymorphism (RFLP).
Bhatia, Shipra; Gordon, Christopher T.; Foster, Robert G.; Melin, Lucie; Abadie, Véronique; Baujat, Geneviève; Vazquez, Marie-Paule; Amiel, Jeanne; Lyonnet, Stanislas; van Heyningen, Veronica; Kleinjan, Dirk A.
2015-01-01
Disruption of gene regulation by sequence variation in non-coding regions of the genome is now recognised as a significant cause of human disease and disease susceptibility. Sequence variants in cis-regulatory elements (CREs), the primary determinants of spatio-temporal gene regulation, can alter transcription factor binding sites. While technological advances have led to easy identification of disease-associated CRE variants, robust methods for discerning functional CRE variants from background variation are lacking. Here we describe an efficient dual-colour reporter transgenesis approach in zebrafish, simultaneously allowing detailed in vivo comparison of spatio-temporal differences in regulatory activity between putative CRE variants and assessment of altered transcription factor binding potential of the variant. We validate the method on known disease-associated elements regulating SHH, PAX6 and IRF6 and subsequently characterise novel, ultra-long-range SOX9 enhancers implicated in the craniofacial abnormality Pierre Robin Sequence. The method provides a highly cost-effective, fast and robust approach for simultaneously unravelling in a single assay whether, where and when in embryonic development a disease-associated CRE-variant is affecting its regulatory function. PMID:26030420
Espin‐Garcia, Osvaldo; Craiu, Radu V.
2017-01-01
ABSTRACT We evaluate two‐phase designs to follow‐up findings from genome‐wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation‐maximization‐based inference under a semiparametric maximum likelihood formulation tailored for post‐GWAS inference. A GWAS‐SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT‐SNP‐dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme‐QT strata yields significant power improvements compared to marginal QT‐ or SNP‐based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure. PMID:29239496
Dingley, Stephen D.; Polyak, Erzsebet; Ostrovsky, Julian; Srinivasan, Satish; Lee, Icksoo; Rosenfeld, Amy B.; Tsukikawa, Mai; Xiao, Rui; Selak, Mary A.; Coon, Joshua J.; Hebert, Alexander S.; Grimsrud, Paul A.; Kwon, Young Joon; Pagliarini, David J.; Gai, Xiaowu; Schurr, Theodore G.; Hüttemann, Maik; Nakamaru-Ogiso, Eiko; Falk, Marni J.
2014-01-01
Mitochondrial DNA (mtDNA) sequence variation can influence the penetrance of complex diseases and climatic adaptation. While studies in geographically defined human populations suggest that mtDNA mutations become fixed when they have conferred metabolic capabilities optimally suited for a specific environment, it has been challenging to definitively assign adaptive functions to specific mtDNA sequence variants in mammals. We investigated whether mtDNA genome variation functionally influences Caenorhabditis elegans wild isolates of distinct mtDNA lineages and geographic origins. We found that, relative to N2 (England) wild-type nematodes, CB4856 wild isolates from a warmer native climate (Hawaii) had a unique p.A12S amino acid substitution in the mtDNA-encoded COX1 core catalytic subunit of mitochondrial complex IV (CIV). Relative to N2, CB4856 worms grown at 20 °C had significantly increased CIV enzyme activity, mitochondrial matrix oxidant burden, and sensitivity to oxidative stress but had significantly reduced lifespan and mitochondrial membrane potential. Interestingly, mitochondrial membrane potential was significantly increased in CB4856 grown at its native temperature of 25 °C. A transmitochondrial cybrid worm strain, chpIR (M, CB4856 > N2), was bred as homoplasmic for the CB4856 mtDNA genome in the N2 nuclear background. The cybrid strain also displayed significantly increased CIV activity, demonstrating that this difference results from the mtDNA-encoded p.A12S variant. However, chpIR (M, CB4856 > N2) worms had significantly reduced median and maximal lifespan relative to CB4856, which may relate to their nuclear– mtDNA genome mismatch. Overall, these data suggest that C. elegans wild isolates of varying geographic origins may adapt to environmental challenges through mtDNA variation to modulate critical aspects of mitochondrial energy metabolism. PMID:24534730
A global reference for human genetic variation
2016-01-01
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. PMID:26432245
High levels of variation in Salix lignocellulose genes revealed using poplar genomic resources
2013-01-01
Background Little is known about the levels of variation in lignin or other wood related genes in Salix, a genus that is being increasingly used for biomass and biofuel production. The lignin biosynthesis pathway is well characterized in a number of species, including the model tree Populus. We aimed to transfer the genomic resources already available in Populus to its sister genus Salix to assess levels of variation within genes involved in wood formation. Results Amplification trials for 27 gene regions were undertaken in 40 Salix taxa. Twelve of these regions were sequenced. Alignment searches of the resulting sequences against reference databases, combined with phylogenetic analyses, showed the close similarity of these Salix sequences to Populus, confirming homology of the primer regions and indicating a high level of conservation within the wood formation genes. However, all sequences were found to vary considerably among Salix species, mainly as SNPs with a smaller number of insertions-deletions. Between 25 and 176 SNPs per kbp per gene region (in predicted exons) were discovered within Salix. Conclusions The variation found is sizeable but not unexpected as it is based on interspecific and not intraspecific comparison; it is comparable to interspecific variation in Populus. The characterisation of genetic variation is a key process in pre-breeding and for the conservation and exploitation of genetic resources in Salix. This study characterises the variation in several lignocellulose gene markers for such purposes. PMID:23924375
Castro-Prieto, Aines; Wachter, Bettina; Melzheimer, Joerg; Thalwitzer, Susanne; Sommer, Simone
2011-01-01
The genes of the major histocompatibility complex (MHC) are a key component of the mammalian immune system and have become important molecular markers for fitness-related genetic variation in wildlife populations. Currently, no information about the MHC sequence variation and constitution in African leopards exists. In this study, we isolated and characterized genetic variation at the adaptively most important region of MHC class I and MHC class II-DRB genes in 25 free-ranging African leopards from Namibia and investigated the mechanisms that generate and maintain MHC polymorphism in the species. Using single-stranded conformation polymorphism analysis and direct sequencing, we detected 6 MHC class I and 6 MHC class II-DRB sequences, which likely correspond to at least 3 MHC class I and 3 MHC class II-DRB loci. Amino acid sequence variation in both MHC classes was higher or similar in comparison to other reported felids. We found signatures of positive selection shaping the diversity of MHC class I and MHC class II-DRB loci during the evolutionary history of the species. A comparison of MHC class I and MHC class II-DRB sequences of the leopard to those of other felids revealed a trans-species mode of evolution. In addition, the evolutionary relationships of MHC class II-DRB sequences between African and Asian leopard subspecies are discussed.
MACHADO, HEATHER E.; BERGLAND, ALAN O.; O’BRIEN, KATHERINE R.; BEHRMAN, EMILY L.; SCHMIDT, PAUL S.; PETROV, DMITRI A.
2016-01-01
Examples of clinal variation in phenotypes and genotypes across latitudinal transects have served as important models for understanding how spatially varying selection and demographic forces shape variation within species. Here, we examine the selective and demographic contributions to latitudinal variation through the largest comparative genomic study to date of Drosophila simulans and Drosophila melanogaster, with genomic sequence data from 382 individual fruit flies, collected across a spatial transect of 19 degrees latitude and at multiple time points over 2 years. Consistent with phenotypic studies, we find less clinal variation in D. simulans than D. melanogaster, particularly for the autosomes. Moreover, we find that clinally varying loci in D. simulans are less stable over multiple years than comparable clines in D. melanogaster. D. simulans shows a significantly weaker pattern of isolation by distance than D. melanogaster and we find evidence for a stronger contribution of migration to D. simulans population genetic structure. While population bottlenecks and migration can plausibly explain the differences in stability of clinal variation between the two species, we also observe a significant enrichment of shared clinal genes, suggesting that the selective forces associated with climate are acting on the same genes and phenotypes in D. simulans and D. melanogaster. PMID:26523848
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
2011-01-01
Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108
Fullerton, Stephanie M; Clark, Andrew G; Weiss, Kenneth M; Taylor, Scott L; Stengård, Jari H; Salomaa, Veikko; Boerwinkle, Eric; Nickerson, Deborah A
2002-07-01
A 3.3-kb region, encompassing the APOA2 gene and 2 kb of 5' and 3' flanking DNA, was re-sequenced in a "core" sample of 24 individuals, sampled without regard to the health from each of three populations: African-Americans from Jackson (Miss., USA), Europeans from North Karelia (Finland), and non-Hispanic European-Americans from Rochester, (Minn., USA). Fifteen variable sites were identified (14 SNPs and one multi-allelic microsatellite, all silent), and these sites segregated as 18 sequence haplotypes (or nine, if SNPs only are considered). The haplotype distribution in the core African-American sample was unusual, with a deficit of particular haplotypes compared with those found in the other two samples, and a significantly (P<0.05) low level of nucleotide diversity relative to patterns of polymorphism and divergence at other human loci. Six of the 14 SNPs, whose variation captured the haplotype structure of the core data, were then genotyped by oligonucleotide ligation assay in an additional 2183 individuals from the same three populations (n=843, n=452, and n=888, respectively). All six sites varied in each of the larger "epidemiological" samples, and together, they defined 19 SNP haplotypes, seven with relative frequencies greater than 1% in the total sample; all of these common haplotypes had been identified earlier in the core re-sequencing survey. Here also, the African-American sample showed significantly lower SNP heterozygosity and haplotype diversity than the other two samples. The deficit of polymorphism is consistent with a population-specific non-neutral increase in the relative frequency of several haplotypes in Jackson.
van der Walt, Elizna M; Smuts, Izelle; Taylor, Robert W; Elson, Joanna L; Turnbull, Douglass M; Louw, Roan; van der Westhuizen, Francois H
2012-06-01
Mitochondrial disease can be attributed to both mitochondrial and nuclear gene mutations. It has a heterogeneous clinical and biochemical profile, which is compounded by the diversity of the genetic background. Disease-based epidemiological information has expanded significantly in recent decades, but little information is known that clarifies the aetiology in African patients. The aim of this study was to investigate mitochondrial DNA variation and pathogenic mutations in the muscle of diagnosed paediatric patients from South Africa. A cohort of 71 South African paediatric patients was included and a high-throughput nucleotide sequencing approach was used to sequence full-length muscle mtDNA. The average coverage of the mtDNA genome was 81±26 per position. After assigning haplogroups, it was determined that although the nature of non-haplogroup-defining variants was similar in African and non-African haplogroup patients, the number of substitutions were significantly higher in African patients. We describe previously reported disease-associated and novel variants in this cohort. We observed a general lack of commonly reported syndrome-associated mutations, which supports clinical observations and confirms general observations in African patients when using single mutation screening strategies based on (predominantly non-African) mtDNA disease-based information. It is finally concluded that this first extensive report on muscle mtDNA sequences in African paediatric patients highlights the need for a full-length mtDNA sequencing strategy, which applies to all populations where specific mutations is not present. This, in addition to nuclear DNA gene mutation and pathogenicity evaluations, will be required to better unravel the aetiology of these disorders in African patients.
Differences in a ribosomal DNA sequence of Strongylus species allows identification of single eggs.
Campbell, A J; Gasser, R B; Chilton, N B
1995-03-01
In the current study, molecular techniques were evaluated for the species identification of individual strongyle eggs. Adult worms of Strongylus edentatus, S. equinus and S. vulgaris were collected at necropsy from horses from Australia and the U.S.A. Genomic DNA was isolated and a ribosomal transcribed spacer (ITS-2) amplified and sequenced using polymerase chain reaction (PCR) techniques. The length of the ITS-2 sequence of S. edentatus, S. equinus and S. vulgaris ranged between 217 and 235 nucleotides. Extensive sequence analysis demonstrated a low degree (0-0.9%) of intraspecific variation in the ITS-2 for the Strongylus species examined, whereas the levels of interspecific differences (13-29%) were significantly greater. Interspecific differences in the ITS-2 sequences allowed unequivocal species identification of single worms and eggs using PCR-linked restriction fragment length polymorphism. These results demonstrate the potential of the ribosomal spacers as genetic markers for species identification of single strongyle eggs from horse faeces.
Onofri, Chiara; de Meijer, Etienne P M; Mandolino, Giuseppe
2015-08-01
Sequence variants of THCA- and CBDA-synthases were isolated from different Cannabis sativa L. strains expressing various wild-type and mutant chemical phenotypes (chemotypes). Expressed and complete sequences were obtained from mature inflorescences. Each strain was shown to have a different specificity and/or ability to convert the precursor CBGA into CBDA and/or THCA type products. The comparison of the expressed sequences led to the identification of different mutations, all of them due to SNPs. These SNPs were found to relate to the cannabinoid composition of the inflorescence at maturity and are therefore proposed to have a functional significance. The amount of variation was found to be higher within the CBDAS sequence family than in the THCAS family, suggesting a more recent evolution of THCA-forming enzymes from the CBDAS group. We therefore consider CBDAS as the ancestral type of these synthases. Copyright © 2015 Elsevier Ltd. All rights reserved.
PyEvolve: a toolkit for statistical modelling of molecular evolution.
Butterfield, Andrew; Vedagiri, Vivek; Lang, Edward; Lawrence, Cath; Wakefield, Matthew J; Isaev, Alexander; Huttley, Gavin A
2004-01-05
Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from approximately 10 days to approximately 6 hours. PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software.
The evolution of transcriptional regulation in eukaryotes
NASA Technical Reports Server (NTRS)
Wray, Gregory A.; Hahn, Matthew W.; Abouheif, Ehab; Balhoff, James P.; Pizer, Margaret; Rockman, Matthew V.; Romano, Laura A.
2003-01-01
Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.
Mutations in the C-terminus of CDKL5: proceed with caution
Diebold, Bertrand; Delépine, Chloé; Gataullina, Svetlana; Delahaye, Andrée; Nectoux, Juliette; Bienvenu, Thierry
2014-01-01
Mutations in the cyclin-dependent kinase-like 5 (CDKL5) gene have been described in girls with Rett-like features and early-onset epileptic encephalopathy including infantile spasms. Milder phenotypes have been associated with sequence variations in the 3′-end of the CDKL5 gene. Identification of novel CDKL5 transcripts coding isoforms characterized by an altered C-terminal region strongly questions the eventual pathogenicity of sequence variations located in the 3′-end of the gene. We investigated a group of 30 female patients with a clinically heterogeneous phenotype ranging from nonspecific intellectual disability to a severe neonatal encephalopathy and identified two heterozygous CDKL5 missense mutations, the previously reported p.Val999Met and the novel mutation p.Pro944Thr. However, these mutations have also been detected in their healthy father. Considering our results and all data from the literature, we suggest that genetic variations beyond the codon 938 in human CDKL5115 protein may have minor or no significance. It is probable that screening of exons 19–21 of the CDKL5 gene is not useful in practical molecular diagnosis of atypical Rett syndrome. PMID:23756444
Genetic variation among wild lake trout populations: the 'wanted' and the 'unwanted'
Burnham-Curtis, Mary K.; Kallemeyn, Larry W.; Bronte, Charles R.; Greswell, Robert E.; Dwyer, Pat; Hamre, R.H.
1997-01-01
In this study we examine genetic variation within and among self-sustaining lake trout populations from the Great Lakes basin, the Rainy Lake basin, and Yellowstone Lake. We used RFLP analysis and direct sequencing to examine DNA sequence variation among several mitochondrial and nuclear genes, including highly conserved loci (e.g. cytochrome b, nuclear exon regions) and highly variable loci (e.g. mitochondrial d-loop and nuclear intron regions). Native Lake Superior lake trout populations show high levels of genetic diversity, while populations from the Rainy Lake basin show little or none. The lake trout population sampled from Yellowstone Lake shows moderate genetic diversity, possibly representative of a relatively large source population closely related to lake trout from Lewis Lake, Wyoming. There has been significant social and management controversy involving these lake trout populations, particularly those that are located in National Parks. In the Great Lakes and Rainy Lake basins, the controversy involves the degree to which hatchery supplementation can contribute to or negatively impact self-sustaining populations which are highly desired by recreational and commercial fisheries. In Yellowstone Lake, the lake trout are viewed as an undesirable intruder that may interfere with resident populations of highly prized native cutthroat trout.
Building blocks of a fish head: Developmental and variational modularity in a complex system.
Lehoux, Caroline; Cloutier, Richard
2015-11-01
Evolution of the vertebrate skull is developmentally constrained by the interactions among its anatomical systems, such as the dermatocranium and the sensory system. The interaction between the dermal bones and lateral line canals has been debated for decades but their morphological integration has never been tested. An ontogenetic series of 97 juvenile and adult Amia calva (Actinopterygii) was used to describe the patterning and modularity of sensory lateral line canals and their integration with supporting cranial bones. Developmental modules were tested for the otic canal and supratemporal commissure by computing correlations in the branching sequence of groups of pores. Landmarks were digitized on 25 specimens to test a priori hypotheses of variational and developmental modularity at the level of canals and dermal bones. Branching sequence suggests a specific patterning supported by significant positive correlations in the sequence of appearance of branches between bilateral sides. Differences in patterning between the otic canal and the supratemporal commissure and tests of modularity with geometric morphometrics suggest that both canals form distinct modules. The integration between bones and canals was insufficient to detect a module. However, both components were not independent. Groups of pores tended to disappear without affecting other groups of pores suggesting that they are quasi-independent units acting as modules. This study provides evidence of a hierarchical organization for the modular sensory system that could explain variation of pattern of canals among species and their association with dermal bones. © 2015 Wiley Periodicals, Inc.
Representations of Pitch and Timbre Variation in Human Auditory Cortex
2017-01-01
Pitch and timbre are two primary dimensions of auditory perception, but how they are represented in the human brain remains a matter of contention. Some animal studies of auditory cortical processing have suggested modular processing, with different brain regions preferentially coding for pitch or timbre, whereas other studies have suggested a distributed code for different attributes across the same population of neurons. This study tested whether variations in pitch and timbre elicit activity in distinct regions of the human temporal lobes. Listeners were presented with sequences of sounds that varied in either fundamental frequency (eliciting changes in pitch) or spectral centroid (eliciting changes in brightness, an important attribute of timbre), with the degree of pitch or timbre variation in each sequence parametrically manipulated. The BOLD responses from auditory cortex increased with increasing sequence variance along each perceptual dimension. The spatial extent, region, and laterality of the cortical regions most responsive to variations in pitch or timbre at the univariate level of analysis were largely overlapping. However, patterns of activation in response to pitch or timbre variations were discriminable in most subjects at an individual level using multivoxel pattern analysis, suggesting a distributed coding of the two dimensions bilaterally in human auditory cortex. SIGNIFICANCE STATEMENT Pitch and timbre are two crucial aspects of auditory perception. Pitch governs our perception of musical melodies and harmonies, and conveys both prosodic and (in tone languages) lexical information in speech. Brightness—an aspect of timbre or sound quality—allows us to distinguish different musical instruments and speech sounds. Frequency-mapping studies have revealed tonotopic organization in primary auditory cortex, but the use of pure tones or noise bands has precluded the possibility of dissociating pitch from brightness. Our results suggest a distributed code, with no clear anatomical distinctions between auditory cortical regions responsive to changes in either pitch or timbre, but also reveal a population code that can differentiate between changes in either dimension within the same cortical regions. PMID:28025255
Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.
2015-01-01
Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438
Parallel gene analysis with allele-specific padlock probes and tag microarrays
Banér, Johan; Isaksson, Anders; Waldenström, Erik; Jarvius, Jonas; Landegren, Ulf; Nilsson, Mats
2003-01-01
Parallel, highly specific analysis methods are required to take advantage of the extensive information about DNA sequence variation and of expressed sequences. We present a scalable laboratory technique suitable to analyze numerous target sequences in multiplexed assays. Sets of padlock probes were applied to analyze single nucleotide variation directly in total genomic DNA or cDNA for parallel genotyping or gene expression analysis. All reacted probes were then co-amplified and identified by hybridization to a standard tag oligonucleotide array. The technique was illustrated by analyzing normal and pathogenic variation within the Wilson disease-related ATP7B gene, both at the level of DNA and RNA, using allele-specific padlock probes. PMID:12930977
2014-01-01
Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these isolates, although horizontal transfer may generate class II pilin diversity. This study supports the suggestion that high frequency antigenic variation of pilin is not universal in pathogenic Neisseria. PMID:24690385
Somatic Genetic Variation in Solid Pseudopapillary Tumor of the Pancreas by Whole Exome Sequencing
Guo, Meng; Luo, Guopei; Jin, Kaizhou; Long, Jiang; Cheng, He; Lu, Yu; Wang, Zhengshi; Yang, Chao; Xu, Jin; Ni, Quanxing; Yu, Xianjun; Liu, Chen
2017-01-01
Solid pseudopapillary tumor of the pancreas (SPT) is a rare pancreatic disease with a unique clinical manifestation. Although CTNNB1 gene mutations had been universally reported, genetic variation profiles of SPT are largely unidentified. We conducted whole exome sequencing in nine SPT patients to probe the SPT-specific insertions and deletions (indels) and single nucleotide polymorphisms (SNPs). In total, 54 SNPs and 41 indels of prominent variations were demonstrated through parallel exome sequencing. We detected that CTNNB1 mutations presented throughout all patients studied (100%), and a higher count of SNPs was particularly detected in patients with older age, larger tumor, and metastatic disease. By aggregating 95 detected variation events and viewing the interconnections among each of the genes with variations, CTNNB1 was identified as the core portion in the network, which might collaborate with other events such as variations of USP9X, EP400, HTT, MED12, and PKD1 to regulate tumorigenesis. Pathway analysis showed that the events involved in other cancers had the potential to influence the progression of the SNPs count. Our study revealed an insight into the variation of the gene encoding region underlying solid-pseudopapillary neoplasm tumorigenesis. The detection of these variations might partly reflect the potential molecular mechanism. PMID:28054945
Carvalho, Alexandra T P; Gouveia, Leonor; Kanna, Charan Raju; Wärmländer, Sebastian K T S; Platts, Jamie A; Kamerlin, Shina Caroline Lynn
2014-01-01
We report a series of molecular dynamics (MD) simulations of up to a microsecond combined simulation time designed to probe epigenetically modified DNA sequences. More specifically, by monitoring the effects of methylation and hydroxymethylation of cytosine in different DNA sequences, we show, for the first time, that DNA epigenetic modifications change the molecule's dynamical landscape, increasing the propensity of DNA toward different values of twist and/or roll/tilt angles (in relation to the unmodified DNA) at the modification sites. Moreover, both the extent and position of different modifications have significant effects on the amount of structural variation observed. We propose that these conformational differences, which are dependent on the sequence environment, can provide specificity for protein binding. PMID:25625845
A filtering method to generate high quality short reads using illumina paired-end technology.
Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L
2013-01-01
Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.
Analysis on the DNA Fingerprinting of Aspergillus Oryzae Mutant Induced by High Hydrostatic Pressure
NASA Astrophysics Data System (ADS)
Wang, Hua; Zhang, Jian; Yang, Fan; Wang, Kai; Shen, Si-Le; Liu, Bing-Bing; Zou, Bo; Zou, Guang-Tian
2011-01-01
The mutant strains of aspergillus oryzae (HP300a) are screened under 300 MPa for 20 min. Compared with the control strains, the screened mutant strains have unique properties such as genetic stability, rapid growth, lots of spores, and high protease activity. Random amplified polymorphic DNA (RAPD) and inter simple sequence repeats (ISSR) are used to analyze the DNA fingerprinting of HP300a and the control strains. There are 67.9% and 51.3% polymorphic bands obtained by these two markers, respectively, indicating significant genetic variations between HP300a and the control strains. In addition, comparison of HP300a and the control strains, the genetic distances of random sequence and simple sequence repeat of DNA are 0.51 and 0.34, respectively.
Two sampling methods yield distinct microbial signatures in the nasopharynges of asthmatic children.
Pérez-Losada, Marcos; Crandall, Keith A; Freishtat, Robert J
2016-06-16
The nasopharynx is a reservoir for pathogens associated with respiratory illnesses, such as asthma. Next-generation sequencing (NGS) has been used to characterize the nasopharyngeal microbiome during health and disease. Most studies so far have surveyed the nasopharynx as a whole; however, less is known about spatial variation (biogeography) in nasal microenvironments and how sampling techniques may capture that microbial diversity. We used targeted 16S rRNA MiSeq sequencing and two different sampling strategies [nasal washes (NW) and nasal brushes (NB)] to characterize the nasopharyngeal microbiota in 30 asthmatic children. Nasal brushing is more abrasive than nasal washing and targeted the inner portion of the inferior turbinate. This region is expected to be different from other nasal microenvironments. Nasal washing is not spatially specific. Our 30 × 2 nasal microbiomes generated 1,474,497 sequences, from which we identified an average of 157 and 186 OTUs per sample in the NW and NB groups, respectively. Microbiotas from NB showed significantly higher alpha-diversity than microbiotas from NW. Similarly, both nasal microbiotas were distinct from each other (PCoA) and significantly differed in their community composition and abundance in at least 9 genera (effective size ≥1 %). Nasopharyngeal microenvironments in asthmatic children contain microbiotas with different diversity and structure. Nasal washes and brushes capture that diversity differently. Future microbial studies of the nasopharynx need to be aware of potential spatial variation (biogeography).
Accuracy of abdominal auscultation for bowel obstruction.
Breum, Birger Michael; Rud, Bo; Kirkegaard, Thomas; Nordentoft, Tyge
2015-09-14
To investigate the accuracy and inter-observer variation of bowel sound assessment in patients with clinically suspected bowel obstruction. Bowel sounds were recorded in patients with suspected bowel obstruction using a Littmann(®) Electronic Stethoscope. The recordings were processed to yield 25-s sound sequences in random order on PCs. Observers, recruited from doctors within the department, classified the sound sequences as either normal or pathological. The reference tests for bowel obstruction were intraoperative and endoscopic findings and clinical follow up. Sensitivity and specificity were calculated for each observer and compared between junior and senior doctors. Interobserver variation was measured using the Kappa statistic. Bowel sound sequences from 98 patients were assessed by 53 (33 junior and 20 senior) doctors. Laparotomy was performed in 47 patients, 35 of whom had bowel obstruction. Two patients underwent colorectal stenting due to large bowel obstruction. The median sensitivity and specificity was 0.42 (range: 0.19-0.64) and 0.78 (range: 0.35-0.98), respectively. There was no significant difference in accuracy between junior and senior doctors. The median frequency with which doctors classified bowel sounds as abnormal did not differ significantly between patients with and without bowel obstruction (26% vs 23%, P = 0.08). The 53 doctors made up 1378 unique pairs and the median Kappa value was 0.29 (range: -0.15-0.66). Accuracy and inter-observer agreement was generally low. Clinical decisions in patients with possible bowel obstruction should not be based on auscultatory assessment of bowel sounds.
Thermal and acid tolerant beta-xylosidases, genes encoding, related organisms, and methods
Thompson, David N [Idaho Falls, ID; Thompson, Vicki S [Idaho Falls, ID; Schaller, Kastli D [Ammon, ID; Apel, William A [Jackson, WY; Lacey, Jeffrey A [Idaho Falls, ID; Reed, David W [Idaho Falls, ID
2011-04-12
Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof are provided. Further provided are methods of at least partially degrading xylotriose and/or xylobiose using isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof.
USDA-ARS?s Scientific Manuscript database
Little is known about genetic variation of Lymantria dispar multiple nucleopolyhedrovirus (LdMNPV; Baculoviridae: Alphabaculovirus) at the nucleotide sequence level. To obtain a more comprehensive view of genetic diversity among isolates of LdMNPV, partial sequences of the lef-8 gene were generated...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, Sean
2013-03-01
Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.
Sequence variation of the feline immunodeficiency virus genome and its clinical relevance.
Stickney, A L; Dunowska, M; Cave, N J
2013-06-08
The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.
2012-01-01
Background The cuticle is an important adaptive structure whose origin played a crucial role in the transition of plants from aqueous to terrestrial conditions. HvABCG31/Eibi1 is an ABCG transporter gene, involved in cuticle formation that was recently identified in wild barley (Hordeum vulgare ssp. spontaneum). To study the genetic variation of HvABCG31 in different habitats, its 2 kb promoter region was sequenced from 112 wild barley accessions collected from five natural populations from southern and northern Israel. The sites included three mesic and two xeric habitats, and differed in annual rainfall, soil type, and soil water capacity. Results Phylogenetic analysis of the aligned HvABCG31 promoter sequences clustered the majority of accessions (69 out of 71) from the three northern mesic populations into one cluster, while all 21 accessions from the Dead Sea area, a xeric southern population, and two isolated accessions (one from a xeric population at Mitzpe Ramon and one from the xeric ‘African Slope’ of “Evolution Canyon”) formed the second cluster. The southern arid populations included six haplotypes, but they differed from the consensus sequence at a large number of positions, while the northern mesic populations included 15 haplotypes that were, on average, more similar to the consensus sequence. Most of the haplotypes (20 of 22) were unique to a population. Interestingly, higher genetic variation occurred within populations (54.2%) than among populations (45.8%). Analysis of the promoter region detected a large number of transcription factor binding sites: 121–128 and 121–134 sites in the two southern arid populations, and 123–128,125–128, and 123–125 sites in the three northern mesic populations. Three types of TFBSs were significantly enriched: those related to GA (gibberellin), Dof (DNA binding with one finger), and light. Conclusions Drought stress and adaptive natural selection may have been important determinants in the observed sequence variation of HvABCG31 promoter. Abiotic stresses may be involved in the HvABCG31 gene transcription regulations, generating more protective cuticles in plants under stresses. PMID:23006777
Gene copy number variation and its significance in cyanobacterial phylogeny
2012-01-01
Background In eukaryotes, variation in gene copy numbers is often associated with deleterious effects, but may also have positive effects. For prokaryotes, studies on gene copy number variation are rare. Previous studies have suggested that high numbers of rRNA gene copies can be advantageous in environments with changing resource availability, but further association of gene copies and phenotypic traits are not documented. We used one of the morphologically most diverse prokaryotic phyla to test whether numbers of gene copies are associated with levels of cell differentiation. Results We implemented a search algorithm that identified 44 genes with highly conserved copies across 22 fully sequenced cyanobacterial taxa. For two very basal cyanobacterial species, Gloeobacter violaceus and a thermophilic Synechococcus species, distinct phylogenetic positions previously found were supported by identical protein coding gene copy numbers. Furthermore, we found that increased ribosomal gene copy numbers showed a strong correlation to cyanobacteria capable of terminal cell differentiation. Additionally, we detected extremely low variation of 16S rRNA sequence copies within the cyanobacteria. We compared our results for 16S rRNA to three other eubacterial phyla (Chroroflexi, Spirochaetes and Bacteroidetes). Based on Bayesian phylogenetic inference and the comparisons of genetic distances, we could confirm that cyanobacterial 16S rRNA paralogs and orthologs show significantly stronger conservation than found in other eubacterial phyla. Conclusions A higher number of ribosomal operons could potentially provide an advantage to terminally differentiated cyanobacteria. Furthermore, we suggest that 16S rRNA gene copies in cyanobacteria are homogenized by both concerted evolution and purifying selection. In addition, the small ribosomal subunit in cyanobacteria appears to evolve at extraordinary slow evolutionary rates, an observation that has been made previously for morphological characteristics of cyanobacteria. PMID:22894826
Dual Transcriptomic Profiling of Host and Microbiota during Health and Disease in Pediatric Asthma.
Pérez-Losada, Marcos; Castro-Nallar, Eduardo; Bendall, Matthew L; Freishtat, Robert J; Crandall, Keith A
2015-01-01
High-throughput sequencing (HTS) analysis of microbial communities from the respiratory airways has heavily relied on the 16S rRNA gene. Given the intrinsic limitations of this approach, airway microbiome research has focused on assessing bacterial composition during health and disease, and its variation in relation to clinical and environmental factors, or other microbiomes. Consequently, very little effort has been dedicated to describing the functional characteristics of the airway microbiota and even less to explore the microbe-host interactions. Here we present a simultaneous assessment of microbiome and host functional diversity and host-microbe interactions from the same RNA-seq experiment, while accounting for variation in clinical metadata. Transcriptomic (host) and metatranscriptomic (microbiota) sequences from the nasal epithelium of 8 asthmatics and 6 healthy controls were separated in silico and mapped to available human and NCBI-NR protein reference databases. Human genes differentially expressed in asthmatics and controls were then used to infer upstream regulators involved in immune and inflammatory responses. Concomitantly, microbial genes were mapped to metabolic databases (COG, SEED, and KEGG) to infer microbial functions differentially expressed in asthmatics and controls. Finally, multivariate analysis was applied to find associations between microbiome characteristics and host upstream regulators while accounting for clinical variation. Our study showed significant differences in the metabolism of microbiomes from asthmatic and non-asthmatic children for up to 25% of the functional properties tested. Enrichment analysis of 499 differentially expressed host genes for inflammatory and immune responses revealed 43 upstream regulators differentially activated in asthma. Microbial adhesion (virulence) and Proteobacteria abundance were significantly associated with variation in the expression of the upstream regulator IL1A; suggesting that microbiome characteristics modulate host inflammatory and immune systems during asthma.
Selection of a DNA barcode for Nectriaceae from fungal whole-genomes.
Zeng, Zhaoqing; Zhao, Peng; Luo, Jing; Zhuang, Wenying; Yu, Zhihe
2012-01-01
A DNA barcode is a short segment of sequence that is able to distinguish species. A barcode must ideally contain enough variation to distinguish every individual species and be easily obtained. Fungi of Nectriaceae are economically important and show high species diversity. To establish a standard DNA barcode for this group of fungi, the genomes of Neurospora crassa and 30 other filamentous fungi were compared. The expect value was treated as a criterion to recognize homologous sequences. Four candidate markers, Hsp90, AAC, CDC48, and EF3, were tested for their feasibility as barcodes in the identification of 34 well-established species belonging to 13 genera of Nectriaceae. Two hundred and fifteen sequences were analyzed. Intra- and inter-specific variations and the success rate of PCR amplification and sequencing were considered as important criteria for estimation of the candidate markers. Ultimately, the partial EF3 gene met the requirements for a good DNA barcode: No overlap was found between the intra- and inter-specific pairwise distances. The smallest inter-specific distance of EF3 gene was 3.19%, while the largest intra-specific distance was 1.79%. In addition, there was a high success rate in PCR and sequencing for this gene (96.3%). CDC48 showed sufficiently high sequence variation among species, but the PCR and sequencing success rate was 84% using a single pair of primers. Although the Hsp90 and AAC genes had higher PCR and sequencing success rates (96.3% and 97.5%, respectively), overlapping occurred between the intra- and inter-specific variations, which could lead to misidentification. Therefore, we propose the EF3 gene as a possible DNA barcode for the nectriaceous fungi.
Jones, J.W.; Neves, R.J.; Ahlstedt, S.A.; Hallerman, E.M.
2006-01-01
Species in the genus Epioblasma have specialized life history requirements and represent the most endangered genus of freshwater mussels (Unionidae) in the world. A genetic characterization of extant populations of the oyster mussel E. capsaeformis and tan riffleshell E. florentina walkeri sensu late was conducted to assess taxonomic validity and to resolve conservation issues for recovery planning. These mussel species exhibit pronounced phenotypic variation, but were difficult to characterize phylogenetically using DNA sequences. Monophyletic lineages, congruent with phenotypic variation among species, were obtained only after extensive analysis of combined mitochondrial (1396 bp of 16S, cytochrome-b, and ND1) and nuclear (515 bp of ITS-1) DNA sequences. In contrast, analysis of variation at 10 hypervariable DNA microsatellite loci showed moderately to highly diverged populations based on FST and R ST values, which ranged from 0.12 to 0.39 and 0.15 to 0.71, respectively. Quantitative variation between species was observed in fish-host specificity, with transformation success of glochidia of E. capsaeformis significantly greater (P<0.05) on greenside darter Etheostoma blennioides, and that of E. f. walkeri significantly greater (P<0.05) on fantail darter Etheostoma flabellare. Lengths of glochidia differed significantly (P<0.001) among species and populations, with mean sizes ranging from 241 to 272 ??m. The texture and colour of the mantle-pad of E. capsaeformis sensu stricto is smooth and bluish-white, whereas that of E. f. walkeri is pustuled and brown, with tan mottling. Based on extensive molecular, morphological and life history data, the population of E. capsaeformis from the Duck River, Tennessee, USA is proposed as a separate species, and the population of E. f. walkeri from Indian Creek, upper Clinch River, Virginia, USA is proposed as a distinct subspecies.
Shi, Liyu; Weng, Jianfeng; Liu, Changlin; Song, Xinyuan; Miao, Hongqin; Hao, Zhuanfang; Xie, Chuanxiao; Li, Mingshun; Zhang, Degui; Bai, Li; Pan, Guangtang; Li, Xinhai; Zhang, Shihuang
2013-04-01
Maize rough dwarf disease (MRDD, a viral disease) results in significant grain yield losses, while genetic basis of which is largely unknown. Based on comparative genomics, eukaryotic translation initiation factor 4E (eIF4E) was considered as a candidate gene for MRDD resistance, validation of which will help to understand the possible genetic mechanism of this disease. ZmeIF4E (orthologs of eIF4E gene in maize) encodes a protein of 218 amino acids, harboring five exons and no variation in the cDNA sequence is identified between the resistant inbred line, X178 and susceptible one, Ye478. ZmeIF4E expression was different in the two lines plants treated with three plant hormones, ethylene, salicylic acid, and jasmonates at V3 developmental stage, suggesting that ZmeIF4E is more likely to be involved in the regulation of defense gene expression and induction of local and systemic resistance. Moreover, four cis-acting elements related to plant defense responses, including DOFCOREZM, EECCRCAH1, GT1GAMSCAM4, and GT1CONSENSUS were detected in ZmeIF4E promoter for harboring sequence variation in the two lines. Association analysis with 163 inbred lines revealed that one SNP in EECCRCAH1 is significantly associated with CSI of MRDD in two environments, which explained 3.33 and 9.04 % of phenotypic variation, respectively. Meanwhile, one SNP in GT-1 motif was found to affect MRDD resistance only in one of the two environments, which explained 5.17 % of phenotypic variation. Collectively, regulatory motifs respectively harboring the two significant SNPs in ZmeIF4E promoter could be involved in the defense process of maize after viral infection. These results contribute to understand maize defense mechanisms against maize rough dwarf virus.
Wagner Mackenzie, Brett; Waite, David W; Taylor, Michael W
2015-01-01
The human gut contains dense and diverse microbial communities which have profound influences on human health. Gaining meaningful insights into these communities requires provision of high quality microbial nucleic acids from human fecal samples, as well as an understanding of the sources of variation and their impacts on the experimental model. We present here a systematic analysis of commonly used microbial DNA extraction methods, and identify significant sources of variation. Five extraction methods (Human Microbiome Project protocol, MoBio PowerSoil DNA Isolation Kit, QIAamp DNA Stool Mini Kit, ZR Fecal DNA MiniPrep, phenol:chloroform-based DNA isolation) were evaluated based on the following criteria: DNA yield, quality and integrity, and microbial community structure based on Illumina amplicon sequencing of the V4 region of bacterial and archaeal 16S rRNA genes. Our results indicate that the largest portion of variation within the model was attributed to differences between subjects (biological variation), with a smaller proportion of variation associated with DNA extraction method (technical variation) and intra-subject variation. A comprehensive understanding of the potential impact of technical variation on the human gut microbiota will help limit preventable bias, enabling more accurate diversity estimates.
Boussaha, Mekki; Michot, Pauline; Letaief, Rabia; Hozé, Chris; Fritz, Sébastien; Grohs, Cécile; Esquerré, Diane; Duchesne, Amandine; Philippe, Romain; Blanquet, Véronique; Phocas, Florence; Floriot, Sandrine; Rocha, Dominique; Klopp, Christophe; Capitan, Aurélien; Boichard, Didier
2016-11-15
In recent years, several bovine genome sequencing projects were carried out with the aim of developing genomic tools to improve dairy and beef production efficiency and sustainability. In this study, we describe the first French cattle genome variation dataset obtained by sequencing 274 whole genomes representing several major dairy and beef breeds. This dataset contains over 28 million single nucleotide polymorphisms (SNPs) and small insertions and deletions. Comparisons between sequencing results and SNP array genotypes revealed a very high genotype concordance rate, which indicates the good quality of our data. To our knowledge, this is the first large-scale catalog of small genomic variations in French dairy and beef cattle. This resource will contribute to the study of gene functions and population structure and also help to improve traits through genotype-guided selection.
Richardson, David S; Westerdahl, Helena
2003-12-01
The Great reed warbler (GRW) and the Seychelles warbler (SW) are congeners with markedly different demographic histories. The GRW is a normal outbred bird species while the SW population remains isolated and inbred after undergoing a severe population bottleneck. We examined variation at Major Histocompatibility Complex (MHC) class I exon 3 using restriction fragment length polymorphism, denaturing gradient gel electrophoresis and DNA sequencing. Although genetic variation was higher in the GRW, considerable variation has been maintained in the SW. The ten exon 3 sequences found in the SW were as diverged from each other as were a random sub-sample of the 67 sequences from the GRW. There was evidence for balancing selection in both species, and the phylogenetic analysis showing that the exon 3 sequences did not separate according to species, was consistent with transspecies evolution of the MHC.
Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming
2013-01-01
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
2013-01-01
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169
Identification of the sequence variations of 15 autosomal STR loci in a Chinese population.
Chen, Wenjing; Cheng, Jianding; Ou, Xueling; Chen, Yong; Tong, Dayue; Sun, Hongyu
2014-01-01
DNA sequence variation including base(s) changes and insertion or deletion in the primer binding region may cause a null allele and, if this changes the length of the amplified fragment out of the allelic ladder, off-ladder (OL) alleles may be detected. In order to provide accurate and reliable DNA evidence for forensic DNA analysis, it is essential to clarify sequence variations in prevalently used STR loci. Suspected null alleles and OL alleles of PlowerPlex16® System from 21,934 unrelated Chinese individuals were verified by alternative systems and sequenced. A total of 17 cases with null alleles were identified, including 12 kinds of point mutations in 16 cases and a 19-base deletion in one case. The total frequency of null alleles was 7.751 × 10(-4). Eight hundred and forty-four OL alleles classified as being of 97 different kinds were observed at 15 STR loci of the PowerPlex®16 system except vWA. All the frequencies of OL alleles were under 0.01. Null alleles should be confirmed by alternative primers and OL alleles should be named appropriately. Particular attention should be paid to sequence variation, since incorrect designation could lead to false conclusions.
The diploid genome sequence of an Asian individual
Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian
2009-01-01
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Analysis of temporal variation in human masticatory cycles during gum chewing.
Crane, Elizabeth A; Rothman, Edward D; Childers, David; Gerstner, Geoffrey E
2013-10-01
The study investigated modulation of fast and slow opening (FO, SO) and closing (FC, SC) chewing cycle phases using gum-chewing sequences in humans. Twenty-two healthy adult subjects participated by chewing gum for at least 20s on the right side and at least 20s on the left side while jaw movements were tracked with a 3D motion analysis system. Jaw movement data were digitized, and chewing cycle phases were identified and analysed for all chewing cycles in a complete sequence. All four chewing cycle phase durations were more variant than total cycle durations, a result found in other non-human primates. Significant negative correlations existed between the opening phases, SO and FO, and between the closing phases, SC and FC; however, there was less consistency in terms of which phases were negatively correlated both between subjects, and between chewing sides within subjects, compared with results reported in other species. The coordination of intra-cycle phases appears to be flexible and to follow complex rules during gum-chewing in humans. Alternatively, the observed intra-cycle phase relationships could simply reflect: (1) variation in jaw kinematics due to variation in how gum was handled by the tongue on a chew-by-chew basis in our experimental design or (2) by variation due to data sampling noise and/or how phases were defined and identified. Copyright © 2013 Elsevier Ltd. All rights reserved.
Reuter, Miriam S.; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K.C.; Trost, Brett; Paton, Tara A.; Pereira, Sergio L.; Herbrick, Jo-Anne; Wintle, Richard F.; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R.; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W.L.; Wang, Zhuozhi; Patel, Rohan V.; Pellecchia, Giovanna; Wei, John; Strug, Lisa J.; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M.; Bassett, Anne S.; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D.; Stavropoulos, Dimitri J.; Bowdin, Sarah; Hildebrandt, Matthew R.; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M. Stephen; Monfared, Nasim; Hosseini, S. Mohsen; Joseph-George, Ann M.; Keeley, Fred W.; Cook, Ryan A.; Fiume, Marc; Lee, Hin C.; Marshall, Christian R.; Davies, Jill; Hazell, Allison; Buchanan, Janet A.; Szego, Michael J.; Scherer, Stephen W.
2018-01-01
BACKGROUND: The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. METHODS: Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. RESULTS: Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set (n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants — associated with cancer, cardiac or neurodegenerative phenotypes — remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. INTERPRETATION: Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. PMID:29431110
Zumaraga, Mark Pretzel; Medina, Paul Julius; Recto, Juan Miguel; Abrahan, Lauro; Azurin, Edelyn; Tanchoco, Celeste C; Jimeno, Cecilia A; Palmes-Saloma, Cynthia
2017-03-01
This study aimed to discover genetic variants in the entire 101 kB vitamin D receptor (VDR) gene for vitamin D deficiency in a group of postmenopausal Filipino women using targeted next generation sequencing (TNGS) approach in a case-control study design. A total of 50 women with and without osteoporotic fracture seen at the Philippine Orthopedic Center were included. Blood samples were collected for determination of serum vitamin D, calcium, phosphorus, glucose, blood urea nitrogen, creatinine, aspartate aminotransferase, alanine aminotransferase and as primary source for targeted VDR gene sequencing using the Ion Torrent Personal Genome Machine. The variant calling was based on the GATK best practice workflow and annotated using Annovar tool. A total of 1496 unique variants in the whole 101-kb VDR gene were identified. Novel sequence variations not registered in the dbSNP database were found among cases and controls at a rate of 23.1% and 16.6% of total discovered variants, respectively. One disease-associated enhancer showed statistically significant association to low serum 25-hydroxy vitamin D levels (Pearson chi-square P-value=0.009). The transcription factor binding site prediction program PROMO predicted the disruption of three transcription factor binding sites in this enhancer region. These findings show the power of TNGS in identifying sequence variations in a very large gene and the surprising results obtained in this study greatly expand the catalog of known VDR sequence variants that may represent an important clue in the emergence of vitamin D deficiency. Such information will also provide the additional guidance necessary toward a personalized nutritional advice to reach sufficient vitamin D status. Copyright © 2016 Elsevier Inc. All rights reserved.
Reuter, Miriam S; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K C; Trost, Brett; Paton, Tara A; Pereira, Sergio L; Herbrick, Jo-Anne; Wintle, Richard F; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W L; Wang, Zhuozhi; Patel, Rohan V; Pellecchia, Giovanna; Wei, John; Strug, Lisa J; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M; Bassett, Anne S; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D; Stavropoulos, Dimitri J; Bowdin, Sarah; Hildebrandt, Matthew R; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M Stephen; Monfared, Nasim; Hosseini, S Mohsen; Joseph-George, Ann M; Keeley, Fred W; Cook, Ryan A; Fiume, Marc; Lee, Hin C; Marshall, Christian R; Davies, Jill; Hazell, Allison; Buchanan, Janet A; Szego, Michael J; Scherer, Stephen W
2018-02-05
The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. © 2018 Joule Inc. or its licensors.
Cenik, Can; Cenik, Elif Sarinay; Byeon, Gun W.; Grubert, Fabian; Candille, Sophie I.; Spacek, Damek; Alsallakh, Bilal; Tilgner, Hagen; Araya, Carlos L.; Tang, Hua; Ricci, Emiliano; Snyder, Michael P.
2015-01-01
Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy—many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation. PMID:26297486
Zhang, J R; Norris, S J
1998-08-01
The Lyme disease spirochete Borrelia burgdorferi possesses 15 silent vls cassettes and a vls expression site (vlsE) encoding a surface-exposed lipoprotein. Segments of the silent vls cassettes have been shown to recombine with the vlsE cassette region in the mammalian host, resulting in combinatorial antigenic variation. Despite promiscuous recombination within the vlsE cassette region, the 5' and 3' coding sequences of vlsE that flank the cassette region are not subject to sequence variation during these recombination events. The segments of the silent vls cassettes recombine in the vlsE cassette region through a unidirectional process such that the sequence and organization of the silent vls loci are not affected. As a result of recombination, the previously expressed segments are replaced by incoming segments and apparently degraded. These results provide evidence for a gene conversion mechanism in VlsE antigenic variation.
NASA Technical Reports Server (NTRS)
Rai, Man Mohan (Inventor); Madavan, Nateri K. (Inventor)
2007-01-01
A method and system for data modeling that incorporates the advantages of both traditional response surface methodology (RSM) and neural networks is disclosed. The invention partitions the parameters into a first set of s simple parameters, where observable data are expressible as low order polynomials, and c complex parameters that reflect more complicated variation of the observed data. Variation of the data with the simple parameters is modeled using polynomials; and variation of the data with the complex parameters at each vertex is analyzed using a neural network. Variations with the simple parameters and with the complex parameters are expressed using a first sequence of shape functions and a second sequence of neural network functions. The first and second sequences are multiplicatively combined to form a composite response surface, dependent upon the parameter values, that can be used to identify an accurate mode
Spuesens, Emiel B M; van de Kreeke, Nick; Estevão, Silvia; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis
2011-02-01
Mycoplasma pneumoniae is a human pathogen that causes a range of respiratory tract infections. The first step in infection is adherence of the bacteria to the respiratory epithelium. This step is mediated by a specialized organelle, which contains several proteins (cytadhesins) that have an important function in adherence. Two of these cytadhesins, P40 and P90, represent the proteolytic products from a single 130 kDa protein precursor, which is encoded by the MPN142 gene. Interestingly, MPN142 contains a repetitive DNA element, termed RepMP5, of which homologues are found at seven other loci within the M. pneumoniae genome. It has been hypothesized that these RepMP5 elements, which are similar but not identical in sequence, recombine with their counterpart within MPN142 and thereby provide a source of sequence variation for this gene. As this variation may give rise to amino acid changes within P40 and P90, the recombination between RepMP5 elements may constitute the basis of antigenic variation and, possibly, immune evasion by M. pneumoniae. To investigate the sequence variation of MPN142 in relation to inter-RepMP5 recombination, we determined the sequences of all RepMP5 elements in a collection of 25 strains. The results indicate that: (i) inter-RepMP5 recombination events have occurred in seven of the strains, and (ii) putative RepMP5 recombination events involving MPN142 have induced amino acid changes in a surface-exposed part of the P40 protein in two of the strains. We conclude that recombination between RepMP5 elements is a common phenomenon that may lead to sequence variation of MPN142-encoded proteins.
Dynamics of actin evolution in dinoflagellates.
Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F
2011-04-01
Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.
Mellows, Andrew; Barnett, Ross; Dalén, Love; Sandoval-Castellanos, Edson; Linderholm, Anna; McGovern, Thomas H.; Church, Mike J.; Larson, Greger
2012-01-01
Previous studies have suggested that the presence of sea ice is an important factor in facilitating migration and determining the degree of genetic isolation among contemporary arctic fox populations. Because the extent of sea ice is dependent upon global temperatures, periods of significant cooling would have had a major impact on fox population connectivity and genetic variation. We tested this hypothesis by extracting and sequencing mitochondrial control region sequences from 17 arctic foxes excavated from two late-ninth-century to twelfth-century AD archaeological sites in northeast Iceland, both of which predate the Little Ice Age (approx. sixteenth to nineteenth century). Despite the fact that five haplotypes have been observed in modern Icelandic foxes, a single haplotype was shared among all of the ancient individuals. Results from simulations within an approximate Bayesian computation framework suggest that the rapid increase in Icelandic arctic fox haplotype diversity can only be explained by sea-ice-mediated fox immigration facilitated by the Little Ice Age. PMID:22977155
Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder
Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W
2017-01-01
We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302
Mellows, Andrew; Barnett, Ross; Dalén, Love; Sandoval-Castellanos, Edson; Linderholm, Anna; McGovern, Thomas H; Church, Mike J; Larson, Greger
2012-11-22
Previous studies have suggested that the presence of sea ice is an important factor in facilitating migration and determining the degree of genetic isolation among contemporary arctic fox populations. Because the extent of sea ice is dependent upon global temperatures, periods of significant cooling would have had a major impact on fox population connectivity and genetic variation. We tested this hypothesis by extracting and sequencing mitochondrial control region sequences from 17 arctic foxes excavated from two late-ninth-century to twelfth-century AD archaeological sites in northeast Iceland, both of which predate the Little Ice Age (approx. sixteenth to nineteenth century). Despite the fact that five haplotypes have been observed in modern Icelandic foxes, a single haplotype was shared among all of the ancient individuals. Results from simulations within an approximate Bayesian computation framework suggest that the rapid increase in Icelandic arctic fox haplotype diversity can only be explained by sea-ice-mediated fox immigration facilitated by the Little Ice Age.
Horizontal gene transfer of chromosomal Type II toxin-antitoxin systems of Escherichia coli.
Ramisetty, Bhaskar Chandra Mohan; Santhosh, Ramachandran Sarojini
2016-02-01
Type II toxin-antitoxin systems (TAs) are small autoregulated bicistronic operons that encode a toxin protein with the potential to inhibit metabolic processes and an antitoxin protein to neutralize the toxin. Most of the bacterial genomes encode multiple TAs. However, the diversity and accumulation of TAs on bacterial genomes and its physiological implications are highly debated. Here we provide evidence that Escherichia coli chromosomal TAs (encoding RNase toxins) are 'acquired' DNA likely originated from heterologous DNA and are the smallest known autoregulated operons with the potential for horizontal propagation. Sequence analyses revealed that integration of TAs into the bacterial genome is unique and contributes to variations in the coding and/or regulatory regions of flanking host genome sequences. Plasmids and genomes encoding identical TAs of natural isolates are mutually exclusive. Chromosomal TAs might play significant roles in the evolution and ecology of bacteria by contributing to host genome variation and by moderation of plasmid maintenance. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Yang, Yingjie; Ren, Jie; Zhang, Qizhu
2016-02-01
HPV-16 varies geographically and is correlated with cervical cancer genesis and progression. This study aimed to determine the distribution of HPV-16 E6/E7 genetic variation in patients with invasive cervical cancer or precancer in Guizhou Province, China. A case-control study was designed, and the distribution of HPV-16 E6/E7 genetic variation was compared among women with cervical cancer, precancer, and sexually active without cervical lesion. HPV infection was detected through flow-through hybridization and gene chip techniques to determine the prevalence of HPV 16 E6/E7 genetic variation. Among 90 specimens (30 cervical cancer, 30 precancer, 30 controls), 81 were subjected to HPV-16 E6/E7 gene sequencing. The rates of DNA sequence mutation and amino acid mutation were 76.5% (62/81) and 66.7% (54/81), respectively. Both E6 and E7 genes showed higher mutation rate than their prototypes. The prevalence of E6/E7 mutation significantly differed between the cervical cancer and the controls (P < 0.05) and between the cervical precancer and the controls (P < 0.05). Mutations were simultaneously detected at the E6-D32E (T96A) and E7-M28V (A82G)/L94P (T281C) sites of the amino acid sequence. The most common genetic variation was D32E/M28V/L94P, which accounted for 35.8% of the cases (29/81). D32E/M28V/L94P mutation was higher in the cervical cancer and precancer compared with the prototype. HPV-16 E6/E7 genetic variations, such as D32E/M28V/L94P, are more prevalent in cervical cancer or precancer than those in the controls. The possible correlation between genetic variation and cancerigenesis may be used to design an HPV vaccine for cervical carcinoma. © 2015 Wiley Periodicals, Inc.
MacCluskie, Margaret C.; Flint, Paul L.; Sedinger, James S.
1997-01-01
We investigated factors affecting incubation time and metabolic rates of Mallard (Anas platyrhynchos) eggs incubated under constant environmental conditions. Time required to reach the star-pipped stage of hatch varied significantly among females, but not with laying sequence or egg size. Metabolic rate of eggs varied positively with position in the laying sequence and tended to vary among females. Metabolic rate did not vary with egg volume or incubation length. Our results indicate metabolic rate may act as one synchronization mechanism for hatch. The role of maternal effects in development time should be considered in subsequent studies of incubation time in ducks.
Real-time myocardium segmentation for the assessment of cardiac function variation
NASA Astrophysics Data System (ADS)
Zoehrer, Fabian; Huellebrand, Markus; Chitiboi, Teodora; Oechtering, Thekla; Sieren, Malte; Frahm, Jens; Hahn, Horst K.; Hennemuth, Anja
2017-03-01
Recent developments in MRI enable the acquisition of image sequences with high spatio-temporal resolution. Cardiac motion can be captured without gating and triggering. Image size and contrast relations differ from conventional cardiac MRI cine sequences requiring new adapted analysis methods. We suggest a novel segmentation approach utilizing contrast invariant polar scanning techniques. It has been tested with 20 datasets of arrhythmia patients. The results do not differ significantly more between automatic and manual segmentations than between observers. This indicates that the presented solution could enable clinical applications of real-time MRI for the examination of arrhythmic cardiac motion in the future.
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.; ...
2016-06-08
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Bautista-de Los Santos, Quyen Melina; Schroeder, Joanna L; Blakemore, Oliver; Moses, Jonathan; Haffey, Mark; Sloan, William; Pinto, Ameet J
2016-03-01
High-throughput and deep DNA sequencing, particularly amplicon sequencing, is being increasingly utilized to reveal spatial and temporal dynamics of bacterial communities in drinking water systems. Whilst the sampling and methodological biases associated with PCR and sequencing have been studied in other environments, they have not been quantified for drinking water. These biases are likely to have the greatest effect on the ability to characterize subtle spatio-temporal patterns influenced by process/environmental conditions. In such cases, intra-sample variability may swamp any underlying small, systematic variation. To evaluate this, we undertook a study with replication at multiple levels including sampling sites, sample collection, PCR amplification, and high throughput sequencing of 16S rRNA amplicons. The variability inherent to the PCR amplification and sequencing steps is significant enough to mask differences between bacterial communities from replicate samples. This was largely driven by greater variability in detection of rare bacteria (relative abundance <0.01%) across PCR/sequencing replicates as compared to replicate samples. Despite this, we captured significant changes in bacterial community over diurnal time-scales and find that the extent and pattern of diurnal changes is specific to each sampling location. Further, we find diurnal changes in bacterial community arise due to differences in the presence/absence of the low abundance bacteria and changes in the relative abundance of dominant bacteria. Finally, we show that bacterial community composition is significantly different across sampling sites for time-periods during which there are typically rapid changes in water use. This suggests hydraulic changes (driven by changes in water demand) contribute to shaping the bacterial community in bulk drinking water over diurnal time-scales. Copyright © 2015 Elsevier Ltd. All rights reserved.
Williams, Tony D.; Ames, Caroline E.; Kiparissis, Yiannis; Wynne-Edwards, Katherine E.
2005-01-01
We investigated the relationship between plasma and yolk oestrogens in laying female zebra finches (Taeniopygia guttata) by manipulating plasma oestradiol (E2) levels, via injection of oestradiol-17β, in a sequence-specific manner to maintain chronically high plasma levels for later-developing eggs (contrasting with the endogenous pattern of decreasing plasma E2 concentrations during laying). We report systematic variation in yolk oestrogen concentrations, in relation to laying sequence, similar to that widely reported for androgenic steroids. In sham-manipulated females, yolk E2 concentrations decreased with laying sequence. However, in E2-treated females plasma E2 levels were higher during the period of rapid yolk development of later-laid eggs, compared with control females. As a consequence, we reversed the laying-sequence-specific pattern of yolk E2: in E2-treated females, yolk E2 concentrations increased with laying-sequence. In general therefore, yolk E2 levels were a direct reflection of plasma E2 levels. However, in control females there was some inter-individual variability in the endogenous pattern of plasma E2 levels through the laying cycle which could generate variation in sequence-specific patterns of yolk hormone levels even if these primarily reflect circulating steroid levels. PMID:15695208
Prediction of phenotypes of missense mutations in human proteins from biological assemblies.
Wei, Qiong; Xu, Qifang; Dunbrack, Roland L
2013-02-01
Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.
Barik, Suvakanta; SarkarDas, Shabari; Singh, Archita; Gautam, Vibhav; Kumar, Pramod; Majee, Manoj; Sarkar, Ananda K
2014-01-01
Similar to the majority of the microRNAs, mature miR166s are derived from multiple members of MIR166 genes (precursors) and regulate various aspects of plant development by negatively regulating their target genes (Class III HD-ZIP). The evolutionary conservation or functional diversification of miRNA166 family members remains elusive. Here, we show the phylogenetic relationships among MIR166 precursor and mature sequences from three diverse model plant species. Despite strong conservation, some mature miR166 sequences, such as ppt-miR166m, have undergone sequence variation. Critical sequence variation in ppt-miR166m has led to functional diversification, as it targets non-HD-ZIPIII gene transcript (s). MIR166 precursor sequences have diverged in a lineage specific manner, and both precursors and mature osa-miR166i/j are highly conserved. Interestingly, polycistronic MIR166s were present in Physcomitrella and Oryza but not in Arabidopsis. The nature of cis-regulatory motifs on the upstream promoter sequences of MIR166 genes indicates their possible contribution to the functional variation observed among miR166 species. Copyright © 2013 Elsevier Inc. All rights reserved.
Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun
2015-01-01
Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.
Thompson, David N; Thompson, Vicki S; Schaller, Kastli D; Apel, William A; Reed, David W; Lacey, Jeffrey A
2013-04-30
Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof are provided. Further provided are methods of at least partially degrading xylotriose, xylobiose, and/or arabinofuranose-substituted xylan using isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof.
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data....
Restructuring of the Aquatic Bacterial Community by Hydric Dynamics Associated with Superstorm Sandy
Ulrich, Nikea; Rosenberger, Abigail; Brislawn, Colin; Wright, Justin; Kessler, Collin; Toole, David; Solomon, Caroline; Strutt, Steven; McClure, Erin
2016-01-01
ABSTRACT Bacterial community composition and longitudinal fluctuations were monitored in a riverine system during and after Superstorm Sandy to better characterize inter- and intracommunity responses associated with the disturbance associated with a 100-year storm event. High-throughput sequencing of the 16S rRNA gene was used to assess microbial community structure within water samples from Muddy Creek Run, a second-order stream in Huntingdon, PA, at 12 different time points during the storm event (29 October to 3 November 2012) and under seasonally matched baseline conditions. High-throughput sequencing of the 16S rRNA gene was used to track changes in bacterial community structure and divergence during and after Superstorm Sandy. Bacterial community dynamics were correlated to measured physicochemical parameters and fecal indicator bacteria (FIB) concentrations. Bioinformatics analyses of 2.1 million 16S rRNA gene sequences revealed a significant increase in bacterial diversity in samples taken during peak discharge of the storm. Beta-diversity analyses revealed longitudinal shifts in the bacterial community structure. Successional changes were observed, in which Betaproteobacteria and Gammaproteobacteria decreased in 16S rRNA gene relative abundance, while the relative abundance of members of the Firmicutes increased. Furthermore, 16S rRNA gene sequences matching pathogenic bacteria, including strains of Legionella, Campylobacter, Arcobacter, and Helicobacter, as well as bacteria of fecal origin (e.g., Bacteroides), exhibited an increase in abundance after peak discharge of the storm. This study revealed a significant restructuring of in-stream bacterial community structure associated with hydric dynamics of a storm event. IMPORTANCE In order to better understand the microbial risks associated with freshwater environments during a storm event, a more comprehensive understanding of the variations in aquatic bacterial diversity is warranted. This study investigated the bacterial communities during and after Superstorm Sandy to provide fine time point resolution of dynamic changes in bacterial composition. This study adds to the current literature by revealing the variation in bacterial community structure during the course of a storm. This study employed high-throughput DNA sequencing, which generated a deep analysis of inter- and intracommunity responses during a significant storm event. This study has highlighted the utility of applying high-throughput sequencing for water quality monitoring purposes, as this approach enabled a more comprehensive investigation of the bacterial community structure. Altogether, these data suggest a drastic restructuring of the stream bacterial community during a storm event and highlight the potential of high-throughput sequencing approaches for assessing the microbiological quality of our environment. PMID:27060115
2012-01-01
Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway. Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines characteristics of laboratory strains and wild-industrial strains. PMID:22448915
Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko
2014-01-01
Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750
Goettel, Wolfgang; Xia, Eric; Upchurch, Robert; Wang, Ming-Li; Chen, Pengyin; An, Yong-Qiang Charles
2014-04-23
Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.
Chen, Weidong; Pan, Yongbo; Yu, Lingyu; Yang, Jun; Zhang, Wenjing
2017-01-01
Microeukaryotes play key roles in the structure and functioning of marine ecosystems. Little is known about the relative importance of the processes that drive planktonic and benthic microeukaryotic biogeography in subtropical offshore areas. This study compares the microeukaryotic community compositions (MCCs) from offshore waters (n = 12) and intertidal sediments (n = 12) around Xiamen Island, southern China, using high-throughput sequencing of 18S rDNA. This work further quantifies the relative contributions of spatial and environmental variables on the distribution of marine MCCs (including total, dominant, rare and conditionally rare taxa). Our results showed that planktonic and benthic MCCs were significantly different, and the benthic richness (6627 OTUs) was much higher than that for plankton (4044 OTUs) with the same sequencing effort. Further, we found that benthic MCCs exhibited a significant distance-decay relationship, whereas the planktonic communities did not. After removing two unique sites (N2 and N3), however, 72% variation in planktonic community was explained well by stochastic processes. More importantly, both the environmental and spatial factors played significant roles in influencing the biogeography of total and dominant planktonic and benthic microeukaryotic communities, although their relative effects on these community variations were different. However, a high proportion of unexplained variation in the rare taxa (78.1–97.4%) and conditionally rare taxa (49.0–81.0%) indicated that more complex mechanisms may influence the assembly of the rare subcommunity. These results demonstrate that patterns and processes in marine microeukaryotic community assembly differ among the different habitats (coastal water vs. intertidal sediment) and different communities (total, dominant, rare and conditionally rare microeukaryotes), and provide novel insight on the microeukaryotic biogeography and ecological mechanisms in coastal waters and intertidal sediments at local scale. PMID:29075237
Zhao, Jiaojiao; Huang, Li; Ren, Xiaoping; Pandey, Manish K; Wu, Bei; Chen, Yuning; Zhou, Xiaojing; Chen, Weigang; Xia, Youlin; Li, Zeqing; Luo, Huaiyong; Lei, Yong; Varshney, Rajeev K; Liao, Boshou; Jiang, Huifang
2017-01-01
Cultivated peanut ( Arachis hypogaea L.) is an allotetraploid (AABB, 2 n = 4 x = 40), valued for its edible oil and digestible protein. Seed size and weight are important agronomical traits significantly influence the yield and nutritional composition of peanut. However, the genetic basis of seed-related traits remains ambiguous. Association mapping is a powerful approach for quickly and efficiently exploring the genetic basis of important traits in plants. In this study, a total of 104 peanut accessions were used to identify molecular markers associated with seed-related traits using 554 single-locus simple sequence repeat (SSR) markers. Most of the accessions had no or weak relationship in the peanut panel. The linkage disequilibrium (LD) decayed with the genetic distance of 1cM at the genome level and the LD of B subgenome decayed faster than that of the A subgenome. Large phenotypic variation was observed for four seed-related traits in the association panel. Using mixed linear model with population structure and kinship, a total of 30 significant SSR markers were detected to be associated with four seed-related traits ( P < 1.81 × 10 -3 ) in different environments, which explained 11.22-32.30% of the phenotypic variation for each trait. The marker AHGA44686 was simultaneously and repeatedly associated with seed length and hundred-seed weight in multiple environments with large phenotypic variance (26.23 ∼ 32.30%). The favorable alleles of associated markers for each seed-related trait and the optimal combination of favorable alleles of associated markers were identified to significantly enhance trait performance, revealing a potential of utilization of these associated markers in peanut breeding program.
Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.
Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart
2014-01-15
High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter
Hysteretic energy prediction method for mainshock-aftershock sequences
NASA Astrophysics Data System (ADS)
Zhai, Changhai; Ji, Duofa; Wen, Weiping; Li, Cuihua; Lei, Weidong; Xie, Lili
2018-04-01
Structures located in seismically active regions may be subjected to mainshock-aftershock (MSAS) sequences. Strong aftershocks significantly affect the hysteretic energy demand of structures. The hysteretic energy, E H,seq, is normalized by mass m and expressed in terms of the equivalent velocity, V D,seq, to quantitatively investigate aftershock effects on the hysteretic energy of structures. The equivalent velocity, V D,seq, is computed by analyzing the response time-history of an inelastic single-degree-of-freedom (SDOF) system with a varying vibration period subjected to 309 MSAS sequences. The present study selected two kinds of MSAS sequences, with one aftershock and two aftershocks, respectively. The aftershocks are scaled to maintain different relative intensities. The variation of the equivalent velocity, V D,seq, is studied for consideration of the ductility values, site conditions, relative intensities, number of aftershocks, hysteretic models, and damping ratios. The MSAS sequence with one aftershock exhibited a 10% to 30% hysteretic energy increase, whereas the MSAS sequence with two aftershocks presented a 20% to 40% hysteretic energy increase. Finally, a hysteretic energy prediction equation is proposed as a function of the vibration period, ductility value, and damping ratio to estimate hysteretic energy for mainshock-aftershock sequences.
2012-01-01
Abstract Introduction Pre-clinical data suggest p53-dependent anthracycline-induced apoptosis and p53-independent taxane activity. However, dedicated clinical research has not defined a predictive role for TP53 gene mutations. The aim of the current study was to retrospectively explore the prognosis and predictive values of TP53 somatic mutations in the BIG 02-98 randomized phase III trial in which women with node-positive breast cancer were treated with adjuvant doxorubicin-based chemotherapy with or without docetaxel. Methods The prognostic and predictive values of TP53 were analyzed in tumor samples by gene sequencing within exons 5 to 8. Patients were classified according to p53 protein status predicted from TP53 gene sequence, as wild-type (no TP53 variation or TP53 variations which are predicted not to modify p53 protein sequence) or mutant (p53 nonsynonymous mutations). Mutations were subcategorized according to missense or truncating mutations. Survival analyses were performed using the Kaplan-Meier method and log-rank test. Cox-regression analysis was used to identify independent predictors of outcome. Results TP53 gene status was determined for 18% (520 of 2887) of the women enrolled in BIG 02-98. TP53 gene variations were found in 17% (90 of 520). Nonsynonymous p53 mutations, found in 16.3% (85 of 520), were associated with older age, ductal morphology, higher grade and hormone-receptor negativity. Of the nonsynonymous mutations, 12.3% (64 of 520) were missense and 3.6% were truncating (19 of 520). Only truncating mutations showed significant independent prognostic value, with an increased recurrence risk compared to patients with non-modified p53 protein (hazard ratio = 3.21, 95% confidence interval = 1.740 to 5.935, P = 0.0002). p53 status had no significant predictive value for response to docetaxel. Conclusions p53 truncating mutations were uncommon but associated with poor prognosis. No significant predictive role for p53 status was detected. Trial registration ClinicalTrials.gov NCT00174655 PMID:22551440
Baker, Christa A.; Ma, Lisa; Casareale, Chelsea R.
2016-01-01
In many sensory pathways, central neurons serve as temporal filters for timing patterns in communication signals. However, how a population of neurons with diverse temporal filtering properties codes for natural variation in communication signals is unknown. Here we addressed this question in the weakly electric fish Brienomyrus brachyistius, which varies the time intervals between successive electric organ discharges to communicate. These fish produce an individually stereotyped signal called a scallop, which consists of a distinctive temporal pattern of ∼8–12 electric pulses. We manipulated the temporal structure of natural scallops during behavioral playback and in vivo electrophysiology experiments to probe the temporal sensitivity of scallop encoding and recognition. We found that presenting time-reversed, randomized, or jittered scallops increased behavioral response thresholds, demonstrating that fish's electric signaling behavior was sensitive to the precise temporal structure of scallops. Next, using in vivo intracellular recordings and discriminant function analysis, we found that the responses of interval-selective midbrain neurons were also sensitive to the precise temporal structure of scallops. Subthreshold changes in membrane potential recorded from single neurons discriminated natural scallops from time-reversed, randomized, and jittered sequences. Pooling the responses of multiple neurons improved the discriminability of natural sequences from temporally manipulated sequences. Finally, we found that single-neuron responses were sensitive to interindividual variation in scallop sequences, raising the question of whether fish may analyze scallop structure to gain information about the sender. Collectively, these results demonstrate that a population of interval-selective neurons can encode behaviorally relevant temporal patterns with millisecond precision. SIGNIFICANCE STATEMENT The timing patterns of action potentials, or spikes, play important roles in representing information in the nervous system. However, how these temporal patterns are recognized by downstream neurons is not well understood. Here we use the electrosensory system of mormyrid weakly electric fish to investigate how a population of neurons with diverse temporal filtering properties encodes behaviorally relevant input timing patterns, and how this relates to behavioral sensitivity. We show that fish are behaviorally sensitive to millisecond variations in natural, temporally patterned communication signals, and that the responses of individual midbrain neurons are also sensitive to variation in these patterns. In fact, the output of single neurons contains enough information to discriminate stereotyped communication signals produced by different individuals. PMID:27559179
Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling
2014-01-01
Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV in China as well as genetic relationships among isolates from China and other countries. PMID:25170926
Dailey, Wendy A; Gryc, Wojciech; Garg, Pooja G; Drenser, Kimberly A
2015-09-01
To present the association between mutations affecting the Wnt-signaling receptor protein (FZD4), inherited vitreoretinopathies, and retinopathy of prematurity (ROP). Retrospective analysis of prospective samples at a tertiary referral center. Patients referred to our practice for management of a variety of pediatric vitreoretinopathies were offered participation in an ophthalmic biobank (421 participants with vitreoretinopathies were included in this study). Full-term healthy infants (n = 98) were recruited to the study as controls. Patients with various vitreoretinopathies were prospectively enrolled in an ophthalmic biobank, approved by the Human Investigation Committee at William Beaumont Hospital. Retrospective genetic analysis of the FZD4 gene was performed (Sanger sequencing). Participants with a diagnosis of familial exudative vitreoretinopathy (FEVR), Norrie disease, Coats' disease, bilateral persistent fetal vasculature, and ROP were reviewed for the presence of a FZD4 variant. Data retrieval included status of retinopathy (including staging when possible), gestational age (GA), birth weight (BW) (when available), and family and birth histories. The association of FZD4 variants with the presence of vitreoretinopathy. The sequence variation p.[P33S(;)P168S] is the most prevalent FZD4 variant and is statistically significant for ROP and FEVR (P = 4.6E-04 and P = 2.4E-03, respectively) compared with full-term newborns (P = 1.7E-01). In addition, infants expressing the sequence variation tended to have significantly lower BWs for respective GA (P = 0.04). This suggests that the FZD4 p.[P33S(;)P168S] variant may be a risk factor for retinopathy and restricted intrauterine growth. Testing for FZD4 gene mutations is useful in patients with suspected FEVR and ROP. The relatively high prevalence of the p.[P33S(;)P168S] variant in ROP and intrauterine growth restriction suggests that it also may be a marker for increased risk of developing ROP and preterm birth. Copyright © 2015 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Gu, Hai Ting; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi
2018-04-01
Abrupt change is an important manifestation of hydrological process with dramatic variation in the context of global climate change, the accurate recognition of which has great significance to understand hydrological process changes and carry out the actual hydrological and water resources works. The traditional method is not reliable at both ends of the samples. The results of the methods are often inconsistent. In order to solve the problem, we proposed a comprehensive weighted recognition method for hydrological abrupt change based on weighting by comparing of 12 commonly used methods for testing change points. The reliability of the method was verified by Monte Carlo statistical test. The results showed that the efficiency of the 12 methods was influenced by the factors including coefficient of variation (Cv), deviation coefficient (Cs) before the change point, mean value difference coefficient, Cv difference coefficient and Cs difference coefficient, but with no significant relationship with the mean value of the sequence. Based on the performance of each method, the weight of each test method was given following the results from statistical test. The sliding rank sum test method and the sliding run test method had the highest weight, whereas the RS test method had the lowest weight. By this means, the change points with the largest comprehensive weight could be selected as the final result when the results of the different methods were inconsistent. This method was used to analyze the daily maximum sequence of Jiajiu station in the lower reaches of the Lancang River (1-day, 3-day, 5-day, 7-day and 1-month). The results showed that each sequence had obvious jump variation in 2004, which was in agreement with the physical causes of hydrological process change and water conservancy construction. The rationality and reliability of the proposed method was verified.
Poon, Art F. Y; Kosakovsky Pond, Sergei L.; Bennett, Phil; Richman, Douglas D; Leigh Brown, Andrew J.; Frost, Simon D. W
2007-01-01
CD8+ cytotoxic T-lymphocytes (CTLs) perform a critical role in the immune control of viral infections, including those caused by human immunodeficiency virus type 1 (HIV-1) and hepatitis C virus (HCV). As a result, genetic variation at CTL epitopes is strongly influenced by host-specific selection for either escape from the immune response, or reversion due to the replicative costs of escape mutations in the absence of CTL recognition. Under strong CTL-mediated selection, codon positions within epitopes may immediately “toggle” in response to each host, such that genetic variation in the circulating virus population is shaped by rapid adaptation to immune variation in the host population. However, this hypothesis neglects the substantial genetic variation that accumulates in virus populations within hosts. Here, we evaluate this quantity for a large number of HIV-1– (n ≥ 3,000) and HCV-infected patients (n ≥ 2,600) by screening bulk RT-PCR sequences for sequencing “mixtures” (i.e., ambiguous nucleotides), which act as site-specific markers of genetic variation within each host. We find that nonsynonymous mixtures are abundant and significantly associated with codon positions under host-specific CTL selection, which should deplete within-host variation by driving the fixation of the favored variant. Using a simple model, we demonstrate that this apparently contradictory outcome can be explained by the transmission of unfavorable variants to new hosts before they are removed by selection, which occurs more frequently when selection and transmission occur on similar time scales. Consequently, the circulating virus population is shaped by the transmission rate and the disparity in selection intensities for escape or reversion as much as it is shaped by the immune diversity of the host population, with potentially serious implications for vaccine design. PMID:17397261
Kuramae, Eiko E.; Hillekens, Remy; de Hollander, Mattias; Kiers, E. Toby; Röling, Wilfred F. M.; Kowalchuk, George A.; van der Heijden, Marcel G. A.
2012-01-01
The cultivation of genetically modified (GM) crops has increased significantly over the last decades. However, concerns have been raised that some GM traits may negatively affect beneficial soil biota, such as arbuscular mycorrhizal fungi (AMF), potentially leading to alterations in soil functioning. Here, we test two maize varieties expressing the Bacillus thuringiensis Cry1Ab endotoxin (Bt maize) for their effects on soil AM fungal communities. We target both fungal DNA and RNA, which is new for AM fungi, and we use two strategies as an inclusive and robust way of detecting community differences: (i) 454 pyrosequencing using general fungal rRNA gene-directed primers and (ii) terminal restriction fragment length polymorphism (T-RFLP) profiling using AM fungus-specific markers. Potential GM-induced effects were compared to the normal natural variation of AM fungal communities across 15 different agricultural fields. AM fungi were found to be abundant in the experiment, accounting for 8% and 21% of total recovered DNA- and RNA-derived fungal sequences, respectively, after 104 days of plant growth. RNA- and DNA-based sequence analyses yielded most of the same AM fungal lineages. Our research yielded three major conclusions. First, no consistent differences were detected between AM fungal communities associated with GM plants and non-GM plants. Second, temporal variation in AMF community composition (between two measured time points) was bigger than GM trait-induced variation. Third, natural variation of AMF communities across 15 agricultural fields in The Netherlands, as well as within-field temporal variation, was much higher than GM-induced variation. In conclusion, we found no indication that Bt maize cultivation poses a risk for AMF. PMID:22885748
Genetic variants of neurotransmitter-related genes and miRNAs in Egyptian autistic patients.
Salem, Ahmed M; Ismail, Samira; Zarouk, Waheba A; Abdul Baky, Olwya; Sayed, Ahmed A; Abd El-Hamid, Sawsan; Salem, Sohair
2013-01-01
Autism is a neurodevelopmental disorder with indisputable evidence for a genetic component. This work studied the association of autism with genetic variations in neurotransmitter-related genes, including MAOA uVNTR, MAOB rs1799836, and DRD2 TaqI A in 53 autistic patients and 30 healthy individuals. The study also analyzed sequence variations of miR-431 and miR-21. MAOA uVNTR was genotyped by PCR, MAOB and DRD2 polymorphisms were analyzed by PCR-based RFLP, and miR-431 and miR-21 were sequenced. Low expressing allele of MAOA uVNTR was frequently higher in female patients compared to that in controls (OR = 2.25). MAOB G allele frequency was more significantly increased in autistic patients than in controls (P < 0.001 for both males and females). DRD2 A1+ genotype increased autism risk (OR = 5.1). Severity of autism tends to be slightly affected by MAOA/B genotype. Plasma MAOB activity was significantly reduced in G than in A allele carrying males. There was no significant difference in patients and maternal plasma MAOA/B activity compared to controls. Neither mutations nor SNPs in miR-431 and miR-21 were found among studied patients. This study threw light on some neurotransmitter-related genes suggesting their potential role in Autism pathogenesis that warrants further studies and much consideration.
Consensus generation and variant detection by Celera Assembler.
Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger
2008-04-15
We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
Ross, Callum F; Iriarte-Diaz, Jose; Reed, David A; Stewart, Thomas A; Taylor, Andrea B
2016-09-01
It has been hypothesized that mandibular corpus morphology of primates is related to the material properties of the foods that they chew. However, chewing foods with different material properties is accompanied by low levels of variation in mandibular strain patterns in macaques. We hypothesized that if variation in primate mandible form reflects adaptations to feeding on foods with different material and geometric properties, then this variation will be driven primarily by differences in oral food processing behavior rather than differences in chewing per se. To test this hypothesis, we recorded in vivo bone strain data from the lateral and medial surfaces of the mandibular corpus during complete feeding sequences in three adult male Sapajus as they fed on foods with a range of sizes and material properties. We assessed whether variation in mandibular corpus strain regimes is associated with variation in feeding behaviors and/or chewing on different foods, and we quantified the relative variation in mandibular corpus strain regimes associated with chewing on foods of different material properties versus a range of oral food processing behaviors (incisor, premolar, and molar biting; pulling on incisors; mastication). Feeding behavior had a significant effect on mandibular corpus strain regimes, as did chewing side and the cycle number in a feeding sequence. However, food type had weaker effects and usually only through interaction effects with chewing side and/or cycle type. Strain regimes varied most across different chew sides, then across different behaviors, and lastly between mastication cycles on different foods. Strain magnitudes associated with premolar, molar, and incisor biting were larger than those recorded during mastication. These data suggest that intra- and inter-specific variation in mandible morphology is a trade-off between performance requirements of different oral food processing behaviors and of variation in chewing side, with direct effects of food type being less important. Copyright © 2016 Elsevier Ltd. All rights reserved.
Xue, Angli; Wang, Hongcheng; Zhu, Jun
2017-09-28
Startle behavior is important for survival, and abnormal startle responses are related to several neurological diseases. Drosophila melanogaster provides a powerful system to investigate the genetic underpinnings of variation in startle behavior. Since mechanically induced, startle responses and environmental conditions can be readily quantified and precisely controlled. The 156 wild-derived fully sequenced lines of the Drosophila Genetic Reference Panel (DGRP) were used to identify SNPs and transcripts associated with variation in startle behavior. The results validated highly significant effects of 33 quantitative trait SNPs (QTSs) and 81 quantitative trait transcripts (QTTs) directly associated with phenotypic variation of startle response. We also detected QTT variation controlled by 20 QTSs (tQTSs) and 73 transcripts (tQTTs). Association mapping based on genomic and transcriptomic data enabled us to construct a complex genetic network that underlies variation in startle behavior. Based on principles of evolutionary conservation, human orthologous genes could be superimposed on this network. This study provided both genetic and biological insights into the variation of startle response behavior of Drosophila melanogaster, and highlighted the importance of genetic network to understand the genetic architecture of complex traits.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wong, O; Lo, G; Yuan, J
Purpose: There is growing interests in applying MR-simulator(MR-sim) in radiotherapy but MR images subject to hardware, patient and pulse sequence dependent geometric distortion that may potentially influence target definition. This study aimed to evaluate the influence on head-and-neck tissue delineation, in terms of positional and volumetric variability, of two T1-weighted(T1w) MR sequences on a 1.5T MR-sim Methods: Four healthy volunteers were scanned (4 scans for each on different days) using both spin-echo (3DCUBE, TR/TE=500/14ms, TA=183s) and gradient-echo sequences (3DFSPGR, TE/TR=7/4ms, TA=173s) with identical coverage, voxel-size(0.8×0.8×1.0mm3), receiver-bandwidth(62.5kHz/pix) and geometric correction on a 1.5T MR-sim immobilized with personalized thermoplastic cast and head-rest.more » Under this setting, similar T1w contrast and signal-to-noise ratio were obtained, and factors other than sequence that might bias image distortion and tissue delineation were minimized. VOIs of parotid gland(PGR, PGL), pituitary gland(PIT) and eyeballs(EyeL, EyeR) were carefully drawn, and inter-scan coefficient-of-variation(CV) of VOI centroid position and volume were calculated for each subject. Mean and standard deviation(SD) of the CVs for four subjects were compared between sequences using Wilcoxon ranksum test. Results: The mean positional(<4%) and volumetric(<7%) CVs varied between tissues, majorly dependent on tissue inherent properties like volume, location, mobility and deformability. Smaller mean volumetric CV was found in 3DCUBE, probably due to its less proneness to tissue susceptibility, but only PGL showed significant difference(P<0.05). Positional CVs had no significant differences for all VOIs(P>0.05) between sequences, suggesting volumetric variation might be more sensitive to sequence-dependent delineation difference. Conclusion: Although 3DCUBE is considered less prone to tissue susceptibility-induced artifact and distortion, our preliminary data showed that both sequences had insignificant differences on positional and volumetric CV in most head-and-neck tissues except for PGL. This study is majorly limited in its small sample size. Influences of image contrasts(T1w v.s. T2w) and inter-observer difference have to be further investigated.« less
Artificial mismatch hybridization
Guo, Zhen; Smith, Lloyd M.
1998-01-01
An improved nucleic acid hybridization process is provided which employs a modified oligonucleotide and improves the ability to discriminate a control nucleic acid target from a variant nucleic acid target containing a sequence variation. The modified probe contains at least one artificial mismatch relative to the control nucleic acid target in addition to any mismatch(es) arising from the sequence variation. The invention has direct and advantageous application to numerous existing hybridization methods, including, applications that employ, for example, the Polymerase Chain Reaction, allele-specific nucleic acid sequencing methods, and diagnostic hybridization methods.
Tandemly repeated sequences in mtDNA control region of whitefish, Coregonus lavaretus.
Brzuzan, P
2000-06-01
Length variation of the mitochondrial DNA control region was observed with PCR amplification of a sample of 138 whitefish (Coregonus lavaretus). Nucleotide sequences of representative PCR products showed that the variation was due to the presence of an approximately 100-bp motif tandemly repeated two, three, or five times in the region between the conserved sequence block-3 (CSB-3) and the gene for phenylalanine tRNA. This is the first report on the tandem array composed of long repeat units in mitochondrial DNA of salmonids.
NASA Astrophysics Data System (ADS)
Antoine, Pierre; Rousseau, Denis-Didier; Degeai, Jean-Philippe; Moine, Olivier; Lagroix, France; kreutzer, Sebastian; Fuchs, Markus; Hatté, Christine; Gauthier, Caroline; Svoboda, Jiri; Lisá, Lenka
2013-05-01
High-resolution multidisciplinary investigation of key European loess-palaeosols profiles have demonstrated that loess sequences result from rapid and cyclic aeolian sedimentation which is reflected in variations of loess grain size indexes and correlated with Greenland ice-core dust records. This correlation suggests a global connection between North Atlantic and west-European air masses. Herein, we present a revised stratigraphy and a continuous high-resolution record of grain-size, magnetic susceptibility and organic carbon δ13C of the famous of Dolní Vestonice (DV) loess sequence in the Moravian region of the Czech Republic. A new set of quartz OSL ages provides a reliable and accurate chronology of the sequence's main pedosedimentary events. The grain size record shows strongly contrasting variations with numerous abrupt coarse-grained events, especially in the upper part of the sequence between ca 20-30 ka. This time period is also characterised by a progressive coarsening of the loess deposits as already observed in other western European sequences. The base of the DV sequence exhibits an exceptionally well-preserved soil complex composed of three chernozem soil horizons and 5 aeolian silt layers (marker silts). This complex is, at present, the most complete record of environmental variations and dust deposition in the European loess belt for the Weichselian Early-glacial period spanning about 110 to 70 ka, allowing correlations with various global palaeoclimatic records. OSL ages combined with sedimentological and palaeopedological observations lead to the conclusion that this soil complex recorded all of the main climatic events expressed in the North GRIP record from Greenland Interstadials (GIS) 25 to 19.
A Laboratory Exercise for Genotyping Two Human Single Nucleotide Polymorphisms
ERIC Educational Resources Information Center
Fernando, James; Carlson, Bradley; LeBard, Timothy; McCarthy, Michael; Umali, Finianne; Ashton, Bryce; Rose, Ferrill F., Jr.
2016-01-01
The dramatic decrease in the cost of sequencing a human genome is leading to an era in which a wide range of students will benefit from having an understanding of human genetic variation. Since over 90% of sequence variation between humans is in the form of single nucleotide polymorphisms (SNPs), a laboratory exercise has been devised in order to…
RSAT 2018: regulatory sequence analysis tools 20th anniversary.
Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane
2018-05-02
RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
VARiD: a variation detection framework for color-space and letter-space platforms.
Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael
2010-06-15
High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.
Kelleher, Raymond J; Geigenmüller, Ute; Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David
2012-01-01
Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism.
Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David
2012-01-01
Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism. PMID:22558107
Sauber, Jeannine; Grothe, Jessica; Behm, Maria; Scherag, André; Grallert, Harald; Illig, Thomas; Hinney, Anke; Hebebrand, Johannes; Wiegand, Susanna; Grüters, Annette; Krude, Heiko; Biebermann, Heike
2010-08-01
In the past 20 years, obesity has become a major health problem due to associated diseases like type 2 diabetes mellitus. The gastric inhibitory polypeptide receptor (GIPR) modulates body weight and glucose homeostasis and, therefore, represents an interesting candidate gene for obesity and the comorbidity impaired glucose homeostasis. Recently, a GIPR variation was found to be associated with impaired insulin response in humans. In this study, we screened the GIPR gene for mutations and examined the association between three single-nucleotide polymorphisms (SNPs; rs8111428, rs2302382, rs1800437) and childhood obesity, as well as impaired glucose homeostasis. The coding region of the GIPR was screened for mutations by direct sequencing. We genotyped three known SNPs in 2280 healthy normal weight (1696) and obese (584) children and adolescents. Genotyping was performed using the SNaPshot protocol, the iplex, and matrix-assisted laser desorption ionization time-of-flight spectrometry technique. Obesity was defined by a body mass index SDS above 2; homeostatic model assessment was calculated. No evidence for an association was found between the SNPs and the obesity phenotype. Significant association was found between the minor allele C of the SNP rs1800437 and elevated homeostasis model of insulin resistance values (P=0.001). No further sequence variations in the GIPR were found to be associated with childhood obesity. Variations of the GIPR sequence are not associated with childhood obesity. This study points to a potential role for rs1800437 in glucose homeostasis. Further studies are necessary to confirm these results.
Does the Genetic Code Have A Eukaryotic Origin?
Zhang, Zhang; Yu, Jun
2013-01-01
In the RNA world, RNA is assumed to be the dominant macromolecule performing most, if not all, core “house-keeping” functions. The ribo-cell hypothesis suggests that the genetic code and the translation machinery may both be born of the RNA world, and the introduction of DNA to ribo-cells may take over the informational role of RNA gradually, such as a mature set of genetic code and mechanism enabling stable inheritance of sequence and its variation. In this context, we modeled the genetic code in two content variables—GC and purine contents—of protein-coding sequences and measured the purine content sensitivities for each codon when the sensitivity (% usage) is plotted as a function of GC content variation. The analysis leads to a new pattern—the symmetric pattern—where the sensitivity of purine content variation shows diagonally symmetry in the codon table more significantly in the two GC content invariable quarters in addition to the two existing patterns where the table is divided into either four GC content sensitivity quarters or two amino acid diversity halves. The most insensitive codon sets are GUN (valine) and CAN (CAR for asparagine and CAY for aspartic acid) and the most biased amino acid is valine (always over-estimated) followed by alanine (always under-estimated). The unique position of valine and its codons suggests its key roles in the final recruitment of the complete codon set of the canonical table. The distinct choice may only be attributable to sequence signatures or signals of splice sites for spliceosomal introns shared by all extant eukaryotes. PMID:23402863
Mangrauthia, Satendra K; Malathi, P; Agarwal, Surekha; Ramkumar, G; Krishnaveni, D; Neeraja, C N; Madhav, M Sheshu; Ladhalakshmi, D; Balachandran, S M; Viraktamath, B C
2012-06-01
Rice tungro disease, one of the major constraints to rice production in South and Southeast Asia, is caused by a combination of two viruses: Rice tungro spherical virus (RTSV) and Rice tungro bacilliform virus (RTBV). The present study was undertaken to determine the genetic variation of RTSV population present in tungro endemic states of Indian subcontinent. Phylogenetic analysis based on coat protein sequences showed distinct divergence of Indian RTSV isolates into two groups; one consisted isolates from Hyderabad (Andhra Pradesh), Cuttack (Orissa), and Puducherry and another from West Bengal, Coimbatore (Tamil Nadu), and Kanyakumari (Tamil Nadu). The results obtained from phylogenetic study were further supported with the SNPs (single nucleotide polymorphism), INDELs (insertion and deletion) and evolutionary distance analysis. In addition, sequence difference count matrix revealed 2-68 nucleotides differences among all the Indian RTSV isolates taken in this study. However, at the protein level these differences were not significant as revealed by Ka/Ks ratio calculation. Sequence identity at nucleotide and amino acid level was 92-100% and 97-100%, respectively, among Indian isolates of RTSV. Understanding of the population structure of RTSV from tungro endemic regions of India would potentially provide insights into the molecular diversification of this virus.
Finnerty, J R; Block, B A
1992-06-01
We were able to differentiate between species of billfish (Istiophoridae family) and to detect considerable intraspecific variation in the blue marlin (Makaira nigricans) by directly sequencing a polymerase chain reaction (PCR)-amplified, 612-bp fragment of the mitochondrial cytochrome b gene. Thirteen variable nucleotide sites separated blue marlin (n = 26) into 7 genotypes. On average, these genotypes differed by 5.7 base substitutions. A smaller sample of swordfish from an equally broad geographic distribution displayed relatively little intraspecific variation, with an average of 1.3 substitutions separating different genotypes. A cladistic analysis of blue marlin cytochrome b variants indicates two major divergent evolutionary lines within the species. The frequencies of these two major evolutionary lines differ significantly between Atlantic and Pacific ocean basins. This finding is important given that the Atlantic stocks of blue marlin are considered endangered. Migration from the Pacific can help replenish the numbers of blue marlin in the Atlantic, but the loss of certain mitochondrial DNA haplotypes in the Atlantic due to overfishing probably could not be remedied by an influx of Pacific fish because of their absence in the Pacific population. Fishery management strategies should attempt to preserve the genetic diversity within the species. The detection of DNA sequence polymorphism indicates the utility of PCR technology in pelagic fishery genetics.
Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.
2013-01-01
Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121
What Advances Are Being Made in DNA Sequencing?
... to identify genetic variations; both methods rely on new technologies that allow rapid sequencing of large amounts of ... describes the different sequencing technologies and what the new technologies have meant for the study of the genetic ...
Bashir, Ali; Bansal, Vikas; Bafna, Vineet
2010-06-18
Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
A candidate gene for choanal atresia in alpaca.
Reed, Kent M; Bauer, Miranda M; Mendoza, Kristelle M; Armién, Aníbal G
2010-03-01
Choanal atresia (CA) is a common nasal craniofacial malformation in New World domestic camelids (alpaca and llama). CA results from abnormal development of the nasal passages and is especially debilitating to newborn crias. CA in camelids shares many of the clinical manifestations of a similar condition in humans (CHARGE syndrome). Herein we report on the regulatory gene CHD7 of alpaca, whose homologue in humans is most frequently associated with CHARGE. Sequence of the CHD7 coding region was obtained from a non-affected cria. The complete coding region was 9003 bp, corresponding to a translated amino acid sequence of 3000 aa. Additional genomic sequences corresponding to a significant portion of the CHD7 gene were identified and assembled from the 2x alpaca whole genome sequence, providing confirmatory sequence for much of the CHD7 coding region. The alpaca CHD7 mRNA sequence was 97.9% similar to the human sequence, with the greatest sequence difference being an insertion in exon 38 that results in a polyalanine repeat (A12). Polymorphism in this repeat was tested for association with CA in alpaca by cloning and sequencing the repeat from both affected and non-affected individuals. Variation in length of the poly-A repeat was not associated with CA. Complete sequencing of the CHD7 gene will be necessary to determine whether other mutations in CHD7 are the cause of CA in camelids.
Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes
2015-08-19
Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
2013-01-01
Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680
Bimolata, Waikhom; Kumar, Anirudh; Sundaram, Raman Meenakshi; Laha, Gouri Shankar; Qureshi, Insaf Ahmed; Reddy, Gajjala Ashok; Ghazi, Irfan Ahmad
2013-08-01
Xa27 is one of the important R-genes, effective against bacterial blight disease of rice caused by Xanthomonas oryzae pv. oryzae (Xoo). Using natural population of Oryza, we analyzed the sequence variation in the functionally important domains of Xa27 across the Oryza species. DNA sequences of Xa27 alleles from 27 rice accessions revealed higher nucleotide diversity among the reported R-genes of rice. Sequence polymorphism analysis revealed synonymous and non-synonymous mutations in addition to a number of InDels in non-coding regions of the gene. High sequence variation was observed in the promoter region including the 5'UTR with 'π' value 0.00916 and 'θ w ' = 0.01785. Comparative analysis of the identified Xa27 alleles with that of IRBB27 and IR24 indicated the operation of both positive selection (Ka/Ks > 1) and neutral selection (Ka/Ks ≈ 0). The genetic distances of alleles of the gene from Oryza nivara were nearer to IRBB27 as compared to IR24. We also found the presence of conserved and null UPT (upregulated by transcriptional activator) box in the isolated alleles. Considerable amino acid polymorphism was localized in the trans-membrane domain for which the functional significance is yet to be elucidated. However, the absence of functional UPT box in all the alleles except IRBB27 suggests the maintenance of single resistant allele throughout the natural population.
Merson, Samuel D.; Ouwerkerk, Diane; Gulino, Lisa-Maree; Klieve, Athol; Bonde, Robert K.; Burgess, Elizabeth A.; Lanyon, Janet M.
2014-01-01
The Florida manatee, Trichechus manatus latirostris, is a hindgut-fermenting herbivore. In winter, manatees migrate to warm water overwintering sites where they undergo dietary shifts and may suffer from cold-induced stress. Given these seasonally induced changes in diet, the present study aimed to examine variation in the hindgut bacterial communities of wild manatees overwintering at Crystal River, west Florida. Faeces were sampled from 36 manatees of known sex and body size in early winter when manatees were newly arrived and then in mid-winter and late winter when diet had probably changed and environmental stress may have increased. Concentrations of faecal cortisol metabolite, an indicator of a stress response, were measured by enzyme immunoassay. Using 454-pyrosequencing, 2027 bacterial operational taxonomic units were identified in manatee faeces following amplicon pyrosequencing of the 16S rRNA gene V3/V4 region. Classified sequences were assigned to eight previously described bacterial phyla; only 0.36% of sequences could not be classified to phylum level. Five core phyla were identified in all samples. The majority (96.8%) of sequences were classified as Firmicutes (77.3 ± 11.1% of total sequences) or Bacteroidetes (19.5 ± 10.6%). Alpha-diversity measures trended towards higher diversity of hindgut microbiota in manatees in mid-winter compared to early and late winter. Beta-diversity measures, analysed through permanova, also indicated significant differences in bacterial communities based on the season.
Zhang, Quan; Zhu, Feng; Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua
2015-01-01
Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.
Van Wyngaarden, Mallory; Snelgrove, Paul V R; DiBacco, Claudio; Hamilton, Lorraine C; Rodríguez-Ezpeleta, Naiara; Zhan, Luyao; Beiko, Robert G; Bradbury, Ian R
2018-03-01
Environmental factors can influence diversity and population structure in marine species and accurate understanding of this influence can both improve fisheries management and help predict responses to environmental change. We used 7163 SNPs derived from restriction site-associated DNA sequencing genotyped in 245 individuals of the economically important sea scallop, Placopecten magellanicus , to evaluate the correlations between oceanographic variation and a previously identified latitudinal genomic cline. Sea scallops span a broad latitudinal area (>10 degrees), and we hypothesized that climatic variation significantly drives clinal trends in allele frequency. Using a large environmental dataset, including temperature, salinity, chlorophyll a, and nutrient concentrations, we identified a suite of SNPs (285-621, depending on analysis and environmental dataset) potentially under selection through correlations with environmental variation. Principal components analysis of different outlier SNPs and environmental datasets revealed similar northern and southern clusters, with significant associations between the first axes of each ( R 2 adj = .66-.79). Multivariate redundancy analysis of outlier SNPs and the environmental principal components indicated that environmental factors explained more than 32% of the variance. Similarly, multiple linear regressions and random-forest analysis identified winter average and minimum ocean temperatures as significant parameters in the link between genetic and environmental variation. This work indicates that oceanographic variation is associated with the observed genomic cline in this species and that seasonal periods of extreme cold may restrict gene flow along a latitudinal gradient in this marine benthic bivalve. Incorporating this finding into management may improve accuracy of management strategies and future predictions.
Fluorescent signatures for variable DNA sequences
Rice, John E.; Reis, Arthur H.; Rice, Lisa M.; Carver-Brown, Rachel K.; Wangh, Lawrence J.
2012-01-01
Life abounds with genetic variations writ in sequences that are often only a few hundred nucleotides long. Rapid detection of these variations for identification of genetic diseases, pathogens and organisms has become the mainstay of molecular science and medicine. This report describes a new, highly informative closed-tube polymerase chain reaction (PCR) strategy for analysis of both known and unknown sequence variations. It combines efficient quantitative amplification of single-stranded DNA targets through LATE-PCR with sets of Lights-On/Lights-Off probes that hybridize to their target sequences over a broad temperature range. Contiguous pairs of Lights-On/Lights-Off probes of the same fluorescent color are used to scan hundreds of nucleotides for the presence of mutations. Sets of probes in different colors can be combined in the same tube to analyze even longer single-stranded targets. Each set of hybridized Lights-On/Lights-Off probes generates a composite fluorescent contour, which is mathematically converted to a sequence-specific fluorescent signature. The versatility and broad utility of this new technology is illustrated in this report by characterization of variant sequences in three different DNA targets: the rpoB gene of Mycobacterium tuberculosis, a sequence in the mitochondrial cytochrome C oxidase subunit 1 gene of nematodes and the V3 hypervariable region of the bacterial 16 s ribosomal RNA gene. We anticipate widespread use of these technologies for diagnostics, species identification and basic research. PMID:22879378
Rare variants and autoimmune disease.
Massey, Jonathan; Eyre, Steve
2014-09-01
The study of rare variants in monogenic forms of autoimmune disease has offered insight into the aetiology of more complex pathologies. Research in complex autoimmune disease initially focused on sequencing candidate genes, with some early successes, notably in uncovering low-frequency variation associated with Type 1 diabetes mellitus. However, other early examples have proved difficult to replicate, and a recent study across six autoimmune diseases, re-sequencing 25 autoimmune disease-associated genes in large sample sizes, failed to find any associated rare variants. The study of rare and low-frequency variation in autoimmune diseases has been made accessible by the inclusion of such variants on custom genotyping arrays (e.g. Immunochip and Exome arrays). Whole-exome sequencing approaches are now also being utilised to uncover the contribution of rare coding variants to disease susceptibility, severity and treatment response. Other sequencing strategies are starting to uncover the role of regulatory rare variation. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Hart, Reece K; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A
2015-01-15
Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Hart, Reece K.; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A.
2015-01-01
Summary: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Contact: reecehart@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273102
Zhang, Zhenying; Liu, Xiaoming; Lv, Xuelian; Lin, Jingrong
2011-12-01
Sporotrichosis is usually a localized, lymphocutaneous disease, but its disseminated type was rarely reported. The main objective of this study was to identify specific DNA sequence variation and virulence of a strain of Sporothrix schenckii isolated from the lesion of disseminated cutaneous sporotrichosis. We confirmed this strain to be S. schenckii by(®) tubulin and chitin synthase gene sequence analysis in addition to the routine mycological and partial ITS and NTS sequencing. We found a 10-bp deletion in the ribosomal NTS region of this strain, in reference to the sequence of control strains isolated from fixed cutaneous sporotrichosis. After inoculated into immunosuppressed mice, this strain caused more extensive system involvement and showed stronger virulence than the control strain isolated from a fixed cutaneous sporotrichosis. Our study thus suggests that different clinical manifestation of sporotrichosis may be associated with variation in genotype and virulence of the strain, independent of effects due to the immune status of the host.
Osborne, A J; Zavodna, M; Chilvers, B L; Robertson, B C; Negro, S S; Kennedy, M A; Gemmell, N J
2013-01-01
Marine mammals are often reported to possess reduced variation of major histocompatibility complex (MHC) genes compared with their terrestrial counterparts. We evaluated diversity at two MHC class II B genes, DQB and DRB, in the New Zealand sea lion (Phocarctos hookeri, NZSL) a species that has suffered high mortality owing to bacterial epizootics, using Sanger sequencing and haplotype reconstruction, together with next-generation sequencing. Despite this species' prolonged history of small population size and highly restricted distribution, we demonstrate extensive diversity at MHC DRB with 26 alleles, whereas MHC DQB is dimorphic. We identify four DRB codons, predicted to be involved in antigen binding, that are evolving under adaptive evolution. Our data suggest diversity at DRB may be maintained by balancing selection, consistent with the role of this locus as an antigen-binding region and the species' recent history of mass mortality during a series of bacterial epizootics. Phylogenetic analyses of DQB and DRB sequences from pinnipeds and other carnivores revealed significant allelic diversity, but little phylogenetic depth or structure among pinniped alleles; thus, we could neither confirm nor refute the possibility of trans-species polymorphism in this group. The phylogenetic pattern observed however, suggests some significant evolutionary constraint on these loci in the recent past, with the pattern consistent with that expected following an epizootic event. These data may help further elucidate some of the genetic factors underlying the unusually high susceptibility to bacterial infection of the threatened NZSL, and help us to better understand the extent and pattern of MHC diversity in pinnipeds. PMID:23572124
Rennick, Linda J; Duprex, W Paul; Rima, Bert K
2007-10-01
Transcription from morbillivirus genomes commences at a single promoter in the 3' non-coding terminus, with the six genes being transcribed sequentially. The 3' and 5' untranslated regions (UTRs) of the genes (mRNA sense), together with the intergenic trinucleotide spacer, comprise the non-coding sequences (NCS) of the virus and contain the conserved gene end and gene start signals, respectively. Bicistronic minigenomes containing transcription units (TUs) encoding autofluorescent reporter proteins separated by measles virus (MV) NCS were used to give a direct estimation of gene expression in single, living cells by assessing the relative amounts of each fluorescent protein in each cell. Initially, five minigenomes containing each of the MV NCS were generated. Assays were developed to determine the amount of each fluorescent protein in cells at both cell population and single-cell levels. This revealed significant variations in gene expression between cells expressing the same NCS-containing minigenome. The minigenome containing the M/F NCS produced significantly lower amounts of fluorescent protein from the second TU (TU2), compared with the other minigenomes. A minigenome with a truncated F 5' UTR had increased expression from TU2. This UTR is 524 nt longer than the other MV 5' UTRs. Insertions into the 5' UTR of the enhanced green fluorescent protein gene in the minigenome containing the N/P NCS showed that specific sequences, rather than just the additional length of F 5' UTR, govern this decreased expression from TU2.
Accuracy of abdominal auscultation for bowel obstruction
Breum, Birger Michael; Rud, Bo; Kirkegaard, Thomas; Nordentoft, Tyge
2015-01-01
AIM: To investigate the accuracy and inter-observer variation of bowel sound assessment in patients with clinically suspected bowel obstruction. METHODS: Bowel sounds were recorded in patients with suspected bowel obstruction using a Littmann® Electronic Stethoscope. The recordings were processed to yield 25-s sound sequences in random order on PCs. Observers, recruited from doctors within the department, classified the sound sequences as either normal or pathological. The reference tests for bowel obstruction were intraoperative and endoscopic findings and clinical follow up. Sensitivity and specificity were calculated for each observer and compared between junior and senior doctors. Interobserver variation was measured using the Kappa statistic. RESULTS: Bowel sound sequences from 98 patients were assessed by 53 (33 junior and 20 senior) doctors. Laparotomy was performed in 47 patients, 35 of whom had bowel obstruction. Two patients underwent colorectal stenting due to large bowel obstruction. The median sensitivity and specificity was 0.42 (range: 0.19-0.64) and 0.78 (range: 0.35-0.98), respectively. There was no significant difference in accuracy between junior and senior doctors. The median frequency with which doctors classified bowel sounds as abnormal did not differ significantly between patients with and without bowel obstruction (26% vs 23%, P = 0.08). The 53 doctors made up 1378 unique pairs and the median Kappa value was 0.29 (range: -0.15-0.66). CONCLUSION: Accuracy and inter-observer agreement was generally low. Clinical decisions in patients with possible bowel obstruction should not be based on auscultatory assessment of bowel sounds. PMID:26379407
Sequence variations of the bovine prion protein gene (PRNP) in native Korean Hanwoo cattle
Choi, Sangho
2012-01-01
Bovine spongiform encephalopathy (BSE) is one of the fatal neurodegenerative diseases known as transmissible spongiform encephalopathies (TSEs) caused by infectious prion proteins. Genetic variations correlated with susceptibility or resistance to TSE in humans and sheep have not been reported for bovine strains including those from Holstein, Jersey, and Japanese Black cattle. Here, we investigated bovine prion protein gene (PRNP) variations in Hanwoo cattle [Bos (B.) taurus coreanae], a native breed in Korea. We identified mutations and polymorphisms in the coding region of PRNP, determined their frequency, and evaluated their significance. We identified four synonymous polymorphisms and two non-synonymous mutations in PRNP, but found no novel polymorphisms. The sequence and number of octapeptide repeats were completely conserved, and the haplotype frequency of the coding region was similar to that of other B. taurus strains. When we examined the 23-bp and 12-bp insertion/deletion (indel) polymorphisms in the non-coding region of PRNP, Hanwoo cattle had a lower deletion allele and 23-bp del/12-bp del haplotype frequency than healthy and BSE-affected animals of other strains. Thus, Hanwoo are seemingly less susceptible to BSE than other strains due to the 23-bp and 12-bp indel polymorphisms. PMID:22705734
Wu, N; Qin, H; Wang, M; Bian, Y; Dong, B; Sun, G; Zhao, W; Chang, G; Xu, Q; Chen, G
2017-04-01
1. Endothelin receptor B subtype 2 (EDNRB2) is a paralog of EDNRB, which encodes a 7-transmembrane G-protein coupled receptor. Previous studies reported that EDNRB was essential for melanoblast migration in mammals and ducks. 2. Muscovy ducks have different plumage colour phenotypes. Variations in EDNRB2 coding sequences (CDSs) and mRNA expression levels were investigated in 4 different Muscovy duck plumage colour phenotypes, including black, black mutant, silver and white head. 3. The EDNRB2 gene from Muscovy duck was cloned; it had a length of 6435 bp and encoded 437 amino acids. The coding region was screened and potential single nucleotide polymorphisms were identified. Eight mutations were obtained, including one missense variant (c.64C > T) and 7 synonymous substitutions. The substitutions were associated with plumage colour phenotypes. 4. The EDNRB2 mRNA expression levels were compared between feather pulp from black birds and black mutant birds. The results indicated that EDNRB2 transcripts in feather pulp were significantly higher in black feathers than in white feathers. 5. The results determined the variation of EDNRB2 CDS and mRNA expression in Muscovy ducks of various plumage colours.
Blacket, Mark J; Malipatil, Mali B; Semeraro, Linda; Gillespie, Peter S; Dominiak, Bernie C
2017-04-01
Understanding the relationship between incursions of insect pests and established populations is critical to implementing effective control. Studies of genetic variation can provide powerful tools to examine potential invasion pathways and longevity of individual pest outbreaks. The major fruit fly pest in eastern Australia, Queensland fruit fly Bactrocera tryoni (Froggatt), has been subject to significant long-term quarantine and population reduction control measures in the major horticulture production areas of southeastern Australia, at the species southern range limit. Previous studies have employed microsatellite markers to estimate gene flow between populations across this region. In this study, we used an independent genetic marker, mitochondrial DNA (mtDNA) sequences, to screen genetic variation in established and adjacent outbreak populations in southeastern Australia. During the study period, favorable environmental conditions resulted in multiple outbreaks, which appeared genetically distinctive and relatively geographically localized, implying minimal dispersal between simultaneous outbreaks. Populations in established regions were found to occur over much larger areas. Screening mtDNA (female) lineages proved to be an effective alternative genetic tool to assist in understanding fruit fly population dynamics and provide another possible molecular method that could now be employed for better understanding of the ecology and evolution of this and other pest species.
Boufana, Belgees; Scala, Antonio; Lahmar, Samia; Pointing, Steve; Craig, Philip S; Dessì, Giorgia; Zidda, Antonella; Pipia, Anna Paola; Varcasia, Antonio
2015-11-30
Cysticercosis caused by the metacestode stage of Taenia hydatigena is endemic in Sardinia. Information on the genetic variation of this parasite is important for epidemiological studies and implementation of control programs. Using two mitochondrial genes, the cytochrome c oxidase subunit 1 (cox1) and the NADH dehydrogenase subunit 1 (ND1) we investigated the genetic variation and population structure of Cysticercus tenuicollis from Sardinian intermediate hosts and compared it to that from other hosts from various geographical regions. The parsimony cox1 network analysis indicated the existence of a common lineage for T. hydatigena and the overall diversity and neutrality indices indicated demographic expansion. Using the cox1 sequences, low pairwise fixation index (Fst) values were recorded for Sardinian, Iranian and Palestinian sheep C. tenuicollis which suggested the absence of genetic differentiation. Using the ND1 sequences, C. tenuicollis from Sardinian sheep appeared to be differentiated from those of goat and pig origin. In addition, goat C. tenuicollis were genetically different from adult T. hydatigena as indicated by the statistically significant Fst value. Our results are consistent with biochemical and morphological studies that suggest the existence of variants of T. hydatigena. Copyright © 2015 Elsevier B.V. All rights reserved.
Kalinowski, Steven T; Andrews, Tessa M; Leonard, Mary J; Snodgrass, Meagan
2012-01-01
Many students do not recognize that individual organisms within populations vary, and this may make it difficult for them to recognize the essential role variation plays in natural selection. Also, many students have weak scientific reasoning skills, and this makes it difficult for them to recognize misconceptions they might have. This paper describes a 2-h laboratory for college students that introduces them to genetic diversity and gives them practice using hypothetico-deductive reasoning. In brief, the lab presents students with DNA sequences from Africans, Europeans, and Asians, and asks students to determine whether people from each continent qualify as distinct "races." Comparison of the DNA sequences shows that people on each continent are not more similar to one another than to people on other continents, and therefore do not qualify as distinct races. Ninety-four percent of our students reported that the laboratory was interesting, and 79% reported that it was a valuable learning experience. We developed and used a survey to measure the extent to which students recognized variation and its significance within populations and showed that the lab increased student awareness of variation. We also showed that the lab improved the ability of students to construct hypothetico-deductive arguments.
The contribution of alu elements to mutagenic DNA double-strand break repair.
Morales, Maria E; White, Travis B; Streva, Vincent A; DeFreece, Cecily B; Hedges, Dale J; Deininger, Prescott L
2015-03-01
Alu elements make up the largest family of human mobile elements, numbering 1.1 million copies and comprising 11% of the human genome. As a consequence of evolution and genetic drift, Alu elements of various sequence divergence exist throughout the human genome. Alu/Alu recombination has been shown to cause approximately 0.5% of new human genetic diseases and contribute to extensive genomic structural variation. To begin understanding the molecular mechanisms leading to these rearrangements in mammalian cells, we constructed Alu/Alu recombination reporter cell lines containing Alu elements ranging in sequence divergence from 0%-30% that allow detection of both Alu/Alu recombination and large non-homologous end joining (NHEJ) deletions that range from 1.0 to 1.9 kb in size. Introduction of as little as 0.7% sequence divergence between Alu elements resulted in a significant reduction in recombination, which indicates even small degrees of sequence divergence reduce the efficiency of homology-directed DNA double-strand break (DSB) repair. Further reduction in recombination was observed in a sequence divergence-dependent manner for diverged Alu/Alu recombination constructs with up to 10% sequence divergence. With greater levels of sequence divergence (15%-30%), we observed a significant increase in DSB repair due to a shift from Alu/Alu recombination to variable-length NHEJ which removes sequence between the two Alu elements. This increase in NHEJ deletions depends on the presence of Alu sequence homeology (similar but not identical sequences). Analysis of recombination products revealed that Alu/Alu recombination junctions occur more frequently in the first 100 bp of the Alu element within our reporter assay, just as they do in genomic Alu/Alu recombination events. This is the first extensive study characterizing the influence of Alu element sequence divergence on DNA repair, which will inform predictions regarding the effect of Alu element sequence divergence on both the rate and nature of DNA repair events.
A map of human genome variation from population-scale sequencing.
Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A
2010-10-28
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
2014-01-01
Background Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. Results In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. Conclusions As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality. PMID:24755115
Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian
2016-07-12
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
Espin-Garcia, Osvaldo; Craiu, Radu V; Bull, Shelley B
2018-02-01
We evaluate two-phase designs to follow-up findings from genome-wide association study (GWAS) when the cost of regional sequencing in the entire cohort is prohibitive. We develop novel expectation-maximization-based inference under a semiparametric maximum likelihood formulation tailored for post-GWAS inference. A GWAS-SNP (where SNP is single nucleotide polymorphism) serves as a surrogate covariate in inferring association between a sequence variant and a normally distributed quantitative trait (QT). We assess test validity and quantify efficiency and power of joint QT-SNP-dependent sampling and analysis under alternative sample allocations by simulations. Joint allocation balanced on SNP genotype and extreme-QT strata yields significant power improvements compared to marginal QT- or SNP-based allocations. We illustrate the proposed method and evaluate the sensitivity of sample allocation to sampling variation using data from a sequencing study of systolic blood pressure. © 2017 The Authors. Genetic Epidemiology Published by Wiley Periodicals, Inc.
Matsuo, Kumihiro; Tanahashi, Yusuke; Mukai, Tokuo; Suzuki, Shigeru; Tajima, Toshihiro; Azuma, Hiroshi; Fujieda, Kenji
2016-07-01
Dual oxidase 2 (DUOX2) mutations are a cause of dyshormonogenesis (DH) and have been identified in patients with permanent congenital hypothyroidism (PH) and with transient hypothyroidism (TH). We aimed to elucidate the prevalence and phenotypical variations of DUOX2 mutations. Forty-eight Japanese DH patients were enroled and analysed for sequence variants of DUOX2, DUOXA2, and TPO using polymerase chain reaction-amplified direct sequencing. Fourteen sequence variants of DUOX2, including 10 novel variants, were identified in 11 patients. DUOX2 variants were more prevalent (11/48, 22.9%) than TPO (3/48, 6.3%) (p=0.020). The prevalence of DUOX2 variants in TH was slightly, but not significantly, higher than in PH. Furthermore, one patient had digenic heterozygous sequence variants of both DUOX2 and TPO. Our results suggest that DUOX2 mutations might be the most common cause of both PH and TH, and that phenotypes of these mutations might be milder than those of other causes.
Combining stress transfer and source directivity: the case of the 2012 Emilia seismic sequence
Convertito, Vincenzo; Catalli, Flaminia; Emolo, Antonio
2013-01-01
The Emilia seismic sequence (Northern Italy) started on May 2012 and caused 17 casualties, severe damage to dwellings and forced the closure of several factories. The total number of events recorded in one month was about 2100, with local magnitude ranging between 1.0 and 5.9. We investigate potential mechanisms (static and dynamic triggering) that may describe the evolution of the sequence. We consider rupture directivity in the dynamic strain field and observe that, for each main earthquake, its aftershocks and the subsequent large event occurred in an area characterized by higher dynamic strains and corresponding to the dominant rupture direction. We find that static stress redistribution alone is not capable of explaining the locations of subsequent events. We conclude that dynamic triggering played a significant role in driving the sequence. This triggering was also associated with a variation in permeability and a pore pressure increase in an area characterized by a massive presence of fluids. PMID:24177982
Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai
2017-01-01
Foxtail millet (Setaria italica) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. PMID:28364039
Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai
2017-05-05
Foxtail millet ( Setaria italica ) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. Copyright © 2017 Zhang et al.
A reference human genome dataset of the BGISEQ-500 sequencer.
Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian
2017-05-01
BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. © The Authors 2017. Published by Oxford University Press.
Sequence variation in SORL1 and Dementia risk in Swedes
Reynolds, Chandra A.; Hong, Mun-Gwan; Eriksson, Ulrika K.; Blennow, Kaj; Johansson, Boo; Malmberg, Bo; Berg, Stig; Gatz, Margaret; Pedersen, Nancy L.; Bennet, Anna M.; Prince, Jonathan A.
2010-01-01
The gene encoding the neuronal sortilin-related receptor SORL1 has been claimed to be associated with Alzheimer Disease by independent groups and across various human populations. We evaluated six genetic markers in SORL1 in a sample of 1558 Swedish dementia cases (including 1270 Alzheimer disease cases) and 2179 controls. For both single marker and haplotype-based analyses we found no strong support for SORL1 as a dementia- or AD-risk modifying gene in our sample in isolation, nor did we observe association with AD/dementia-related traits, including CSF β-amyloid1–42, tau levels, or age-at-onset. However, meta-analyses of markers in this study together with previously published studies on SORL1 encompassing in excess of 13,000 individuals does suggest significant association with AD (best OR 1.097; 95% CI 1.038–1.158, p = 0.001). All six markers were significant in meta-analyses and it is notable that they occur in two distinct LD blocks. These data are consistent with either allelic heterogeneity or the existence of as yet untested functional variants and these will be important considerations in further attempts to evaluate the importance of sequence variation in SORL1 with AD risk. PMID:19653016
Ruttanajit, Tida; Chanchamroen, Sujin; Cram, David S; Sawakwongpra, Kritchakorn; Suksalak, Wanwisa; Leng, Xue; Fan, Junmei; Wang, Li; Yao, Yuanqing; Quangkananurug, Wiwat
2016-02-01
Currently, our understanding of the nature and reproductive potential of blastocysts associated with trophectoderm (TE) lineage chromosomal mosaicism is limited. The objective of this study was to first validate copy number variation sequencing (CNV-Seq) for measuring the level of mosaicism and second, examine the nature and level of mosaicism in TE biopsies of patient's blastocysts. TE biopy samples were analysed by array comparative genomic hybridization (CGH) and CNV-Seq to discriminate between euploid, aneuploid and mosaic blastocysts. Using artificial models of TE mosaicism for five different chromosomes, CNV-Seq accurately and reproducibly quantitated mosaicism at levels of 50% and 20%. In a comparative 24-chromosome study of 49 blastocysts by array CGH and CNV-Seq, 43 blastocysts (87.8%) had a concordant diagnosis and 6 blastocysts (12.2%) were discordant. The discordance was attributed to low to medium levels of chromosomal mosaicism (30-70%) not detected by array CGH. In an expanded study of 399 blastocysts using CNV-Seq as the sole diagnostic method, the proportion of diploid-aneuploid mosaics (34, 8.5%) was significantly higher than aneuploid mosaics (18, 4.5%) (p < 0.02). Mosaicism is a significant chromosomal abnormality associated with the TE lineage of human blastocysts that can be reliably and accurately detected by CNV-Seq. © 2015 John Wiley & Sons, Ltd.
Cho, Anna; Seong, Moon-Woo; Lim, Byung Chan; Lee, Hwa Jeen; Byeon, Jung Hye; Kim, Seung Soo; Kim, Soo Yeon; Choi, Sun Ah; Wong, Ai-Lynn; Lee, Jeongho; Kim, Jon Soo; Ryu, Hye Won; Lee, Jin Sook; Kim, Hunmin; Hwang, Hee; Choi, Ji Eun; Kim, Ki Joong; Hwang, Young Seung; Hong, Ki Ho; Park, Seungman; Cho, Sung Im; Lee, Seung Jun; Park, Hyunwoong; Seo, Soo Hyun; Park, Sung Sup; Chae, Jong Hee
2017-05-01
Duchenne and Becker muscular dystrophies (DMD and BMD) are allelic X-linked recessive muscle diseases caused by mutations in the large and complex dystrophin gene. We analyzed the dystrophin gene in 507 Korean DMD/BMD patients by multiple ligation-dependent probe amplification and direct sequencing. Overall, 117 different deletions, 48 duplications, and 90 pathogenic sequence variations, including 30 novel variations, were identified. Deletions and duplications accounted for 65.4% and 13.3% of Korean dystrophinopathy, respectively, suggesting that the incidence of large rearrangements in dystrophin is similar among different ethnic groups. We also detected sequence variations in >100 probands. The small variations were dispersed across the whole gene, and 12.3% were nonsense mutations. Precise genetic characterization in patients with DMD/BMD is timely and important for implementing nationwide registration systems and future molecular therapeutic trials in Korea and globally. Muscle Nerve 55: 727-734, 2017. © 2016 Wiley Periodicals, Inc.
Genetic variations of the SLCO1B1 gene in the Chinese, Malay and Indian populations of Singapore.
Ho, Woon Fei; Koo, Seok Hwee; Yee, Jie Yin; Lee, Edmund Jon Deoon
2008-01-01
OATP1B1 is a liver-specific transporter that mediates the uptake of various endogenous and exogenous compounds including many clinically used drugs from blood into hepatocytes. This study aims to identify genetic variations of SLCO1B1 gene in three distinct ethnic groups of the Singaporean population (n=288). The coding region of the gene encoding the transporter protein was screened for genetic variations in the study population by denaturing high-performance liquid chromatography and DNA sequencing. Twenty-five genetic variations of SLCO1B1, including 10 novel ones, were found: 13 in the coding exons (9 nonsynonymous and 4 synonymous variations), 6 in the introns, and 6 in the 3' untranslated region. Four novel nonsynonymous variations: 633A>G (Ile211Met), 875C>T (Ala292Val), 1837T>C (Cys613Arg), and 1877T>A (Leu626Stop) were detected as heterozygotes. Among the novel nonsynonymous variations, 633A>G, 1837T>C, and 1877T>A were predicted to be functionally significant. These data would provide fundamental and useful information for pharmacogenetic studies on drugs that are substrates of OATP1B1 in Asians.
Jackson, J.V.; Talbot, S.L.; Farley, S.
2008-01-01
We collected data from 20 biparentally inherited microsatellite loci, and nucleotide sequence from the maternally inherited mitochondrial DNA (mtDNA) control region, to determine levels of genetic variation of the brown bears (Ursus arctos L., 1758) of the Kenai Peninsula, south central Alaska. Nuclear genetic variation was similar to that observed in other Alaskan peninsular populations. We detected no significant inbreeding and found no evidence of population substructuring on the Kenai Peninsula. We observed a genetic signature of a bottleneck under the infinite alleles model (IAM), but not under the stepwise mutation model (SMM) or the two-phase model (TPM) of microsatellite mutation. Kenai brown bears have lower levels of mtDNA haplotypic diversity relative to most other brown bear populations in Alaska. ?? 2008 NRC.
Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja
2014-01-01
Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu.
Homburger, Julian R.; Green, Eric M.; Caleshu, Colleen; Sunitha, Margaret S.; Taylor, Rebecca E.; Ruppel, Kathleen M.; Metpally, Raghu Prasad Rao; Colan, Steven D.; Michels, Michelle; Day, Sharlene M.; Olivotto, Iacopo; Bustamante, Carlos D.; Dewey, Frederick E.; Ho, Carolyn Y.; Spudich, James A.; Ashley, Euan A.
2016-01-01
Myosin motors are the fundamental force-generating elements of muscle contraction. Variation in the human β-cardiac myosin heavy chain gene (MYH7) can lead to hypertrophic cardiomyopathy (HCM), a heritable disease characterized by cardiac hypertrophy, heart failure, and sudden cardiac death. How specific myosin variants alter motor function or clinical expression of disease remains incompletely understood. Here, we combine structural models of myosin from multiple stages of its chemomechanical cycle, exome sequencing data from two population cohorts of 60,706 and 42,930 individuals, and genetic and phenotypic data from 2,913 patients with HCM to identify regions of disease enrichment within β-cardiac myosin. We first developed computational models of the human β-cardiac myosin protein before and after the myosin power stroke. Then, using a spatial scan statistic modified to analyze genetic variation in protein 3D space, we found significant enrichment of disease-associated variants in the converter, a kinetic domain that transduces force from the catalytic domain to the lever arm to accomplish the power stroke. Focusing our analysis on surface-exposed residues, we identified a larger region significantly enriched for disease-associated variants that contains both the converter domain and residues on a single flat surface on the myosin head described as the myosin mesa. Notably, patients with HCM with variants in the enriched regions have earlier disease onset than patients who have HCM with variants elsewhere. Our study provides a model for integrating protein structure, large-scale genetic sequencing, and detailed phenotypic data to reveal insight into time-shifted protein structures and genetic disease. PMID:27247418
Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.
Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric
2016-08-30
Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.
Grobler, P.J.; Jones, J.W.; Johnson, N.A.; Beaty, B.; Struthers, J.; Neves, R.J.; Hallerman, E.M.
2006-01-01
The restoration and recovery of imperiled mussel species will require the re-establishment of populations into historically occupied habitats. The possible existence of genetic differentiation among populations should be considered before inter-basin transfers are made. Eighty individuals of the federal candidate species Lexingtonia dolabelloides were sampled from populations in the North Fork Holston, Middle Fork Holston, Clinch, Paint Rock and Duck rivers of the Tennessee River basin in the southeastern United States. We sequenced 603 base-pairs of a mitochondrial DNA gene (ND-1) and 512 base-pairs of a nuclear DNA gene (ITS-1). Analyses of molecular variation (AMOVA) values for both genes indicated that the majority of variation in L. dolabelloides resided within populations (82.9-88.3%), with 11.7-17.1% of variation among populations. Haplotype frequencies differed significantly among populations for both genes sequenced. Clustering of haplotypes in minimum-spanning networks did not conform stringently to population boundaries, reflecting high within-population and low between-population variability. Maximum parsimony analysis did not identify any population as a monophyletic lineage. A Mantel test showed no significant correlation between geographical stream distance and genetic distance, thus not supporting a pattern of isolation-by-distance. Overall, results provided support to manage fragmented populations of L. dolabelloides in the Tennessee River drainage as two management units (MUs), but did not provide evidence for the existence of ESUs following published molecular criteria. ?? The Author 2005. Published by Oxford University Studies on behalf of The Malacological Society of London, all rights reserved.
Ayllon, Fernando; Kjærner-Semb, Erik; Furmanek, Tomasz; Wennevik, Vidar; Solberg, Monica F; Dahle, Geir; Taranger, Geir Lasse; Glover, Kevin A; Almén, Markus Sällman; Rubin, Carl J; Edvardsen, Rolf B; Wargelius, Anna
2015-11-01
Wild and domesticated Atlantic salmon males display large variation for sea age at sexual maturation, which varies between 1-5 years. Previous studies have uncovered a genetic predisposition for variation of age at maturity with moderate heritability, thus suggesting a polygenic or complex nature of this trait. The aim of this study was to identify associated genetic loci, genes and ultimately specific sequence variants conferring sea age at maturity in salmon. We performed a genome wide association study (GWAS) using a pool sequencing approach (20 individuals per river and phenotype) of male salmon returning to rivers as sexually mature either after one sea winter (2009) or three sea winters (2011) in six rivers in Norway. The study revealed one major selective sweep, which covered 76 significant SNPs in which 74 were found in a 370 kb region of chromosome 25. Genotyping other smolt year classes of wild and domesticated salmon confirmed this finding. Genotyping domesticated fish narrowed the haplotype region to four SNPs covering 2386 bp, containing the vgll3 gene, including two missense mutations explaining 33-36% phenotypic variation. A single locus was found to have a highly significant role in governing sea age at maturation in this species. The SNPs identified may be both used as markers to guide breeding for late maturity in salmon aquaculture and in monitoring programs of wild salmon. Interestingly, a SNP in proximity of the VGLL3 gene in humans (Homo sapiens), has previously been linked to age at puberty suggesting a conserved mechanism for timing of puberty in vertebrates.
Wang, Mi
2017-01-01
Abstract Polymorphism in cis-regulatory sequences can lead to different levels of expression for the two alleles of a gene, providing a starting point for the evolution of gene expression. Little is known about the genome-wide abundance of genetic variation in gene regulation in natural populations but analysis of allele-specific expression (ASE) provides a means for investigating such variation. We performed RNA-seq of multiple tissues from population samples of two closely related flycatcher species and developed a Bayesian algorithm that maximizes data usage by borrowing information from the whole data set and combines several SNPs per transcript to detect ASE. Of 2,576 transcripts analyzed in collared flycatcher, ASE was detected in 185 (7.2%) and a similar frequency was seen in the pied flycatcher. Transcripts with statistically significant ASE commonly showed the major allele in >90% of the reads, reflecting that power was highest when expression was heavily biased toward one of the alleles. This would suggest that the observed frequencies of ASE likely are underestimates. The proportion of ASE transcripts varied among tissues, being lowest in testis and highest in muscle. Individuals often showed ASE of particular transcripts in more than one tissue (73.4%), consistent with a genetic basis for regulation of gene expression. The results suggest that genetic variation in regulatory sequences commonly affects gene expression in natural populations and that it provides a seedbed for phenotypic evolution via divergence in gene expression. PMID:28453623
Plasmodium copy number variation scan: gene copy numbers evaluation in haploid genomes.
Beghain, Johann; Langlois, Anne-Claire; Legrand, Eric; Grange, Laura; Khim, Nimol; Witkowski, Benoit; Duru, Valentine; Ma, Laurence; Bouchier, Christiane; Ménard, Didier; Paul, Richard E; Ariey, Frédéric
2016-04-12
In eukaryotic genomes, deletion or amplification rates have been estimated to be a thousand more frequent than single nucleotide variation. In Plasmodium falciparum, relatively few transcription factors have been identified, and the regulation of transcription is seemingly largely influenced by gene amplification events. Thus copy number variation (CNV) is a major mechanism enabling parasite genomes to adapt to new environmental changes. Currently, the detection of CNVs is based on quantitative PCR (qPCR), which is significantly limited by the relatively small number of genes that can be analysed at any one time. Technological advances that facilitate whole-genome sequencing, such as next generation sequencing (NGS) enable deeper analyses of the genomic variation to be performed. Because the characteristics of Plasmodium CNVs need special consideration in algorithms and strategies for which classical CNV detection programs are not suited a dedicated algorithm to detect CNVs across the entire exome of P. falciparum was developed. This algorithm is based on a custom read depth strategy through NGS data and called PlasmoCNVScan. The analysis of CNV identification on three genes known to have different levels of amplification and which are located either in the nuclear, apicoplast or mitochondrial genomes is presented. The results are correlated with the qPCR experiments, usually used for identification of locus specific amplification/deletion. This tool will facilitate the study of P. falciparum genomic adaptation in response to ecological changes: drug pressure, decreased transmission, reduction of the parasite population size (transition to pre-elimination endemic area).
Gourraud, P A; Karaouni, A; Woo, J M; Schmidt, T; Oksenberg, J R; Hecht, F M; Liegler, T J; Barbour, J D
2011-03-01
We examined single nucleotide polymorphisms (SNP) in the APOBEC3 locus on chromosome 22, paired with population sequences of pro-viral human immunodeficiency virus-1 (HIV-1) vif from peripheral blood mononuclear cells, from 96 recently HIV-1-infected treatment-naive adults. We found evidence for the existence of an APOBEC3H linkage disequilibrium (LD) block associated with variation in GA → AA, or APOBEC3F/H signature, sequence changes in pro-viral HIV-1 vif sequence (top 10 significant SNPs with a significant p = 4.8 × 10(-3)). We identified a common five position risk haplotype distal to APOBEC3H (A3Hrh). These markers were in high LD (D' = 1; r(2) = 0.98) to a previously described A3H "RED" haplotype containing a variant (E121) with enhanced susceptibility to HIV-1 Vif. This association was confirmed by a haplotype analysis. Homozygote carriers of the A3Hrh had lower GA->AA (A3F/H) sequence editing upon pro-viral HIV-1 vif sequence (p = 0.01), and lower HIV-1 RNA levels over time during early, untreated HIV-1 infection, (p = 0.015 mixed effects model). This effect may be due to enhanced susceptibility of A3H forms to HIV-1 Vif mediated viral suppression of sequence editing activity, slowing viral diversification and escape from immune responses. Copyright © 2011 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Assembly-history dynamics of a pitcher-plant protozoan community in experimental microcosms.
Kadowaki, Kohmei; Inouye, Brian D; Miller, Thomas E
2012-01-01
History drives community assembly through differences both in density (density effects) and in the sequence in which species arrive (sequence effects). Density effects arise from predictable population dynamics, which are free of history, but sequence effects are due to a density-free mechanism, arising solely from the order and timing of immigration events. Few studies have determined how components of immigration history (timing, number of individuals, frequency) alter local dynamics to determine community assembly, beyond addressing when immigration history produces historically contingent assembly. We varied density and sequence effects independently in a two-way factorial design to follow community assembly in a three-species aquatic protozoan community. A superior competitor, Colpoda steinii, mediated alternative community states; early arrival or high introduction density allowed this species to outcompete or suppress the other competitors (Poterioochromonas malhamensis and Eimeriidae gen. sp.). Multivariate analysis showed that density effects caused greater variation in community states, whereas sequence effects altered the mean community composition. A significant interaction between density and sequence effects suggests that we should refine our understanding of priority effects. These results highlight a practical need to understand not only the "ingredients" (species) in ecological communities but their "recipes" as well.
Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.
Sharma, S; Raina, S N
2005-01-01
A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.
Singh, Satyendra K; Prasad, Kashi N; Singh, Aloukick K; Gupta, Kamlesh K; Chauhan, Ranjeet S; Singh, Amrita; Singh, Avinash; Rai, Ravi P; Pati, Binod K
2016-10-01
Taenia solium is the major cause of taeniasis and cysticercosis/neurocysticercosis (NCC) in the developing countries including India, but the existence of other Taenia species and genetic variation have not been studied in India. So, we studied the existence of different Taenia species, and sequence variation in Taenia isolates from human (proglottids and cysticerci) and swine (cysticerci) in North India. Amplification of cytochrome c oxidase subunit 1 gene (cox1) was done by polymerase chain reaction (PCR) followed by sequencing and phylogenetic analysis. We identified two species of Taenia i.e. T. solium and Taenia asiatica in our isolates. T. solium isolates showed similarity with Asian genotype and nucleotide variations from 0.25 to 1.01 %, whereas T. asiatica displayed nucleotide variations ranged from 0.25 to 0.5 %. These findings displayed the minimal genetic variations in North Indian isolates of T. solium and T. asiatica.
Understanding the mechanisms of protein-DNA interactions
NASA Astrophysics Data System (ADS)
Lavery, Richard
2004-03-01
Structural, biochemical and thermodynamic data on protein-DNA interactions show that specific recognition cannot be reduced to a simple set of binary interactions between the partners (such as hydrogen bonds, ion pairs or steric contacts). The mechanical properties of the partners also play a role and, in the case of DNA, variations in both conformation and flexibility as a function of base sequence can be a significant factor in guiding a protein to the correct binding site. All-atom molecular modeling offers a means of analyzing the role of different binding mechanisms within protein-DNA complexes of known structure. This however requires estimating the binding strengths for the full range of sequences with which a given protein can interact. Since this number grows exponentially with the length of the binding site it is necessary to find a method to accelerate the calculations. We have achieved this by using a multi-copy approach (ADAPT) which allows us to build a DNA fragment with a variable base sequence. The results obtained with this method correlate well with experimental consensus binding sequences. They enable us to show that indirect recognition mechanisms involving the sequence dependent properties of DNA play a significant role in many complexes. This approach also offers a means of predicting protein binding sites on the basis of binding energies, which is complementary to conventional lexical techniques.
Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.
Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao
2018-05-01
STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Küpper, Clemens; Burke, Terry; Lank, David B.
2015-01-01
Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species. PMID:25534935
An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.
Zare, Fatima; Dow, Michelle; Monteleone, Nicholas; Hosny, Abdelrahman; Nabavi, Sheida
2017-05-31
Recently copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. Recently whole exome sequencing (WES) has become primary strategy for sequencing patient samples and study their genomics aberrations. However, compared to whole genome sequencing, WES introduces more biases and noise that make CNV detection very challenging. Additionally, tumors' complexity makes the detection of cancer specific CNVs even more difficult. Although many CNV detection tools have been developed since introducing NGS data, there are few tools for somatic CNV detection for WES data in cancer. In this study, we evaluated the performance of the most recent and commonly used CNV detection tools for WES data in cancer to address their limitations and provide guidelines for developing new ones. We focused on the tools that have been designed or have the ability to detect cancer somatic aberrations. We compared the performance of the tools in terms of sensitivity and false discovery rate (FDR) using real data and simulated data. Comparative analysis of the results of the tools showed that there is a low consensus among the tools in calling CNVs. Using real data, tools show moderate sensitivity (~50% - ~80%), fair specificity (~70% - ~94%) and poor FDRs (~27% - ~60%). Also, using simulated data we observed that increasing the coverage more than 10× in exonic regions does not improve the detection power of the tools significantly. The limited performance of the current CNV detection tools for WES data in cancer indicates the need for developing more efficient and precise CNV detection methods. Due to the complexity of tumors and high level of noise and biases in WES data, employing advanced novel segmentation, normalization and de-noising techniques that are designed specifically for cancer data is necessary. Also, CNV detection development suffers from the lack of a gold standard for performance evaluation. Finally, developing tools with user-friendly user interfaces and visualization features can enhance CNV studies for a broader range of users.
2013-01-01
Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482
Mutsaerts, Henri J M M; van Osch, Matthias J P; Zelaya, Fernando O; Wang, Danny J J; Nordhøy, Wibeke; Wang, Yi; Wastling, Stephen; Fernandez-Seara, Maria A; Petersen, E T; Pizzini, Francesca B; Fallatah, Sameeha; Hendrikse, Jeroen; Geier, Oliver; Günther, Matthias; Golay, Xavier; Nederveen, Aart J; Bjørnerud, Atle; Groote, Inge R
2015-06-01
A main obstacle that impedes standardized clinical and research applications of arterial spin labeling (ASL), is the substantial differences between the commercial implementations of ASL from major MRI vendors. In this study, we compare a single identical 2D gradient-echo EPI pseudo-continuous ASL (PCASL) sequence implemented on 3T scanners from three vendors (General Electric Healthcare, Philips Healthcare and Siemens Healthcare) within the same center and with the same subjects. Fourteen healthy volunteers (50% male, age 26.4±4.7years) were scanned twice on each scanner in an interleaved manner within 3h. Because of differences in gradient and coil specifications, two separate studies were performed with slightly different sequence parameters, with one scanner used across both studies for comparison. Reproducibility was evaluated by means of quantitative cerebral blood flow (CBF) agreement and inter-session variation, both on a region-of-interest (ROI) and voxel level. In addition, a qualitative similarity comparison of the CBF maps was performed by three experienced neuro-radiologists. There were no CBF differences between vendors in study 1 (p>0.1), but there were CBF differences of 2-19% between vendors in study 2 (p<0.001 in most gray matter ROIs) and 10-22% difference in CBF values obtained with the same vendor between studies (p<0.001 in most gray matter ROIs). The inter-vendor inter-session variation was not significantly larger than the intra-vendor variation in all (p>0.1) but one of the ROIs (p<0.001). This study demonstrates the possibility to acquire comparable cerebral CBF maps on scanners of different vendors. Small differences in sequence parameters can have a larger effect on the reproducibility of ASL than hardware or software differences between vendors. These results suggest that researchers should strive to employ identical labeling and readout strategies in multi-center ASL studies. Copyright © 2015 Elsevier Inc. All rights reserved.
Albayrak, Levent; Khanipov, Kamil; Pimenova, Maria; Golovko, George; Rojas, Mark; Pavlidis, Ioannis; Chumakov, Sergei; Aguilar, Gerardo; Chávez, Arturo; Widger, William R; Fofanov, Yuriy
2016-12-12
Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency ≤ 1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.
Cappi, C; Brentani, H; Lima, L; Sanders, S J; Zai, G; Diniz, B J; Reis, V N S; Hounie, A G; Conceição do Rosário, M; Mariani, D; Requena, G L; Puga, R; Souza-Duran, F L; Shavitt, R G; Pauls, D L; Miguel, E C; Fernandez, T V
2016-01-01
Studies of rare genetic variation have identified molecular pathways conferring risk for developmental neuropsychiatric disorders. To date, no published whole-exome sequencing studies have been reported in obsessive-compulsive disorder (OCD). We sequenced all the genome coding regions in 20 sporadic OCD cases and their unaffected parents to identify rare de novo (DN) single-nucleotide variants (SNVs). The primary aim of this pilot study was to determine whether DN variation contributes to OCD risk. To this aim, we evaluated whether there is an elevated rate of DN mutations in OCD, which would justify this approach toward gene discovery in larger studies of the disorder. Furthermore, to explore functional molecular correlations among genes with nonsynonymous DN SNVs in OCD probands, a protein–protein interaction (PPI) network was generated based on databases of direct molecular interactions. We applied Degree-Aware Disease Gene Prioritization (DADA) to rank the PPI network genes based on their relatedness to a set of OCD candidate genes from two OCD genome-wide association studies (Stewart et al., 2013; Mattheisen et al., 2014). In addition, we performed a pathway analysis with genes from the PPI network. The rate of DN SNVs in OCD was 2.51 × 10−8 per base per generation, significantly higher than a previous estimated rate in unaffected subjects using the same sequencing platform and analytic pipeline. Several genes harboring DN SNVs in OCD were highly interconnected in the PPI network and ranked high in the DADA analysis. Nearly all the DN SNVs in this study are in genes expressed in the human brain, and a pathway analysis revealed enrichment in immunological and central nervous system functioning and development. The results of this pilot study indicate that further investigation of DN variation in larger OCD cohorts is warranted to identify specific risk genes and to confirm our preliminary finding with regard to PPI network enrichment for particular biological pathways and functions. PMID:27023170
He, Weiguo; Qin, Qinbo; Liu, Shaojun; Li, Tangluo; Wang, Jing; Xiao, Jun; Xie, Lihua; Zhang, Chun; Liu, Yun
2012-01-01
Through distant crossing, diploid, triploid and tetraploid hybrids of red crucian carp (Carassius auratus red var., RCC♀, Cyprininae, 2n = 100) × topmouth culter (Erythroculter ilishaeformis Bleeker, TC♂, Cultrinae, 2n = 48) were successfully produced. Diploid hybrids possessed 74 chromosomes with one set from RCC and one set from TC; triploid hybrids harbored 124 chromosomes with two sets from RCC and one set from TC; tetraploid hybrids had 148 chromosomes with two sets from RCC and two sets from TC. The 5S rDNA of the three different ploidy-level hybrids and their parents were sequenced and analyzed. There were three monomeric 5S rDNA classes (designated class I: 203 bp; class II: 340 bp; and class III: 477 bp) in RCC and two monomeric 5S rDNA classes (designated class IV: 188 bp, and class V: 286 bp) in TC. In the hybrid offspring, diploid hybrids inherited three 5S rDNA classes from their female parent (RCC) and only class IV from their male parent (TC). Triploid hybrids inherited class II and class III from their female parent (RCC) and class IV from their male parent (TC). Tetraploid hybrids gained class II and class III from their female parent (RCC), and generated a new 5S rDNA sequence (designated class I-N). The specific paternal 5S rDNA sequence of class V was not found in the hybrid offspring. Sequence analysis of 5S rDNA revealed the influence of hybridization and polyploidization on the organization and variation of 5S rDNA in fish. This is the first report on the coexistence in vertebrates of viable diploid, triploid and tetraploid hybrids produced by crossing parents with different chromosome numbers, and these new hybrids are novel specimens for studying the genomic variation in the first generation of interspecific hybrids, which has significance for evolution and fish genetics.
Blanco-Marchite, Cristina; Sánchez-Sánchez, Francisco; López-Garrido, María-Pilar; Iñigez-de-Onzoño, Mercedes; López-Martínez, Francisco; López-Sánchez, Enrique; Alvarez, Lydia; Rodríguez-Calvo, Pedro-Pablo; Méndez-Hernández, Carmen; Fernández-Vega, Luis; García-Sánchez, Julián; Coca-Prados, Miguel; García-Feijoo, Julián
2011-01-01
Purpose. To investigate the role of WDR36 and P53 sequence variations in POAG susceptibility. Methods. The authors performed a case-control genetic association study in 268 unrelated Spanish patients (POAG1) and 380 control subjects matched for sex, age, and ethnicity. WDR36 sequence variations were screened by either direct DNA sequencing or denaturing high-performance liquid chromatography. P53 polymorphisms p.R72P and c.97–147ins16bp were analyzed by single-nucleotide polymorphism (SNP) genotyping and PCR, respectively. Positive SNP and haplotype associations were reanalyzed in a second sample of 211 patients and in combined cases (n = 479). Results. The authors identified almost 50 WDR36 sequence variations, of which approximately two-thirds were rare and one-third were polymorphisms. Approximately half the variants were novel. Eight patients (2.9%) carried rare mutations that were not identified in the control group (P = 0.001). Six Tag SNPs were expected to be structured in three common haplotypes. Haplotype H2 was consistently associated with the disease (P = 0.0024 in combined cases). According to a dominant model, genotypes containing allele P of the P53 p.R72P SNP slightly increased glaucoma risk. Glaucoma susceptibility associated with different WDR36 genotypes also increased significantly in combination with the P53 RP risk genotype, indicating the existence of a genetic interaction. For instance, the OR of the H2 diplotype estimated for POAG1 and combined cases rose approximately 1.6 times in the two-locus genotype H2/RP. Conclusions. Rare WDR36 variants and the P53 p.R72P polymorphism behaved as moderate glaucoma risk factors in Spanish patients. The authors provide evidence for a genetic interaction between WDR36 and P53 variants in POAG susceptibility, although this finding must be confirmed in other populations. PMID:21931130
Machczyńska, Joanna; Zimny, Janusz; Bednarek, Piotr Tomasz
2015-10-01
Plant regeneration via in vitro culture can induce genetic and epigenetic variation; however, the extent of such changes in triticale is not yet understood. In the present study, metAFLP, a variation of methylation-sensitive amplified fragment length polymorphism analysis, was used to investigate tissue culture-induced variation in triticale regenerants derived from four distinct genotypes using androgenesis and somatic embryogenesis. The metAFLP technique enabled identification of both sequence and DNA methylation pattern changes in a single experiment. Moreover, it was possible to quantify subtle effects such as sequence variation, demethylation, and de novo methylation, which affected 19, 5.5, 4.5% of sites, respectively. Comparison of variation in different genotypes and with different in vitro regeneration approaches demonstrated that both the culture technique and genetic background of donor plants affected tissue culture-induced variation. The results showed that the metAFLP approach could be used for quantification of tissue culture-induced variation and provided direct evidence that in vitro plant regeneration could cause genetic and epigenetic variation.
Wu, Gary D; Lewis, James D; Hoffmann, Christian; Chen, Ying-Yu; Knight, Rob; Bittinger, Kyle; Hwang, Jennifer; Chen, Jun; Berkowsky, Ronald; Nessel, Lisa; Li, Hongzhe; Bushman, Frederic D
2010-07-30
Intense interest centers on the role of the human gut microbiome in health and disease, but optimal methods for analysis are still under development. Here we present a study of methods for surveying bacterial communities in human feces using 454/Roche pyrosequencing of 16S rRNA gene tags. We analyzed fecal samples from 10 individuals and compared methods for storage, DNA purification and sequence acquisition. To assess reproducibility, we compared samples one cm apart on a single stool specimen for each individual. To analyze storage methods, we compared 1) immediate freezing at -80 degrees C, 2) storage on ice for 24 or 3) 48 hours. For DNA purification methods, we tested three commercial kits and bead beating in hot phenol. Variations due to the different methodologies were compared to variation among individuals using two approaches--one based on presence-absence information for bacterial taxa (unweighted UniFrac) and the other taking into account their relative abundance (weighted UniFrac). In the unweighted analysis relatively little variation was associated with the different analytical procedures, and variation between individuals predominated. In the weighted analysis considerable variation was associated with the purification methods. Particularly notable was improved recovery of Firmicutes sequences using the hot phenol method. We also carried out surveys of the effects of different 454 sequencing methods (FLX versus Titanium) and amplification of different 16S rRNA variable gene segments. Based on our findings we present recommendations for protocols to collect, process and sequence bacterial 16S rDNA from fecal samples--some major points are 1) if feasible, bead-beating in hot phenol or use of the PSP kit improves recovery; 2) storage methods can be adjusted based on experimental convenience; 3) unweighted (presence-absence) comparisons are less affected by lysis method.
Salleh, Mohd Zaki; Teh, Lay Kek; Lee, Lian Shien; Ismet, Rose Iszati; Patowary, Ashok; Joshi, Kandarp; Pasha, Ayesha; Ahmed, Azni Zain; Janor, Roziah Mohd; Hamzah, Ahmad Sazali; Adam, Aishah; Yusoff, Khalid; Hoh, Boon Peng; Hatta, Fazleen Haslinda Mohd; Ismail, Mohamad Izwan; Scaria, Vinod; Sivasubbu, Sridhar
2013-01-01
With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.
Morrison, Cheryl L; Iwanowicz, Luke; Work, Thierry M; Fahsbender, Elizabeth; Breitbart, Mya; Adams, Cynthia; Iwanowicz, Deb; Sanders, Lakyn; Ackermann, Mathias; Cornman, Robert S
2018-01-01
Chelonid alphaherpesvirus 5 (ChHV5) is a herpesvirus associated with fibropapillomatosis (FP) in sea turtles worldwide. Single-locus typing has previously shown differentiation between Atlantic and Pacific strains of this virus, with low variation within each geographic clade. However, a lack of multi-locus genomic sequence data hinders understanding of the rate and mechanisms of ChHV5 evolutionary divergence, as well as how these genomic changes may contribute to differences in disease manifestation. To assess genomic variation in ChHV5 among five Hawaii and three Florida green sea turtles, we used high-throughput short-read sequencing of long-range PCR products amplified from tumor tissue using primers designed from the single available ChHV5 reference genome from a Hawaii green sea turtle. This strategy recovered sequence data from both geographic regions for approximately 75% of the predicted ChHV5 coding sequences. The average nucleotide divergence between geographic populations was 1.5%; most of the substitutions were fixed differences between regions. Protein divergence was generally low (average 0.08%), and ranged between 0 and 5.3%. Several atypical genes originally identified and annotated in the reference genome were confirmed in ChHV5 genomes from both geographic locations. Unambiguous recombination events between geographic regions were identified, and clustering of private alleles suggests the prevalence of recombination in the evolutionary history of ChHV5. This study significantly increased the amount of sequence data available from ChHV5 strains, enabling informed selection of loci for future population genetic and natural history studies, and suggesting the (possibly latent) co-infection of individuals by well-differentiated geographic variants.
Morrison, Cheryl L.; Iwanowicz, Luke R.; Work, Thierry M.; Fahsbender, Elizabeth; Breitbart, Mya; Adams, Cynthia; Iwanowicz, Deborah; Sanders, Lakyn; Ackermann, Mathias; Cornman, Robert S.
2018-01-01
Chelonid alphaherpesvirus 5 (ChHV5) is a herpesvirus associated with fibropapillomatosis (FP) in sea turtles worldwide. Single-locus typing has previously shown differentiation between Atlantic and Pacific strains of this virus, with low variation within each geographic clade. However, a lack of multi-locus genomic sequence data hinders understanding of the rate and mechanisms of ChHV5 evolutionary divergence, as well as how these genomic changes may contribute to differences in disease manifestation. To assess genomic variation in ChHV5 among five Hawaii and three Florida green sea turtles, we used high-throughput short-read sequencing of long-range PCR products amplified from tumor tissue using primers designed from the single available ChHV5 reference genome from a Hawaii green sea turtle. This strategy recovered sequence data from both geographic regions for approximately 75% of the predicted ChHV5 coding sequences. The average nucleotide divergence between geographic populations was 1.5%; most of the substitutions were fixed differences between regions. Protein divergence was generally low (average 0.08%), and ranged between 0 and 5.3%. Several atypical genes originally identified and annotated in the reference genome were confirmed in ChHV5 genomes from both geographic locations. Unambiguous recombination events between geographic regions were identified, and clustering of private alleles suggests the prevalence of recombination in the evolutionary history of ChHV5. This study significantly increased the amount of sequence data available from ChHV5 strains, enabling informed selection of loci for future population genetic and natural history studies, and suggesting the (possibly latent) co-infection of individuals by well-differentiated geographic variants.
Turnbaugh, Peter J.; Quince, Christopher; Faith, Jeremiah J.; McHardy, Alice C.; Yatsunenko, Tanya; Niazi, Faheem; Affourtit, Jason; Egholm, Michael; Henrissat, Bernard; Knight, Rob; Gordon, Jeffrey I.
2010-01-01
We deeply sampled the organismal, genetic, and transcriptional diversity in fecal samples collected from a monozygotic (MZ) twin pair and compared the results to 1,095 communities from the gut and other body habitats of related and unrelated individuals. Using a new scheme for noise reduction in pyrosequencing data, we estimated the total diversity of species-level bacterial phylotypes in the 1.2-1.5 million bacterial 16S rRNA reads obtained from each deeply sampled cotwin to be ~800 (35.9%, 49.1% detected in both). A combined 1.1 million read 16S rRNA dataset representing 281 shallowly sequenced fecal samples from 54 twin pairs and their mothers contained an estimated 4,018 species-level phylotypes, with each sample having a unique species assemblage (53.4 ± 0.6% and 50.3 ± 0.5% overlap with the deeply sampled cotwins). Of the 134 phylotypes with a relative abundance of >0.1% in the combined dataset, only 37 appeared in >50% of the samples, with one phylotype in the Lachnospiraceae family present in 99%. Nongut communities had significantly reduced overlap with the deeply sequenced twins’ fecal microbiota (18.3 ± 0.3%, 15.3 ± 0.3%). The MZ cotwins’ fecal DNA was deeply sequenced (3.8-6.3 Gbp/sample) and assembled reads were assigned to 25 genus-level phylogenetic bins. Only 17% of the genes in these bins were shared between the cotwins. Bins exhibited differences in their degree of sequence variation, gene content including the repertoire of carbohydrate active enzymes present within and between twins (e.g., predicted cellulases, dockerins), and transcriptional activities. These results provide an expanded perspective about features that make each of us unique life forms and directions for future characterization of our gut ecosystems. PMID:20363958
Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E
2012-09-28
Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.
2012-01-01
Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841
Iwanowicz, Luke; Work, Thierry M.; Fahsbender, Elizabeth; Breitbart, Mya; Adams, Cynthia; Iwanowicz, Deb; Sanders, Lakyn; Ackermann, Mathias; Cornman, Robert S.
2018-01-01
Chelonid alphaherpesvirus 5 (ChHV5) is a herpesvirus associated with fibropapillomatosis (FP) in sea turtles worldwide. Single-locus typing has previously shown differentiation between Atlantic and Pacific strains of this virus, with low variation within each geographic clade. However, a lack of multi-locus genomic sequence data hinders understanding of the rate and mechanisms of ChHV5 evolutionary divergence, as well as how these genomic changes may contribute to differences in disease manifestation. To assess genomic variation in ChHV5 among five Hawaii and three Florida green sea turtles, we used high-throughput short-read sequencing of long-range PCR products amplified from tumor tissue using primers designed from the single available ChHV5 reference genome from a Hawaii green sea turtle. This strategy recovered sequence data from both geographic regions for approximately 75% of the predicted ChHV5 coding sequences. The average nucleotide divergence between geographic populations was 1.5%; most of the substitutions were fixed differences between regions. Protein divergence was generally low (average 0.08%), and ranged between 0 and 5.3%. Several atypical genes originally identified and annotated in the reference genome were confirmed in ChHV5 genomes from both geographic locations. Unambiguous recombination events between geographic regions were identified, and clustering of private alleles suggests the prevalence of recombination in the evolutionary history of ChHV5. This study significantly increased the amount of sequence data available from ChHV5 strains, enabling informed selection of loci for future population genetic and natural history studies, and suggesting the (possibly latent) co-infection of individuals by well-differentiated geographic variants. PMID:29479497
Omaleki, Lida; Browning, Glenn F; Barber, Stuart R; Allen, Joanne L; Srikumaran, Subramaniam; Markham, Philip F
2014-11-07
Species within the genus Mannheimia are among the most important causes of ovine mastitis. Isolates of these species can express leukotoxin A (LktA), a primary virulence factor of these bacteria. To examine the significance of variation in the LktA, the sequences of the lktA genes in a panel of isolates from cases of ovine mastitis were compared. The cross-neutralising capacities of rat antisera raised against LktA of one Mannheimia glucosida, one haemolytic Mannheimia ruminalis, and two Mannheimia haemolytica isolates were also examined to assess the effect that variation in the lktA gene can have on protective immunity against leukotoxins with differing sequences. The lktA nucleotide distance between the M. haemolytica isolates was greater than between the M. glucosida isolates, with the M. haemolytica isolates divisible into two groups based on their lktA sequences. Comparison of the topology of phylogenetic trees of 16S rDNA and lktA sequences revealed differences in the relationships between some isolates, suggesting horizontal gene transfer. Cross neutralisation data obtained with monospecific anti-LktA rat sera were used to derive antigenic similarity coefficients for LktA from the four Mannheimia species isolates. Similarity coefficients indicated that LktA of the two M. haemolytica isolates were least similar, while LktA from M. glucosida was most similar to those for one of the M. haemolytica isolates and the haemolytic M. ruminalis isolate. The results suggested that vaccination with the M. glucosida leukotoxin would generate the greatest cross-protection against ovine mastitis caused by Mannheimia species with these alleles. Copyright © 2014 Elsevier B.V. All rights reserved.
Assessing the Fidelity of Ancient DNA Sequences Amplified From Nuclear Genes
Binladen, Jonas; Wiuf, Carsten; Gilbert, M. Thomas P.; Bunce, Michael; Barnett, Ross; Larson, Greger; Greenwood, Alex D.; Haile, James; Ho, Simon Y. W.; Hansen, Anders J.; Willerslev, Eske
2006-01-01
To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved in phenotypic traits of extinct taxa. It is well documented that postmortem damage in ancient mtDNA can lead to the generation of artifactual sequences. However, as yet no one has thoroughly investigated the damage spectrum in ancient nuDNA. By comparing clone sequences from 23 fossil specimens, recovered from environments ranging from permafrost to desert, we demonstrate the presence of miscoding lesion damage in both the mtDNA and nuDNA, resulting in insertion of erroneous bases during amplification. Interestingly, no significant differences in the frequency of miscoding lesion damage are recorded between mtDNA and nuDNA despite great differences in cellular copy numbers. For both mtDNA and nuDNA, we find significant positive correlations between total sequence heterogeneity and the rates of type 1 transitions (adenine → guanine and thymine → cytosine) and type 2 transitions (cytosine → thymine and guanine → adenine), respectively. Type 2 transitions are by far the most dominant and increase relative to those of type 1 with damage load. The results suggest that the deamination of cytosine (and 5-methyl cytosine) to uracil (and thymine) is the main cause of miscoding lesions in both ancient mtDNA and nuDNA sequences. We argue that the problems presented by postmortem damage, as well as problems with contamination from exogenous sources of conserved nuclear genes, allelic variation, and the reliance on single nucleotide polymorphisms, call for great caution in studies relying on ancient nuDNA sequences. PMID:16299392
Typing Clostridium difficile strains based on tandem repeat sequences
2009-01-01
Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124
2012-01-01
Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952) of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612) were intronic and 9% (n = 464) were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS). Significant (P < 0.01) mean allele frequency differentials between the low and high fertility groups were observed for 720 SNPs (58 NSS). Allele frequencies for 43 of the SNPs were also determined by genotyping the 150 individual animals (Sequenom® MassARRAY). No significant differences (P > 0.1) were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total). Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving interval plausibly harbouring causative variants contributing to heritable variation. To our knowledge, this is the first report describing sequencing of targeted genomic regions in any livestock species using groups with divergent phenotypes for an economically important trait. PMID:22235840
Significance of genetic variants in DLC1 and their association with hepatocellular carcinoma
XIE, CHENG-RONG; SUN, HONG-GUANG; SUN, YU; ZHAO, WEN-XIU; ZHANG, SHENG; WANG, XIAO-MIN; YIN, ZHEN-YU
2015-01-01
DLC1 has been shown to be downregulated or absent in hepatocellular carcinoma (HCC) and is associated with tumorigenesis and development. However, only a small number of studies have focused on genetic variations of DLC1. The present study performed exon sequencing for the DLC1 gene in HCC tissue samples from 105 patients to identify functional genetic variation of DLC1 and its association with HCC susceptibility, clinicopathological features and prognosis. A novel missense mutation and four non-synonymous single nucleotide polymorphisms (SNPs; rs3816748, rs11203495, rs3816747 and rs532841) were identified. A significant correlation of rs3816747 polymorphisms with HCC susceptibility was identified. Compared to individuals with the GG genotype of rs3816747, those with the GA (odds ratio (OR)=0.486; P=0.037) or GA+AA genotype (OR=0.51; P=0.039) were associated with a significantly decreased HCC risk. Furthermore, patients with the GC+CC genotype of rs3816748, the TC+CC genotype of rs11203495 or the GA+AA genotype of rs3816747 had small-sized tumors compared with those carrying the wild-type genotype. No significant association of DLC1 SNPs with the patients' prognosis was found. These results indicated that genetic variations in the DLC1 gene may confer a risk for HCC. PMID:26095787
Uroz, Stéphane; Ioannidis, Panos; Lengelle, Juliette; Cébron, Aurélie; Morin, Emmanuelle; Buée, Marc; Martin, Francis
2013-01-01
In temperate ecosystems, acidic forest soils are among the most nutrient-poor terrestrial environments. In this context, the long-term differentiation of the forest soils into horizons may impact the assembly and the functions of the soil microbial communities. To gain a more comprehensive understanding of the ecology and functional potentials of these microbial communities, a suite of analyses including comparative metagenomics was applied on independent soil samples from a spruce plantation (Breuil-Chenue, France). The objectives were to assess whether the decreasing nutrient bioavailability and pH variations that naturally occurs between the organic and mineral horizons affects the soil microbial functional biodiversity. The 14 Gbp of pyrosequencing and Illumina sequences generated in this study revealed complex microbial communities dominated by bacteria. Detailed analyses showed that the organic soil horizon was significantly enriched in sequences related to Bacteria, Chordata, Arthropoda and Ascomycota. On the contrary the mineral horizon was significantly enriched in sequences related to Archaea. Our analyses also highlighted that the microbial communities inhabiting the two soil horizons differed significantly in their functional potentials according to functional assays and MG-RAST analyses, suggesting a functional specialisation of these microbial communities. Consistent with this specialisation, our shotgun metagenomic approach revealed a significant increase in the relative abundance of sequences related glycoside hydrolases in the organic horizon compared to the mineral horizon that was significantly enriched in glycoside transferases. This functional stratification according to the soil horizon was also confirmed by a significant correlation between the functional assays performed in this study and the functional metagenomic analyses. Together, our results suggest that the soil stratification and particularly the soil resource availability impact the functional diversity and to a lesser extent the taxonomic diversity of the bacterial communities. PMID:23418476
de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M
2017-07-06
Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.
Tempo and mode of genomic mutations unveil human evolutionary history.
Hara, Yuichiro
2015-01-01
Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.
Gifford, Robert J.; Rhee, Soo-Yon; Eriksson, Nicolas; Liu, Tommy F.; Kiuchi, Mark; Das, Amar K.; Shafer, Robert W.
2008-01-01
Design Promiscuous guanine (G) to adenine (A) substitutions catalysed by apolipoprotein B RNA-editing catalytic component (APOBEC) enzymes are observed in a proportion of HIV-1 sequences in vivo and can introduce artifacts into some genetic analyses. The potential impact of undetected lethal editing on genotypic estimation of transmitted drug resistance was assessed. Methods Classifiers of lethal, APOBEC-mediated editing were developed by analysis of lentiviral pol gene sequence variation and evaluated using control sets of HIV-1 sequences. The potential impact of sequence editing on genotypic estimation of drug resistance was assessed in sets of sequences obtained from 77 studies of 25 or more therapy-naive individuals, using mixture modelling approaches to determine the maximum likelihood classification of sequences as lethally edited as opposed to viable. Results Analysis of 6437 protease and reverse transcriptase sequences from therapy-naive individuals using a novel classifier of lethal, APOBEC3G-mediated sequence editing, the polypeptide-like 3G (APOBEC3G)-mediated defectives (A3GD) index’, detected lethal editing in association with spurious ‘transmitted drug resistance’ in nearly 3% of proviral sequences obtained from whole blood and 0.2% of samples obtained from plasma. Conclusion Screening for lethally edited sequences in datasets containing a proportion of proviral DNA, such as those likely to be obtained for epidemiological surveillance of transmitted drug resistance in the developing world, can eliminate rare but potentially significant errors in genotypic estimation of transmitted drug resistance. PMID:18356601
Epigenetic Variation in Mangrove Plants Occurring in Contrasting Natural Environment
Lira-Medeiros, Catarina Fonseca; Parisod, Christian; Fernandes, Ricardo Avancini; Mata, Camila Souza; Cardoso, Monica Aires; Ferreira, Paulo Cavalcanti Gomes
2010-01-01
Background Epigenetic modifications, such as cytosine methylation, are inherited in plant species and may occur in response to biotic or abiotic stress, affecting gene expression without changing genome sequence. Laguncularia racemosa, a mangrove species, occurs in naturally contrasting habitats where it is subjected daily to salinity and nutrient variations leading to morphological differences. This work aims at unraveling how CpG-methylation variation is distributed among individuals from two nearby habitats, at a riverside (RS) or near a salt marsh (SM), with different environmental pressures and how this variation is correlated with the observed morphological variation. Principal Findings Significant differences were observed in morphological traits such as tree height, tree diameter, leaf width and leaf area between plants from RS and SM locations, resulting in smaller plants and smaller leaf size in SM plants. Methyl-Sensitive Amplified Polymorphism (MSAP) was used to assess genetic and epigenetic (CpG-methylation) variation in L. racemosa genomes from these populations. SM plants were hypomethylated (14.6% of loci had methylated samples) in comparison to RS (32.1% of loci had methylated samples). Within-population diversity was significantly greater for epigenetic than genetic data in both locations, but SM also had less epigenetic diversity than RS. Frequency-based (GST) and multivariate (βST) methods that estimate population structure showed significantly greater differentiation among locations for epigenetic than genetic data. Co-Inertia analysis, exploring jointly the genetic and epigenetic data, showed that individuals with similar genetic profiles presented divergent epigenetic profiles that were characteristic of the population in a particular environment, suggesting that CpG-methylation changes may be associated with environmental heterogeneity. Conclusions In spite of significant morphological dissimilarities, individuals of L. racemosa from salt marsh and riverside presented little genetic but abundant DNA methylation differentiation, suggesting that epigenetic variation in natural plant populations has an important role in helping individuals to cope with different environments. PMID:20436669
Yu, Teng-Lang; Lin, Hung-Du; Weng, Ching-Feng
2014-01-01
Aim To comprehend the phylogeographic patterns of genetic variation in anurans at Taiwan Island, this study attempted to examine (1) the existence of various geological barriers (Central Mountain Ranges, CMRs); and (2) the genetic variation of Bufo bankorensis using mtDNA sequences among populations located in different regions of Taiwan, characterized by different climates and existing under extreme conditions when compared available sequences of related species B. gargarizans of mainland China. Methodology/Principal Findings Phylogenetic analyses of the dataset with mitochondrial DNA (mtDNA) D-loop gene (348 bp) recovered a close relationship between B. bankorensis and B. gargarizans, identified three distinct lineages. Furthermore, the network of mtDNA D-loop gene (564 bp) amplified (279 individuals, 27 localities) from Taiwan Island indicated three divergent clades within B. bankorensis (Clade W, E and S), corresponding to the geography, thereby verifying the importance of the CMRs and Kaoping River drainage as major biogeographic barriers. Mismatch distribution analysis, neutrality tests and Bayesian skyline plots revealed that a significant population expansion occurred for the total population and Clade W, with horizons dated to approximately 0.08 and 0.07 Mya, respectively. These results suggest that the population expansion of Taiwan Island species B. bankorensis might have resulted from the release of available habitat in post-glacial periods, the genetic variation on mtDNA showing habitat selection, subsequent population dispersal, and co-distribution among clades. Conclusions The multiple origins (different clades) of B. bankorensis mtDNA sequences were first evident in this study. The divergent genetic clades found within B. bankorensis could be independent colonization by previously diverged lineages; inferring B. bankorensis originated from B. gargarizans of mainland China, then dispersal followed by isolation within Taiwan Island. Highly divergent clades between W and E of B. bankorensis, implies that the CMRs serve as a genetic barrier and separated the whole island into the western and eastern phylogroups. PMID:24853679
Sun, Lingling; Che, Kui; Zhao, Zhenzhen; Liu, Song; Xing, Xiaoming; Luo, Bing
2015-09-04
NK/T cell lymphoma is an aggressive lymphoma almost always associated with EBV. BamHI-A rightward open reading frame 1 (BARF1) and BamHI-H rightward open reading frame 1 (BHRF1) are two EBV early genes, which may be involved in the oncogenicity of EBV. It has been found that V29A strains, a BARF1 mutant subtype, showed higher prevalence in NPC, which may suggest the association between this variation and nasopharyngeal carcinoma (NPC). To characterize the sequence variation patterns of the Epstein-Barr virus (EBV) early genes and to elucidate their association with NK/T cell lymphoma, we analyzed the sequences of BARF1 and BHRF1 in EBV-positive NK/T cell lymphoma samples from Northern China. In situ hybridization (ISH) performed for EBV-encoded small RNA1 (EBER1) with specific digoxigenin-labeled probes was used to select the EBV positive lymphoma samples. Nested-polymerase chain reaction (nested-PCR) and DNA sequence analysis technique were used to obtain the sequences of BARF1 and BHRF1. The polymorphisms of these two genes were classified according to the signature changes and compared with the known corresponding EBV gene variation data. Two major subtypes of BARF1 gene, designated as B95-8 and V29A subtype, were identified. B95-8 subtype was the dominant subtype. The V29A subtype had one consistent amino acid change at amino acid residue 29 (V → A). Compared with B95-8, AA change at 88 (L → V) of BHRF1 was found in the majority of the isolates, and AA79 (V → L) mutation in a few isolates. Functional domains of BARF1 and BHRF1 were highly conserved. The distributions of BARF1 and BHRF1 subtypes had no significant differences among different EBV-associated malignancies and healthy donors. The sequences of BARF1 and BHRF1 are highly conserved which may contribute to maintain the biological function of these two genes. There is no evidence that particular EBV substrains of BARF1 or BHRF1 is region-restricted or disease-specific.
VarDetect: a nucleotide sequence variation exploratory tool
Ngamphiw, Chumpol; Kulawonganunchai, Supasak; Assawamakin, Anunchai; Jenwitheesuk, Ekachai; Tongsima, Sissades
2008-01-01
Background Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides. Results We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency. Availability VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at . PMID:19091032
Shafer, Robert W.; Hertogs, Kurt; Zolopa, Andrew R.; Warford, Ann; Bloor, Stuart; Betts, Bradley J.; Merigan, Thomas C.; Harrigan, Richard; Larder, Brendon A.
2001-01-01
We assessed the reproducibility of human immunodeficiency virus type 1 (HIV-1) reverse transcriptase (RT) and protease sequencing using cryopreserved plasma aliquots obtained from 46 heavily treated HIV-1-infected individuals in two laboratories using dideoxynucleotide sequencing. The rates of complete sequence concordance between the two laboratories were 99.1% for the protease sequence and 99.0% for the RT sequence. Approximately 90% of the discordances were partial, defined as one laboratory detecting a mixture and the second laboratory detecting only one of the mixture's components. Only 0.1% of the nucleotides were completely discordant between the two laboratories, and these were significantly more likely to occur in plasma samples with lower plasma HIV-1 RNA levels. Nucleotide mixtures were detected at approximately 1% of the nucleotide positions, and in every case in which one laboratory detected a mixture, the second laboratory either detected the same mixture or detected one of the mixture's components. The high rate of concordance in detecting mixtures and the fact that most discordances between the two laboratories were partial suggest that most discordances were caused by variation in sampling of the HIV-1 quasispecies by PCR rather than by technical errors in the sequencing process itself. PMID:11283081
Creze, Maud; Versheure, Leslie; Besson, Pierre; Sauvage, Chloe; Leclerc, Xavier; Jissendi-Tchofo, Patrice
2014-06-01
Brain functional and cytoarchitectural maturation continue until adulthood, but little is known about the evolution of the regional pattern of cortical thickness (CT), complexity (CC), and intensity or gradient (CG) in young adults. We attempted to detect global and regional age- and gender-related variations of brain CT, CC, and CG, in 28 healthy young adults (19-33 years) using a three-dimensional T1 -weighted magnetic resonance imaging sequence and surface-based methods. Whole brain interindividual variations of CT and CG were similar to that in the literature. As a new finding, age- and gender-related variations significantly affected brain complexity (P < 0.01) on posterior cingulate and middle temporal cortices (age), and the fronto-orbital cortex (gender), all in the right hemisphere. Regions of interest analyses showed age and gender significant interaction (P < 0.05) on the temporopolar, inferior, and middle temporal-entorrhinal cortices bilaterally, as well as left inferior parietal. In addition, we found significant inverse correlations between CT and CC and between CT and CG over the whole brain and markedly in precentral and occipital areas. Our findings differ in details from previous reports and may correlate with late brain maturation and learning plasticity in young adults' brain in the third decade. Copyright © 2013 Wiley Periodicals, Inc.
Analysis of methylated patterns and quality-related genes in tobacco (Nicotiana tabacum) cultivars.
Jiao, Junna; Jia, Yanlong; Lv, Zhuangwei; Sun, Chuanfei; Gao, Lijie; Yan, Xiaoxiao; Cui, Liusu; Tang, Zongxiang; Yan, Benju
2014-08-01
Methylation-sensitive amplified polymorphism was used in this study to investigate epigenetic information of four tobacco cultivars: Yunyan 85, NC89, K326, and Yunyan 87. The DNA fragments with methylated information were cloned by reamplified PCR and sequenced. The results of Blast alignments showed that the genes with methylation information included chitinase, nitrate reductase, chloroplast DNA, mitochondrial DNA, ornithine decarboxylase, ribulose carboxylase, and promoter sequences. Homologous comparison in three cloned gene sequences (nitrate reductase, ornithine decarboxylase, and ribulose decarboxylase) indicated that geographic factors had significant influence on the whole genome methylation. Introns also contained different information in different tobacco cultivars. These findings suggest that synthetic mechanisms for tobacco aromatic components could be affected by different environmental factors leading to variation of noncoding regions in the genome, which finally results in different fragrance and taste in different tobacco cultivars.
Turlapati, Swathi A; Minocha, Rakesh; Long, Stephanie; Ramsdell, Jordan; Minocha, Subhash C
2015-01-01
The impact of chronic nitrogen amendments on bacterial communities was evaluated at Harvard Forest, Petersham, MA, USA. Thirty soil samples (3 treatments × 2 soil horizons × 5 subplots) were collected in 2009 from untreated (control), low nitrogen-amended (LN; 50 kg NH4NO3 ha(-1) yr(-1)) and high nitrogen-amended (HN; 150 kg NH4NO3 ha(-1) yr(-1)) plots. PCR-amplified partial 16S rRNA gene sequences made from soil DNA were subjected to pyrosequencing (Turlapati et al., 2013) and analyses using oligotyping. The parameters M (the minimum count of the most abundant unique sequence in an oligotype) and s (the minimum number of samples in which an oligotype is expected to be present) had to be optimized for forest soils because of high diversity and the presence of rare organisms. Comparative analyses of the pyrosequencing data by oligotyping and operational taxonomic unit clustering tools indicated that the former yields more refined units of taxonomy with sequence similarity of ≥99.5%. Sequences affiliated with four new phyla and 73 genera were identified in the present study as compared to 27 genera reported earlier from the same data (Turlapati et al., 2013). Significant rearrangements in the bacterial community structure were observed with N-amendments revealing the presence of additional genera in N-amended plots with the absence of some that were present in the control plots. Permutational MANOVA analyses indicated significant variation associated with soil horizon and N treatment for a majority of the phyla. In most cases soil horizon partitioned more variation relative to treatment and treatment effects were more evident for the organic (Org) horizon. Mantel test results for Org soil showed significant positive correlations between bacterial communities and most soil parameters including NH4 and NO3. In mineral soil, correlations were seen only with pH, NH4, and NO3. Regardless of the pipeline used, a major hindrance for such a study remains to be the lack of reference databases for forest soils.
2010-01-01
Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org. PMID:20459805
Hermes Transposon Distribution and Structure in Musca domestica
Subramanian, Ramanand A.; Cathcart, Laura A.; Krafsur, Elliot S.; Atkinson, Peter W.
2009-01-01
Hermes are hAT transposons from Musca domestica that are very closely related to the hobo transposons from Drosophila melanogaster and are useful as gene vectors in a wide variety of organisms including insects, planaria, and yeast. hobo elements show distinct length variations in a rapidly evolving region of the transposase-coding region as a result of expansions and contractions of a simple repeat sequence encoding 3 amino acids threonine, proline, and glutamic acid (TPE). These variations in length may influence the function of the protein and the movement of hobo transposons in natural populations. Here, we determine the distribution of Hermes in populations of M. domestica as well as whether Hermes transposase has undergone similar sequence expansions and contractions during its evolution in this species. Hermes transposons were found in all M. domestica individuals sampled from 14 populations collected from 4 continents. All individuals with Hermes transposons had evidence for the presence of intact transposase open reading frames, and little sequence variation was observed among Hermes elements. A systematic analysis of the TPE-homologous region of the Hermes transposase-coding region revealed no evidence for length variation. The simple sequence repeat found in hobo elements is a feature of this transposon that evolved since the divergence of hobo and Hermes. PMID:19366812
Cenik, Can; Cenik, Elif Sarinay; Byeon, Gun W; Grubert, Fabian; Candille, Sophie I; Spacek, Damek; Alsallakh, Bilal; Tilgner, Hagen; Araya, Carlos L; Tang, Hua; Ricci, Emiliano; Snyder, Michael P
2015-11-01
Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy--many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation. © 2015 Cenik et al.; Published by Cold Spring Harbor Laboratory Press.
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project
Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John
2008-01-01
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213
Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup
2016-01-01
Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
2010-01-01
Background Thoroughbred horses have been selected for traits contributing to speed and stamina for centuries. It is widely recognized that inherited variation in physical and physiological characteristics is responsible for variation in individual aptitude for race distance, and that muscle phenotypes in particular are important. Results A genome-wide SNP-association study for optimum racing distance was performed using the EquineSNP50 Bead Chip genotyping array in a cohort of n = 118 elite Thoroughbred racehorses divergent for race distance aptitude. In a cohort-based association test we evaluated genotypic variation at 40,977 SNPs between horses suited to short distance (≤ 8 f) and middle-long distance (> 8 f) races. The most significant SNP was located on chromosome 18: BIEC2-417495 ~690 kb from the gene encoding myostatin (MSTN) [Punadj. = 6.96 × 10-6]. Considering best race distance as a quantitative phenotype, a peak of association on chromosome 18 (chr18:65809482-67545806) comprising eight SNPs encompassing a 1.7 Mb region was observed. Again, similar to the cohort-based analysis, the most significant SNP was BIEC2-417495 (Punadj. = 1.61 × 10-9; PBonf. = 6.58 × 10-5). In a candidate gene study we have previously reported a SNP (g.66493737C>T) in MSTN associated with best race distance in Thoroughbreds; however, its functional and genome-wide relevance were uncertain. Additional re-sequencing in the flanking regions of the MSTN gene revealed four novel 3' UTR SNPs and a 227 bp SINE insertion polymorphism in the 5' UTR promoter sequence. Linkage disequilibrium was highest between g.66493737C>T and BIEC2-417495 (r2 = 0.86). Conclusions Comparative association tests consistently demonstrated the g.66493737C>T SNP as the superior variant in the prediction of distance aptitude in racehorses (g.66493737C>T, P = 1.02 × 10-10; BIEC2-417495, Punadj. = 1.61 × 10-9). Functional investigations will be required to determine whether this polymorphism affects putative transcription-factor binding and gives rise to variation in gene and protein expression. Nonetheless, this study demonstrates that the g.66493737C>T SNP provides the most powerful genetic marker for prediction of race distance aptitude in Thoroughbreds. PMID:20932346
Hill, Emmeline W; McGivney, Beatrice A; Gu, Jingjing; Whiston, Ronan; Machugh, David E
2010-10-11
Thoroughbred horses have been selected for traits contributing to speed and stamina for centuries. It is widely recognized that inherited variation in physical and physiological characteristics is responsible for variation in individual aptitude for race distance, and that muscle phenotypes in particular are important. A genome-wide SNP-association study for optimum racing distance was performed using the EquineSNP50 Bead Chip genotyping array in a cohort of n = 118 elite Thoroughbred racehorses divergent for race distance aptitude. In a cohort-based association test we evaluated genotypic variation at 40,977 SNPs between horses suited to short distance (≤ 8 f) and middle-long distance (> 8 f) races. The most significant SNP was located on chromosome 18: BIEC2-417495 ~690 kb from the gene encoding myostatin (MSTN) [P(unadj.) = 6.96 x 10⁻⁶]. Considering best race distance as a quantitative phenotype, a peak of association on chromosome 18 (chr18:65809482-67545806) comprising eight SNPs encompassing a 1.7 Mb region was observed. Again, similar to the cohort-based analysis, the most significant SNP was BIEC2-417495 (P(unadj.) = 1.61 x 10⁻⁹; P(Bonf.) = 6.58 x 10⁻⁵). In a candidate gene study we have previously reported a SNP (g.66493737C>T) in MSTN associated with best race distance in Thoroughbreds; however, its functional and genome-wide relevance were uncertain. Additional re-sequencing in the flanking regions of the MSTN gene revealed four novel 3' UTR SNPs and a 227 bp SINE insertion polymorphism in the 5' UTR promoter sequence. Linkage disequilibrium was highest between g.66493737C>T and BIEC2-417495 (r² = 0.86). Comparative association tests consistently demonstrated the g.66493737C>T SNP as the superior variant in the prediction of distance aptitude in racehorses (g.66493737C>T, P = 1.02 x 10⁻¹⁰; BIEC2-417495, P(unadj.) = 1.61 x 10⁻⁹). Functional investigations will be required to determine whether this polymorphism affects putative transcription-factor binding and gives rise to variation in gene and protein expression. Nonetheless, this study demonstrates that the g.66493737C>T SNP provides the most powerful genetic marker for prediction of race distance aptitude in Thoroughbreds.
Holistic and component plant phenotyping using temporal image sequence.
Das Choudhury, Sruti; Bashyam, Srinidhi; Qiu, Yumou; Samal, Ashok; Awada, Tala
2018-01-01
Image-based plant phenotyping facilitates the extraction of traits noninvasively by analyzing large number of plants in a relatively short period of time. It has the potential to compute advanced phenotypes by considering the whole plant as a single object (holistic phenotypes) or as individual components, i.e., leaves and the stem (component phenotypes), to investigate the biophysical characteristics of the plants. The emergence timing, total number of leaves present at any point of time and the growth of individual leaves during vegetative stage life cycle of the maize plants are significant phenotypic expressions that best contribute to assess the plant vigor. However, image-based automated solution to this novel problem is yet to be explored. A set of new holistic and component phenotypes are introduced in this paper. To compute the component phenotypes, it is essential to detect the individual leaves and the stem. Thus, the paper introduces a novel method to reliably detect the leaves and the stem of the maize plants by analyzing 2-dimensional visible light image sequences captured from the side using a graph based approach. The total number of leaves are counted and the length of each leaf is measured for all images in the sequence to monitor leaf growth. To evaluate the performance of the proposed algorithm, we introduce University of Nebraska-Lincoln Component Plant Phenotyping Dataset (UNL-CPPD) and provide ground truth to facilitate new algorithm development and uniform comparison. The temporal variation of the component phenotypes regulated by genotypes and environment (i.e., greenhouse) are experimentally demonstrated for the maize plants on UNL-CPPD. Statistical models are applied to analyze the greenhouse environment impact and demonstrate the genetic regulation of the temporal variation of the holistic phenotypes on the public dataset called Panicoid Phenomap-1. The central contribution of the paper is a novel computer vision based algorithm for automated detection of individual leaves and the stem to compute new component phenotypes along with a public release of a benchmark dataset, i.e., UNL-CPPD. Detailed experimental analyses are performed to demonstrate the temporal variation of the holistic and component phenotypes in maize regulated by environment and genetic variation with a discussion on their significance in the context of plant science.
Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M
2014-06-01
It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.
Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori
2018-01-01
Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397
Genetic Alterations Affecting Cholesterol Metabolism and Human Fertility1
DeAngelis, Anthony M.; Roy-O'Reilly, Meaghan; Rodriguez, Annabelle
2014-01-01
ABSTRACT Single nucleotide polymorphisms (SNPs) represent genetic variations among individuals in a population. In medicine, these small variations in the DNA sequence may significantly impact an individual's response to certain drugs or influence the risk of developing certain diseases. In the field of reproductive medicine, a significant amount of research has been devoted to identifying polymorphisms which may impact steroidogenesis and fertility. This review discusses current understanding of the effects of genetic variations in cholesterol metabolic pathways on human fertility that bridge novel linkages between cholesterol metabolism and reproductive health. For example, the role of the low-density lipoprotein receptor (LDLR) in cellular metabolism and human reproduction has been well studied, whereas there is now an emerging body of research on the role of the high-density lipoprotein (HDL) receptor scavenger receptor class B type I (SR-BI) in human lipid metabolism and female reproduction. Identifying and understanding how polymorphisms in the SCARB1 gene or other genes related to lipid metabolism impact human physiology is essential and will play a major role in the development of personalized medicine for improved diagnosis and treatment of infertility. PMID:25122065
A novel enterovirus species identified from severe diarrheal goats.
Wang, Mingyue; He, Jia; Lu, Haibing; Liu, Yajing; Deng, Yingrui; Zhu, Lisai; Guo, Changming; Tu, Changchun; Wang, Xinping
2017-01-01
The Enterovirus genus of the family of Picornaviridae consists of 9 species of Enteroviruses and 3 species of Rhinoviruses based on the latest virus taxonomy. Those viruses contribute significantly to respiratory and digestive disorders in human and animals. Out of 9 Enterovirus species, Enterovirus E-G are closely related to diseases affecting on livestock industry. While enterovirus infection has been increasingly reported in cattle and swine, the enterovirus infections in small ruminants remain largely unknown. Virology, molecular and bioinformatics methods were employed to characterize a novel enterovirus CEV-JL14 from goats manifesting severe diarrhea with morbidity and mortality respectively up to 84% and 54% in China. CEV-JL14 was defined and proposed as a new Enterovirus species L within the genus of Enterovirus of the family Picornaviridae. CEV-JL14 had a complete genome sequence of 7461 nucleotides with an ORF encoding 2172 amino acids, and shared 77.1% of genomic sequence identity with TB4-OEV, an ovine enterovirus. Comparison of 5'-UTR and structural genes of CEV-JL14 with known Enterovirus species revealed highly genetic variations among CEV-JL14 with known Enterovirus species. VP1 nucleotide sequence identities of CEV-14 were 51.8%-53.5% with those of Enterovirus E and F, 30.9%-65.3% with Enterovirus G, and 43.8-51. 5% with Enterovirus A-D, respectively. CEV-JL14 was proposed as a novel species within the genus of Enterovirus according to the current ICTV demarcation criteria of enteroviruses. CEV-JL14 clustered phylogenetically to neither Enterovirus E and F, nor to Enterovirus G. It was defined and proposed as novel species L within the genus of Enterovirus. This is the first report of caprine enterovirus in China, the first complete genomic sequence of a caprine enterovirus revealed, and the unveiling of significant genetic variations between ovine enterovirus and caprine enterovirus, thus broadening the current understanding of enteroviruses.
A novel enterovirus species identified from severe diarrheal goats
Liu, Yajing; Deng, Yingrui; Zhu, Lisai; Guo, Changming; Tu, Changchun; Wang, Xinping
2017-01-01
Backgrounds The Enterovirus genus of the family of Picornaviridae consists of 9 species of Enteroviruses and 3 species of Rhinoviruses based on the latest virus taxonomy. Those viruses contribute significantly to respiratory and digestive disorders in human and animals. Out of 9 Enterovirus species, Enterovirus E-G are closely related to diseases affecting on livestock industry. While enterovirus infection has been increasingly reported in cattle and swine, the enterovirus infections in small ruminants remain largely unknown. Methods Virology, molecular and bioinformatics methods were employed to characterize a novel enterovirus CEV-JL14 from goats manifesting severe diarrhea with morbidity and mortality respectively up to 84% and 54% in China. Results CEV-JL14 was defined and proposed as a new Enterovirus species L within the genus of Enterovirus of the family Picornaviridae. CEV-JL14 had a complete genome sequence of 7461 nucleotides with an ORF encoding 2172 amino acids, and shared 77.1% of genomic sequence identity with TB4-OEV, an ovine enterovirus. Comparison of 5’-UTR and structural genes of CEV-JL14 with known Enterovirus species revealed highly genetic variations among CEV-JL14 with known Enterovirus species. VP1 nucleotide sequence identities of CEV-14 were 51.8%-53.5% with those of Enterovirus E and F, 30.9%-65.3% with Enterovirus G, and 43.8–51. 5% with Enterovirus A-D, respectively. CEV-JL14 was proposed as a novel species within the genus of Enterovirus according to the current ICTV demarcation criteria of enteroviruses. Conclusions CEV-JL14 clustered phylogenetically to neither Enterovirus E and F, nor to Enterovirus G. It was defined and proposed as novel species L within the genus of Enterovirus. This is the first report of caprine enterovirus in China, the first complete genomic sequence of a caprine enterovirus revealed, and the unveiling of significant genetic variations between ovine enterovirus and caprine enterovirus, thus broadening the current understanding of enteroviruses. PMID:28376123
Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola
2011-05-01
Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
2010-01-01
Bombyx mori and Bombyx mandarina are morphologically and physiologically similar. In this study, we compared the nucleotide variations in the complete mitochondrial (mt) genomes between the domesticated silkmoth, B. mori, and its wild ancestors, Chinese B. mandarina (ChBm) and Japanese B. mandarina (JaBm). The sequence divergence and transition mutation ratio between B. mori and ChBm are significantly smaller than those observed between B. mori and JaBm. The preference of transition by DNA strands between B. mori and ChBm is consistent with that between B. mori and JaBm, however, the regional variation in nucleotide substitution rate shows a different feature. These results suggest that the ChBm mt genome is not undergoing the same evolutionary process as JaBm, providing evidence for selection on mtDNA. Moreover, investigation of the nucleotide sequence divergence in the A+T-rich region of Bombyx mt genomes also provides evidence for the assumption that the A+T-rich region might not be the fastest evolving region of the mtDNA of insects. PMID:21637625
Michmerhuizen, Nicole L.; Birkeland, Andrew C.; Bradford, Carol R.; Brenner, J. Chad
2016-01-01
While sequencing studies have provided an improved understanding of the genetic landscape of head and neck squamous cell carcinomas (HNSCC), there remains a significant lack of genetic data derived from non-Caucasian cohorts. Additionally, there is wide variation in HNSCC incidence and mortality worldwide both between and within various geographic regions. These epidemiologic differences are in part accounted for by varying exposure to environmental risk factors such as tobacco, alcohol, high risk human papilloma viruses and betel quid. However, inherent genetic factors may also play an important role in this variability. As limited sequencing data is available for many populations, the involvement of unique genetic factors in HNSCC pathogenesis from epidemiologically diverse groups is unknown. Here, we review current knowledge about the epidemiologic, environmental, and genetic variation in HNSCC cohorts globally and discuss future studies necessary to further our understanding of these differences. Long-term, a more complete understanding of the genetic drivers found in diverse HNSCC cohorts may help the development of personalized medicine protocols for patients with rare or complex genetic events. PMID:27551333
A Late Glacial to Holocene record of environmental change from Lake Dojran (Macedonia, Greece)
NASA Astrophysics Data System (ADS)
Francke, A.; Wagner, B.; Leng, M. J.; Rethemeyer, J.
2013-02-01
A Late Glacial to Holocene sediment sequence (Co1260, 717 cm) from Lake Dojran, located at the boarder of the F.Y.R. of Macedonia and Greece, has been investigated to provide information on climate variability in the Balkan region. A robust age-model was established from 13 radiocarbon ages, and indicates that the base of the sequence was deposited at ca. 12 500 cal yr BP, when the lake-level was low. Variations in sedimentological (H2O, TOC, CaCO3, TS, TOC/TN, TOC/TS, grain-size, XRF, δ18Ocarb, δ13Ccarb, δ13Corg) data were linked to hydro-acoustic data and indicate that warmer and more humid climate conditions characterised the remaining period of the Younger Dryas until the beginning of the Holocene. The Holocene exhibits significant environmental variations, including the 8.2 and 4.2 ka cooling events, the Medieval Warm Period and the Little Ice Age. Human induced erosion processes in the catchment of Lake Dojran intensified after 2800 cal yr BP.
Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.
2011-01-01
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452