Hsing, Michael; Cherkasov, Artem
2008-06-25
Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
Assmus, Jens; Kleffe, Jürgen; Schmitt, Armin O.; Brockmann, Gudrun A.
2013-01-01
There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence. PMID:23658777
LenVarDB: database of length-variant protein domains.
Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan
2014-01-01
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L
2006-01-01
Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851
A System for Dosage-Based Functional Genomics in Poplar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henry, Isabelle M.; Zinkgraf, Matthew S.; Groover, Andrew T.
Altering gene dosage through variation in gene copy number is a powerful approach to addressing questions regarding gene regulation, quantitative trait loci, and heterosis, but one that is not easily applied to sexually transmitted species. Elite poplar (Populus spp) varieties are created through interspecific hybridization, followed by clonal propagation. Altered gene dosage relationships are believed to contribute to hybrid performance. Clonal propagation allows for replication and maintenance of meiotically unstable ploidy or structural variants and provides an alternative approach to investigating gene dosage effects not possible in sexually propagated species. Here, we built a genome-wide structural variation system for dosage-basedmore » functional genomics and breeding of poplar. We pollinated Populus deltoides with gamma-irradiated Populus nigra pollen to produce >500 F1 seedlings containing dosage lesions in the form of deletions and insertions of chromosomal segments (indel mutations). Using high-precision dosage analysis, we detected indel mutations in ~55% of the progeny. These indels varied in length, position, and number per individual, cumulatively tiling >99% of the genome, with an average of 10 indels per gene. Combined with future phenotype and transcriptome data, this population will provide an excellent resource for creating and characterizing dosage-based variation in poplar, including the contribution of dosage to quantitative traits and heterosis.« less
A System for Dosage-Based Functional Genomics in Poplar
Henry, Isabelle M.; Zinkgraf, Matthew S.; Groover, Andrew T.; ...
2015-08-28
Altering gene dosage through variation in gene copy number is a powerful approach to addressing questions regarding gene regulation, quantitative trait loci, and heterosis, but one that is not easily applied to sexually transmitted species. Elite poplar (Populus spp) varieties are created through interspecific hybridization, followed by clonal propagation. Altered gene dosage relationships are believed to contribute to hybrid performance. Clonal propagation allows for replication and maintenance of meiotically unstable ploidy or structural variants and provides an alternative approach to investigating gene dosage effects not possible in sexually propagated species. Here, we built a genome-wide structural variation system for dosage-basedmore » functional genomics and breeding of poplar. We pollinated Populus deltoides with gamma-irradiated Populus nigra pollen to produce >500 F1 seedlings containing dosage lesions in the form of deletions and insertions of chromosomal segments (indel mutations). Using high-precision dosage analysis, we detected indel mutations in ~55% of the progeny. These indels varied in length, position, and number per individual, cumulatively tiling >99% of the genome, with an average of 10 indels per gene. Combined with future phenotype and transcriptome data, this population will provide an excellent resource for creating and characterizing dosage-based variation in poplar, including the contribution of dosage to quantitative traits and heterosis.« less
Wu, Jian; Wei, Keyun; Cheng, Feng; Li, Shikai; Wang, Qian; Zhao, Jianjun; Bonnema, Guusje; Wang, Xiaowu
2012-08-28
Flowering time is an important trait in Brassica rapa crops. FLOWERING LOCUS C (FLC) is a MADS-box transcription factor that acts as a potent repressor of flowering. Expression of FLC is silenced when plants are exposed to low temperature, which activates flowering. There are four copies of FLC in B. rapa. Analyses of different segregating populations have suggested that BraA.FLC.a (BrFLC1) and BraA.FLC.b (BrFLC2) play major roles in controlling flowering time in B. rapa. We analyzed the BrFLC2 sequence in nine B. rapa accessions, and identified a 57-bp insertion/deletion (InDel) across exon 4 and intron 4 resulting in a non-functional allele. In total, three types of transcripts were identified for this mutated BrFLC2 allele. The InDel was used to develop a PCR-based marker, which was used to screen a collection of 159 B. rapa accessions. The deletion genotype was present only in oil-type B. rapa, including ssp. oleifera and ssp. tricolaris, and not in other subspecies. The deletion genotype was significantly correlated with variation in flowering time. In contrast, the reported splicing site variation in BrFLC1, which also leads to a non-functional locus, was detected but not correlated with variation in flowering time in oil-type B. rapa, although it was correlated with variation in flowering time in vegetable-type B. rapa. Our results suggest that the naturally occurring deletion mutation across exon 4 and intron 4 in BrFLC2 gene contributes greatly to variation in flowering time in oil-type B. rapa. The observed different relationship between BrFLC1 or BrFLC2 and flowering time variation indicates that the control of flowering time has evolved separately between oil-type and vegetable-type B. rapa groups.
Li, Shan; Zheng, Yun-Chao; Cui, Hai-Rui; Fu, Hao-Wei; Shu, Qing-Yao; Huang, Jian-Zhong
Mutation breeding is based on the induction of genetic variations; hence knowledge of the frequency and type of induced mutations is of paramount importance for the design and implementation of a mutation breeding program. Although γ ray irradiation has been widely used since the 1960s in the breeding of about 200 economically important plant species, molecular elucidation of its genetic effects has so far been achieved largely by analysis of target genes or genomic regions. In the present study, the whole genomes of six γ-irradiated M 2 rice plants were sequenced; a total of 144-188 million high-quality (Q>20) reads were generated for each M 2 plant, resulting in genome coverage of >45 times for each plant. Single base substitution (SBS) and short insertion/deletion (Indel) mutations were detected at the average frequency of 7.5×10 -6 -9.8×10 -6 in the six M 2 rice plants (SBS being about 4 times more frequent than Indels). Structural and copy number variations, though less frequent than SBS and Indel, were also identified and validated. The mutations were scattered in all genomic regions across 12 rice chromosomes without apparent hotspots. The present study is the first genome-wide single-nucleotide resolution study on the feature and frequency of γ irradiation-induced mutations in a seed propagated crop; the findings are of practical importance for mutation breeding of rice and other crop species.
Indel variant analysis of short-read sequencing data with Scalpel
Fang, Han; Bergmann, Ewa A; Arora, Kanika; Vacic, Vladimir; Zody, Michael C; Iossifov, Ivan; O’Rawe, Jason A; Wu, Yiyang; Barron, Laura T Jimenez; Rosenbaum, Julie; Ronemus, Michael; Lee, Yoon-ha; Wang, Zihua; Dikoglu, Esra; Jobanputra, Vaidehi; Lyon, Gholson J; Wigler, Michael; Schatz, Michael C; Narzisi, Giuseppe
2017-01-01
As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ~5 h after read mapping. PMID:27854363
2014-01-01
Background Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. Results In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. Conclusions We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease. PMID:24742296
Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup
2016-01-01
Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Accurate indel prediction using paired-end short reads
2013-01-01
Background One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives. Results Here, we present a machine learning based method which is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. Our method identifies indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. We demonstrate the usefulness of our method with paired-end Illumina reads from 80 genomes of the first phase of the 1001 Genomes Project ( http://www.1001genomes.org) in Arabidopsis thaliana. Conclusion In this work we show that indel classification is a necessary step to reduce the number of false positive candidates. We demonstrate that missing classification may lead to spurious biological interpretations. The software is available at: http://agkb.is.tuebingen.mpg.de/Forschung/SV-M/. PMID:23442375
2011-01-01
Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336
Pena, Heloisa B.; Pena, Sérgio D. J.
2012-01-01
Objective Short insertion-deletion polymorphisms (indels) are the second most abundant form of genetic variations in humans after SNPs. Since indel alleles differ in size, they can be typed using the same methodological approaches and equipment currently utilized for microsatellite genotyping, which is already operational in forensic laboratories. We have previously shown that a panel of 40 carefully chosen indels has excellent potential for forensic identification, with combined probability of identity (match probability) of 7.09 × 10–17 for Europeans. Methods We describe the successful development of a multiplex system for genotyping the 40-indel panel in long thin denaturing polyacrylamide gels with silver staining. We also demonstrate that the system can be easily fully automated with a simple large scanner and commercial software. Results and Conclusion The great advantage of the new system of typing is its very low cost. The total price for laboratory equipment is less than EUR 10,000.-, and genotyping of an individual patient will cost less than EUR 10.- in supplies. Thus, the 40-indel panel described here and the newly developed ‘low-tech’ analysis platform represent useful new tools for forensic identification and kinship analysis in laboratories with limited budgets, especially in developing countries. PMID:22851937
Vishwakarma, Manish K; Kale, Sandip M; Sriswathi, Manda; Naresh, Talari; Shasidhar, Yaduru; Garg, Vanika; Pandey, Manish K; Varshney, Rajeev K
2017-01-01
Small insertions and deletions (InDels) are the second most prevalent and the most abundant structural variations in plant genomes. In order to deploy these genetic variations for genetic analysis in genus Arachis , we conducted comparative analysis of the draft genome assemblies of both the diploid progenitor species of cultivated tetraploid groundnut ( Arachis hypogaea L.) i.e., Arachis duranensis (A subgenome) and Arachis ipaënsis (B subgenome) and identified 515,223 InDels. These InDels include 269,973 insertions identified in A. ipaënsis against A. duranensis while 245,250 deletions in A. duranensis against A. ipaënsis . The majority of the InDels were of single bp (43.7%) and 2-10 bp (39.9%) while the remaining were >10 bp (16.4%). Phylogenetic analysis using genotyping data for 86 (40.19%) polymorphic markers grouped 96 diverse Arachis accessions into eight clusters mostly by the affinity of their genome. This study also provided evidence for the existence of "K" genome, although distinct from both the "A" and "B" genomes, but more similar to "B" genome. The complete homology between A. monticola and A. hypogaea tetraploid taxa showed a very similar genome composition. The above analysis has provided greater insights into the phylogenetic relationship among accessions, genomes, sub species and sections. These InDel markers are very useful resource for groundnut research community for genetic analysis and breeding applications.
Vishwakarma, Manish K.; Kale, Sandip M.; Sriswathi, Manda; Naresh, Talari; Shasidhar, Yaduru; Garg, Vanika; Pandey, Manish K.; Varshney, Rajeev K.
2017-01-01
Small insertions and deletions (InDels) are the second most prevalent and the most abundant structural variations in plant genomes. In order to deploy these genetic variations for genetic analysis in genus Arachis, we conducted comparative analysis of the draft genome assemblies of both the diploid progenitor species of cultivated tetraploid groundnut (Arachis hypogaea L.) i.e., Arachis duranensis (A subgenome) and Arachis ipaënsis (B subgenome) and identified 515,223 InDels. These InDels include 269,973 insertions identified in A. ipaënsis against A. duranensis while 245,250 deletions in A. duranensis against A. ipaënsis. The majority of the InDels were of single bp (43.7%) and 2–10 bp (39.9%) while the remaining were >10 bp (16.4%). Phylogenetic analysis using genotyping data for 86 (40.19%) polymorphic markers grouped 96 diverse Arachis accessions into eight clusters mostly by the affinity of their genome. This study also provided evidence for the existence of “K” genome, although distinct from both the “A” and “B” genomes, but more similar to “B” genome. The complete homology between A. monticola and A. hypogaea tetraploid taxa showed a very similar genome composition. The above analysis has provided greater insights into the phylogenetic relationship among accessions, genomes, sub species and sections. These InDel markers are very useful resource for groundnut research community for genetic analysis and breeding applications. PMID:29312366
Garzón-Martínez, Gina A.; Osorio-Guarín, Jaime A.; Delgadillo-Durán, Paola; Mayorga, Franklin; Enciso-Rodríguez, Felix E.; Landsman, David
2015-01-01
The genus Physalis is common in the Americas and includes several economically important species, among them Physalis peruviana that produces appetizing edible fruits. We studied the genetic diversity and population structure of P. peruviana and characterized 47 accessions of this species along with 13 accessions of related taxa consisting of 222 individuals from the Colombian Corporation of Agricultural Research (CORPOICA) germplasm collection, using Conserved Orthologous Sequences (COSII) and Immunity Related Genes (IRGs). In addition, 642 Single Nucleotide Polymorphism (SNPs) markers were identified and used for the genetic diversity analysis. A total of 121 alleles were detected in 24 InDels loci ranging from 2 to 9 alleles per locus, with an average of 5.04 alleles per locus. The average number of alleles in the SNP markers was two. The observed heterozygosity for P. peruviana with InDel and SNP markers was higher (0.48 and 0.59) than the expected heterozygosity (0.30 and 0.41). Interestingly, the observed heterozygosity in related taxa (0.4 and 0.12) was lower than the expected heterozygosity (0.59 and 0.25). The coefficient of population differentiation FST was 0.143 (InDels) and 0.038 (SNPs), showing a relatively low level of genetic differentiation among P. peruviana and related taxa. Higher levels of genetic variation were instead observed within populations based on the AMOVA analysis. Population structure analysis supported the presence of two main groups and PCA analysis based on SNP markers revealed two distinct clusters in the P. peruviana accessions corresponding to their state of cultivation. In this study, we identified molecular markers useful to detect genetic variation in Physalis germplasm for assisting conservation and crossbreeding strategies. PMID:26550601
Garzón-Martínez, Gina A; Osorio-Guarín, Jaime A; Delgadillo-Durán, Paola; Mayorga, Franklin; Enciso-Rodríguez, Felix E; Landsman, David; Mariño-Ramírez, Leonardo; Barrero, Luz Stella
2015-12-01
The genus Physalis is common in the Americas and includes several economically important species, among them Physalis peruviana that produces appetizing edible fruits. We studied the genetic diversity and population structure of P. peruviana and characterized 47 accessions of this species along with 13 accessions of related taxa consisting of 222 individuals from the Colombian Corporation of Agricultural Research (CORPOICA) germplasm collection, using Conserved Orthologous Sequences (COSII) and Immunity Related Genes (IRGs). In addition, 642 Single Nucleotide Polymorphism (SNPs) markers were identified and used for the genetic diversity analysis. A total of 121 alleles were detected in 24 InDels loci ranging from 2 to 9 alleles per locus, with an average of 5.04 alleles per locus. The average number of alleles in the SNP markers was two. The observed heterozygosity for P. peruviana with InDel and SNP markers was higher (0.48 and 0.59) than the expected heterozygosity (0.30 and 0.41). Interestingly, the observed heterozygosity in related taxa (0.4 and 0.12) was lower than the expected heterozygosity (0.59 and 0.25). The coefficient of population differentiation F ST was 0.143 (InDels) and 0.038 (SNPs), showing a relatively low level of genetic differentiation among P. peruviana and related taxa. Higher levels of genetic variation were instead observed within populations based on the AMOVA analysis. Population structure analysis supported the presence of two main groups and PCA analysis based on SNP markers revealed two distinct clusters in the P. peruviana accessions corresponding to their state of cultivation. In this study, we identified molecular markers useful to detect genetic variation in Physalis germplasm for assisting conservation and crossbreeding strategies.
Xing, Libo; Zhang, Dong; Song, Xiaomin; Weng, Kai; Shen, Yawen; Li, Youmei; Zhao, Caiping; Ma, Juanjuan; An, Na; Han, Mingyu
2016-01-01
Apple (Malus domestica Borkh.) is a commercially important fruit worldwide. Detailed information on genomic DNA polymorphisms, which are important for understanding phenotypic traits, is lacking for the apple. We re-sequenced two elite apple varieties, ‘Nagafu No. 2’ and ‘Qinguan,’ which have different characteristics. We identified many genomic variations, including 2,771,129 single nucleotide polymorphisms (SNPs), 82,663 structural variations (SVs), and 1,572,803 insertion/deletions (INDELs) in ‘Nagafu No. 2’ and 2,262,888 SNPs, 63,764 SVs, and 1,294,060 INDELs in ‘Qinguan.’ The ‘SNP,’ ‘INDEL,’ and ‘SV’ distributions were non-random, with variation-rich or -poor regions throughout the genomes. In ‘Nagafu No. 2’ and ‘Qinguan’ there were 171,520 and 147,090 non-synonymous SNPs spanning 23,111 and 21,400 genes, respectively; 3,963 and 3,196 SVs in 3,431 and 2,815 genes, respectively; and 1,834 and 1,451 INDELs in 1,681 and 1,345 genes, respectively. Genetic linkage maps of 190 flowering genes associated with multiple flowering pathways in ‘Nagafu No. 2,’ ‘Qinguan,’ and ‘Golden Delicious,’ identified complex regulatory mechanisms involved in floral induction, flower bud formation, and flowering characteristics, which might reflect the genetic variation of the flowering genes. Expression profiling of key flowering genes in buds and leaves suggested that the photoperiod and autonomous flowering pathways are major contributors to the different floral-associated traits between ‘Nagafu No. 2’ and ‘Qinguan.’ The genome variation data provided a foundation for the further exploration of apple diversity and gene–phenotype relationships, and for future research on molecular breeding to improve apple and related species. PMID:27446138
Cho, Kwang-Soo; Yun, Bong-Kyoung; Yoon, Young-Ho; Hong, Su-Young; Mekapogu, Manjulatha; Kim, Kyung-Hee; Yang, Tae-Jin
2015-01-01
We report the chloroplast (cp) genome sequence of tartary buckwheat (Fagopyrum tataricum) obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale) cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp) were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats) and F. esculentum (one repeat), and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes—rpoC2, ycf3, accD, and clpP—have high synonymous (Ks) value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum. PMID:25966355
Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls.
Buckley, Alexandra R; Standish, Kristopher A; Bhutani, Kunal; Ideker, Trey; Lasken, Roger S; Carter, Hannah; Harismendy, Olivier; Schork, Nicholas J
2017-06-12
Cancer research to date has largely focused on somatically acquired genetic aberrations. In contrast, the degree to which germline, or inherited, variation contributes to tumorigenesis remains unclear, possibly due to a lack of accessible germline variant data. Here we called germline variants on 9618 cases from The Cancer Genome Atlas (TCGA) database representing 31 cancer types. We identified batch effects affecting loss of function (LOF) variant calls that can be traced back to differences in the way the sequence data were generated both within and across cancer types. Overall, LOF indel calls were more sensitive to technical artifacts than LOF Single Nucleotide Variant (SNV) calls. In particular, whole genome amplification of DNA prior to sequencing led to an artificially increased burden of LOF indel calls, which confounded association analyses relating germline variants to tumor type despite stringent indel filtering strategies. The samples affected by these technical artifacts include all acute myeloid leukemia and practically all ovarian cancer samples. We demonstrate how technical artifacts induced by whole genome amplification of DNA can lead to false positive germline-tumor type associations and suggest TCGA whole genome amplified samples be used with caution. This study draws attention to the need to be sensitive to problems associated with a lack of uniformity in data generation in TCGA data.
Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael
2015-01-01
With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710
Pereira, R; Alves, C; Aler, M; Amorim, A; Arévalo, C; Betancor, E; Braganholi, D; Bravo, M L; Brito, P; Builes, J J; Burgos, G; Carvalho, E F; Castillo, A; Catanesi, C I; Cicarelli, R M B; Coufalova, P; Dario, P; D'Amato, M E; Davison, S; Ferragut, J; Fondevila, M; Furfuro, S; García, O; Gaviria, A; Gomes, I; González, E; Gonzalez-Liñan, A; Gross, T E; Hernández, A; Huang, Q; Jiménez, S; Jobim, L F; López-Parra, A M; Marino, M; Marques, S; Martínez-Cortés, G; Masciovecchio, V; Parra, D; Penacino, G; Pinheiro, M F; Porto, M J; Posada, Y; Restrepo, C; Ribeiro, T; Rubio, L; Sala, A; Santurtún, A; Solís, L S; Souto, L; Streitemberger, E; Torres, A; Vilela-Lamego, C; Yunis, J J; Yurrebaso, I; Gusmão, L
2018-01-01
A collaborative effort was carried out by the Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) to promote knowledge exchange between associate laboratories interested in the implementation of indel-based methodologies and build allele frequency databases of 38 indels for forensic applications. These databases include populations from different countries that are relevant for identification and kinship investigations undertaken by the participating laboratories. Before compiling population data, participants were asked to type the 38 indels in blind samples from annual GHEP-ISFG proficiency tests, using an amplification protocol previously described. Only laboratories that reported correct results contributed with population data to this study. A total of 5839 samples were genotyped from 45 different populations from Africa, America, East Asia, Europe and Middle East. Population differentiation analysis showed significant differences between most populations studied from Africa and America, as well as between two Asian populations from China and East Timor. Low F ST values were detected among most European populations. Overall diversities and parameters of forensic efficiency were high in populations from all continents. Copyright © 2017 Elsevier B.V. All rights reserved.
Indel detection from DNA and RNA sequencing data with transIndel.
Yang, Rendong; Van Etten, Jamie L; Dehm, Scott M
2018-04-19
Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.
Yeo, Zhen Xuan; Wong, Joshua Chee Leong; Rozen, Steven G; Lee, Ann Siew Gek
2014-06-24
The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing. Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM. New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.
Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum
Miles, Alistair; Iqbal, Zamin; Vauterin, Paul; Pearson, Richard; Campino, Susana; Theron, Michel; Gould, Kelda; Mead, Daniel; Drury, Eleanor; O'Brien, John; Ruano Rubio, Valentin; MacInnis, Bronwyn; Mwangi, Jonathan; Samarakoon, Upeka; Ranford-Cartwright, Lisa; Ferdig, Michael; Hayton, Karen; Su, Xin-zhuan; Wellems, Thomas; Rayner, Julian; McVean, Gil; Kwiatkowski, Dominic
2016-01-01
The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired. PMID:27531718
Wang, Qi; Heizer, Esley; Rosa, Bruce A.; Wildman, Scott A.; Janetka, James W.; Mitreva, Makedonka
2016-01-01
Insertions and deletions (indels) are important sequence variants that are considered as phylogenetic markers that reflect evolutionary adaptations in different species. In an effort to systematically study indels specific to the phylum Nematoda and their structural impact on the proteins bearing them, we examined over 340,000 polypeptides from 21 nematode species spanning the phylum, compared them to non-nematodes and identified indels unique to nematode proteins in more than 3,000 protein families. Examination of the amino acid composition revealed uneven usage of amino acids for insertions and deletions. The amino acid composition and cost, along with the secondary structure constitution of the indels, were analyzed in the context of their biological pathway associations. Species-specific indels could enable indel-based targeting for drug design in pathogens/parasites. Therefore, we screened the spatial locations of the indels in the parasite’s protein 3D structures, determined the location of the indel and identified potential unique drug targeting sites. These indels could be confirmed by RNA-Seq data. Examples are presented that illustrate the close proximity of the indel to established small-molecule binding pockets that can potentially facilitate selective targeting to the parasites and bypassing their host, thus reducing or eliminating the toxicity of the potential drugs. The study presents an approach for understanding the adaptation of pathogens/parasites at a molecular level, and outlines a strategy to identify such nematode-selective targets that remain essential to the organism. With further experimental characterization and validation, it opens a possible channel for the development of novel treatments with high target specificity, addressing both host toxicity and resistance concerns. PMID:26829384
Wang, Qi; Heizer, Esley; Rosa, Bruce A; Wildman, Scott A; Janetka, James W; Mitreva, Makedonka
2016-04-01
Insertions and deletions (indels) are important sequence variants that are considered as phylogenetic markers that reflect evolutionary adaptations in different species. In an effort to systematically study indels specific to the phylum Nematoda and their structural impact on the proteins bearing them, we examined over 340,000 polypeptides from 21 nematode species spanning the phylum, compared them to non-nematodes and identified indels unique to nematode proteins in more than 3000 protein families. Examination of the amino acid composition revealed uneven usage of amino acids for insertions and deletions. The amino acid composition and cost, along with the secondary structure constitution of the indels, were analyzed in the context of their biological pathway associations. Species-specific indels could enable indel-based targeting for drug design in pathogens/parasites. Therefore, we screened the spatial locations of the indels in the parasite's protein 3D structures, determined the location of the indel and identified potential unique drug targeting sites. These indels could be confirmed by RNA-Seq data. Examples are presented illustrating the close proximity of some indels to established small-molecule binding pockets that can potentially facilitate selective targeting to the parasites and bypassing their host, thus reducing or eliminating the toxicity of the potential drugs. This study presents an approach for understanding the adaptation of pathogens/parasites at a molecular level, and outlines a strategy to identify such nematode-selective targets that remain essential to the organism. With further experimental characterization and validation, it opens a possible channel for the development of novel treatments with high target specificity, addressing both host toxicity and resistance concerns. Copyright © 2016 Elsevier B.V. All rights reserved.
Patterns of DNA barcode variation in Canadian marine molluscs.
Layton, Kara K S; Martel, André L; Hebert, Paul D N
2014-01-01
Molluscs are the most diverse marine phylum and this high diversity has resulted in considerable taxonomic problems. Because the number of species in Canadian oceans remains uncertain, there is a need to incorporate molecular methods into species identifications. A 648 base pair segment of the cytochrome c oxidase subunit I gene has proven useful for the identification and discovery of species in many animal lineages. While the utility of DNA barcoding in molluscs has been demonstrated in other studies, this is the first effort to construct a DNA barcode registry for marine molluscs across such a large geographic area. This study examines patterns of DNA barcode variation in 227 species of Canadian marine molluscs. Intraspecific sequence divergences ranged from 0-26.4% and a barcode gap existed for most taxa. Eleven cases of relatively deep (>2%) intraspecific divergence were detected, suggesting the possible presence of overlooked species. Structural variation was detected in COI with indels found in 37 species, mostly bivalves. Some indels were present in divergent lineages, primarily in the region of the first external loop, suggesting certain areas are hotspots for change. Lastly, mean GC content varied substantially among orders (24.5%-46.5%), and showed a significant positive correlation with nearest neighbour distances. DNA barcoding is an effective tool for the identification of Canadian marine molluscs and for revealing possible cases of overlooked species. Some species with deep intraspecific divergence showed a biogeographic partition between lineages on the Atlantic, Arctic and Pacific coasts, suggesting the role of Pleistocene glaciations in the subdivision of their populations. Indels were prevalent in the barcode region of the COI gene in bivalves and gastropods. This study highlights the efficacy of DNA barcoding for providing insights into sequence variation across a broad taxonomic group on a large geographic scale.
Forensic applicability of multi-allelic InDels with mononucleotide homopolymer structures.
Zhang, Shu; Zhu, Qiang; Chen, Xiaogang; Zhao, Yuancun; Zhao, Xiaohong; Yang, Yiwen; Gao, Zehua; Fang, Ting; Wang, Yufang; Zhang, Ji
2018-04-27
Insertion/deletion polymorphisms (InDels), which possess the characteristics of low mutation rates and a short amplicon size, have been regarded as promising markers for forensic DNA analysis. InDels can be classified as bi-allelic or multi-allelic, depending on the number of alleles. Many studies have explored the use of bi-allelic InDels in forensic applications, such as individual identification and ancestry inference. However, multi-allelic InDels have received relatively little attention. In this study, InDels with 2-6 alleles and a minor allele frequency ≥0.01, in Chinese Southern Han (CHS), were retrieved from the 1000 Genomes Project Phase III. Based on the structural analysis of all retrieved InDels, 17 multi-allelic markers with mononucleotide homopolymer structures were selected and combined in one multiplex PCR reaction system. Sensitivity, species specificity and applicability in forensic case work of the multiplex were analyzed. A total of 218 unrelated individuals from a Chinese Han population were genotyped. The combined discriminatory power (CDP), the combined match probability (CMP) and the cumulative probability of exclusion (CPE) were 0.9999999999609, 3.91E-13 and 0.9956, respectively. The results demonstrated that this InDel multiplex panel was highly informative in the investigated population and most of the 26 populations of the 1000 Genomes Project. The data also suggested that multi-allelic InDel markers with monomeric base pair expansions are useful for forensic applications. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Fu, Chong-Yun; Liu, Wu-Ge; Liu, Di-Lin; Li, Ji-Hua; Zhu, Man-Shan; Liao, Yi-Long; Liu, Zhen-Rong; Zeng, Xue-Qin; Wang, Feng
2016-03-01
Next-generation sequencing technologies provide opportunities to further understand genetic variation, even within closely related cultivars. We performed whole genome resequencing of two elite indica rice varieties, RGD-7S and Taifeng B, whose F1 progeny showed hybrid weakness and hybrid vigor when grown in the early- and late-cropping seasons, respectively. Approximately 150 million 100-bp pair-end reads were generated, which covered ∼86% of the rice (Oryza sativa L. japonica 'Nipponbare') reference genome. A total of 2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified 961,791 SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S, 1989 of 2727 CNVs were overlapped in 218 genes, and 1231 of 2010 CNVs were annotated in 175 genes in Taifeng B. In addition, we verified a subset of InDels in the interval of hybrid weakness genes, Hw3 and Hw4, and obtained some polymorphic InDel markers, which will provide a sound foundation for cloning hybrid weakness genes. Analysis of genomic variations will also contribute to understanding the genetic basis of hybrid weakness and heterosis.
Length and sequence variability in mitochondrial control region of the milkfish, Chanos chanos.
Ravago, Rachel G; Monje, Virginia D; Juinio-Meñez, Marie Antonette
2002-01-01
Extensive length variability was observed in the mitochondrial control region of the milkfish, Chanos chanos. The nucleotide sequence of the control region and flanking regions was determined. Length variability and heteroplasmy was due to the presence of varying numbers of a 41-bp tandemly repeated sequence and a 48-bp insertion/deletion (indel). The structure and organization of the milkfish control region is similar to that of other teleost fish and vertebrates. However, extensive variation in the copy number of tandem repeats (4-20 copies) and the presence of a relatively large (48-bp) indel, are apparently uncommon in teleost fish control region sequences reported to date. High sequence variability of control region peripheral domains indicates the potential utility of selected regions as markers for population-level studies.
Corradi, Nicolas; Sanders, Ian R
2006-03-10
The P-type II ATPase gene family encodes proteins with an important role in adaptation of the cell to variation in external K+, Ca2+ and Na2+ concentrations. The presence of P-type II gene subfamilies that are specific for certain kingdoms has been reported but was sometimes contradicted by discovery of previously unknown homologous sequences in newly sequenced genomes. Members of this gene family have been sampled in all of the fungal phyla except the arbuscular mycorrhizal fungi (AMF; phylum Glomeromycota), which are known to play a key-role in terrestrial ecosystems and to be genetically highly variable within populations. Here we used highly degenerate primers on AMF genomic DNA to increase the sampling of fungal P-Type II ATPases and to test previous predictions about their evolution. In parallel, homologous sequences of the P-type II ATPases have been used to determine the nature and amount of polymorphism that is present at these loci among isolates of Glomus intraradices harvested from the same field. In this study, four P-type II ATPase sub-families have been isolated from three AMF species. We show that, contrary to previous predictions, P-type IIC ATPases are present in all basal fungal taxa. Additionally, P-Type IIE ATPases should no longer be considered as exclusive to the Ascomycota and the Basidiomycota, since we also demonstrate their presence in the Zygomycota. Finally, a comparison of homologous sequences encoding P-type IID ATPases showed unexpectedly that indel mutations among coding regions, as well as specific gene duplications occur among AMF individuals within the same field. On the basis of these results we suggest that the diversification of P-Type IIC and E ATPases followed the diversification of the extant fungal phyla with independent events of gene gains and losses. Consistent with recent findings on the human genome, but at a much smaller geographic scale, we provided evidence that structural genomic changes, such as exonic indel mutations and gene duplications are less rare than previously thought and that these also occur within fungal populations.
Zhao, Xiaohong; Chen, Xiaogang; Zhao, Yuancun; Zhang, Shu; Gao, Zehua; Yang, Yiwen; Wang, Yufang; Zhang, Ji
2018-05-01
Insertion/deletion polymorphisms (indels), which combine the advantages of both short tandem repeats and single-nucleotide polymorphisms, are suitable for parentage testing. To overcome the limitations of the low polymorphism of di-allelic indels, we constructed a set of haplotypes with physically linked, multi-allelic indels. Candidate haplotypes were selected from the 1000 Genomes Project database, and were subject to the following criteria for inclusion: (i) each marker must have a minimum allele frequency (MAF) of ≥0.1 in the Han population of China; (ii) markers must exist in a non-coding region; (iii) the physical distance between a pair of candidate indels must be <500 bp; (iv) the allele length variation of each indel from 1 to 20 bp; (v) different haplotypes must be located on different chromosomes or chromosomal arms, or be more than 10 Mb apart if on the same chromosomal arm; and (vi) they must not be located across a recombination hotspot. A multiplex system with 11 haplotype markers, comprising 22 tri-allelic indel loci distributed over 10 chromosomes was developed. To validate the multiplex panel, we investigated the haplotype distribution in sets of two and three-generation pedigrees. The results demonstrated that the haplotypes consisting of multi-allelic indel markers exhibited higher polymorphism than a single indel locus, and thus provide Supplementary information for forensic kinship identification. Copyright © 2018 Elsevier B.V. All rights reserved.
Zhang, Tongwu; Hu, Songnian; Zhang, Guangyu; Pan, Linlin; Zhang, Xiaowei; Al-Mssallem, Ibrahim S.; Yu, Jun
2012-01-01
Hassawi rice (Oryza sativa L.) is a landrace adapted to the climate of Saudi Arabia, characterized by its strong resistance to soil salinity and drought. Using high quality sequencing reads extracted from raw data of a whole genome sequencing project, we assembled both chloroplast (cp) and mitochondrial (mt) genomes of the wild-type Hassawi rice (Hassawi-1) and its dwarf hybrid (Hassawi-2). We discovered 16 InDels (insertions and deletions) but no SNP (single nucleotide polymorphism) is present between the two Hassawi cp genomes. We identified 48 InDels and 26 SNPs in the two Hassawi mt genomes and a new type of sequence variation, termed reverse complementary variation (RCV) in the rice cp genomes. There are two and four RCVs identified in Hassawi-1 when compared to 93–11 (indica) and Nipponbare (japonica), respectively. Microsatellite sequence analysis showed there are more SSRs in the genic regions of both cp and mt genomes in the Hassawi rice than in the other rice varieties. There are also large repeats in the Hassawi mt genomes, with the longest length of 96,168 bp and 96,165 bp in Hassawi-1 and Hassawi-2, respectively. We believe that frequent DNA rearrangement in the Hassawi mt and cp genomes indicate ongoing dynamic processes to reach genetic stability under strong environmental pressures. Based on sequence variation analysis and the breeding history, we suggest that both Hassawi-1 and Hassawi-2 originated from the Indonesian variety Peta since genetic diversity between the two Hassawi cultivars is very low albeit an unknown historic origin of the wild-type Hassawi rice. PMID:22870184
PyRAD: assembly of de novo RADseq loci for phylogenetic analyses.
Eaton, Deren A R
2014-07-01
Restriction-site-associated genomic markers are a powerful tool for investigating evolutionary questions at the population level, but are limited in their utility at deeper phylogenetic scales where fewer orthologous loci are typically recovered across disparate taxa. While this limitation stems in part from mutations to restriction recognition sites that disrupt data generation, an additional source of data loss comes from the failure to identify homology during bioinformatic analyses. Clustering methods that allow for lower similarity thresholds and the inclusion of indel variation will perform better at assembling RADseq loci at the phylogenetic scale. PyRAD is a pipeline to assemble de novo RADseq loci with the aim of optimizing coverage across phylogenetic datasets. It uses a wrapper around an alignment-clustering algorithm, which allows for indel variation within and between samples, as well as for incomplete overlap among reads (e.g. paired-end). Here I compare PyRAD with the program Stacks in their performance analyzing a simulated RADseq dataset that includes indel variation. Indels disrupt clustering of homologous loci in Stacks but not in PyRAD, such that the latter recovers more shared loci across disparate taxa. I show through reanalysis of an empirical RADseq dataset that indels are a common feature of such data, even at shallow phylogenetic scales. PyRAD uses parallel processing as well as an optional hierarchical clustering method, which allows it to rapidly assemble phylogenetic datasets with hundreds of sampled individuals. Software is written in Python and freely available at http://www.dereneaton.com/software/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
2013-01-01
Background Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved. Results We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans. Conclusions The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution. PMID:24025428
Comparative genomic analysis of Mycobacterium tuberculosis clinical isolates.
Liu, Fei; Hu, Yongfei; Wang, Qi; Li, Hong Min; Gao, George F; Liu, Cui Hua; Zhu, Baoli
2014-06-13
Due to excessive antibiotic use, drug-resistant Mycobacterium tuberculosis has become a serious public health threat and a major obstacle to disease control in many countries. To better understand the evolution of drug-resistant M. tuberculosis strains, we performed whole genome sequencing for 7 M. tuberculosis clinical isolates with different antibiotic resistance profiles and conducted comparative genomic analysis of gene variations among them. We observed that all 7 M. tuberculosis clinical isolates with different levels of drug resistance harbored similar numbers of SNPs, ranging from 1409-1464. The numbers of insertion/deletions (Indels) identified in the 7 isolates were also similar, ranging from 56 to 101. A total of 39 types of mutations were identified in drug resistance-associated loci, including 14 previously reported ones and 25 newly identified ones. Sixteen of the identified large Indels spanned PE-PPE-PGRS genes, which represents a major source of antigenic variability. Aside from SNPs and Indels, a CRISPR locus with varied spacers was observed in all 7 clinical isolates, suggesting that they might play an important role in plasticity of the M. tuberculosis genome. The nucleotide diversity (Л value) and selection intensity (dN/dS value) of the whole genome sequences of the 7 isolates were similar. The dN/dS values were less than 1 for all 7 isolates (range from 0.608885 to 0.637365), supporting the notion that M. tuberculosis genomes undergo purifying selection. The Л values and dN/dS values were comparable between drug-susceptible and drug-resistant strains. In this study, we show that clinical M. tuberculosis isolates exhibit distinct variations in terms of the distribution of SNP, Indels, CRISPR-cas locus, as well as the nucleotide diversity and selection intensity, but there are no generalizable differences between drug-susceptible and drug-resistant isolates on the genomic scale. Our study provides evidence strengthening the notion that the evolution of drug resistance among clinical M. tuberculosis isolates is clearly a complex and diversified process.
Association of insertion-deletions polymorphisms with colorectal cancer risk and clinical features.
Marques, Diego; Ferreira-Costa, Layse Raynara; Ferreira-Costa, Lorenna Larissa; Correa, Romualdo da Silva; Borges, Aline Maciel Pinheiro; Ito, Fernanda Ribeiro; Ramos, Carlos Cesar de Oliveira; Bortolin, Raul Hernandes; Luchessi, André Ducati; Ribeiro-Dos-Santos, Ândrea; Santos, Sidney; Silbiger, Vivian Nogueira
2017-10-07
To investigate the association between 16 insertion-deletions (INDEL) polymorphisms, colorectal cancer (CRC) risk and clinical features in an admixed population. One hundred and forty patients with CRC and 140 cancer-free subjects were examined. Genomic DNA was extracted from peripheral blood samples. Polymorphisms and genomic ancestry distribution were assayed by Multiplex-PCR reaction, separated by capillary electrophoresis on the ABI 3130 Genetic Analyzer instrument and analyzed on GeneMapper ID v3.2. Clinicopathological data were obtained by consulting the patients' clinical charts, intra-operative documentation, and pathology scoring. Logistic regression analysis showed that polymorphism variations in IL4 gene was associated with increased CRC risk, while TYMS and UCP2 genes were associated with decreased risk. Reference to anatomical localization of tumor Del allele of NFKB1 and CASP8 were associated with more colon related incidents than rectosigmoid. In relation to the INDEL association with tumor node metastasis (TNM) stage risk, the Ins alleles of ACE , HLAG and TP53 (6 bp INDEL) were associated with higher TNM stage. Furthermore, regarding INDEL association with relapse risk, the Ins alleles of ACE , HLAG , and UGT1A1 were associated with early relapse risk, as well as the Del allele of TYMS . Regarding INDEL association with death risk before 10 years, the Ins allele of SGSM3 and UGT1A1 were associated with death risk. The INDEL variations in ACE , UCP2 , TYMS , IL4 , NFKB1 , CASP8 , TP53 , HLAG , UGT1A1 , and SGSM3 were associated with CRC risk and clinical features in an admixed population. These data suggest that this cancer panel might be useful as a complementary tool for better clinical management, and more studies need to be conducted to confirm these findings.
Edvardsen, Rolf B; Leininger, Sven; Kleppe, Lene; Skaftnesmo, Kai Ove; Wargelius, Anna
2014-01-01
Understanding the biological function behind key proteins is of great concern in Atlantic salmon, both due to a high commercial importance and an interesting life history. Until recently, functional studies in salmonids appeared to be difficult. However, the recent discovery of targeted mutagenesis using the CRISPR/Cas9 (clustered regularly interspaced palindromic repeats/CRISPR-associated) system enables performing functional studies in Atlantic salmon to a great extent. We used the CRISPR/Cas9 system to target two genes involved in pigmentation, tyrosinase (tyr) and solute carrier family 45, member 2 (slc45a2). Embryos were assayed for mutation rates at the 17 somite stage, where 40 and 22% of all injected embryos showed a high degree of mutation induction for slc45a2 and tyr, respectively. At hatching this mutation frequency was also visible for both targeted genes, displaying a graded phenotype ranging from complete lack of pigmentation to partial loss and normal pigmentation. CRISPRslc45a2/Cas9 injected embryos showing a complete lack of pigmentation or just a few spots of pigments also lacked wild type sequences when assaying more than 80 (slc45a2) sequence clones from whole embryos. This indicates that CRISPR/Cas9 can induce double-allelic knockout in the F0 generation. However, types and frequency of indels might affect the phenotype. Therefore, the variation of indels was assayed in the graded pigmentation phenotypes produced by CRISPR/Cas9-slc45a2. The results show a tendency for fewer types of indels formed in juveniles completely lacking pigmentation compared to juveniles displaying partial pigmentation. Another interesting observation was a high degree of the same indel type in different juveniles. This study shows for the first time successful use of the CRISPR/Cas9 technology in a marine cold water species. Targeted double-allelic mutations were obtained and, though the level of mosaicism has to be considered, we demonstrate that F0 fish can be used for functional studies in Atlantic salmon.
Shirasawa, Kenta; Hirakawa, Hideki; Nunome, Tsukasa; Tabata, Satoshi; Isobe, Sachiko
2016-01-01
Genome-wide mutations induced by ethyl methanesulfonate (EMS) and gamma irradiation in the tomato Micro-Tom genome were identified by a whole-genome shotgun sequencing analysis to estimate the spectrum and distribution of whole-genome DNA mutations and the frequency of deleterious mutations. A total of ~370 Gb of paired-end reads for four EMS-induced mutants and three gamma-ray-irradiated lines as well as a wild-type line were obtained by next-generation sequencing technology. Using bioinformatics analyses, we identified 5920 induced single nucleotide variations and insertion/deletion (indel) mutations. The predominant mutations in the EMS mutants were C/G to T/A transitions, while in the gamma-ray mutants, C/G to T/A transitions, A/T to T/A transversions, A/T to G/C transitions and deletion mutations were equally common. Biases in the base composition flanking mutations differed between the mutagenesis types. Regarding the effects of the mutations on gene function, >90% of the mutations were located in intergenic regions, and only 0.2% were deleterious. In addition, we detected 1,140,687 spontaneous single nucleotide polymorphisms and indel polymorphisms in wild-type Micro-Tom lines. We also found copy number variation, deletions and insertions of chromosomal segments in both the mutant and wild-type lines. The results provide helpful information not only for mutation research, but also for mutant screening methodology with reverse-genetic approaches. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
PExFInS: An Integrative Post-GWAS Explorer for Functional Indels and SNPs
Cheng, Zhongshan; Chu, Hin; Fan, Yanhui; Li, Cun; Song, You-Qiang; Zhou, Jie; Yuen, Kwok-Yung
2015-01-01
Expression quantitative trait loci (eQTLs) mapping and linkage disequilibrium (LD) analysis have been widely employed to interpret findings of genome-wide association studies (GWAS). With the availability of deep sequencing data of 423 lymphoblastoid cell lines (LCLs) from six global populations and the microarray expression data, we performed eQTL analysis, identified more than 228 K SNP cis-eQTLs and 21 K indel cis-eQTLs and generated a LCL cis-eQTL database. We demonstrate that the percentages of population-shared and population-specific cis-eQTLs are comparable; while indel cis-eQTLs in the population-specific subsection make more contribution to gene expression variations than those in the population-shared subsection. We found cis-eQTLs, especially the population-shared cis-eQTLs are significantly enriched toward transcription start site. Moreover, the National Human Genome Research Institute cataloged GWAS SNPs are enriched for LCL cis-eQTLs. Specifically, 32.8% GWAS SNPs are LCL cis-eQTLs, among which 12.5% can be tagged by indel cis-eQTLs, suggesting the fundamental contribution of indel cis-eQTLs to GWAS association signals. To search for functional indels and SNPs tagging GWAS SNPs, a pipeline Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) has been developed, integrating LD analysis, functional annotation from public databases, cis-eQTL mapping with our LCL cis-eQTL database and other published cis-eQTL datasets. PMID:26612672
Association of insertion-deletions polymorphisms with colorectal cancer risk and clinical features
Marques, Diego; Ferreira-Costa, Layse Raynara; Ferreira-Costa, Lorenna Larissa; Correa, Romualdo da Silva; Borges, Aline Maciel Pinheiro; Ito, Fernanda Ribeiro; Ramos, Carlos Cesar de Oliveira; Bortolin, Raul Hernandes; Luchessi, André Ducati; Ribeiro-dos-Santos, Ândrea; Santos, Sidney; Silbiger, Vivian Nogueira
2017-01-01
AIM To investigate the association between 16 insertion-deletions (INDEL) polymorphisms, colorectal cancer (CRC) risk and clinical features in an admixed population. METHODS One hundred and forty patients with CRC and 140 cancer-free subjects were examined. Genomic DNA was extracted from peripheral blood samples. Polymorphisms and genomic ancestry distribution were assayed by Multiplex-PCR reaction, separated by capillary electrophoresis on the ABI 3130 Genetic Analyzer instrument and analyzed on GeneMapper ID v3.2. Clinicopathological data were obtained by consulting the patients’ clinical charts, intra-operative documentation, and pathology scoring. RESULTS Logistic regression analysis showed that polymorphism variations in IL4 gene was associated with increased CRC risk, while TYMS and UCP2 genes were associated with decreased risk. Reference to anatomical localization of tumor Del allele of NFKB1 and CASP8 were associated with more colon related incidents than rectosigmoid. In relation to the INDEL association with tumor node metastasis (TNM) stage risk, the Ins alleles of ACE, HLAG and TP53 (6 bp INDEL) were associated with higher TNM stage. Furthermore, regarding INDEL association with relapse risk, the Ins alleles of ACE, HLAG, and UGT1A1 were associated with early relapse risk, as well as the Del allele of TYMS. Regarding INDEL association with death risk before 10 years, the Ins allele of SGSM3 and UGT1A1 were associated with death risk. CONCLUSION The INDEL variations in ACE, UCP2, TYMS, IL4, NFKB1, CASP8, TP53, HLAG, UGT1A1, and SGSM3 were associated with CRC risk and clinical features in an admixed population. These data suggest that this cancer panel might be useful as a complementary tool for better clinical management, and more studies need to be conducted to confirm these findings. PMID:29085228
Jiang, Jianping; Gao, Yahui; Hou, Yali; Li, Wenhui; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao
2016-01-01
The use of whole-genome resequencing to obtain more information on genetic variation could produce a range of benefits for the dairy cattle industry, especially with regard to increasing milk production and improving milk composition. In this study, we sequenced the genomes of eight Holstein bulls from four half- or full-sib families, with high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage at an average effective depth of 10×, using Illumina sequencing. Over 0.9 million nonredundant short insertions and deletions (indels) [1-49 base pairs (bp)] were obtained. Among them, 3,625 indels that were polymorphic between the high and low groups of bulls were revealed and subjected to further analysis. The vast majority (76.67%) of these indels were novel. Follow-up validation assays confirmed that most (70%) of the randomly selected indels represented true variations. The indels that were polymorphic between the two groups were annotated based on the cattle genome sequence assembly (UMD3.1.69); as a result, nearly 1,137 of them were found to be located within 767 annotated genes, only 5 (0.138%) of which were located in exons. Then, by integrated analysis of the 767 genes with known quantitative trait loci (QTL); significant single-nucleotide polymorphisms (SNPs) previously identified by genome-wide association studies (GWASs) to be associated with bovine milk protein and fat traits; and the well-known pathways involved in protein, fat synthesis, and metabolism, we identified a total of 11 promising candidate genes potentially affecting milk composition traits. These were FCGR2B, CENPE, RETSAT, ACSBG2, NFKB2, TBC1D1, NLK, MAP3K1, SLC30A2, ANGPT1 and UGDH. Our findings provide a basis for further study and reveal key genes for milk composition traits in dairy cattle.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data.
da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos
2013-12-01
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data
da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos
2013-01-01
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree. PMID:24385862
Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H
2018-04-12
Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.
Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V
2012-02-17
The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.
2009-01-01
Background Polymerase chain reaction (PCR) is very useful in many areas of molecular biology research. It is commonly observed that PCR success is critically dependent on design of an effective primer pair. Current tools for primer design do not adequately address the problem of PCR failure due to mis-priming on target-related sequences and structural variations in the genome. Methods We have developed an integrated graphical web-based application for primer design, called RExPrimer, which was written in Python language. The software uses Primer3 as the primer designing core algorithm. Locally stored sequence information and genomic variant information were hosted on MySQLv5.0 and were incorporated into RExPrimer. Results RExPrimer provides many functionalities for improved PCR primer design. Several databases, namely annotated human SNP databases, insertion/deletion (indel) polymorphisms database, pseudogene database, and structural genomic variation databases were integrated into RExPrimer, enabling an effective without-leaving-the-website validation of the resulting primers. By incorporating these databases, the primers reported by RExPrimer avoid mis-priming to related sequences (e.g. pseudogene, segmental duplication) as well as possible PCR failure because of structural polymorphisms (SNP, indel, and copy number variation (CNV)). To prevent mismatching caused by unexpected SNPs in the designed primers, in particular the 3' end (SNP-in-Primer), several SNP databases covering the broad range of population-specific SNP information are utilized to report SNPs present in the primer sequences. Population-specific SNP information also helps customize primer design for a specific population. Furthermore, RExPrimer offers a graphical user-friendly interface through the use of scalable vector graphic image that intuitively presents resulting primers along with the corresponding gene structure. In this study, we demonstrated the program effectiveness in successfully generating primers for strong homologous sequences. Conclusion The improvements for primer design incorporated into RExPrimer were demonstrated to be effective in designing primers for challenging PCR experiments. Integration of SNP and structural variation databases allows for robust primer design for a variety of PCR applications, irrespective of the sequence complexity in the region of interest. This software is freely available at http://www4a.biotec.or.th/rexprimer. PMID:19958502
Srivastava, Rishi; Singh, Mohar; Bajaj, Deepak; Parida, Swarup K.
2016-01-01
Development and large-scale genotyping of user-friendly informative genome/gene-derived InDel markers in natural and mapping populations is vital for accelerating genomics-assisted breeding applications of chickpea with minimal resource expenses. The present investigation employed a high-throughput whole genome next-generation resequencing strategy in low and high pod number parental accessions and homozygous individuals constituting the bulks from each of two inter-specific mapping populations [(Pusa 1103 × ILWC 46) and (Pusa 256 × ILWC 46)] to develop non-erroneous InDel markers at a genome-wide scale. Comparing these high-quality genomic sequences, 82,360 InDel markers with reference to kabuli genome and 13,891 InDel markers exhibiting differentiation between low and high pod number parental accessions and bulks of aforementioned mapping populations were developed. These informative markers were structurally and functionally annotated in diverse coding and non-coding sequence components of genome/genes of kabuli chickpea. The functional significance of regulatory and coding (frameshift and large-effect mutations) InDel markers for establishing marker-trait linkages through association/genetic mapping was apparent. The markers detected a greater amplification (97%) and intra-specific polymorphic potential (58–87%) among a diverse panel of cultivated desi, kabuli, and wild accessions even by using a simpler cost-efficient agarose gel-based assay implicating their utility in large-scale genetic analysis especially in domesticated chickpea with narrow genetic base. Two high-density inter-specific genetic linkage maps generated using aforesaid mapping populations were integrated to construct a consensus 1479 InDel markers-anchored high-resolution (inter-marker distance: 0.66 cM) genetic map for efficient molecular mapping of major QTLs governing pod number and seed yield per plant in chickpea. Utilizing these high-density genetic maps as anchors, three major genomic regions harboring each of pod number and seed yield robust QTLs (15–28% phenotypic variation explained) were identified on chromosomes 2, 4, and 6. The integration of genetic and physical maps at these QTLs mapped on chromosomes scaled-down the long major QTL intervals into high-resolution short pod number and seed yield robust QTL physical intervals (0.89–2.94 Mb) which were essentially got validated in multiple genetic backgrounds of two chickpea mapping populations. The genome-wide InDel markers including natural allelic variants and genomic loci/genes delineated at major six especially in one colocalized novel congruent robust pod number and seed yield robust QTLs mapped on a high-density consensus genetic map were found most promising in chickpea. These functionally relevant molecular tags can drive marker-assisted genetic enhancement to develop high-yielding cultivars with increased seed/pod number and yield in chickpea. PMID:27695461
Molecular spectrum of somaclonal variation in regenerated rice revealed by whole-genome sequencing.
Miyao, Akio; Nakagome, Mariko; Ohnuma, Takako; Yamagata, Harumi; Kanamori, Hiroyuki; Katayose, Yuichi; Takahashi, Akira; Matsumoto, Takashi; Hirochika, Hirohiko
2012-01-01
Somaclonal variation is a phenomenon that results in the phenotypic variation of plants regenerated from cell culture. One of the causes of somaclonal variation in rice is the transposition of retrotransposons. However, many aspects of the mechanisms that result in somaclonal variation remain undefined. To detect genome-wide changes in regenerated rice, we analyzed the whole-genome sequences of three plants independently regenerated from cultured cells originating from a single seed stock. Many single-nucleotide polymorphisms (SNPs) and insertions and deletions (indels) were detected in the genomes of the regenerated plants. The transposition of only Tos17 among 43 transposons examined was detected in the regenerated plants. Therefore, the SNPs and indels contribute to the somaclonal variation in regenerated rice in addition to the transposition of Tos17. The observed molecular spectrum was similar to that of the spontaneous mutations in Arabidopsis thaliana. However, the base change ratio was estimated to be 1.74 × 10(-6) base substitutions per site per regeneration, which is 248-fold greater than the spontaneous mutation rate of A. thaliana.
A global reference for human genetic variation
2016-01-01
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. PMID:26432245
Altools: a user friendly NGS data analyser.
Camiolo, Salvatore; Sablok, Gaurav; Porceddu, Andrea
2016-02-17
Genotyping by re-sequencing has become a standard approach to estimate single nucleotide polymorphism (SNP) diversity, haplotype structure and the biodiversity and has been defined as an efficient approach to address geographical population genomics of several model species. To access core SNPs and insertion/deletion polymorphisms (indels), and to infer the phyletic patterns of speciation, most such approaches map short reads to the reference genome. Variant calling is important to establish patterns of genome-wide association studies (GWAS) for quantitative trait loci (QTLs), and to determine the population and haplotype structure based on SNPs, thus allowing content-dependent trait and evolutionary analysis. Several tools have been developed to investigate such polymorphisms as well as more complex genomic rearrangements such as copy number variations, presence/absence variations and large deletions. The programs available for this purpose have different strengths (e.g. accuracy, sensitivity and specificity) and weaknesses (e.g. low computation speed, complex installation procedure and absence of a user-friendly interface). Here we introduce Altools, a software package that is easy to install and use, which allows the precise detection of polymorphisms and structural variations. Altools uses the BWA/SAMtools/VarScan pipeline to call SNPs and indels, and the dnaCopy algorithm to achieve genome segmentation according to local coverage differences in order to identify copy number variations. It also uses insert size information from the alignment of paired-end reads and detects potential large deletions. A double mapping approach (BWA/BLASTn) identifies precise breakpoints while ensuring rapid elaboration. Finally, Altools implements several processes that yield deeper insight into the genes affected by the detected polymorphisms. Altools was used to analyse both simulated and real next-generation sequencing (NGS) data and performed satisfactorily in terms of positive predictive values, sensitivity, the identification of large deletion breakpoints and copy number detection. Altools is fast, reliable and easy to use for the mining of NGS data. The software package also attempts to link identified polymorphisms and structural variants to their biological functions thus providing more valuable information than similar tools.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.
Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I
2017-12-19
To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls
Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M.; Agarwala, Vineeta; Gaulton, Kyle J.; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J.; Rivas, Manuel A.; Perry, John R. B.; Sim, Xueling; Blackwell, Thomas W.; Robertson, Neil R.; Rayner, N William; Cingolani, Pablo; Locke, Adam E.; Tajes, Juan Fernandez; Highland, Heather M.; Dupuis, Josee; Chines, Peter S.; Lindgren, Cecilia M.; Hartl, Christopher; Jackson, Anne U.; Chen, Han; Huyghe, Jeroen R.; van de Bunt, Martijn; Pearson, Richard D.; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M.; Gamazon, Eric R.; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A.; Below, Jennifer E.; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L.; Pasko, Dorota; Parker, Stephen C. J.; Varga, Tibor V.; Green, Todd; Beer, Nicola L.; Day-Williams, Aaron G.; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J.; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P.; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F.; Han, Bok-Ghee; Jenkinson, Christopher P.; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C. Y.; Palmer, Nicholette D.; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E.; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D.; Neale, Benjamin M.; Purcell, Shaun; Butterworth, Adam S.; Howson, Joanna M. M.; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K. L.; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H. T.; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E.; Rybin, Dennis; Farook, Vidya S.; Fowler, Sharon P.; Freedman, Barry I.; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J.; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K.; Puppala, Sobha; Scott, William R.; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A.; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C.; Mangino, Massimo; Bonnycastle, Lori L.; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L.; Herder, Christian; Groves, Christopher J.; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A.; Doney, Alex S. F.; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J.; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E.; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H.; Stirrups, Kathleen; Wood, Andrew R.; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O.; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P.; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B.; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N. A.; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M.; Syvänen, Ann-Christine; Bergman, Richard N.; Bharadwaj, Dwaipayan; Bottinger, Erwin P.; Cho, Yoon Shin; Chandak, Giriraj R.; Chan, Juliana CN; Chia, Kee Seng; Daly, Mark J.; Ebrahim, Shah B.; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A.; Lehman, Donna M.; Jia, Weiping; Ma, Ronald C. W.; Pollin, Toni I.; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J. F.; Small, Kerrin S.; Ried, Janina S.; DeFronzo, Ralph A.; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J.; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W.; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R.; Gloyn, Anna L.; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D.; Hattersley, Andrew T.; Bowden, Donald W.; Collins, Francis S.; Atzmon, Gil; Chambers, John C.; Spector, Timothy D.; Laakso, Markku; Strom, Tim M.; Bell, Graeme I.; Blangero, John; Duggirala, Ravindranath; Tai, E. Shyong; McVean, Gilean; Hanis, Craig L.; Wilson, James G.; Seielstad, Mark; Frayling, Timothy M.; Meigs, James B.; Cox, Nancy J.; Sladek, Rob; Lander, Eric S.; Gabriel, Stacey; Mohlke, Karen L.; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J.; Morris, Andrew P.; Kang, Hyun Min; Altshuler, David; Burtt, Noël P.; Florez, Jose C.; Boehnke, Michael; McCarthy, Mark I.
2017-01-01
To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D. PMID:29257133
Martínez-Cortés, G; Gusmão, L; Pereira, R; Salcido, V H; Favela-Mendoza, A F; Muñoz-Valle, J F; Inclán-Sánchez, A; López-Hernández, L B; Rangel-Villalobos, H
2015-07-01
Insertion-deletions for human identification purposes (HID-Indels) offer advantages to solve particular forensic situations and complex paternity cases. In Mexico, admixed population known as Mestizos is the largest (∼90%), plus a number of Amerindian groups (∼10%), which have not been studied with HID-Indels. For this reason, allele frequencies and forensic parameters for 38 HID-Indels were estimated in 531 unrelated individuals from one Amerindian (Purépecha) and seven Mestizo populations from different regions of the country. Genotype distribution was in agreement with Hardy-Weinberg expectations in almost all loci/populations. The linkage disequilibrium (LD) test did not reveal possible associations between loci pairs in all eight Mexican populations. The combined power of discrimination was high in all populations (PD >99.99999999998%). However, the power of exclusion of the 38 HID-Indel system (PE >99.6863%) was reduced regarding most of autosomal STR kits. The assessment of genetic structure (AMOVA) and relationships between populations (FST) demonstrated significant differences among Mexican populations, mainly of the Purépecha Amerindian group. Among Mexican-Mestizos, three population clusters consistent with geography were defined: (i) North-West region: Chihuahua, Sinaloa, and Jalisco; (ii) Central-Southern region: Mexico City, Veracruz and Yucatan; (iii) South region: Chiapas. In brief, this report validates the inclusion of the 38 HID-Indel system in forensic casework and paternity cases in seven Mexican-Mestizo populations from different regions, and in one Mexican Amerindian group. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore
2017-01-01
The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732
Gupta, R S
1998-12-01
The presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature sequence that shows considerable promise for phylogenetic inference. An alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed. In this model, extant archaebacteria and gram-positive bacteria, which have a simple, single-layered cell wall structure, are termed monoderm prokaryotes. They are believed to be descended from the most primitive organisms. Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria, and I suggest that this evolution occurred in response to antibiotic selection pressures. Evidence is presented that diderm prokaryotes (i.e., gram-negative bacteria), which have a bilayered cell wall, are derived from monoderm prokaryotes. Signature sequences in different proteins provide a means to define a number of different taxa within prokaryotes (namely, low G+C and high G+C gram-positive, Deinococcus-Thermus, cyanobacteria, chlamydia-cytophaga related, and two different groups of Proteobacteria) and to indicate how they evolved from a common ancestor. Based on phylogenetic information from indels in different protein sequences, it is hypothesized that all eukaryotes, including amitochondriate and aplastidic organisms, received major gene contributions from both an archaebacterium and a gram-negative eubacterium. In this model, the ancestral eukaryotic cell is a chimera that resulted from a unique fusion event between the two separate groups of prokaryotes followed by integration of their genomes.
Gupta, Radhey S.
1998-01-01
The presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature sequence that shows considerable promise for phylogenetic inference. An alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed. In this model, extant archaebacteria and gram-positive bacteria, which have a simple, single-layered cell wall structure, are termed monoderm prokaryotes. They are believed to be descended from the most primitive organisms. Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria, and I suggest that this evolution occurred in response to antibiotic selection pressures. Evidence is presented that diderm prokaryotes (i.e., gram-negative bacteria), which have a bilayered cell wall, are derived from monoderm prokaryotes. Signature sequences in different proteins provide a means to define a number of different taxa within prokaryotes (namely, low G+C and high G+C gram-positive, Deinococcus-Thermus, cyanobacteria, chlamydia-cytophaga related, and two different groups of Proteobacteria) and to indicate how they evolved from a common ancestor. Based on phylogenetic information from indels in different protein sequences, it is hypothesized that all eukaryotes, including amitochondriate and aplastidic organisms, received major gene contributions from both an archaebacterium and a gram-negative eubacterium. In this model, the ancestral eukaryotic cell is a chimera that resulted from a unique fusion event between the two separate groups of prokaryotes followed by integration of their genomes. PMID:9841678
Liu, Wen; Ghouri, Fozia; Yu, Hang; Li, Xiang; Yu, Shuhong; Shahid, Muhammad Qasim; Liu, Xiangdong
2017-01-01
Common wild rice (Oryza rufipogon Griff.) is an important germplasm for rice breeding, which contains many resistance genes. Re-sequencing provides an unprecedented opportunity to explore the abundant useful genes at whole genome level. Here, we identified the nucleotide-binding site leucine-rich repeat (NBS-LRR) encoding genes by re-sequencing of two wild rice lines (i.e. Huaye 1 and Huaye 2) that were developed from common wild rice. We obtained 128 to 147 million reads with approximately 32.5-fold coverage depth, and uniquely covered more than 89.6% (> = 1 fold) of reference genomes. Two wild rice lines showed high SNP (single-nucleotide polymorphisms) variation rate in 12 chromosomes against the reference genomes of Nipponbare (japonica cultivar) and 93-11 (indica cultivar). InDels (insertion/deletion polymorphisms) count-length distribution exhibited normal distribution in the two lines, and most of the InDels were ranged from -5 to 5 bp. With reference to the Nipponbare genome sequence, we detected a total of 1,209,308 SNPs, 161,117 InDels and 4,192 SVs (structural variations) in Huaye 1, and 1,387,959 SNPs, 180,226 InDels and 5,305 SVs in Huaye 2. A total of 44.9% and 46.9% genes exhibited sequence variations in two wild rice lines compared to the Nipponbare and 93-11 reference genomes, respectively. Analysis of NBS-LRR mutant candidate genes showed that they were mainly distributed on chromosome 11, and NBS domain was more conserved than LRR domain in both wild rice lines. NBS genes depicted higher levels of genetic diversity in Huaye 1 than that found in Huaye 2. Furthermore, protein-protein interaction analysis showed that NBS genes mostly interacted with the cytochrome C protein (Os05g0420600, Os01g0885000 and BGIOSGA038922), while some NBS genes interacted with heat shock protein, DNA-binding activity, Phosphoinositide 3-kinase and a coiled coil region. We explored abundant NBS-LRR encoding genes in two common wild rice lines through genome wide re-sequencing, which proved to be a useful tool to exploit elite NBS-LRR genes in wild rice. The data here provide a foundation for future work aimed at dissecting the genetic basis of disease resistance in rice, and the two wild rice lines will be useful germplasm for the molecular improvement of cultivated rice.
Yu, Hang; Li, Xiang; Yu, Shuhong; Shahid, Muhammad Qasim
2017-01-01
Common wild rice (Oryza rufipogon Griff.) is an important germplasm for rice breeding, which contains many resistance genes. Re-sequencing provides an unprecedented opportunity to explore the abundant useful genes at whole genome level. Here, we identified the nucleotide-binding site leucine-rich repeat (NBS-LRR) encoding genes by re-sequencing of two wild rice lines (i.e. Huaye 1 and Huaye 2) that were developed from common wild rice. We obtained 128 to 147 million reads with approximately 32.5-fold coverage depth, and uniquely covered more than 89.6% (> = 1 fold) of reference genomes. Two wild rice lines showed high SNP (single-nucleotide polymorphisms) variation rate in 12 chromosomes against the reference genomes of Nipponbare (japonica cultivar) and 93–11 (indica cultivar). InDels (insertion/deletion polymorphisms) count-length distribution exhibited normal distribution in the two lines, and most of the InDels were ranged from -5 to 5 bp. With reference to the Nipponbare genome sequence, we detected a total of 1,209,308 SNPs, 161,117 InDels and 4,192 SVs (structural variations) in Huaye 1, and 1,387,959 SNPs, 180,226 InDels and 5,305 SVs in Huaye 2. A total of 44.9% and 46.9% genes exhibited sequence variations in two wild rice lines compared to the Nipponbare and 93–11 reference genomes, respectively. Analysis of NBS-LRR mutant candidate genes showed that they were mainly distributed on chromosome 11, and NBS domain was more conserved than LRR domain in both wild rice lines. NBS genes depicted higher levels of genetic diversity in Huaye 1 than that found in Huaye 2. Furthermore, protein-protein interaction analysis showed that NBS genes mostly interacted with the cytochrome C protein (Os05g0420600, Os01g0885000 and BGIOSGA038922), while some NBS genes interacted with heat shock protein, DNA-binding activity, Phosphoinositide 3-kinase and a coiled coil region. We explored abundant NBS-LRR encoding genes in two common wild rice lines through genome wide re-sequencing, which proved to be a useful tool to exploit elite NBS-LRR genes in wild rice. The data here provide a foundation for future work aimed at dissecting the genetic basis of disease resistance in rice, and the two wild rice lines will be useful germplasm for the molecular improvement of cultivated rice. PMID:28700714
Glaser-Schmitt, Amanda; Duchen, Pablo; Parsch, John
2016-01-01
Insertions and deletions (indels) are a major source of genetic variation within species and may result in functional changes to coding or regulatory sequences. In this study we report that an indel polymorphism in the 3’ untranslated region (UTR) of the metallothionein gene MtnA is associated with gene expression variation in natural populations of Drosophila melanogaster. A derived allele of MtnA with a 49-bp deletion in the 3' UTR segregates at high frequency in populations outside of sub-Saharan Africa. The frequency of the deletion increases with latitude across multiple continents and approaches 100% in northern Europe. Flies with the deletion have more than 4-fold higher MtnA expression than flies with the ancestral sequence. Using reporter gene constructs in transgenic flies, we show that the 3' UTR deletion significantly contributes to the observed expression difference. Population genetic analyses uncovered signatures of a selective sweep in the MtnA region within populations from northern Europe. We also find that the 3’ UTR deletion is associated with increased oxidative stress tolerance. These results suggest that the 3' UTR deletion has been a target of selection for its ability to confer increased levels of MtnA expression in northern European populations, likely due to a local adaptive advantage of increased oxidative stress tolerance. PMID:27120580
Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun
2015-01-01
Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.
Forensic efficiency and genetic variation of 30 InDels in Vietnamese and Nigerian populations.
Du, Weian; Peng, Zhiyong; Feng, Chunlei; Zhu, Bofeng; Wang, Bangchao; Wang, Yue; Liu, Chao; Chen, Ling
2017-10-24
Insertion/deletion polymorphisms (InDels) are ubiquitous diallelic genetic markers that have drawn increasing attention from forensic researchers. Here, we investigated 30 InDel loci in Vietnamese and Nigerian populations and evaluated their usefulness in forensic genetics. The polymorphic information content of these populations ranged, respectively, from 0.164 to 0.375 and from 0.090 to 0.375 across loci. After Bonferroni correction, no significant deviation from Hardy-Weinberg equilibrium was found, except for HLD97 in the Nigerian population. The cumulative power of exclusion for all 30 loci in the Vietnamese and Nigerian populations was 0.9870 and 0.9676, respectively, indicating that this InDel set is not suitable for paternity testing in these populations, but could be included as a supplement. For the Vietnamese and the Nigerian populations, the mean observed heterozygosity was 0.5917 and 0.6268, and the combined discrimination power of the 30 loci was 0.9999999999767 and 0.9999999999603, respectively. These findings indicated that these InDels may be suitable for personal forensic identification in the studied populations. The results of D A distance, phylogenetic tree, principal component, and cluster analyses were consistent and indicated a clear pattern of regional distribution. Moreover, the Vietnamese population was shown to have close genetic relationships with the Guangdong Han and Shanghai Han populations.
Forensic efficiency and genetic variation of 30 InDels in Vietnamese and Nigerian populations
Du, Weian; Peng, Zhiyong; Feng, Chunlei; Zhu, Bofeng; Wang, Bangchao; Wang, Yue; Liu, Chao; Chen, Ling
2017-01-01
Insertion/deletion polymorphisms (InDels) are ubiquitous diallelic genetic markers that have drawn increasing attention from forensic researchers. Here, we investigated 30 InDel loci in Vietnamese and Nigerian populations and evaluated their usefulness in forensic genetics. The polymorphic information content of these populations ranged, respectively, from 0.164 to 0.375 and from 0.090 to 0.375 across loci. After Bonferroni correction, no significant deviation from Hardy-Weinberg equilibrium was found, except for HLD97 in the Nigerian population. The cumulative power of exclusion for all 30 loci in the Vietnamese and Nigerian populations was 0.9870 and 0.9676, respectively, indicating that this InDel set is not suitable for paternity testing in these populations, but could be included as a supplement. For the Vietnamese and the Nigerian populations, the mean observed heterozygosity was 0.5917 and 0.6268, and the combined discrimination power of the 30 loci was 0.9999999999767 and 0.9999999999603, respectively. These findings indicated that these InDels may be suitable for personal forensic identification in the studied populations. The results of DA distance, phylogenetic tree, principal component, and cluster analyses were consistent and indicated a clear pattern of regional distribution. Moreover, the Vietnamese population was shown to have close genetic relationships with the Guangdong Han and Shanghai Han populations. PMID:29179488
Stafuzza, Nedenia Bonvino; Zerlotini, Adhemar; Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto
2017-01-01
Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.
Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J.; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto
2017-01-01
Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs. PMID:28323836
Somatic Genetic Variation in Solid Pseudopapillary Tumor of the Pancreas by Whole Exome Sequencing
Guo, Meng; Luo, Guopei; Jin, Kaizhou; Long, Jiang; Cheng, He; Lu, Yu; Wang, Zhengshi; Yang, Chao; Xu, Jin; Ni, Quanxing; Yu, Xianjun; Liu, Chen
2017-01-01
Solid pseudopapillary tumor of the pancreas (SPT) is a rare pancreatic disease with a unique clinical manifestation. Although CTNNB1 gene mutations had been universally reported, genetic variation profiles of SPT are largely unidentified. We conducted whole exome sequencing in nine SPT patients to probe the SPT-specific insertions and deletions (indels) and single nucleotide polymorphisms (SNPs). In total, 54 SNPs and 41 indels of prominent variations were demonstrated through parallel exome sequencing. We detected that CTNNB1 mutations presented throughout all patients studied (100%), and a higher count of SNPs was particularly detected in patients with older age, larger tumor, and metastatic disease. By aggregating 95 detected variation events and viewing the interconnections among each of the genes with variations, CTNNB1 was identified as the core portion in the network, which might collaborate with other events such as variations of USP9X, EP400, HTT, MED12, and PKD1 to regulate tumorigenesis. Pathway analysis showed that the events involved in other cancers had the potential to influence the progression of the SNPs count. Our study revealed an insight into the variation of the gene encoding region underlying solid-pseudopapillary neoplasm tumorigenesis. The detection of these variations might partly reflect the potential molecular mechanism. PMID:28054945
Xie, Tong; Guo, Yuxin; Chen, Ling; Fang, Yating; Tai, Yunchun; Zhou, Yongsong; Qiu, Pingming; Zhu, Bofeng
2018-07-01
In recent years, insertion/deletion (InDel) markers have become a promising and useful supporting tool in forensic identification cases and biogeographic research field. In this study, 30 InDel loci were explored to reveal the genetic diversities and genetic relationships between Chinese Xinjiang Hui group and the 25 previously reported populations using various biostatistics methods such as forensic statistical parameter analysis, phylogenetic reconstruction, multi-dimensional scaling, principal component analysis, and STRUCTURE analysis. No deviations from Hardy-Weinberg equilibrium tests were found at all 30 loci in the Chinese Xinjiang Hui group. The observed heterozygosity and expected heterozygosity ranged from 0.1971 (HLD118) to 0.5092 (HLD92), 0.2222 (HLD118) to 0.5000 (HLD6), respectively. The cumulative probability of exclusion and combined power of discrimination were 0.988849 and 0.99999999999378, respectively, which indicated that these 30 loci could be qualified for personal identification and used as complementary genetic markers for paternity tests in forensic cases. The results of present research based on the different methods of population genetic analysis revealed that the Chinese Xinjiang Hui group had close relationships with most Chinese groups, especially Han populations. In spite of this, for a better understanding of genetic background of the Chinese Xinjiang Hui group, more molecular genetic markers such as ancestry informative markers, single nucleotide polymorphisms (SNPs), and copy number variations will be conducted in future studies. Copyright © 2018 Elsevier B.V. All rights reserved.
Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees.
Mahmud, Md Pavel; Wiedenhoeft, John; Schliep, Alexander
2012-09-15
Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L(1) distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L(1) distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. pavelm@cs.rutgers.edu Supplementary data are available at Bioinformatics online.
Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees
Mahmud, Md Pavel; Wiedenhoeft, John; Schliep, Alexander
2012-01-01
Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L1 distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L1 distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. Availability and implementation: TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. Contact: pavelm@cs.rutgers.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22962448
Verlengia, Rozangela; Rebelo, Ana C; Crisp, Alex H; Kunz, Vandeni C; Dos Santos Carneiro Cordeiro, Marco A; Hirata, Mario H; Crespo Hirata, Rosario D; Silva, Ester
2014-09-01
Polymorphisms at the angiotensin-converting enzyme gene (ACE), such as the indel [rs1799752] variant in intron 16, have been shown to be associated with aerobic performance of athletes and non-athletes. However, the relationship between ACE indel polymorphism and cardiorespiratory fitness has not been always demonstrated. The relationship between ACE indel polymorphism and cardiorespiratory fitness was investigated in a sample of young Caucasian Brazilian women. This study investigated 117 healthy women (aged 18 to 30 years) who were grouped as physically active (n = 59) or sedentary (n = 58). All subjects performed an incremental exercise test (ramp protocol) on a cycle-ergometer with 20-25 W/min increments. Blood samples were obtained for DNA extraction and to analyze metabolic and hormonal profiles. ACE indel polymorphism was determined by polymerase chain reaction (PCR) and fragment size analysis. The physically active group had higher values of peak oxygen uptake (VO2 peak), carbon dioxide output (VCO2), ventilation (VE) and power output than the sedentary group (P < 0.05) at the peak of the exercise test. However, heart rate (HR), systolic blood pressure (SBP) and diastolic blood pressure (DBP) did not differ between groups. There was no relationship between ACE indel polymorphism and cardiorespiratory variables during the test in both the physically active and sedentary groups, even when the dominant (DD vs. D1 + 2) and recessive (2 vs. DI + DD) models of inheritance were tested. These results do not support the concept that the genetic variation at the ACE locus contributes to the cardiorespiratory responses at the peak of exercise test in physically active or sedentary healthy women. This indicates that other factors might mediate these responses, including the physical training level of the women.
Verlengia, Rozangela; Rebelo, Ana C.; Crisp, Alex H.; Kunz, Vandeni C.; dos Santos Carneiro Cordeiro, Marco A.; Hirata, Mario H.; Crespo Hirata, Rosario D.; Silva, Ester
2014-01-01
Background: Polymorphisms at the angiotensin-converting enzyme gene (ACE), such as the indel [rs1799752] variant in intron 16, have been shown to be associated with aerobic performance of athletes and non-athletes. However, the relationship between ACE indel polymorphism and cardiorespiratory fitness has not been always demonstrated. Objectives: The relationship between ACE indel polymorphism and cardiorespiratory fitness was investigated in a sample of young Caucasian Brazilian women. Patients and Methods: This study investigated 117 healthy women (aged 18 to 30 years) who were grouped as physically active (n = 59) or sedentary (n = 58). All subjects performed an incremental exercise test (ramp protocol) on a cycle-ergometer with 20-25 W/min increments. Blood samples were obtained for DNA extraction and to analyze metabolic and hormonal profiles. ACE indel polymorphism was determined by polymerase chain reaction (PCR) and fragment size analysis. Results: The physically active group had higher values of peak oxygen uptake (VO2 peak), carbon dioxide output (VCO2), ventilation (VE) and power output than the sedentary group (P < 0.05) at the peak of the exercise test. However, heart rate (HR), systolic blood pressure (SBP) and diastolic blood pressure (DBP) did not differ between groups. There was no relationship between ACE indel polymorphism and cardiorespiratory variables during the test in both the physically active and sedentary groups, even when the dominant (DD vs. D1 + 2) and recessive (2 vs. DI + DD) models of inheritance were tested. Conclusions: These results do not support the concept that the genetic variation at the ACE locus contributes to the cardiorespiratory responses at the peak of exercise test in physically active or sedentary healthy women. This indicates that other factors might mediate these responses, including the physical training level of the women. PMID:25520764
Rosse, Izinara da Cruz; Steinberg, Raphael da Silva; Coimbra, Roney Santos; Peixoto, Maria Gabriela Campolina Diniz; Verneque, Rui Silva; Machado, Marco Antonio; Fonseca, Cleusa Graça; Carvalho, Maria Raquel Santos
2014-07-01
Diacylglycerol-O-acyltransferase (DGAT1) gene encodes the rate-limiting enzyme of triglyceride synthesis. A polymorphism in this gene, DGAT1 K232A, has been associated with milk production and composition in taurine breeds. However, this polymorphism is not a good tool for ascertaining the effects of this QTL in Bos indicus (Zebu), since the frequency of the DGAT1 232A allele is too low in these breeds. We sequenced the 3'-untranslated region of DGAT1 gene in a sample of bulls of the breeds Guzerá (Bos indicus) and Holstein (Bos taurus) and, using in silico analysis, we searched for genetic variation, evolutionary conservation, regulatory elements, and possible substitution effects. Six single nucleotide (SNPs) and one insertion-deletion (INDEL) polymorphisms were found in the Guzerá bulls. Additionally, we developed a preliminary association study, using this INDEL polymorphism as a genetic marker. A significant association was detected (P ≤ 0.05) between the INDEL (DGAT1 3'UTR INDEL) and the breeding values (BV) for protein, fat, and milk yields over a 305-day lactation period. The DGAT1 3' UTR INDEL genotype I/I (I, for insertion) was associated with lower BVs (-38.77 kg for milk, -1.86 kg for fat, and -1.48 kg for protein yields), when compared to the genotype I/D (D, for deletion). I/D genotype was lower D/D genotype (-34.98 kg milk, -1.73 kg fat, and -1.09 kg protein yields). This study reports the first polymorphism of DGAT1 3'UTR in the Guzerá breed, as well as its association with BV for milk protein, fat, and milk yields.
Hehir-Kwa, Jayne Y; Marschall, Tobias; Kloosterman, Wigard P; Francioli, Laurent C; Baaijens, Jasmijn A; Dijkstra, Louis J; Abdellaoui, Abdel; Koval, Vyacheslav; Thung, Djie Tjwan; Wardenaar, René; Renkens, Ivo; Coe, Bradley P; Deelen, Patrick; de Ligt, Joep; Lameijer, Eric-Wubbo; van Dijk, Freerk; Hormozdiari, Fereydoun; Uitterlinden, André G; van Duijn, Cornelia M; Eichler, Evan E; de Bakker, Paul I W; Swertz, Morris A; Wijmenga, Cisca; van Ommen, Gert-Jan B; Slagboom, P Eline; Boomsma, Dorret I; Schönhuth, Alexander; Ye, Kai; Guryev, Victor
2016-10-06
Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.
Avelange-Macherel, Marie-Hélène; Payet, Nicole; Lalanne, David; Neveu, Martine; Tolleter, Dimitri; Burstin, Judith; Macherel, David
2015-07-01
LEAM, a late embryogenesis abundant protein, and HSP22, a small heat shock protein, were shown to accumulate in the mitochondria during pea (Pisum sativum L.) seed development, where they are expected to contribute to desiccation tolerance. Here, their expression was examined in seeds of 89 pea genotypes by Western blot analysis. All genotypes expressed LEAM and HSP22 in similar amounts. In contrast with HSP22, LEAM displayed different isoforms according to apparent molecular mass. Each of the 89 genotypes harboured a single LEAM isoform. Genomic and RT-PCR analysis revealed four LEAM genes differing by a small variable indel in the coding region. These variations were consistent with the apparent molecular mass of each isoform. Indels, which occurred in repeated domains, did not alter the main properties of LEAM. Structural modelling indicated that the class A α-helix structure, which allows interactions with the mitochondrial inner membrane in the dry state, was preserved in all isoforms, suggesting functionality is maintained. The overall results point out the essential character of LEAM and HSP22 in pea seeds. LEAM variability is discussed in terms of pea breeding history as well as LEA gene evolution mechanisms. © 2014 John Wiley & Sons Ltd.
Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong
2014-01-01
Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372
Moghaddam, Samira Mafi; Song, Qijian; Mamidi, Sujan; Schmutz, Jeremy; Lee, Rian; Cregan, Perry; Osorno, Juan M; McClean, Phillip E
2014-01-01
Next generation sequence data provides valuable information and tools for genetic and genomic research and offers new insights useful for marker development. This data is useful for the design of accurate and user-friendly molecular tools. Common bean (Phaseolus vulgaris L.) is a diverse crop in which separate domestication events happened in each gene pool followed by race and market class diversification that has resulted in different morphological characteristics in each commercial market class. This has led to essentially independent breeding programs within each market class which in turn has resulted in limited within market class sequence variation. Sequence data from selected genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney) were used to develop InDel-based markers specific to each market class. Design of the InDel markers was conducted through a combination of assembly, alignment and primer design software using 1.6× to 5.1× coverage of Illumina GAII sequence data for each of the selected genotypes. The procedure we developed for primer design is fast, accurate, less error prone, and higher throughput than when they are designed manually. All InDel markers are easy to run and score with no need for PCR optimization. A total of 2687 InDel markers distributed across the genome were developed. To highlight their usefulness, they were employed to construct a phylogenetic tree and a genetic map, showing that InDel markers are reliable, simple, and accurate.
Moghaddam, Samira Mafi; Song, Qijian; Mamidi, Sujan; Schmutz, Jeremy; Lee, Rian; Cregan, Perry; Osorno, Juan M.; McClean, Phillip E.
2013-01-01
Next generation sequence data provides valuable information and tools for genetic and genomic research and offers new insights useful for marker development. This data is useful for the design of accurate and user-friendly molecular tools. Common bean (Phaseolus vulgaris L.) is a diverse crop in which separate domestication events happened in each gene pool followed by race and market class diversification that has resulted in different morphological characteristics in each commercial market class. This has led to essentially independent breeding programs within each market class which in turn has resulted in limited within market class sequence variation. Sequence data from selected genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney) were used to develop InDel-based markers specific to each market class. Design of the InDel markers was conducted through a combination of assembly, alignment and primer design software using 1.6× to 5.1× coverage of Illumina GAII sequence data for each of the selected genotypes. The procedure we developed for primer design is fast, accurate, less error prone, and higher throughput than when they are designed manually. All InDel markers are easy to run and score with no need for PCR optimization. A total of 2687 InDel markers distributed across the genome were developed. To highlight their usefulness, they were employed to construct a phylogenetic tree and a genetic map, showing that InDel markers are reliable, simple, and accurate. PMID:24860578
Multiplex pyrosequencing of InDel markers for forensic DNA analysis.
Bus, Magdalena M; Karas, Ognjen; Allen, Marie
2016-12-01
The capillary electrophoresis (CE) technology is commonly used for fragment length separation of markers in forensic DNA analysis. In this study, pyrosequencing technology was used as an alternative and rapid tool for the analysis of biallelic InDel (insertion/deletion) markers for individual identification. The DNA typing is based on a subset of the InDel markers that are included in the Investigator ® DIPplex Kit, which are sequenced in a multiplex pyrosequencing analysis. To facilitate the analysis of degraded DNA, the polymerase chain reaction (PCR) fragments were kept short in the primer design. Samples from individuals of Swedish origin were genotyped using the pyrosequencing strategy and analysis of the Investigator ® DIPplex markers with CE. A comparison between the pyrosequencing and CE data revealed concordant results demonstrating a robust and correct genotyping by pyrosequencing. Using optimal marker combination and a directed dispensation strategy, five markers could be multiplexed and analyzed simultaneously. In this proof-of-principle study, we demonstrate that multiplex InDel pyrosequencing analysis is possible. However, further studies on degraded samples, lower DNA quantities, and mixtures will be required to fully optimize InDel analysis by pyrosequencing for forensic applications. Overall, although CE analysis is implemented in most forensic laboratories, multiplex InDel pyrosequencing offers a cost-effective alternative for some applications. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DCJ-indel and DCJ-substitution distances with distinct operation costs
2013-01-01
Background Classical approaches to compute the genomic distance are usually limited to genomes with the same content and take into consideration only rearrangements that change the organization of the genome (i.e. positions and orientation of pieces of DNA, number and type of chromosomes, etc.), such as inversions, translocations, fusions and fissions. These operations are generically represented by the double-cut and join (DCJ) operation. The distance between two genomes, in terms of number of DCJ operations, can be computed in linear time. In order to handle genomes with distinct contents, also insertions and deletions of fragments of DNA – named indels – must be allowed. More powerful than an indel is a substitution of a fragment of DNA by another fragment of DNA. Indels and substitutions are called content-modifying operations. It has been shown that both the DCJ-indel and the DCJ-substitution distances can also be computed in linear time, assuming that the same cost is assigned to any DCJ or content-modifying operation. Results In the present study we extend the DCJ-indel and the DCJ-substitution models, considering that the content-modifying cost is distinct from and upper bounded by the DCJ cost, and show that the distance in both models can still be computed in linear time. Although the triangular inequality can be disrupted in both models, we also show how to efficiently fix this problem a posteriori. PMID:23879938
Yang, Rendong; Nelson, Andrew C; Henzler, Christine; Thyagarajan, Bharat; Silverstein, Kevin A T
2015-12-07
Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.
A Stochastic Evolutionary Model for Protein Structure Alignment and Phylogeny
Challis, Christopher J.; Schmidler, Scott C.
2012-01-01
We present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model, and mutations follow a standard substitution matrix, whereas backbone atoms diffuse in three-dimensional space according to an Ornstein–Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables phylogenetic inference on time scales not previously attainable with sequence evolution models. The model also provides a tool for testing evolutionary hypotheses and improving our understanding of protein structural evolution. PMID:22723302
Flores-Martínez, Silvia Esperanza; Castro-Martínez, Anna Gabriela; López-Quintero, Andrés; García-Zapién, Alejandra Guadalupe; Torres-Rodríguez, Ruth Noemí; Sánchez-Corona, José
2015-01-01
Polycystic ovary syndrome is a complex and heterogeneous disease involving both reproductive and metabolic problems. It has been suggested a genetic predisposition in the etiology of this syndrome. The identification of calpain-10 gene (CAPN10) as the first candidate gene for type 2 diabetes mellitus, has focused the interest in investigating their possible relation with the polycystic ovary syndrome, because this syndrome is associated with hyperinsulinemia and insulin resistance, two metabolic abnormalities associated with type 2 diabetes mellitus. To investigate if there is association between the SNP-63 and the variant indel-19 of the CAPN10 gene and polycystic ovary syndrome in women of reproductive age. This study included 101 women (55 with polycystic ovary syndrome and 46 without polycystic ovary syndrome). The genetic variant indel-19 was identified by electrophoresis of the amplified fragments by PCR, and the SNP-63 by PCR-RFLP. The allele and genotype frequencies of the two variants do not differ significatly between women with polycystic ovary syndrome and control women group. The haplotype 21 (defined by the insertion allele of indel-19 variant and C allele of SNP-63) was found with higher frequency in both study groups, being more frequent in the polycystic ovary syndrome patients group, however, this difference was not statistically significant (p = 0.8353). The results suggest that SNP-63 and indel-19 variant of the CAPN10 gene do not represent a risk factor for polycystic ovary syndrome in our patients group. Copyright © 2015. Published by Masson Doyma México S.A.
Insertion/Deletion Within the KDM6A Gene Is Significantly Associated With Litter Size in Goat
Cui, Yang; Yan, Hailong; Wang, Ke; Xu, Han; Zhang, Xuelian; Zhu, Haijing; Liu, Jinwang; Qu, Lei; Lan, Xianyong; Pan, Chuanying
2018-01-01
A previous whole-genome association analysis identified lysine demethylase 6A (KDM6A), which encodes a type of histone demethylase, as a candidate gene associated to goat fecundity. KDM6A gene knockout mouse disrupts gametophyte development, suggesting that it has a critical role in reproduction. In this study, goat KDM6A mRNA expression profiles were determined, insertion/deletion (indel) variants in the gene identified, indel variants effect on KDM6A gene expression assessed, and their association with first-born litter size analyzed in 2326 healthy female Shaanbei white cashmere goats. KDM6A mRNA was expressed in all tissues tested (heart, liver, spleen, lung, kidney, muscle, brain, skin and testis); the expression levels in testes at different developmental stages [1-week-old (wk), 2, 3 wk, 1-month-old (mo), 1.5 and 2 mo] indicated a potential association with the mitosis-to-meiosis transition, implying that KDM6A may have an essential role in goat fertility. Meanwhile, two novel intronic indels of 16 bp and 5 bp were identified. Statistical analysis revealed that only the 16 bp indel was associated with first-born litter size (P < 0.01), and the average first-born litter size of individuals with an insertion/insertion genotype higher than that of those with the deletion/deletion genotype (P < 0.05). There was also a significant difference in genotype distributions of the 16 bp indel between mothers of single-lamb and multi-lamb litters in the studied goat population (P = 0.001). Consistently, the 16 bp indel also had a significant effect on KDM6A gene expression. Additionally, there was no significant linkage disequilibrium (LD) between these two indel loci, consistent with the association analysis results. Together, these findings suggest that the 16 bp indel in KDM6A may be useful for marker-assisted selection (MAS) of goats. PMID:29616081
Local Structural Differences in Homologous Proteins: Specificities in Different SCOP Classes
Joseph, Agnel Praveen; Valadié, Hélène; Srinivasan, Narayanaswamy; de Brevern, Alexandre G.
2012-01-01
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions. PMID:22745680
Wild-Type Measles Viruses with Non-Standard Genome Lengths
Bankamp, Bettina; Liu, Chunyu; Rivailler, Pierre; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen F.; Bellini, William J.; Rota, Paul A.
2014-01-01
The length of the single stranded, negative sense RNA genome of measles virus (MeV) is highly conserved at 15,894 nucleotides (nt). MeVs can be grouped into 24 genotypes based on the highly variable 450 nucleotides coding for the carboxyl-terminus of the nucleocapsid protein (N-450). Here, we report the genomic sequences of 2 wild-type viral isolates of genotype D4 with genome lengths of 15,900 nt. Both genomes had a 7 nt insertion in the 3′ untranslated region (UTR) of the matrix (M) gene and a 1 nt deletion in the 5′ UTR of the fusion (F) gene. The net gain of 6 nt complies with the rule-of-six required for replication competency of the genomes of morbilliviruses. The insertions and deletion (indels) were confirmed in a patient sample that was the source of one of the viral isolates. The positions of the indels were identical in both viral isolates, even though epidemiological data and the 3 nt differences in N-450 between the two genomes suggested that the viruses represented separate chains of transmission. Identical indels were found in the M-F intergenic regions of 14 additional genotype D4 viral isolates that were imported into the US during 2007–2010. Viral isolates with and without indels produced plaques of similar size and replicated efficiently in A549/hSLAM and Vero/hSLAM cells. This is the first report of wild-type MeVs with genome lengths other than 15,894 nt and demonstrates that the length of the M-F UTR of wild-type MeVs is flexible. PMID:24748123
Genome Analysis of the Domestic Dog (Korean Jindo) by Massively Parallel Sequencing
Kim, Ryong Nam; Kim, Dae-Soo; Choi, Sang-Haeng; Yoon, Byoung-Ha; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dong-Wook; Kim, Jong-Joo; Ha, Ji-Hong; Toyoda, Atsushi; Fujiyama, Asao; Kim, Aeri; Kim, Min-Young; Park, Kun-Hyang; Lee, Kang Seon; Park, Hong-Seog
2012-01-01
Although pioneering sequencing projects have shed light on the boxer and poodle genomes, a number of challenges need to be met before the sequencing and annotation of the dog genome can be considered complete. Here, we present the DNA sequence of the Jindo dog genome, sequenced to 45-fold average coverage using Illumina massively parallel sequencing technology. A comparison of the sequence to the reference boxer genome led to the identification of 4 675 437 single nucleotide polymorphisms (SNPs, including 3 346 058 novel SNPs), 71 642 indels and 8131 structural variations. Of these, 339 non-synonymous SNPs and 3 indels are located within coding sequences (CDS). In particular, 3 non-synonymous SNPs and a 26-bp deletion occur in the TCOF1 locus, implying that the difference observed in cranial facial morphology between Jindo and boxer dogs might be influenced by those variations. Through the annotation of the Jindo olfactory receptor gene family, we found 2 unique olfactory receptor genes and 236 olfactory receptor genes harbouring non-synonymous homozygous SNPs that are likely to affect smelling capability. In addition, we determined the DNA sequence of the Jindo dog mitochondrial genome and identified Jindo dog-specific mtDNA genotypes. This Jindo genome data upgrade our understanding of dog genomic architecture and will be a very valuable resource for investigating not only dog genetics and genomics but also human and dog disease genetics and comparative genomics. PMID:22474061
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.
Chen, Zhiwen; Feng, Kun; Grover, Corrinne E; Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F; Wang, Kunbo; Hua, Jinping
2016-01-01
The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium.
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species
Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F.; Wang, Kunbo
2016-01-01
The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium. PMID:27309527
Al-Bustan, Suzanne A; Al-Serri, Ahmad; Annice, Babitha G; Alnaqeeb, Majed A; Al-Kandari, Wafa Y; Dashti, Mohammed
2018-01-01
The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel "rare" variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004-0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001-0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia.
Al-Serri, Ahmad; Annice, Babitha G.; Alnaqeeb, Majed A.; Al-Kandari, Wafa Y.; Dashti, Mohammed
2018-01-01
The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel “rare” variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004–0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001–0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia. PMID:29438437
Schlick-Steiner, Birgit C; Arthofer, Wolfgang; Moder, Karl; Steiner, Florian M
2015-01-01
Today, the comparative analysis of DNA molecules mainly uses information inferred from nucleotide substitutions. Insertion/deletion (INDEL) mutations, in contrast, are largely considered uninformative and discarded, due to our lacking knowledge on their evolution. However, including rather than discarding INDELs would be relevant to any research area in ecology and evolution that uses molecular data. As a practical approach to better understanding INDEL evolution in general, we propose the study of recent INDEL (reINDEL) mutations – mutations where both ancestral and derived state are seen in the sample. The precondition for reINDEL identification is knowledge about the pedigree of the individuals sampled. Sound reINDEL knowledge will allow the improved modeling needed for including INDELs in the downstream analysis of molecular data. Both microsatellites, currently still the predominant marker system in the analysis of populations, and sequences generated by next-generation sequencing, a promising and rapidly developing range of technologies, offer the opportunity for reINDEL identification. However, a 2013 sample of animal microsatellite studies contained unexpectedly few reINDELs identified. As most likely explanation, we hypothesize that reINDELs are underreported rather than absent and that this underreporting stems from common reINDEL unawareness. If our hypothesis applies, increased reINDEL awareness should allow gathering data rapidly. We recommend the routine reporting of either the absence or presence of reINDELs together with standardized key information on the nature of mutations when they are detected and the use of the keyword “reINDEL” to increase visibility in both instances of successful and unsuccessful search. PMID:25628861
Ferlaino, Michael; Rogers, Mark F.; Shihab, Hashem A.; Mort, Matthew; Cooper, David N.; Gaunt, Tom R.; Campbell, Colin
2018-01-01
Background Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. Results We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. Conclusions FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome. PMID:28985712
Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin
2017-10-06
Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.
Patterns of diversity and recombination along chromosome 1 of maize (Zea mays ssp. mays L.).
Tenaillon, Maud I; Sawkins, Mark C; Anderson, Lorinda K; Stack, Stephen M; Doebley, John; Gaut, Brandon S
2002-01-01
We investigate the interplay between genetic diversity and recombination in maize (Zea mays ssp. mays). Genetic diversity was measured in three types of markers: single-nucleotide polymorphisms, indels, and microsatellites. All three were examined in a sample of previously published DNA sequences from 21 loci on maize chromosome 1. Small indels (1-5 bp) were numerous and far more common than large indels. Furthermore, large indels (>100 bp) were infrequent in the population sample, suggesting they are slightly deleterious. The 21 loci also contained 47 microsatellites, of which 33 were polymorphic. Diversity in SNPs, indels, and microsatellites was compared to two measures of recombination: C (=4Nc) estimated from DNA sequence data and R based on a quantitative recombination nodule map of maize synaptonemal complex 1. SNP diversity was correlated with C (r = 0.65; P = 0.007) but not with R (r = -0.10; P = 0.69). Given the lack of correlation between R and SNP diversity, the correlation between SNP diversity and C may be driven by demography. In contrast to SNP diversity, microsatellite diversity was correlated with R (r = 0.45; P = 0.004) but not C (r = -0.025; P = 0.55). The correlation could arise if recombination is mutagenic for microsatellites, or it may be consistent with background selection that is apparent only in this class of rapidly evolving markers. PMID:12454083
Zhang, Quan; Zhu, Feng; Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua
2015-01-01
Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.
Inácio, Ana; Costa, Heloísa Afonso; da Silva, Cláudia Vieira; Ribeiro, Teresa; Porto, Maria João; Santos, Jorge Costa; Igrejas, Gilberto; Amorim, António
2017-05-01
The migratory phenomenon in Portugal has become one of the main factors for the genetic variability. In the last few years, a new class of autosomal insertion/deletion markers-InDel-has attracted interest in forensic genetics. Since there is no data for InDel markers of Portuguese-speaking African countries (PALOP) immigrants living in Lisboa, our aim is the characterization of those groups of individuals by typing them with at least 30 InDel markers and to compare different groups of individuals/populations. We studied 454 bloodstain samples belonging to immigrant individuals from Angola, Guinea-Bissau, and Mozambique. DNA extraction was performed with the Chelex® 100 method. After extraction, all samples were typed with the Investigator® DIPplex method. Through the obtained results, allelic frequencies show that all markers are at Hardy-Weinberg equilibrium, and we can confirm that those populations show significant genetic distances between themselves, between them, and the host Lisboa population. Because of this, they introduce genetic variability in Lisboa population.
Wollinger, L M; Dal Bosco, S M; Rempe, C; Almeida, S E M; Berlese, D B; Castoldi, R P; Arndt, M E; Contini, V; Genro, J P
2015-12-29
The aim of the current study was to investigate the association between the InDel polymorphism in the angiotensin I-converting enzyme gene (ACE) and the rs699 polymorphism in the angiotensinogen gene (AGT) and diabetes mellitus type 2 (DM2) in a sample population from Southern Brazil. A case-control study was conducted with 228 patients with DM2 and 183 controls without DM2. The ACE InDel polymorphism was genotyped by polymerase chain reaction (PCR) with specific primers, followed by electrophoresis on 1.5% agarose gel. The AGT rs699 polymorphism was genotyped using a real-time PCR assay. No significant association between the ACE InDel polymorphism and DM2 was detected (P = 0.97). However, regarding the AGT rs699 polymorphism, DM2 patients had a significantly higher frequency of the AG genotype and lower frequency of the GG genotype when compared to the controls (P = 0.03). Our results suggest that there is an association between the AGT rs699 polymorphism and DM2 in a Brazilian sample.
Guo, Yuxin; Shen, Chunmei; Meng, Haotian; Dong, Qian; Kong, Tingting; Yang, Chunhua; Wang, Hongdan; Jin, Rui; Zhu, Bofeng
2016-12-01
In recent years, Insertion/Deletion (InDel) polymorphisms have become a hot area of forensic research. In this study, 30 InDel loci were selected to investigate the genetic polymorphisms of Tibetan groups, which are from Tibet Autonomous Region and Qinghai province of China, and explore the genetic relationships between Tibetan groups and other groups. Allele frequencies of the 30 InDel loci ranged from 0.1219 (HLD111) to 0.5609 (HLD57) in the Tibet Tibetan group and 0.1639 (HLD118) to 0.5655 (HLD124) in the Qinghai Tibetan group. The combined power of discrimination, matching probability, and power of exclusion were 0.999999999986, 0.999999988, and 0.9913 in the Tibet Tibetan group, respectively, and 0.99999999999204, 0.9999999796, and 0.9862 in the Qinghai Tibetan group. The results of principal component analysis, phylogenetic tree, and population structure demonstrated that the four Tibetan groups (Tibetan1, Tibetan2, Tibet, and Qinghai Tibetan groups) clustered together and had relatively close genetic relationships with nine Asian groups and then European and Amerindian groups.
Global Genomic Diversity of Oryza sativa Varieties Revealed by Comparative Physical Mapping
Wang, Xiaoming; Kudrna, David A.; Pan, Yonglong; Wang, Hao; Liu, Lin; Lin, Haiyan; Zhang, Jianwei; Song, Xiang; Goicoechea, Jose Luis; Wing, Rod A.; Zhang, Qifa; Luo, Meizhong
2014-01-01
Bacterial artificial chromosome (BAC) physical maps embedding a large number of BAC end sequences (BESs) were generated for Oryza sativa ssp. indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) and were compared with the genome sequences of O. sativa spp. japonica cv. Nipponbare and O. sativa ssp. indica cv. 93-11. The comparisons exhibited substantial diversities in terms of large structural variations and small substitutions and indels. Genome-wide BAC-sized and contig-sized structural variations were detected, and the shared variations were analyzed. In the expansion regions of the Nipponbare reference sequence, in comparison to the MH63 and ZS97 physical maps, as well as to the previously constructed 93-11 physical map, the amounts and types of the repeat contents, and the outputs of gene ontology analysis, were significantly different from those of the whole genome. Using the physical maps of four wild Oryza species from OMAP (http://www.omap.org) as a control, we detected many conserved and divergent regions related to the evolution process of O. sativa. Between the BESs of MH63 and ZS97 and the two reference sequences, a total of 1532 polymorphic simple sequence repeats (SSRs), 71,383 SNPs, 1767 multiple nucleotide polymorphisms, 6340 insertions, and 9137 deletions were identified. This study provides independent whole-genome resources for intra- and intersubspecies comparisons and functional genomics studies in O. sativa. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html). PMID:24424778
Liu, Hanmei; Wang, Xuewen; Wei, Bin; Wang, Yongbin; Liu, Yinghong; Zhang, Junjie; Hu, Yufeng; Yu, Guowu; Li, Jian; Xu, Zhanbin; Huang, Yubi
2016-01-01
In southwest China, some maize landraces have long been isolated geographically, and have phenotypes that differ from those of widely grown cultivars. These landraces may harbor rich genetic variation responsible for those phenotypes. Four-row Wax is one such landrace, with four rows of kernels on the cob. We resequenced the genome of Four-row Wax, obtaining 50.46 Gb sequence at 21.87× coverage, then identified and characterized 3,252,194 SNPs, 213,181 short InDels (1–5 bp) and 39,631 structural variations (greater than 5 bp). Of those, 312,511 (9.6%) SNPs were novel compared to the most detailed haplotype map (HapMap) SNP database of maize. Characterization of variations in reported kernel row number (KRN) related genes and KRN QTL regions revealed potential causal mutations in fea2, td1, kn1, and te1. Genome-wide comparisons revealed abundant genetic variations in Four-row Wax, which may be associated with environmental adaptation. The sequence and SNP variations described here enrich genetic resources of maize, and provide guidance into study of seed numbers for crop yield improvement. PMID:27242868
Somatic Mosaicism: Implications for Disease and Transmission Genetics
Campbell, Ian M.; Shaw, Chad A.; Stankiewicz, Pawel; Lupski, James R.
2015-01-01
Nearly all of the genetic material among cells within an organism is identical. However, single nucleotide variants (SNVs), indels, copy number variants (CNVs), and other structural variants (SVs) continually accumulate as cells divide during development. This process results in an organism composed of countless cells, each with its own unique personal genome. Thus, every human is undoubtedly mosaic. Mosaic mutations can go unnoticed, underlie genetic disease or normal human variation, and may be transmitted to the next generation as constitutional variants. Here, we review the influence of the developmental timing of mutations, the mechanisms by which they arise, methods for detecting mosaic variants, and the risk of passing these mutations on to the next generation. PMID:25910407
Dong, Chun-nan; Yang, Ya-dong; Li, Shu-jin; Yang, Ya-ran; Zhang, Xiao-jing; Fang, Xiang-dong; Yan, Jiang-wei; Cong, Bin
2016-01-01
In the case of mass disasters, missing persons and forensic caseworks, highly degraded biological samples are often encountered. It can be a challenge to analyze and interpret the DNA profiles from these samples. Here we provide a new strategy to solve the problem by taking advantage of the intrinsic structural properties of DNA. We have assessed the in vivo positions of more than 35 million putative nucleosome cores in human leukocytes using high-throughput whole genome sequencing, and identified 2,462 single nucleotide variations (SNVs), 128 insertion-deletion polymorphisms (indels). After comparing the sequence reads with 44 STR loci commonly used in forensics, five STRs (TH01, TPOX, D18S51, DYS391, and D10S1248)were matched. We compared these “nucleosome protected STRs” (NPSTRs) with five other non-NPSTRs using mini-STR primer design, real-time PCR, and capillary gel electrophoresis on artificially degraded DNA. Moreover, genotyping performance of the five NPSTRs and five non-NPSTRs was also tested with real casework samples. All results show that loci located in nucleosomes are more likely to be successfully genotyped in degraded samples. In conclusion, after further strict validation, these markers could be incorporated into future forensic and paleontology identification kits, resulting in higher discriminatory power for certain degraded sample types. PMID:27189082
Pinto, Pablo; Salgado, Claudio; Santos, Ney Pereira Carneiro; Santos, Sidney; Ribeiro-dos-Santos, Ândrea
2015-01-01
Leprosy is an insidious infectious disease caused by the obligate intracellular bacteria Mycobacterium leprae, and host genetic factors can modulate the immune response and generate distinct categories of leprosy susceptibility that are also influenced by genetic ancestry. We investigated the possible effects of CYP19A1 [rs11575899], NFKβ1 [rs28362491], IL1α [rs3783553], CASP8 [rs3834129], UGT1A1 [rs8175347], PAR1 [rs11267092], CYP2E1 [INDEL 96pb] and IL4 [rs79071878] genes in a group of 141 leprosy patients and 180 healthy individuals. The INDELs were typed by PCR Multiplex in ABI PRISM 3130 and analyzed with GeneMapper ID v3.2. The NFKβ1, CASP8, PAR1 and IL4 INDELs were associated with leprosy susceptibility, while NFKβ1, CASP8, PAR1 and CYP19A1 were associated with the MB (Multibacilary) clinical form of leprosy. NFKβ1 [rs28362491], CASP8 [rs3834129], PAR1 [rs11267092] and IL4 [rs79071878] genes are potential markers for susceptibility to leprosy development, while the INDELs in NFKβ1, CASP8, PAR1 and CYP19A1 (rs11575899) are potential markers for the severe clinical form MB. Moreover, all of these markers are influenced by genetic ancestry, and European contribution increases the risk to leprosy development, in other hand an increase in African contribution generates protection against leprosy.
SvABA: genome-wide detection of structural variants and indels by local assembly.
Wala, Jeremiah A; Bandopadhayay, Pratiti; Greenwald, Noah F; O'Rourke, Ryan; Sharpe, Ted; Stewart, Chip; Schumacher, Steve; Li, Yilong; Weischenfeldt, Joachim; Yao, Xiaotong; Nusbaum, Chad; Campbell, Peter; Getz, Gad; Meyerson, Matthew; Zhang, Cheng-Zhong; Imielinski, Marcin; Beroukhim, Rameen
2018-04-01
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50-300 bp) SVs. © 2018 Wala et al.; Published by Cold Spring Harbor Laboratory Press.
Assessing interethnic admixture using an X-linked insertion-deletion multiplex.
Ribeiro-Rodrigues, Elzemar Martins; dos Santos, Ney Pereira Carneiro; dos Santos, Andrea Kely Campos Ribeiro; Pereira, Rui; Amorim, António; Gusmão, Leonor; Zago, Marco Antonio; dos Santos, Sidney Emanuel Batista
2009-01-01
In this study, a PCR multiplex was optimized, allowing the simultaneous analysis of 13 X-chromosome Insertion/deletion polymorphisms (INDELs). Genetic variation observed in Africans, Europeans, and Native Americans reveals high inter-population variability. The estimated proportions of X-chromosomes in an admixed population from the Brazilian Amazon region show a predominant Amerindian contribution (approximately 41%), followed by European (approximately 32%) and African (approximately 27%) contributions. The proportion of Amerindian contribution based on X-linked data is similar to the expected value based on mtDNA and Y-chromosome information. The accuracy for assessing interethnic admixture, and the high differentiation between African, European, and Native American populations, demonstrates the suitability of this INDEL set to measure ancestry proportions in three-hybrid populations, as it is the case of Latin American populations.
Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua
2015-01-01
Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as reveled by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus. PMID:25974068
Whole-genome analyses of Korean native and Holstein cattle breeds by massively parallel sequencing.
Choi, Jung-Woo; Liao, Xiaoping; Stothard, Paul; Chung, Won-Hyong; Jeon, Heoyn-Jeong; Miller, Stephen P; Choi, So-Young; Lee, Jeong-Koo; Yang, Bokyoung; Lee, Kyung-Tai; Han, Kwang-Jin; Kim, Hyeong-Cheol; Jeong, Dongkee; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin
2014-01-01
A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea--Hanwoo, Jeju Heugu, and Korean Holstein--using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs), of which 54.12% were found to be novel. We also detected 1,063,267 insertions-deletions (InDels) across the genomes (78.92% novel). Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs) were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH) were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding.
Whole-Genome Analyses of Korean Native and Holstein Cattle Breeds by Massively Parallel Sequencing
Stothard, Paul; Chung, Won-Hyong; Jeon, Heoyn-Jeong; Miller, Stephen P.; Choi, So-Young; Lee, Jeong-Koo; Yang, Bokyoung; Lee, Kyung-Tai; Han, Kwang-Jin; Kim, Hyeong-Cheol; Jeong, Dongkee; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin
2014-01-01
A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea—Hanwoo, Jeju Heugu, and Korean Holstein—using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs), of which 54.12% were found to be novel. We also detected 1,063,267 insertions–deletions (InDels) across the genomes (78.92% novel). Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs) were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH) were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding. PMID:24992012
PolyTB: A genomic variation map for Mycobacterium tuberculosis
Coll, Francesc; Preston, Mark; Guerra-Assunção, José Afonso; Hill-Cawthorn, Grant; Harris, David; Perdigão, João; Viveiros, Miguel; Portugal, Isabel; Drobniewski, Francis; Gagneux, Sebastien; Glynn, Judith R.; Pain, Arnab; Parkhill, Julian; McNerney, Ruth; Martin, Nigel; Clark, Taane G.
2014-01-01
Summary Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest. PMID:24637013
Hou, Xiaoyang; Yu, Xiaoning; Du, Binghai; Liu, Kai; Yao, Liangtong; Zhang, Sicheng; Selin, C; Fernando, W G D; Wang, Chengqiang; Ding, Yanqin
2016-01-01
Sporulating bacteria such as Bacillus subtilis and Paenibacillus polymyxa exhibit sporulation deficiencies during their lifetime in a laboratory environment. In this study, spontaneous mutants SC2-M1 and SC2-M2, of P. polymyxa SC2 lost the ability to form endospores. A global genetic and transcriptomic analysis of wild-type SC2 and spontaneous mutants was carried out. Genome resequencing analysis revealed 14 variants in the genome of SC2-M1, including three insertions and deletions (indels), 10 single nucleotide variations (SNVs) and one intrachromosomal translocation (ITX). There were nine variants in the genome of SC2-M2, including two indels and seven SNVs. Transcriptomic analysis revealed that 266 and 272 genes showed significant differences in expression in SC2-M1 and SC2-M2, respectively, compared with the wild-type SC2. Besides sporulation-related genes, genes related to exopolysaccharide biosynthesis (eps), antibiotic (fusaricidin) synthesis, motility (flgB) and other functions were also affected in these mutants. In SC2-M2, reversion of spo0A resulted in the complete recovery of sporulation. This is the first global analysis of mutations related to sporulation deficiency in P. polymyxa. Our results demonstrate that a SNV within spo0A caused the sporulation deficiency of SC2-M2 and provide strong evidence that an arginine residue at position 211 is essential for the function of Spo0A. Copyright © 2016 The Author(s). Published by Elsevier Masson SAS.. All rights reserved.
Comprehensive analysis of Arabidopsis expression level polymorphisms with simple inheritance
Plantegenet, Stephanie; Weber, Johann; Goldstein, Darlene R; Zeller, Georg; Nussbaumer, Cindy; Thomas, Jérôme; Weigel, Detlef; Harshman, Keith; Hardtke, Christian S
2009-01-01
In Arabidopsis thaliana, gene expression level polymorphisms (ELPs) between natural accessions that exhibit simple, single locus inheritance are promising quantitative trait locus (QTL) candidates to explain phenotypic variability. It is assumed that such ELPs overwhelmingly represent regulatory element polymorphisms. However, comprehensive genome-wide analyses linking expression level, regulatory sequence and gene structure variation are missing, preventing definite verification of this assumption. Here, we analyzed ELPs observed between the Eil-0 and Lc-0 accessions. Compared with non-variable controls, 5′ regulatory sequence variation in the corresponding genes is indeed increased. However, ∼42% of all the ELP genes also carry major transcription unit deletions in one parent as revealed by genome tiling arrays, representing a >4-fold enrichment over controls. Within the subset of ELPs with simple inheritance, this proportion is even higher and deletions are generally more severe. Similar results were obtained from analyses of the Bay-0 and Sha accessions, using alternative technical approaches. Collectively, our results suggest that drastic structural changes are a major cause for ELPs with simple inheritance, corroborating experimentally observed indel preponderance in cloned Arabidopsis QTL. PMID:19225455
Olsson, Sanna; Kaasalainen, Ulla; Rikkinen, Jouko
2012-02-01
In this study we reconstruct the structural evolution of the hyper-variable P6b region of the group I trnLeu intron in a monophyletic group of lichen-symbiotic Nostoc strains and establish it as a useful marker in the phylogenetic analysis of these organisms. The studied cyanobacteria occur as photosynthetic and/or nitrogen-fixing symbionts in lichen species of the diverse Nephroma guild. Phylogenetic analyses and secondary structure reconstructions are used to improve the understanding of the replication mechanisms in the P6b stem-loop and to explain the observed distribution patterns of indels. The variants of the P6b region in the Nostoc clade studied consist of different combinations of five sequence modules. The distribution of indels together with the ancestral character reconstruction performed enables the interpretation of the evolution of each sequence module. Our results indicate that the indel events are usually associated with single nucleotide changes in the P6b region and have occurred several times independently. In spite of their homoplasy, they provide phylogenetic information for closely related taxa. Thus we recognize that features of the P6b region can be used as molecular markers for species identification and phylogenetic studies involving symbiotic Nostoc cyanobacteria.
Liu, Tian-Jia; Li, Yong-Ping; Zhou, Jing-Jing; Hu, Chun-Gen; Zhang, Jin-Zhi
2018-03-01
The comprehensive genetic variation of two citrus species were analyzed at genome and transcriptome level. A total of 1090 differentially expressed genes were found during fruit development by RNA-sequencing. Fruit size (fruit equatorial diameter) and weight (fresh weight) are the two most important components determining yield and consumer acceptability for many horticultural crops. However, little is known about the genetic control of these traits. Here, we performed whole-genome resequencing to reveal the comprehensive genetic variation of the fruit development between kumquat (Citrus japonica) and Clementine mandarin (Citrus clementina). In total, 5,865,235 single-nucleotide polymorphisms (SNPs) and 414,447 insertions/deletions (InDels) were identified in the two citrus species. Based on integrative analysis of genome and transcriptome of fruit, 640,801 SNPs and 20,733 InDels were identified. The features, genomic distribution, functional effect, and other characteristics of these genetic variations were explored. RNA-sequencing identified 1090 differentially expressed genes (DEGs) during fruit development of kumquat and Clementine mandarin. Gene Ontology revealed that these genes were involved in various molecular functional and biological processes. In addition, the genetic variation of 939 DEGs and 74 multiple fruit development pathway genes from previous reports were also identified. A global survey identified 24,237 specific alternative splicing events in the two citrus species and showed that intron retention is the most prevalent pattern of alternative splicing. These genome variation data provide a foundation for further exploration of citrus diversity and gene-phenotype relationships and for future research on molecular breeding to improve kumquat, Clementine mandarin and related species.
CRAVAT is an easy to use web-based tool for analysis of cancer variants (missense, nonsense, in-frame indel, frameshift indel, splice site). CRAVAT provides scores and a variety of annotations that assist in identification of important variants. Results are provided in an interactive, highly graphical webpage and include annotated 3D structure visualization. CRAVAT is also available for local or cloud-based installation as a Docker container. MuPIT provides 3D visualization of mutation clusters and functional annotation and is now integrated with CRAVAT.
Xu, Yao; Jiang, Yu; Shi, Tao; Cai, Hanfang; Lan, Xianyong; Zhao, Xin; Plath, Martin; Chen, Hong
2017-01-01
Whole-genome sequencing provides a powerful tool to obtain more genetic variability that could produce a range of benefits for cattle breeding industry. Nanyang (Bos indicus) and Qinchuan (Bos taurus) are two important Chinese indigenous cattle breeds with distinct phenotypes. To identify the genetic characteristics responsible for variation in phenotypes between the two breeds, in the present study, we for the first time sequenced the genomes of four Nanyang and four Qinchuan cattle with 10 to 12 fold on average of 97.86% and 98.98% coverage of genomes, respectively. Comparison with the Bos_taurus_UMD_3.1 reference assembly yielded 9,010,096 SNPs for Nanyang, and 6,965,062 for Qinchuan cattle, 51% and 29% of which were novel SNPs, respectively. A total of 154,934 and 115,032 small indels (1 to 3 bp) were found in the Nanyang and Qinchuan genomes, respectively. The SNP and indel distribution revealed that Nanyang showed a genetically high diversity as compared to Qinchuan cattle. Furthermore, a total of 2,907 putative cases of copy number variation (CNV) were identified by aligning Nanyang to Qinchuan genome, 783 of which (27%) encompassed the coding regions of 495 functional genes. The gene ontology (GO) analysis revealed that many CNV genes were enriched in the immune system and environment adaptability. Among several CNV genes related to lipid transport and fat metabolism, Lepin receptor gene (LEPR) overlapping with CNV_1815 showed remarkably higher copy number in Qinchuan than Nanyang (log2 (ratio) = -2.34988; P value = 1.53E-102). Further qPCR and association analysis investigated that the copy number of the LEPR gene presented positive correlations with transcriptional expression and phenotypic traits, suggesting the LEPR CNV may contribute to the higher fat deposition in muscles of Qinchuan cattle. Our findings provide evidence that the distinct phenotypes of Nanyang and Qinchuan breeds may be due to the different genetic variations including SNPs, indels and CNV.
Jiang, Yu; Shi, Tao; Cai, Hanfang; Lan, Xianyong; Zhao, Xin; Plath, Martin; Chen, Hong
2017-01-01
Whole-genome sequencing provides a powerful tool to obtain more genetic variability that could produce a range of benefits for cattle breeding industry. Nanyang (Bos indicus) and Qinchuan (Bos taurus) are two important Chinese indigenous cattle breeds with distinct phenotypes. To identify the genetic characteristics responsible for variation in phenotypes between the two breeds, in the present study, we for the first time sequenced the genomes of four Nanyang and four Qinchuan cattle with 10 to 12 fold on average of 97.86% and 98.98% coverage of genomes, respectively. Comparison with the Bos_taurus_UMD_3.1 reference assembly yielded 9,010,096 SNPs for Nanyang, and 6,965,062 for Qinchuan cattle, 51% and 29% of which were novel SNPs, respectively. A total of 154,934 and 115,032 small indels (1 to 3 bp) were found in the Nanyang and Qinchuan genomes, respectively. The SNP and indel distribution revealed that Nanyang showed a genetically high diversity as compared to Qinchuan cattle. Furthermore, a total of 2,907 putative cases of copy number variation (CNV) were identified by aligning Nanyang to Qinchuan genome, 783 of which (27%) encompassed the coding regions of 495 functional genes. The gene ontology (GO) analysis revealed that many CNV genes were enriched in the immune system and environment adaptability. Among several CNV genes related to lipid transport and fat metabolism, Lepin receptor gene (LEPR) overlapping with CNV_1815 showed remarkably higher copy number in Qinchuan than Nanyang (log2 (ratio) = -2.34988; P value = 1.53E-102). Further qPCR and association analysis investigated that the copy number of the LEPR gene presented positive correlations with transcriptional expression and phenotypic traits, suggesting the LEPR CNV may contribute to the higher fat deposition in muscles of Qinchuan cattle. Our findings provide evidence that the distinct phenotypes of Nanyang and Qinchuan breeds may be due to the different genetic variations including SNPs, indels and CNV. PMID:28841720
Sequence variations of the bovine prion protein gene (PRNP) in native Korean Hanwoo cattle
Choi, Sangho
2012-01-01
Bovine spongiform encephalopathy (BSE) is one of the fatal neurodegenerative diseases known as transmissible spongiform encephalopathies (TSEs) caused by infectious prion proteins. Genetic variations correlated with susceptibility or resistance to TSE in humans and sheep have not been reported for bovine strains including those from Holstein, Jersey, and Japanese Black cattle. Here, we investigated bovine prion protein gene (PRNP) variations in Hanwoo cattle [Bos (B.) taurus coreanae], a native breed in Korea. We identified mutations and polymorphisms in the coding region of PRNP, determined their frequency, and evaluated their significance. We identified four synonymous polymorphisms and two non-synonymous mutations in PRNP, but found no novel polymorphisms. The sequence and number of octapeptide repeats were completely conserved, and the haplotype frequency of the coding region was similar to that of other B. taurus strains. When we examined the 23-bp and 12-bp insertion/deletion (indel) polymorphisms in the non-coding region of PRNP, Hanwoo cattle had a lower deletion allele and 23-bp del/12-bp del haplotype frequency than healthy and BSE-affected animals of other strains. Thus, Hanwoo are seemingly less susceptible to BSE than other strains due to the 23-bp and 12-bp indel polymorphisms. PMID:22705734
Walkowiak, Sean; Rowland, Owen; Rodrigue, Nicolas; Subramaniam, Rajagopal
2016-12-09
The Fusarium graminearum species complex is composed of many distinct fungal species that cause several diseases in economically important crops, including Fusarium Head Blight of wheat. Despite being closely related, these species and individuals within species have distinct phenotypic differences in toxin production and pathogenicity, with some isolates reported as non-pathogenic on certain hosts. In this report, we compare genomes and gene content of six new isolates from the species complex, including the first available genomes of F. asiaticum and F. meridionale, with four other genomes reported in previous studies. A comparison of genome structure and gene content revealed a 93-99% overlap across all ten genomes. We identified more than 700 k base pairs (kb) of single nucleotide polymorphisms (SNPs), insertions, and deletions (indels) within common regions of the genome, which validated the species and genetic populations reported within species. We constructed a non-redundant pan gene list containing 15,297 genes from the ten genomes and among them 1827 genes or 12% were absent in at least one genome. These genes were co-localized in telomeric regions and select regions within chromosomes with a corresponding increase in SNPs and indels. Many are also predicted to encode for proteins involved in secondary metabolism and other functions associated with disease. Genes that were common between isolates contained high levels of nucleotide variation and may be pseudogenes, allelic, or under diversifying selection. The genomic resources we have contributed will be useful for the identification of genes that contribute to the phenotypic variation and niche specialization that have been reported among members of the F. graminearum species complex.
Douville, Christopher; Masica, David L.; Stenson, Peter D.; Cooper, David N.; Gygax, Derek M.; Kim, Rick; Ryan, Michael
2015-01-01
ABSTRACT Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features—DNA and protein sequence conservation, indel length, and occurrence in repeat regions—are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in‐frame and frameshift indels (VEST‐indel) as pathogenic or benign. We apply 24 features, including a new “PubMed” feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false‐positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta‐predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta‐predictor with improved performance over any individual method. PMID:26442818
Douville, Christopher; Masica, David L; Stenson, Peter D; Cooper, David N; Gygax, Derek M; Kim, Rick; Ryan, Michael; Karchin, Rachel
2016-01-01
Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method. © 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
Gong, Chenrui; Du, Qingzhang; Xie, Jianbo; Quan, Mingyang; Chen, Beibei; Zhang, Deqiang
2018-01-01
Short insertions and deletions (InDels) are one of the major genetic variants and are distributed widely across the genome; however, few investigations of InDels have been conducted in long-lived perennial plants. Here, we employed a combination of RNA-seq and population resequencing to identify InDels within differentially expressed (DE) genes underlying wood formation in a natural population of Populus tomentosa (435 individuals) and utilized InDel-based association mapping to detect the causal variants under additive, dominance, and epistasis underlying growth and wood properties. In the present paper, 5,482 InDels detected from 629 DE genes showed uneven distributions throughout all 19 chromosomes, and 95.9% of these loci were diallelic InDels. Seventy-four InDels (positive false discovery rate q ≤ 0.10) from 68 genes exhibited significant additive/dominant effects on 10 growth and wood-properties, with an average of 14.7% phenotypic variance explained. Potential pleiotropy was observed in one-third of the InDels (representing 24 genes). Seven genes exhibited significantly differential expression among the genotypic classes of associated InDels, indicating possible important roles for these InDels. Epistasis analysis showed that overlapping interacting genes formed unique interconnected networks for each trait, supporting the putative biochemical links that control quantitative traits. Therefore, the identification and utilization of InDels in trees will be recognized as an effective marker system for molecular marker-assisted breeding applications, and further facilitate our understanding of quantitative genomics. PMID:29403506
Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Sinha, Pallavi; Kale, Sandip M; Parupalli, Swathi; Kumar, Vinay; Chitikineni, Annapurna; Vechalapu, Suryanarayana; Sameer Kumar, Chanda Venkata; Sharma, Mamta; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Muniswamy, Sonnappa; Varshney, Rajeev K
2017-07-01
Identification of candidate genomic regions associated with target traits using conventional mapping methods is challenging and time-consuming. In recent years, a number of single nucleotide polymorphism (SNP)-based mapping approaches have been developed and used for identification of candidate/putative genomic regions. However, in the majority of these studies, insertion-deletion (Indel) were largely ignored. For efficient use of Indels in mapping target traits, we propose Indel-seq approach, which is a combination of whole-genome resequencing (WGRS) and bulked segregant analysis (BSA) and relies on the Indel frequencies in extreme bulks. Deployment of Indel-seq approach for identification of candidate genomic regions associated with fusarium wilt (FW) and sterility mosaic disease (SMD) resistance in pigeonpea has identified 16 Indels affecting 26 putative candidate genes. Of these 26 affected putative candidate genes, 24 genes showed effect in the upstream/downstream of the genic region and two genes showed effect in the genes. Validation of these 16 candidate Indels in other FW- and SMD-resistant and FW- and SMD-susceptible genotypes revealed a significant association of five Indels (three for FW and two for SMD resistance). Comparative analysis of Indel-seq with other genetic mapping approaches highlighted the importance of the approach in identification of significant genomic regions associated with target traits. Therefore, the Indel-seq approach can be used for quick and precise identification of candidate genomic regions for any target traits in any crop species. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Chen, Qi; Thomas, Joseph T; Giménez-Lirola, Luis G; Hardham, John M; Gao, Qinshan; Gerber, Priscilla F; Opriessnig, Tanja; Zheng, Ying; Li, Ganwu; Gauger, Phillip C; Madson, Darin M; Magstadt, Drew R; Zhang, Jianqiang
2016-04-05
At least two genetically different porcine epidemic diarrhea virus (PEDV) strains have been identified in the United States (U.S. PEDV prototype and S-INDEL-variant strains). The current serological assays offered at veterinary diagnostic laboratories for detection of PEDV-specific antibody are based on the U.S. PEDV prototype strain. The objectives of this study were: 1) isolate the U.S. PEDV S-INDEL-variant strain in cell culture; 2) generate antisera against the U.S. PEDV prototype and S-INDEL-variant strains by experimentally infecting weaned pigs; 3) determine if the various PEDV serological assays could detect antibodies against the U.S. PEDV S-INDEL-variant strain and vice versa. A U.S. PEDV S-INDEL-variant strain was isolated in cell culture in this study. Three groups of PEDV-negative, 3-week-old pigs (five pigs per group) were inoculated orally with a U.S. PEDV prototype isolate (previously isolated in our lab), an S-INDEL-variant isolate or virus-negative culture medium. Serum samples collected at 0, 7, 14, 21 and 28 days post inoculation were evaluated by the following PEDV serological assays: 1) indirect fluorescent antibody (IFA) assays using the prototype and S-INDEL-variant strains as indicator viruses; 2) virus neutralization (VN) tests against the prototype and S-INDEL-variant viruses; 3) PEDV prototype strain whole virus based ELISA; 4) PEDV prototype strain S1-based ELISA; and 5) PEDV S-INDEL-variant strain S1-based ELISA. The positive antisera against the prototype strain reacted to and neutralized both prototype and S-INDEL-variant viruses, and the positive antisera against the S-INDEL-variant strain also reacted to and neutralized both prototype and S-INDEL-variant viruses, as examined by IFA antibody assays and VN tests. Antibodies against the two PEDV strains could be detected by all three ELISAs although detection rates varied to some degree. These data indicate that the antibodies against U.S. PEDV prototype and S-INDEL-variant strains cross-reacted and cross-neutralized both strains in vitro. The current serological assays based on U.S. PEDV prototype strain can detect antibodies against both U.S. PEDV strains.
Genetic variation of six desaturase genes in flax and their impact on fatty acid composition.
Thambugala, Dinushika; Duguid, Scott; Loewen, Evelyn; Rowland, Gordon; Booker, Helen; You, Frank M; Cloutier, Sylvie
2013-10-01
Flax (Linum usitatissimum L.) is one of the richest plant sources of omega-3 fatty acids praised for their health benefits. In this study, the extent of the genetic variability of genes encoding stearoyl-ACP desaturase (SAD), and fatty acid desaturase 2 (FAD2) and 3 (FAD3) was determined by sequencing the six paralogous genes from 120 flax accessions representing a broad range of germplasm including some EMS mutant lines. A total of 6 alleles for sad1 and sad2, 21 for fad2a, 5 for fad2b, 15 for fad3a and 18 for fad3b were identified. Deduced amino acid sequences of the alleles predicted 4, 2, 3, 4, 6 and 7 isoforms, respectively. Allele frequencies varied greatly across genes. Fad3a, with 110 SNPs and 19 indels, and fad3b, with 50 SNPs and 5 indels, showed the highest levels of genetic variations. While most of the SNPs and all the indels were silent mutations, both genes carried nonsense SNP mutations resulting in premature stop codons, a feature not observed in sad and fad2 genes. Some alleles and isoforms discovered in induced mutant lines were absent in the natural germplasm. Correlation of these genotypic data with fatty acid composition data of 120 flax accessions phenotyped in six field experiments revealed statistically significant effects of some of the SAD and FAD isoforms on fatty acid composition, oil content and iodine value. The novel allelic variants and isoforms identified for the six desaturases will be a resource for the development of oilseed flax with unique and useful fatty acid profiles.
Comeron, Josep M; Reed, Jordan; Christie, Matthew; Jacobs, Julia S; Dierdorff, Jason; Eberl, Daniel F; Manak, J Robert
2016-04-05
Accurate and rapid identification or confirmation of single nucleotide polymorphisms (SNPs), point mutations and other human genomic variation facilitates understanding the genetic basis of disease. We have developed a new methodology (called MENA (Mismatch EndoNuclease Array)) pairing DNA mismatch endonuclease enzymology with tiling microarray hybridization in order to genotype both known point mutations (such as SNPs) as well as identify previously undiscovered point mutations and small indels. We show that our assay can rapidly genotype known SNPs in a human genomic DNA sample with 99% accuracy, in addition to identifying novel point mutations and small indels with a false discovery rate as low as 10%. Our technology provides a platform for a variety of applications, including: (1) genotyping known SNPs as well as confirming newly discovered SNPs from whole genome sequencing analyses; (2) identifying novel point mutations and indels in any genomic region from any organism for which genome sequence information is available; and (3) screening panels of genes associated with particular diseases and disorders in patient samples to identify causative mutations. As a proof of principle for using MENA to discover novel mutations, we report identification of a novel allele of the beethoven (btv) gene in Drosophila, which encodes a ciliary cytoplasmic dynein motor protein important for auditory mechanosensation.
Genetic Differentiation of North-East Argentina Populations Based on 30 Binary X Chromosome Markers.
Di Santo Meztler, Gabriela P; Del Palacio, Santiago; Esteban, María E; Armoa, Isaías; Argüelles, Carina F; Catanesi, Cecilia I
2018-01-01
Alu insertions, INDELs, and SNPs in the X chromosome can be useful not only for revealing relationships among populations but also for identification purposes. We present data of 10 Alu insertions, 5 INDELs, and 15 SNPs of X-chromosome from three Argentinian north-east cities in order to gain insight into the genetic diversity of the X chromosome within this region of the country. Data from 198 unrelated individuals belonging to Posadas, Corrientes, and Eldorado cities were genotyped for Ya5DP62, Yb8DP49, Ya5DP3, Ya5NBC37, Ya5DP77, Ya5NBC491, Ya5DP4, Ya5DP13, Yb8NBC634, and Yb8NBC102 Alu insertions, for MID193, MID1705, MID3754, MID3756 and MID1540 Indels and for rs6639398, rs5986751, rs5964206, rs9781645, rs2209420, rs1299087, rs318173, rs933315, rs1991961, rs4825889, rs1781116, rs1937193, rs1781104, rs149910, and rs652 SNPs. No deviations from Hardy-Weinberg equilibrium were observed for Posadas and Corrientes. However, Eldorado showed significant values, and it was found to have an internal substructuring with two groups of different origin, one showing higher similarity with European countries, and the other with more similarities to Posadas and Corrientes. F st pairwise genetic distances emerged for some markers among the studied populations and also between our data and those from other countries and continents. Of particular interest, Alu insertions demonstrated the most differences, and could be of use in ancestry studies for these populations, while INDELs and SNPs variation were informative for differentiation within the country.
Huang, Y; Zheng, J; Hu, J D; Wu, Y A; Zheng, X Y; Liu, T B; Chen, F L
2014-02-19
We performed whole-exome sequencing in samples representing accelerated phase (AP) and blastic crisis (BC) in a subject with chronic myeloid leukemia (CML). A total of 12.74 Gb clean data were generated, achieving a mean depth coverage of 64.45 and 69.53 for AP and BC samples, respectively, of the target region. A total of 148 somatic variants were detected, including 76 insertions and deletions (indels), 64 single-nucleotide variations (SNV), and 8 structural variations (SV). On the basis of annotation and functional prediction analysis, we identified 3 SNVs and 6 SVs that showed a potential association with CML progression. Among the genes that harbor the identified variants, GATA2 has previously been reported to play important roles in the progression from AP to BC in CML. Identification of these genes will allow us to gain a better understanding of the pathological mechanism of CML and represents a critical advance toward new molecular diagnostic tests for the development of potential therapies for CML.
Jiang, Qian; Meng, Xing; Meng, Lingwei; Chang, Nannan; Xiong, Jingwei; Cao, Huiqing; Liang, Zicai
2014-01-01
MicroRNA knockout by genome editing technologies is promising. In order to extend the application of the technology and to investigate the function of a specific miRNA, we used CRISPR/Cas9 to deplete human miR-93 from a cluster by targeting its 5' region in HeLa cells. Various small indels were induced in the targeted region containing the Drosha processing site and seed sequences. Interestingly, we found that even a single nucleotide deletion led to complete knockout of the target miRNA with high specificity. Functional knockout was confirmed by phenotype analysis. Furthermore, de novo microRNAs were not found by RNA-seq. Nevertheless, expression of the pri-microRNAs was increased. When combined with structural analysis, the data indicated that biogenesis was impaired. Altogether, we showed that small indels in the 5' region of a microRNA result in sequence depletion as well as Drosha processing retard.
Restricted DCJ-indel model: sorting linear genomes with DCJ and indels
2012-01-01
Background The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-indel model that considers DCJ and indels. Results In the present work we consider the restricted DCJ-indel model for sorting linear genomes with unequal contents. We allow DCJ operations and indels with the following constraint: if a circular chromosome is created by a DCJ, it has to be reincorporated in the next step (no other DCJ or indel can be applied between the creation and the reincorporation of a circular chromosome). We then develop a sorting algorithm and give a tight upper bound for the restricted DCJ-indel distance. Conclusions We have given a tight upper bound for the restricted DCJ-indel distance. The question whether this bound can be reduced so that both the general and the restricted DCJ-indel distances are equal remains open. PMID:23281630
PolyTB: a genomic variation map for Mycobacterium tuberculosis.
Coll, Francesc; Preston, Mark; Guerra-Assunção, José Afonso; Hill-Cawthorn, Grant; Harris, David; Perdigão, João; Viveiros, Miguel; Portugal, Isabel; Drobniewski, Francis; Gagneux, Sebastien; Glynn, Judith R; Pain, Arnab; Parkhill, Julian; McNerney, Ruth; Martin, Nigel; Clark, Taane G
2014-05-01
Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Kepler, Thomas B.; Liao, Hua-Xin; Alam, S. Munir; Bhaskarabhatla, Rekha; Zhang, Ruijun; Stewart, Shelley; Anasti, Kara; Kelsoe, Garnett; Parks, Robert; Lloyd, Krissey E.; Stolarchuk, Christina; Pritchett, Jamie; Solomon, Erika; Friberg, Emma; Morris, Lynn; Karim, Salim S. Abdool; Cohen, Myron S.; Walter, Emmanuel; Moody, M. Anthony; Wu, Xueling; Altae-Tran, Han R.; Georgiev, Ivelin S.; Kwong, Peter D.; Boyd, Scott D.; Fire, Andrew Z.; Mascola, John R.; Haynes, Barton F.
2014-01-01
Summary Induction of HIV-1 broad neutralizing antibodies (bnAbs) is a goal of HIV-1 vaccine development but has remained challenging partially due to unusual traits of bnAbs, including high somatic hypermutation (SHM) frequencies and in-frame insertions and deletions (indels). Here we examined the propensity and functional requirement for indels within HIV-1 bnAbs. High-throughput sequencing of the immunoglobulin (Ig) VHDJH genes in HIV-1 infected and uninfected individuals revealed that the indel frequency was elevated among HIV-1-infected subjects, with no unique properties attributable to bnAb-producing individuals. This increased indel occurrence depended only on the frequency of SHM point-mutations. Indel-encoded regions were generally proximal to antigen binding sites. Additionally, reconstruction of a HIV-1 CD4-binding site bnAb clonal lineage revealed that a large compound VHDJH indel was required for bnAb activity. Thus, vaccine development should focus on designing regimens targeted at sustained activation of bnAb lineages to achieve the required SHM and indel events. PMID:25211073
Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs.
Saunders, Christopher T; Wong, Wendy S W; Swamy, Sajani; Becq, Jennifer; Murray, Lisa J; Cheetham, R Keira
2012-07-15
Whole genome and exome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants at any practical level of tumor impurity. We describe Strelka, a method for somatic SNV and small indel detection from sequencing data of matched tumor-normal samples. The method uses a novel Bayesian approach which represents continuous allele frequencies for both tumor and normal samples, while leveraging the expected genotype structure of the normal. This is achieved by representing the normal sample as a mixture of germline variation with noise, and representing the tumor sample as a mixture of the normal sample with somatic variation. A natural consequence of the model structure is that sensitivity can be maintained at high tumor impurity without requiring purity estimates. We demonstrate that the method has superior accuracy and sensitivity on impure samples compared with approaches based on either diploid genotype likelihoods or general allele-frequency tests. The Strelka workflow source code is available at ftp://strelka@ftp.illumina.com/. csaunders@illumina.com
USDA-ARS?s Scientific Manuscript database
The PCR-based Escherichia coli O157 (O157) strain typing system, Polymorphic Amplified Typing Sequences (PATS), targets insertions-deletions (Indels) and single nucleotide polymorphisms (SNPs) at the XbaI and AvrII(BlnI) restriction enzyme sites, respectively, besides amplifying four known virulenc...
Unique haplotypes of cacao trees as revealed by trnH-psbA chloroplast DNA
Gutiérrez-López, Nidia; Ovando-Medina, Isidro; Salvador-Figueroa, Miguel; Molina-Freaner, Francisco; Avendaño-Arrazate, Carlos H.
2016-01-01
Cacao trees have been cultivated in Mesoamerica for at least 4,000 years. In this study, we analyzed sequence variation in the chloroplast DNA trnH-psbA intergenic spacer from 28 cacao trees from different farms in the Soconusco region in southern Mexico. Genetic relationships were established by two analysis approaches based on geographic origin (five populations) and genetic origin (based on a previous study). We identified six polymorphic sites, including five insertion/deletion (indels) types and one transversion. The overall nucleotide diversity was low for both approaches (geographic = 0.0032 and genetic = 0.0038). Conversely, we obtained moderate to high haplotype diversity (0.66 and 0.80) with 10 and 12 haplotypes, respectively. The common haplotype (H1) for both networks included cacao trees from all geographic locations (geographic approach) and four genetic groups (genetic approach). This common haplotype (ancient) derived a set of intermediate haplotypes and singletons interconnected by one or two mutational steps, which suggested directional selection and event purification from the expansion of narrow populations. Cacao trees from Soconusco region were grouped into one cluster without any evidence of subclustering based on AMOVA (FST = 0) and SAMOVA (FST = 0.04393) results. One population (Mazatán) showed a high haplotype frequency; thus, this population could be considered an important reservoir of genetic material. The indels located in the trnH-psbA intergenic spacer of cacao trees could be useful as markers for the development of DNA barcoding. PMID:27076998
UCHIDA, Leo; HERIYANTO, Agus; THONGCHAI, Chalermchaikit; HANH, Tran Thi; HORIUCHI, Motohiro; ISHIHARA, Kanako; TAMURA, Yutaka; MURAMATSU, Yasukazu
2014-01-01
ABSTRACT There has been an accumulation of information on frequencies of insertion/deletion (indel) polymorphisms within the bovine prion protein gene (PRNP) and on the number of octapeptide repeats and single nucleotide polymorphisms (SNPs) in the coding region of bovine PRNP related to bovine spongiform encephalopathy (BSE) susceptibility. We investigated the frequencies of 23-bp indel polymorphism in the promoter region (23indel) and 12-bp indel polymorphism in intron 1 region (12indel), octapeptide repeat polymorphisms and SNPs in the bovine PRNP of cattle and water buffaloes in Vietnam, Indonesia and Thailand. The frequency of the deletion allele in the 23indel site was significantly low in cattle of Indonesia and Thailand and water buffaloes. The deletion allele frequency in the 12indel site was significantly low in all of the cattle and buffaloes categorized in each subgroup. In both indel sites, the deletion allele has been reported to be associated with susceptibility to classical BSE. In some Indonesian local cattle breeds, the frequency of the allele with 5 octapeptide repeats was significantly high despite the fact that the allele with 6 octapeptide repeats has been reported to be most frequent in many breeds of cattle. Four SNPs observed in Indonesian local cattle have not been reported for domestic cattle. This study provided information on PRNP of livestock in these Southeast Asian countries. PMID:24705506
Rajkumar, Sankaranarayanan; Vasavada, Abhay R.; Praveen, Mamidipudi R.; Ananthan, Rajendran; Reddy, Geereddy B.; Tripathi, Harsha; Ganatra, Darshini A.; Arora, Anshul I.; Patel, Alpesh R.
2013-01-01
Purpose. To explore different molecular factors impairing the activities of superoxide dismutase (SOD) isoforms in senile cataractous lenses. Methods. Enzyme activity of SOD isoforms, levels of their corresponding cofactors copper (Cu), manganese (Mn), zinc (Zn), and expression of mRNA transcripts and proteins were determined in the lenses of human subjects with and without cataract. DNA from lens epithelium (LE) and peripheral blood was isolated. Polymerase chain reaction–single strand conformation polymorphism (PCR-SSCP) followed by sequencing was carried out to screen somatic mutations. The impact of intronic insertion/deletion (INDEL) variations on the splicing process and on the resultant transcript was evaluated. Genotyping of IVS4+42delG polymorphism of SOD1 gene was done by PCR–restriction fragment length polymorphism (RFLP). Results. A significant decrease in Cu/Zn- and Mn-SOD activity (P < 0.001) and in Cu/Zn-SOD transcript (P < 0.001) and its protein (P < 0.05) were found in cataractous lenses. No significant change in the level of copper (P = 0.36) and an increase in the level of manganese (P = 0.01) and zinc (P = 0.02) were observed in cataractous lenses. A significant positive correlation between the level of Cu/Zn-SOD activity and the levels of Cu (P = 0.003) and Zn (P = 0.005) was found in the cataractous lenses. DNA sequencing revealed three intronic INDEL variations in exon4 of SOD1 gene. Splice-junction analysis showed the potential of IVS4+42delG in creating a new cryptic acceptor site. If it is involved in alternate splicing, it could result in generation of SOD1 mRNA transcripts lacking exon4 region. Transcript analysis revealed the presence of complete SOD1 mRNA transcripts. Genotyping revealed the presence of IVS4+42delG polymorphism in all subjects. Conclusions. The decrease in the activity of SOD1 isoform in cataractous lenses was associated with the decreased level of mRNA transcripts and their protein expression and was not associated with either modulation in the level of enzyme cofactors or with INDEL variations. PMID:23970468
Deep whole-genome sequencing of 90 Han Chinese genomes.
Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen
2017-09-01
Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.
Ajawatanawong, Pravech; Atkinson, Gemma C; Watson-Haigh, Nathan S; Mackenzie, Bryony; Baldauf, Sandra L
2012-07-01
Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.
Genome-wide variation within and between wild and domestic yak.
Wang, Kun; Hu, Quanjun; Ma, Hui; Wang, Lizhong; Yang, Yongzhi; Luo, Wenchun; Qiu, Qiang
2014-07-01
The yak is one of the few animals that can thrive in the harsh environment of the Qinghai-Tibetan Plateau and adjacent Alpine regions. Yak provides essential resources allowing Tibetans to live at high altitudes. However, genetic variation within and between wild and domestic yak remain unknown. Here, we present a genome-wide study of the genetic variation within and between wild and domestic yak. Using next-generation sequencing technology, we resequenced three wild and three domestic yak with a mean of fivefold coverage using our published domestic yak genome as a reference. We identified a total of 8.38 million SNPs (7.14 million novel), 383,241 InDels and 126,352 structural variants between the six yak. We observed higher linkage disequilibrium in domestic yak than in wild yak and a modest but distinct genetic divergence between these two groups. We further identified more than a thousand of potential selected regions (PSRs) for the three domestic yak by scanning the whole genome. These genomic resources can be further used to study genetic diversity and select superior breeds of yak and other bovid species. © 2014 John Wiley & Sons Ltd.
indCAPS: A tool for designing screening primers for CRISPR/Cas9 mutagenesis events.
Hodgens, Charles; Nimchuk, Zachary L; Kieber, Joseph J
2017-01-01
Genetic manipulation of organisms using CRISPR/Cas9 technology generally produces small insertions/deletions (indels) that can be difficult to detect. Here, we describe a technique to easily and rapidly identify such indels. Sequence-identified mutations that alter a restriction enzyme recognition site can be readily distinguished from wild-type alleles using a cleaved amplified polymorphic sequence (CAPS) technique. If a restriction site is created or altered by the mutation such that only one allele contains the restriction site, a polymerase chain reaction (PCR) followed by a restriction digest can be used to distinguish the two alleles. However, in the case of most CRISPR-induced alleles, no such restriction sites are present in the target sequences. In this case, a derived CAPS (dCAPS) approach can be used in which mismatches are purposefully introduced in the oligonucleotide primers to create a restriction site in one, but not both, of the amplified templates. Web-based tools exist to aid dCAPS primer design, but when supplied sequences that include indels, the current tools often fail to suggest appropriate primers. Here, we report the development of a Python-based, species-agnostic web tool, called indCAPS, suitable for the design of PCR primers used in dCAPS assays that is compatible with indels. This tool should have wide utility for screening editing events following CRISPR/Cas9 mutagenesis as well as for identifying specific editing events in a pool of CRISPR-mediated mutagenesis events. This tool was field-tested in a CRISPR mutagenesis experiment targeting a cytokinin receptor (AHK3) in Arabidopsis thaliana. The tool suggested primers that successfully distinguished between wild-type and edited alleles of a target locus and facilitated the isolation of two novel ahk3 null alleles. Users can access indCAPS and design PCR primers to employ dCAPS to identify CRISPR/Cas9 alleles at http://indcaps.kieber.cloudapps.unc.edu/.
19 CFR 134.43 - Methods of marking specific articles.
Code of Federal Regulations, 2014 CFR
2014-04-01
... origin by cutting, die-sinking, engraving, stamping, or some other permanent method. The indelible... metal or plastic tag indelibly marked with the country of origin and permanently attached to the article... crafts must be indelibly marked with the country of origin by means of cutting, die-sinking, engraving...
19 CFR 134.43 - Methods of marking specific articles.
Code of Federal Regulations, 2013 CFR
2013-04-01
... origin by cutting, die-sinking, engraving, stamping, or some other permanent method. The indelible... metal or plastic tag indelibly marked with the country of origin and permanently attached to the article... crafts must be indelibly marked with the country of origin by means of cutting, die-sinking, engraving...
Hwang, Ui Wook
2007-04-01
Phylogenetic position of a diplomonad protist Giardia, a principle cause of diarrhea, among eukaryotes has been vigorously debated so far. Through the comparisons of primary and secondary structures of SSU rRNAs of G. intestinalis, G. microti, G. ardeae, and G. muris, I found two major indel regions (a 6-nt indel and a 22-26-nt indel), which correspond to the helix 10 of the V2 region and helices E23-8 to E23-9 of the V4 region, respectively. As generally shown in eukaryotes, G. intestinalis and G. microti have commonly a relatively longer helix 10 (a 7-bp stem and a 4-nt loop), and also the eukaryote-specific helices E23-6 to E23-9. On the other hand, G. muris and G. ardeae have a shorter helix 10: a 2-bp stem and a 6-nt loop in G. ardeae and a 3-bp stem and a 6-nt loop in G. muris. In the V4, they have a single long helix (like the P23-1 helix in prokaryotes) instead of the helices E23-6 to E23-9. Among the four Giardia species, co-appearance of prokaryote- and eukaryote-typical features might be significant evidence to suggest that Giardia (Archezoa) is a living fossil showing an "intermediate stage" during the evolution from prokaryotes to eukaryotes.
Jiang, Shu-Ye; Ma, Ali; Ramamoorthy, Rengasamy; Ramachandran, Srinivasan
2013-01-01
Expression profiling is one of the most important tools for dissecting biological functions of genes and the upregulation or downregulation of gene expression is sufficient for recreating phenotypic differences. Expression divergence of genes significantly contributes to phenotypic variations. However, little is known on the molecular basis of expression divergence and evolution among rice genotypes with contrasting phenotypes. In this study, we have implemented an integrative approach using bioinformatics and experimental analyses to provide insights into genomic variation, expression divergence, and evolution between salinity-sensitive rice variety Nipponbare and tolerant rice line Pokkali under normal and high salinity stress conditions. We have detected thousands of differentially expressed genes between these two genotypes and thousands of up- or downregulated genes under high salinity stress. Many genes were first detected with expression evidence using custom microarray analysis. Some gene families were preferentially regulated by high salinity stress and might play key roles in stress-responsive biological processes. Genomic variations in promoter regions resulted from single nucleotide polymorphisms, indels (1–10 bp of insertion/deletion), and structural variations significantly contributed to the expression divergence and regulation. Our data also showed that tandem and segmental duplication, CACTA and hAT elements played roles in the evolution of gene expression divergence and regulation between these two contrasting genotypes under normal or high salinity stress conditions. PMID:24121498
Cingoz, Sultan; Agilkaya, Sinem; Oztura, Ibrahim; Eroglu, Secil; Karadeniz, Derya; Evlice, Ahmet; Altungoz, Oguz; Yilmaz, Hikmet; Baklan, Baris
2014-04-01
The HLA-DQB1*06:02 allele across all ethnic groups and the rs5770917 variation between CPT1B and CHKB genes in Japanese and Koreans are common genetic susceptibility factors for narcolepsy. This comprehensive genetic study sought to assess variations in CHKB and CPT1B susceptibility genes and HLA-DQB1*06:02 allele status in Turkish patients with narcolepsy and healthy persons. CHKB/CPT1B genes were sequenced in patients with narcolepsy (n=37) and healthy persons (n=100) to detect variations. The HLA-DQB1*06:02 allele status was determined by sequence specific polymerase chain reaction. The HLA-DQB1*06:02 allele was significantly more frequent in narcoleptic patients than in healthy persons (p=2×10(-7)) and in patients with narcolepsy and cataplexy than in those without (p=0.018). The mean of the multiple sleep latency test, sleep-onset rapid eye movement periods, and frequency of sleep paralysis significantly differed in the HLA-DQB1*06:02-positive patients. rs5770917, rs5770911, rs2269381, and rs2269382 were detected together as a haplotype in three patients and 11 healthy persons. In addition to this haplotype, the indel variation (rs144647670) was detected in the 5' upstream region of the human CHKB gene in the patients and healthy persons carrying four variants together. This study identified a novel haplotype consisting of the indel variation, which had not been detected in previous studies in Japanese and Korean populations, and observed four single-nucleotide polymorphisms in CHKB/CPT1B. The study confirmed the association of the HLA-DQB1*06:02 allele with narcolepsy and cataplexy susceptibility. The findings suggest that the presence of HLA-DQB1*06:02 may be a predictor of cataplexy in narcoleptic patients and could therefore be used as an additional diagnostic marker alongside hypocretin.
Carneiro, Miguel; Ferrand, Nuno
2007-01-01
Kappa-casein (CSN3) plays an important role in stabilising the Ca-sensitive caseins in the micelle. The European rabbit (Oryctolagus cuniculus) CSN3 has previously been shown to possess two alleles (A and B), which differ deeply in their intronic regions (indels of 100 and 1550 nucleotides in introns 1 and 4, respectively). Furthermore, a correlation between several reproductive performance traits and the different alleles was described. However, all these data were exclusively collected in rabbit domestic breeds, preventing a deeper understanding of the extensive polymorphism observed in the CSN3 gene. Additionally, the techniques available for the typing of both indel polymorphisms were until now not suitable for large-scale studies. In this report, we describe a simple, PCR-based typing method to distinguish rabbit CSN3 alleles. We analyse both ancient wild rabbit populations from the Iberian Peninsula and France, and the more recently derived English wild rabbits and domestic stocks. A new allele (C) showing another major indel (250 bp) in intron 1 was found, but exclusively detected in Iberian wild rabbits. In addition, our survey revealed the occurrence of new haplotypes in wild populations, suggesting that intragenic recombination is important in creating genetic diversity at this locus. This easy and low cost single-step PCR-based method results in an improvement over previous described techniques, can be easily set up in a routine molecular laboratory and would probably be a valuable tool in the management of rabbit domestic breeds. PMID:17433245
The Genome of the Netherlands: design, and project goals.
Boomsma, Dorret I; Wijmenga, Cisca; Slagboom, Eline P; Swertz, Morris A; Karssen, Lennart C; Abdellaoui, Abdel; Ye, Kai; Guryev, Victor; Vermaat, Martijn; van Dijk, Freerk; Francioli, Laurent C; Hottenga, Jouke Jan; Laros, Jeroen F J; Li, Qibin; Li, Yingrui; Cao, Hongzhi; Chen, Ruoyan; Du, Yuanping; Li, Ning; Cao, Sujie; van Setten, Jessica; Menelaou, Androniki; Pulit, Sara L; Hehir-Kwa, Jayne Y; Beekman, Marian; Elbers, Clara C; Byelas, Heorhiy; de Craen, Anton J M; Deelen, Patrick; Dijkstra, Martijn; den Dunnen, Johan T; de Knijff, Peter; Houwing-Duistermaat, Jeanine; Koval, Vyacheslav; Estrada, Karol; Hofman, Albert; Kanterakis, Alexandros; Enckevort, David van; Mai, Hailiang; Kattenberg, Mathijs; van Leeuwen, Elisabeth M; Neerincx, Pieter B T; Oostra, Ben; Rivadeneira, Fernanodo; Suchiman, Eka H D; Uitterlinden, Andre G; Willemsen, Gonneke; Wolffenbuttel, Bruce H; Wang, Jun; de Bakker, Paul I W; van Ommen, Gert-Jan; van Duijn, Cornelia M
2014-02-01
Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project.
Wang, Xin; Hu, Tixu; Zhang, Fengxia; Wang, Bing; Li, Changxin; Yang, Tianxia; Li, Hanxia; Lu, Yongen; Ye, Zhibiao
2017-01-01
Deciphering the mechanism of malate accumulation in plants would contribute to a greater understanding of plant chemistry, which has implications for improving flavor quality in crop species and enhancing human health benefits. However, the regulation of malate metabolism is poorly understood in crops such as tomato (Solanum lycopersicum). Here, we integrated a metabolite-based genome-wide association study with linkage mapping and gene functional studies to characterize the genetics of malate accumulation in a global collection of tomato accessions with broad genetic diversity. We report that TFM6 (tomato fruit malate 6), which corresponds to Al-ACTIVATED MALATE TRANSPORTER9 (Sl-ALMT9 in tomato), is the major quantitative trait locus responsible for variation in fruit malate accumulation among tomato genotypes. A 3-bp indel in the promoter region of Sl-ALMT9 was linked to high fruit malate content. Further analysis indicated that this indel disrupts a W-box binding site in the Sl-ALMT9 promoter, which prevents binding of the WRKY transcription repressor Sl-WRKY42, thereby alleviating the repression of Sl-ALMT9 expression and promoting high fruit malate accumulation. Evolutionary analysis revealed that this highly expressed Sl-ALMT9 allele was selected for during tomato domestication. Furthermore, vacuole membrane-localized Sl-ALMT9 increases in abundance following Al treatment, thereby elevating malate transport and enhancing Al resistance. PMID:28814642
Annamalai, Thavamathi; Lin, Chun-Ming; Gao, Xiang; Liu, Xinsheng; Lu, Zhongyan; Saif, Linda J; Wang, Qiuhong
2017-10-06
We investigated cross-protective immunity of a US spike-insertion deletion porcine epidemic diarrhea virus (PEDV) Iowa106 (S-INDEL) strain against the original US PEDV (PC21A) strain in nursing piglets. Piglets were inoculated orally with S-INDEL, PC21A or mock. At 20-29 days post-inoculation (dpi), all pigs were challenged with the PC21A strain. The S-INDEL-inoculated pigs had lower ileal IgA antibody secreting cells, serum IgA and neutralizing antibody titers compared with PC21A-inoculated pigs. No pigs in the PC21A-group developed diarrhea, whereas 81 and 100% of pigs in the S-INDEL and mock-groups had diarrhea post challenge, respectively. S-INDEL induced partial protective immunity against the original US PEDV strain.
Colinet, F G; Vanderick, S; Charloteaux, B; Eggen, A; Gengler, N; Renaville, B; Brasseur, R; Portetelle, D; Renaville, Robert
2009-01-01
The growth hormone secretagogue receptor (GHSR) is involved in the regulation of energetic homeostasis and GH secretion. In this study, the bovine GHSR gene was mapped to BTA1 between BL26 and BMS4004. Two different bovine GHSR CDS (GHSR1a and GHSR1b) were sequenced. Six polymorphisms (five SNPs and one 3-bp indel) were also identified, three of them leading to amino acid variations L24V, D194N, and Del R242. These variations are located in the extracellular N-terminal end, the exoloop 2, and the cytoloop 3 of the receptor, respectively.
Pramanik, Sreemanta; Li, Honghua
2002-01-01
Direct polymerase chain reaction (PCR) detection of insertion/deletion (indel) polymorphisms requires sample homozygosity. For the indel polymorphisms that have the deletion allele with a relatively low frequency in the autosomal regions, direct PCR detection becomes difficult or impossible. The present study is, to our knowledge, the first designed to directly detect indel polymorphisms in a human autosomal region (i.e., the immunoglobulin VH region), through use of single haploid sperm cells as subjects. Unique marker sequences (n=32), spaced at ∼5-kb intervals, were selected near the 3′ end of the VH region. A two-round multiplex PCR protocol was used to amplify these sequences from single sperm samples from nine unrelated healthy donors. The parental haplotypes of the donors were determined by examining the presence or absence of these markers. Seven clustered markers in 6 of the 18 haplotypes were missing and likely represented a 35–40-kb indel polymorphism. The genotypes of the donors, with respect to this polymorphism, perfectly matched the expectation under Hardy-Weinberg equilibrium. Three VH gene segments, of which two are functional, are affected by this polymorphism. According to these results, >10% of individuals in the human population may not have these gene segments in their genome, and ∼44% may have only one copy of these gene segments. The biological impact of this polymorphism would be very interesting to study. The approach used in the present study could be applied to understand the physical structure and diversity of all other autosomal regions. PMID:12442231
Low incidence of SNVs and indels in trio genomes of Cas9-mediated multiplex edited sheep.
Wang, Xiaolong; Liu, Jing; Niu, Yiyuan; Li, Yan; Zhou, Shiwei; Li, Chao; Ma, Baohua; Kou, Qifang; Petersen, Bjoern; Sonstegard, Tad; Huang, Xingxu; Jiang, Yu; Chen, Yulin
2018-05-25
The simplicity of the CRISPR/Cas9 system has enabled its widespread applications in generating animal models, functional genomic screening and in treating genetic and infectious diseases. However, unintended mutations produced by off-target CRISPR/Cas9 nuclease activity may lead to negative consequences. Especially, a very recent study found that gene editing can introduce hundreds of unintended mutations into the genome, and have attracted wide attention. To address the off-target concerns, urgent characterization of the CRISPR/Cas9-mediated off-target mutagenesis is highly anticipated. Here we took advantage of our previously generated gene-edited sheep and performed family trio-based whole genome sequencing which is capable of discriminating variants in the edited progenies that are inherited, naturally generated, or induced by genetic modification. Three family trios were re-sequenced at a high average depth of genomic coverage (~ 25.8×). After developing a pipeline to comprehensively analyze the sequence data for de novo single nucleotide variants, indels and structural variations from the genome; we only found a single unintended event in the form of a 2.4 kb inversion induced by site-specific double-strand breaks between two sgRNA targeting sites at the MSTN locus with a low incidence. We provide the first report on the fidelity of CRISPR-based modification for sheep genomes targeted simultaneously for gene breaks at three coding sequence locations. The trio-based sequencing approach revealed almost negligible off-target modifications, providing timely evidences of the safe application of genome editing in vivo with CRISPR/Cas9.
Zhang, Yanghai; Cui, Yang; Zhang, Xuelian; Wang, Yimin; Gao, Jiayang; Yu, Ting; Lv, Xiaoyan; Pan, Chuanying
2018-05-31
Steroidogenic acute regulatory protein (StAR), primarily expressed in Leydig cells (LCs) in the mammalian testes, is essential for testosterone biosynthesis and male fertility. However, no previous reports have explored the expression profiles, alternative splicing and genetic variations of StAR gene in pig. The aim of current study was to explore the expression profiles in different tissues and different types of testicular cells (LCs; spermatogonial stem cells, SSCs; Sertoli cells, SCs), to identify different splice variants and their expression levels, as well as to detect the indel polymorphism in pig StAR gene. Expression analysis results revealed that StAR was widely expressed in all tested tissues and the expression level in testis was significantly higher than that in other tissues (P < 0.01); among different types of testicular cells, the StAR mRNA expression level was significantly higher in LCs than others (P < 0.05). Furthermore, three splice variants, StAR-a, StAR-b and StAR-c, were first found in pig. Further study showed StAR-a was highly expressed in both testis and LCs when compared with other variants (P < 0.01), suggesting StAR-a was the primary variant at StAR gene post-transcription and may facilitate the combination and transportation of cholesterol with StAR. In addition, a 5-bp duplicated deletion (NC_010457.5:g.5524-5528 delACTTG) was verified in the porcine StAR gene, which was closely related to male testicular morphology traits (P < 0.05), and we speculated that the allele "D" of StAR gene might be a positive allele. Briefly, the current findings suggest that StAR and StAR-a play imperative roles in male fertility and the 5-bp indel can be a potential DNA marker for the marker-assisted selection in boar. Copyright © 2018 Elsevier Inc. All rights reserved.
Lin, Chun-Ming; Annamalai, Thavamathi; Liu, Xinsheng; Gao, Xiang; Lu, Zhongyan; El-Tholoth, Mohamed; Hu, Hui; Saif, Linda J; Wang, Qiuhong
2015-11-20
Although the original US porcine epidemic diarrhea virus (PEDV) was confirmed as highly virulent by multiple studies, the virulence of spike-insertion deletion (S-INDEL) PEDV strains is undefined. In this study, 3-4 day-old conventional suckling piglets were inoculated with S-INDEL PEDV Iowa106 (4 pig litters) to study its virulence. Two litters of age-matched piglets were inoculated with either the original US PEDV PC21A or mock as positive and negative controls, respectively. Subsequently, all pigs were challenged with the original US PEDV PC21A on 21-29 days post-inoculation (dpi) to assess cross-protection. All S-INDEL Iowa106- and the original US PC21A-inoculated piglets developed diarrhea. However, the severity of clinical signs, mortality (0-75%) and fecal PEDV RNA shedding titers varied among the four S-INDEL Iowa106-inoculated litters. Compared with the original PC21A, piglets euthanized/died acutely from S-INDEL Iowa106 infection had relatively milder villous atrophy, lower antigen scores and more limited intestinal infection. Two of four S-INDEL Iowa106-infected sows and the original PC21A-infected sow showed anorexia and watery diarrhea for 1-4 days. After the original PC21A challenge, a subset (13/16) of S-INDEL Iowa106-inoculated piglets developed diarrhea, whereas all (5/5) and no (0/4) pigs in the mock and original PC21A-inoculated pigs had diarrhea, respectively. Our results suggest that the virulence of S-INDEL PEDV Iowa106 was less than the original US PEDV PC21A in suckling pigs, with 100% morbidity and 18% (6/33) overall (0-75%) mortality in suckling pigs depending on factors such as the sow's health and lactation and the piglets' birth weight. Prior infection by S-INDEL Iowa106 provided partial cross-protection to piglets against the original PC21A challenge at 21-29 dpi.
García-Lor, Andrés; Luro, François; Navarro, Luis; Ollitrault, Patrick
2012-01-01
Genetic stratification associated with domestication history is a key parameter for estimating the pertinence of genetic association study within a gene pool. Previous molecular and phenotypic studies have shown that most of the diversity of cultivated citrus results from recombination between three main species: C. medica (citron), C. reticulata (mandarin) and C. maxima (pummelo). However, the precise contribution of each of these basic species to the genomes of secondary cultivated species, such as C. sinensis (sweet orange), C. limon (lemon), C. aurantium (sour orange), C. paradisi (grapefruit) and recent hybrids is unknown. Our study focused on: (1) the development of insertion-deletion (InDel) markers and their comparison with SSR markers for use in genetic diversity and phylogenetic studies; (2) the analysis of the contributions of basic taxa to the genomes of secondary species and modern cultivars and (3) the description of the organisation of the Citrus gene pool, to evaluate how genetic association studies should be done at the cultivated Citrus gene pool level. InDel markers appear to be better phylogenetic markers for tracing the contributions of the three ancestral species, whereas SSR markers are more useful for intraspecific diversity analysis. Most of the genetic organisation of the Citrus gene pool is related to the differentiation between C. reticulata, C. maxima and C. medica. High and generalised LD was observed, probably due to the initial differentiation between the basic species and a limited number of interspecific recombinations. This structure precludes association genetic studies at the genus level without developing additional recombinant populations from interspecific hybrids. Association genetic studies should also be affordable at intraspecific level in a less structured pool such as C. reticulata.
Metzger, Julia; Tonda, Raul; Beltran, Sergi; Agueda, Lídia; Gut, Marta; Distl, Ottmar
2014-07-04
Domestication has shaped the horse and lead to a group of many different types. Some have been under strong human selection while others developed in close relationship with nature. The aim of our study was to perform next generation sequencing of breed and non-breed horses to provide an insight into genetic influences on selective forces. Whole genome sequencing of five horses of four different populations revealed 10,193,421 single nucleotide polymorphisms (SNPs) and 1,361,948 insertion/deletion polymorphisms (indels). In comparison to horse variant databases and previous reports, we were able to identify 3,394,883 novel SNPs and 868,525 novel indels. We analyzed the distribution of individual variants and found significant enrichment of private mutations in coding regions of genes involved in primary metabolic processes, anatomical structures, morphogenesis and cellular components in non-breed horses and in contrast to that private mutations in genes affecting cell communication, lipid metabolic process, neurological system process, muscle contraction, ion transport, developmental processes of the nervous system and ectoderm in breed horses. Our next generation sequencing data constitute an important first step for the characterization of non-breed in comparison to breed horses and provide a large number of novel variants for future analyses. Functional annotations suggest specific variants that could play a role for the characterization of breed or non-breed horses.
Kong, Tingting; Chen, Yahao; Guo, Yuxin; Wei, Yuanyuan; Jin, Xiaoye; Xie, Tong; Mu, Yuling; Dong, Qian; Wen, Shaoqing; Zhou, Boyan; Zhang, Li; Shen, Chunmei; Zhu, Bofeng
2017-01-01
In the present study, we assessed the genetic diversities of the Chinese Kazak ethnic group on the basis of 30 well-chosen autosomal insertion and deletion loci and explored the genetic relationships between Kazak and 23 reference groups. We detected the level of the expected heterozygosity ranging from 0.3605 at HLD39 locus to 0.5000 at HLD136 locus and the observed heterozygosity ranging from 0.3548 at HLD39 locus to 0.5283 at HLD136 locus. The combined power of discrimination and the combined power of exclusion for all 30 loci in the studied Kazak group were 0.999999999999128 and 0.9945, respectively. The dataset generated in this study indicated the panel of 30 InDels was highly efficient in forensic individual identifcation but may not have enough power in paternity cases. The results of the interpopulation differentiations, PCA plots, phylogenetic trees and STRUCTURE analyses showed a close genetic affiliation between the Kazak and Uigur group. PMID:28915619
Chen, Qi; Gauger, Phillip C; Stafne, Molly R; Thomas, Joseph T; Madson, Darin M; Huang, Haiyan; Zheng, Ying; Li, Ganwu; Zhang, Jianqiang
2016-05-01
At least two genetically different porcine epidemic diarrhoea virus (PEDV) strains have been identified in the USA: US PEDV prototype and S-INDEL-variant strains. The objective of this study was to compare the pathogenicity differences of the US PEDV prototype and S-INDEL-variant strains in conventional neonatal piglets under experimental infections. Fifty PEDV-negative 5-day-old pigs were divided into five groups of ten pigs each and were inoculated orogastrically with three US PEDV prototype isolates (IN19338/2013, NC35140/2013 and NC49469/2013), an S-INDEL-variant isolate (IL20697/2014), and virus-negative culture medium, respectively, with virus titres of 104 TCID50 ml- 1, 10 ml per pig. All three PEDV prototype isolates tested in this study, regardless of their phylogenetic clades, had similar pathogenicity and caused severe enteric disease in 5-day-old pigs as evidenced by clinical signs, faecal virus shedding, and gross and histopathological lesions. Compared with pigs inoculated with the three US PEDV prototype isolates, pigs inoculated with the S-INDEL-variant isolate had significantly diminished clinical signs, virus shedding in faeces, gross lesions in small intestines, caeca and colons, histopathological lesions in small intestines, and immunohistochemistry staining in ileum. However, the US PEDV prototype and the S-INDEL-variant strains induced similar viraemia levels in inoculated pigs. Whole genome sequences of the PEDV prototype and S-INDEL-variant strains were determined, but the molecular basis of virulence differences between these PEDV strains remains to be elucidated using a reverse genetics approach.
Pearce, J.M.
2006-01-01
Insertions and deletions (indels) result in sequences of various lengths when homologous gene regions are compared among individuals or species. Although indels are typically phylogenetically informative, occurrence and incorporation of these characters as gaps in intraspecific population genetic data sets are rarely discussed. Moreover, the impact of gaps on estimates of fixation indices, such as FST, has not been reviewed. Here, I summarize the occurrence and population genetic signal of indels among 60 published studies that involved alignments of multiple sequences from the mitochondrial DNA (mtDNA) control region of vertebrate taxa. Among 30 studies observing indels, an average of 12% of both variable and parsimony-informative sites were composed of these sites. There was no consistent trend between levels of population differentiation and the number of gap characters in a data block. Across all studies, the average influence on estimates of ??ST was small, explaining only an additional 1.8% of among population variance (range 0.0-8.0%). Studies most likely to observe an increase in ??ST with the inclusion of gap characters were those with < 20 variable sites, but a near equal number of studies with few variable sites did not show an increase. In contrast to studies at interspecific levels, the influence of indels for intraspecific population genetic analyses of control region DNA appears small, dependent upon total number of variable sites in the data block, and related to species-specific characteristics and the spatial distribution of mtDNA lineages that contain indels. ?? 2006 Blackwell Publishing Ltd.
The potential European genetic predisposition for non-contact anterior cruciate ligament injury.
Astur, Diego Costa; Andrade, Edilson; Arliani, Gustavo Gonçalves; Debieux, Pedro; Loyola, Leonor Casilla; Dos Santos, Sidney Emanuel Batista; Burbano, Rommel Mario Rodriguez; Leal, Mariana Ferreira; Cohen, Moises
2018-05-04
Previous research has provided evidence of a hereditary predisposition for anterior cruciate ligament (ACL) injury. The purpose of this study was to evaluate the association between ancestral population genetics and risk of non-contact ACL injuries. Blood samples were collected from 177 individuals with a history of non-contact ACL injury and 556 non-injured control individuals for analysis of the genetic material through the use of a panel of 48 INDELs ancestry genetic markers from three ancestral origins. Among patients with non-contact ACL injury, 82% were male and 18% were female. In the control group, 78% were male, and 22% were female. The mean age of the non-contact ACL injury group was 31.7 years (± 10.2), and the control group was 33.8 years (± 13.2). The individual genetic contribution from INDELs of each ancestral origin varied considerably: ranging between 1.5-94.8% contribution for INDELs of African origin (mean of 21.4% of INDELs); between 2 and 96.1% contribution for INDELs of European origin (mean of 66.7% of INDELs); and between 1.3-96.4% contribution for INDELs of Amerindian origin (mean of 11.7% of INDELs). When comparing paired subjects from the non-contact ACL and control groups, the genetic analysis showed that the European ancestry score was higher in the non-contact ACL group than control group (0.70 ± 0.21 vs 0.63 ± 0.22 respectively, p < 0.001), whereas African ancestry scores (ACL group 0.18 ± 0.18 vs control group 0.24 ± 0.21, p < 0.001) and Amerindian ancestry scores (ACL group 0.11 ± 0.09 vs control group 0.12 ± 0.10, n.s.) were lower among the non-contact ACL group than in controls. European INDELs markers were found to represent a potential genetic predisposition for non-contact ACL injuries when compared to African and Amerindian INDELs. This study has the potential to correlate a measurable and distinct genetic marker with risk of a non-contact ACL injury. Thus, it increases knowledge base and volume of molecular and genetical factors associated with this pathology. Furthermore, this study provides guidance and evidence for the development of genetic risk-screening panels for non-contact ACL injury. Level III Diagnostic Study.
Kurokawa, S; Shibaike, H; Akiyama, H; Yoshimura, Y
2004-12-01
A comparison of chloroplast DNA (cpDNA) sequences was carried out between the crop and weed types of Abutilon theophrasti to clarify the seed source of the present weedy velvetleaf in Japan. A sequencing analysis of approx. 6% of the chloroplast genome (ca 10 kbp) detected three nucleotide substitutions, one six-base-pair insertion/deletion (indel) and one 30-base pair inversion, which distinguish two haplotypes of cpDNA. A PCR-based survey of the indel and the inversion revealed that the 93 accessions of velvetleaf collected from the world could be divided into two groups. A morphological marker (capsule color) could be used to discriminate the crop type and the weed type, and hence, along with cpDNA haplotype, to distinguish three genotypes (Type I, II, and III). All Japanese cultivars and crop accessions from other countries were Type I. Weed types were divided into Type II and III. All of the samples from the USA, and the samples taken from grain imports to Japan were Type III. Since most of the weedy types distributed in Japan were of Type III, it is argued that they were introduced as seeds in the imported grain. We also found that the Type II plants sporadically occurred in Japan. It is suggested that they originated as hybrids, with indigenous cultivars as the maternal ancestor. Such hybrids must have survived since the cessation of velvetleaf cultivation about a century ago.
Identification of genomic indels and structural variations using split reads
2011-01-01
Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful. PMID:21787423
Population-based structural variation discovery with Hydra-Multi.
Lindberg, Michael R; Hall, Ira M; Quinlan, Aaron R
2015-04-15
Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA). Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra. aaronquinlan@gmail.com or ihall@genome.wustl.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.; ...
2016-06-08
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Hamilton, Natasha A; Tammen, Imke; Raadsma, Herman W
2013-01-01
Angiotensin converting enzyme (ACE) is essential for control of blood pressure. The human ACE gene contains an intronic Alu indel (I/D) polymorphism that has been associated with variation in serum enzyme levels, although the functional mechanism has not been identified. The polymorphism has also been associated with cardiovascular disease, type II diabetes, renal disease and elite athleticism. We have characterized the ACE gene in horses of breeds selected for differing physical abilities. The equine gene has a similar structure to that of all known mammalian ACE genes. Nine common single nucleotide polymorphisms (SNPs) discovered in pooled DNA were found to be inherited in nine haplotypes. Three of these SNPs were located in intron 16, homologous to that containing the Alu polymorphism in the human. A highly conserved 18 bp sequence, also within that intron, was identified as being a potential binding site for the transcription factors Oct-1, HFH-1 and HNF-3β, and lies within a larger area of higher than normal homology. This putative regulatory element may contribute to regulation of the documented inter-individual variation in human circulating enzyme levels, for which a functional mechanism is yet to be defined. Two equine SNPs occurred within the conserved area in intron 16, although neither of them disrupted the putative binding site. We propose a possible regulatory mechanism of the ACE gene in mammalian species which was previously unknown. This advance will allow further analysis leading to a better understanding of the mechanisms underpinning the associations seen between the human Alu polymorphism and enzyme levels, cardiovascular disease states and elite athleticism.
Hamilton, Natasha A.; Tammen, Imke; Raadsma, Herman W.
2013-01-01
Angiotensin converting enzyme (ACE) is essential for control of blood pressure. The human ACE gene contains an intronic Alu indel (I/D) polymorphism that has been associated with variation in serum enzyme levels, although the functional mechanism has not been identified. The polymorphism has also been associated with cardiovascular disease, type II diabetes, renal disease and elite athleticism. We have characterized the ACE gene in horses of breeds selected for differing physical abilities. The equine gene has a similar structure to that of all known mammalian ACE genes. Nine common single nucleotide polymorphisms (SNPs) discovered in pooled DNA were found to be inherited in nine haplotypes. Three of these SNPs were located in intron 16, homologous to that containing the Alu polymorphism in the human. A highly conserved 18 bp sequence, also within that intron, was identified as being a potential binding site for the transcription factors Oct-1, HFH-1 and HNF-3β, and lies within a larger area of higher than normal homology. This putative regulatory element may contribute to regulation of the documented inter-individual variation in human circulating enzyme levels, for which a functional mechanism is yet to be defined. Two equine SNPs occurred within the conserved area in intron 16, although neither of them disrupted the putative binding site. We propose a possible regulatory mechanism of the ACE gene in mammalian species which was previously unknown. This advance will allow further analysis leading to a better understanding of the mechanisms underpinning the associations seen between the human Alu polymorphism and enzyme levels, cardiovascular disease states and elite athleticism. PMID:23408978
The Genome of the Netherlands: design, and project goals
Boomsma, Dorret I; Wijmenga, Cisca; Slagboom, Eline P; Swertz, Morris A; Karssen, Lennart C; Abdellaoui, Abdel; Ye, Kai; Guryev, Victor; Vermaat, Martijn; van Dijk, Freerk; Francioli, Laurent C; Hottenga, Jouke Jan; Laros, Jeroen F J; Li, Qibin; Li, Yingrui; Cao, Hongzhi; Chen, Ruoyan; Du, Yuanping; Li, Ning; Cao, Sujie; van Setten, Jessica; Menelaou, Androniki; Pulit, Sara L; Hehir-Kwa, Jayne Y; Beekman, Marian; Elbers, Clara C; Byelas, Heorhiy; de Craen, Anton J M; Deelen, Patrick; Dijkstra, Martijn; den Dunnen, Johan T; de Knijff, Peter; Houwing-Duistermaat, Jeanine; Koval, Vyacheslav; Estrada, Karol; Hofman, Albert; Kanterakis, Alexandros; Enckevort, David van; Mai, Hailiang; Kattenberg, Mathijs; van Leeuwen, Elisabeth M; Neerincx, Pieter B T; Oostra, Ben; Rivadeneira, Fernanodo; Suchiman, Eka H D; Uitterlinden, Andre G; Willemsen, Gonneke; Wolffenbuttel, Bruce H; Wang, Jun; de Bakker, Paul I W; van Ommen, Gert-Jan; van Duijn, Cornelia M
2014-01-01
Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent–offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910–1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14–15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project. PMID:23714750
The evolution of small insertions and deletions in the coding genes of Drosophila melanogaster.
Chong, Zechen; Zhai, Weiwei; Li, Chunyan; Gao, Min; Gong, Qiang; Ruan, Jue; Li, Juan; Jiang, Lan; Lv, Xuemei; Hungate, Eric; Wu, Chung-I
2013-12-01
Studies of protein evolution have focused on amino acid substitutions with much less systematic analysis on insertion and deletions (indels) in protein coding genes. We hence surveyed 7,500 genes between Drosophila melanogaster and D. simulans, using D. yakuba as an outgroup for this purpose. The evolutionary rate of coding indels is indeed low, at only 3% of that of nonsynonymous substitutions. As coding indels follow a geometric distribution in size and tend to fall in low-complexity regions of proteins, it is unclear whether selection or mutation underlies this low rate. To resolve the issue, we collected genomic sequences from an isogenic African line of D. melanogaster (ZS30) at a high coverage of 70× and analyzed indel polymorphism between ZS30 and the reference genome. In comparing polymorphism and divergence, we found that the divergence to polymorphism ratio (i.e., fixation index) for smaller indels (size ≤ 10 bp) is very similar to that for synonymous changes, suggesting that most of the within-species polymorphism and between-species divergence for indels are selectively neutral. Interestingly, deletions of larger sizes (size ≥ 11 bp and ≤ 30 bp) have a much higher fixation index than synonymous mutations and 44.4% of fixed middle-sized deletions are estimated to be adaptive. To our surprise, this pattern is not found for insertions. Protein indel evolution appear to be in a dynamic flux of neutrally driven expansion (insertions) together with adaptive-driven contraction (deletions), and these observations provide important insights for understanding the fitness of new mutations as well as the evolutionary driving forces for genomic evolution in Drosophila species.
Wendt, Frank R; Warshauer, David H; Zeng, Xiangpei; Churchill, Jennifer D; Novroski, Nicole M M; Song, Bing; King, Jonathan L; LaRue, Bobby L; Budowle, Bruce
2016-11-01
Short tandem repeat (STR) loci are the traditional markers used for kinship, missing persons, and direct comparison human identity testing. These markers hold considerable value due to their highly polymorphic nature, amplicon size, and ability to be multiplexed. However, many STRs are still too large for use in analysis of highly degraded DNA. Small bi-allelic polymorphisms, such as insertions/deletions (INDELs), may be better suited for analyzing compromised samples, and their allele size differences are amenable to analysis by capillary electrophoresis. The INDEL marker allelic states range in size from 2 to 6 base pairs, enabling small amplicon size. In addition, heterozygote balance may be increased by minimizing preferential amplification of the smaller allele, as is more common with STR markers. Multiplexing a large number of INDELs allows for generating panels with high discrimination power. The Nextera™ Rapid Capture Custom Enrichment Kit (Illumina, Inc., San Diego, CA) and massively parallel sequencing (MPS) on the Illumina MiSeq were used to sequence 68 well-characterized INDELs in four major US population groups. In addition, the STR Allele Identification Tool: Razor (STRait Razor) was used in a novel way to analyze INDEL sequences and detect adjacent single nucleotide polymorphisms (SNPs) and other polymorphisms. This application enabled the discovery of unique allelic variants, which increased the discrimination power and decreased the single-locus random match probabilities (RMPs) of 22 of these well-characterized INDELs which can be considered as microhaplotypes. These findings suggest that additional microhaplotypes containing human identification (HID) INDELs may exist elsewhere in the genome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Peanut diseases, such as leaf spot and spotted wilt caused by Tomato Spotted Wilt Virus, can significantly reduce yield and quality. Application of marker assisted plant breeding requires the development and validation of different types of DNA molecular markers. Nearly 10,000 SSR-based molecular ...
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.
Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti
2016-10-06
With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
Jaramillo-Correa, J P; Bousquet, J; Beaulieu, J; Isabel, N; Perron, M; Bouillé, M
2003-05-01
Primers previously developed to amplify specific non-coding regions of the mitochondrial genome in Angiosperms, and new primers for additional non-coding mtDNA regions, were tested for their ability to direct DNA amplification in 12 conifer taxa and to detect sequence-tagged-site (STS) polymorphisms within and among eight species in Picea. Out of 12 primer pairs, nine were successful at amplifying mtDNA in most of the taxa surveyed. In conifers, indels and substitutions were observed for several loci, allowing them to distinguish between families, genera and, in some cases, between species within genera. In Picea, interspecific polymorphism was detected for four loci, while intraspecific variation was observed for three of the mtDNA regions studied. One of these (SSU rRNA V1 region) exhibited indel polymorphisms, and the two others ( nad1 intron b/c and nad5 intron1) revealed restriction differences after digestion with Sau3AI (PCR-RFLP). A fourth locus, the nad4L- orf25 intergenic region, showed a multibanding pattern for most of the spruce species, suggesting a possible gene duplication. Maternal inheritance, expected for mtDNA in conifers, was observed for all polymorphic markers except the intergenic region nad4L- orf25. Pooling of the variation observed with the remaining three markers resulted in two to six different mtDNA haplotypes within the different species of Picea. Evidence for intra-genomic recombination was observed in at least two taxa. Thus, these mitotypes are likely to be more informative than single-locus haplotypes. They should be particularly useful for the study of biogeography and the dynamics of hybrid zones.
Alsmadi, Osama; Hebbar, Prashantha; Antony, Dinu; Behbehani, Kazem; Thanaraj, Thangavel Alphonse
2014-01-01
Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are ‘novel’. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10−16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3′ UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian tribe subgroup. The full-length genome sequences and the identified variants are available at ftp://dgr.dasmaninstitute.org and http://dgr.dasmaninstitute.org/DGR/gb.html. PMID:24896259
Dillon, Marcus M; Sung, Way; Sebra, Robert; Lynch, Michael; Cooper, Vaughn S
2017-01-01
The vast diversity in nucleotide composition and architecture among bacterial genomes may be partly explained by inherent biases in the rates and spectra of spontaneous mutations. Bacterial genomes with multiple chromosomes are relatively unusual but some are relevant to human health, none more so than the causative agent of cholera, Vibrio cholerae Here, we present the genome-wide mutation spectra in wild-type and mismatch repair (MMR) defective backgrounds of two Vibrio species, the low-%GC squid symbiont V. fischeri and the pathogen V. cholerae, collected under conditions that greatly minimize the efficiency of natural selection. In apparent contrast to their high diversity in nature, both wild-type V. fischeri and V. cholerae have among the lowest rates for base-substitution mutations (bpsms) and insertion-deletion mutations (indels) that have been measured, below 10 - 3 /genome/generation. Vibrio fischeri and V. cholerae have distinct mutation spectra, but both are AT-biased and produce a surprising number of multi-nucleotide indels. Furthermore, the loss of a functional MMR system caused the mutation spectra of these species to converge, implying that the MMR system itself contributes to species-specific mutation patterns. Bpsm and indel rates varied among genome regions, but do not explain the more rapid evolutionary rates of genes on chromosome 2, which likely result from weaker purifying selection. More generally, the very low mutation rates of Vibrio species correlate inversely with their immense population sizes and suggest that selection may not only have maximized replication fidelity but also optimized other polygenic traits relative to the constraints of genetic drift. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Digital PCR analysis of maternal plasma for noninvasive detection of sickle cell anemia.
Barrett, Angela N; McDonnell, Thomas C R; Chan, K C Allen; Chitty, Lyn S
2012-06-01
Cell-free fetal DNA (cffDNA) constitutes approximately 10% of the cell-free DNA in maternal plasma and is a suitable source of fetal genetic material for noninvasive prenatal diagnosis (NIPD). The objective of this study was to determine the feasibility of using digital PCR for NIPD in pregnancies at risk of sickle cell anemia. Minor-groove binder (MGB) TaqMan probes were designed to discriminate between wild-type hemoglobin A and mutant (hemoglobin S) alleles encoded by the HBB (hemoglobin, beta) gene in cffDNA isolated from maternal plasma samples obtained from pregnancies at risk of sickle cell anemia. The fractional fetal DNA concentration was assessed in male-bearing pregnancies with a digital PCR assay for the Y chromosome-specific marker DYS14. In pregnancies with a female fetus, a panel of biallelic insertion/deletion polymorphism (indel) markers was developed for the quantification of the fetal DNA fraction. We used digital real-time PCR to analyze the dosage of the variant encoding hemoglobin S relative to that encoding wild-type hemoglobin A. The sickle cell genotype was correctly determined in 82% (37 of 45) of male fetuses and 75% (15 of 20) of female fetuses. Mutation status was determined correctly in 100% of the cases (25 samples) with fractional fetal DNA concentrations >7%. The panel of indels was informative in 65% of the female-bearing pregnancies. Digital PCR can be used to determine the genotype of fetuses at risk for sickle cell anemia. Optimization of the fractional fetal DNA concentration is essential. More-informative indel markers are needed for this assay's comprehensive use in cases of a female fetus.
Wang, Quanxiu; Zhao, Hu; Jiang, Junpeng; Xu, Jiuyue; Xie, Weibo; Fu, Xiangkui; Liu, Chang; He, Yuqing; Wang, Gongwei
2017-01-01
The photoprotective processes conferred by nonphotochemical quenching (NPQ) serve fundamental roles in maintaining plant fitness and sustainable yield. So far, few loci have been reported to be involved in natural variation of NPQ capacity in rice (Oryza sativa), and the extents of variation explored are very limited. Here we conducted a genome-wide association study (GWAS) for NPQ capacity using a diverse worldwide collection of 529 O. sativa accessions. A total of 33 significant association loci were identified. To check the validity of the GWAS signals, three F2 mapping populations with parents selected from the association panel were constructed and assayed. All QTLs detected in mapping populations could correspond to at least one GWAS signal, indicating the GWAS results were quite reliable. OsPsbS1 was repeatedly detected and explained more than 40% of the variation in the whole association population in two years, and demonstrated to be a common major QTL in all three mapping populations derived from inter-group crosses. We revealed 43 single nucleotide polymorphisms (SNPs) and 7 insertions and deletions (InDels) within a 6,997-bp DNA fragment of OsPsbS1, but found no non-synonymous SNPs or InDels in the coding region, indicating the PsbS1 protein sequence is highly conserved. Haplotypes with the 2,674-bp insertion in the promoter region exhibited significantly higher NPQ values and higher expression levels of OsPsbS1. The OsPsbS1 RNAi plants and CRISPR/Cas9 mutants exhibited drastically decreased NPQ values. OsPsbS1 had specific and high-level expression in green tissues of rice. However, we didn't find significant function for OsPsbS2, the other rice PsbS homologue. Manipulation of the significant loci or candidate genes identified may enhance photoprotection and improve photosynthesis and yield in rice. PMID:29081789
Wang, Quanxiu; Zhao, Hu; Jiang, Junpeng; Xu, Jiuyue; Xie, Weibo; Fu, Xiangkui; Liu, Chang; He, Yuqing; Wang, Gongwei
2017-01-01
The photoprotective processes conferred by nonphotochemical quenching (NPQ) serve fundamental roles in maintaining plant fitness and sustainable yield. So far, few loci have been reported to be involved in natural variation of NPQ capacity in rice ( Oryza sativa ), and the extents of variation explored are very limited. Here we conducted a genome-wide association study (GWAS) for NPQ capacity using a diverse worldwide collection of 529 O. sativa accessions. A total of 33 significant association loci were identified. To check the validity of the GWAS signals, three F2 mapping populations with parents selected from the association panel were constructed and assayed. All QTLs detected in mapping populations could correspond to at least one GWAS signal, indicating the GWAS results were quite reliable. OsPsbS1 was repeatedly detected and explained more than 40% of the variation in the whole association population in two years, and demonstrated to be a common major QTL in all three mapping populations derived from inter-group crosses. We revealed 43 single nucleotide polymorphisms (SNPs) and 7 insertions and deletions (InDels) within a 6,997-bp DNA fragment of OsPsbS1 , but found no non-synonymous SNPs or InDels in the coding region, indicating the PsbS1 protein sequence is highly conserved. Haplotypes with the 2,674-bp insertion in the promoter region exhibited significantly higher NPQ values and higher expression levels of OsPsbS1 . The OsPsbS1 RNAi plants and CRISPR/Cas9 mutants exhibited drastically decreased NPQ values. OsPsbS1 had specific and high-level expression in green tissues of rice. However, we didn't find significant function for OsPsbS2 , the other rice PsbS homologue. Manipulation of the significant loci or candidate genes identified may enhance photoprotection and improve photosynthesis and yield in rice.
Walter, Vonn; Patel, Nirali M.; Eberhard, David A.; Hayward, Michele C.; Salazar, Ashley H.; Jo, Heejoon; Soloway, Matthew G.; Wilkerson, Matthew D.; Parker, Joel S.; Yin, Xiaoying; Zhang, Guosheng; Siegel, Marni B.; Rosson, Gary B.; Earp, H. Shelton; Sharpless, Norman E.; Gulley, Margaret L.; Weck, Karen E.
2015-01-01
The recent FDA approval of the MiSeqDx platform provides a unique opportunity to develop targeted next generation sequencing (NGS) panels for human disease, including cancer. We have developed a scalable, targeted panel-based assay termed UNCseq, which involves a NGS panel of over 200 cancer-associated genes and a standardized downstream bioinformatics pipeline for detection of single nucleotide variations (SNV) as well as small insertions and deletions (indel). In addition, we developed a novel algorithm, NGScopy, designed for samples with sparse sequencing coverage to detect large-scale copy number variations (CNV), similar to human SNP Array 6.0 as well as small-scale intragenic CNV. Overall, we applied this assay to 100 snap-frozen lung cancer specimens lacking same-patient germline DNA (07–0120 tissue cohort) and validated our results against Sanger sequencing, SNP Array, and our recently published integrated DNA-seq/RNA-seq assay, UNCqeR, where RNA-seq of same-patient tumor specimens confirmed SNV detected by DNA-seq, if RNA-seq coverage depth was adequate. In addition, we applied the UNCseq assay on an independent lung cancer tumor tissue collection with available same-patient germline DNA (11–1115 tissue cohort) and confirmed mutations using assays performed in a CLIA-certified laboratory. We conclude that UNCseq can identify SNV, indel, and CNV in tumor specimens lacking germline DNA in a cost-efficient fashion. PMID:26076459
GAPTrap: A Simple Expression System for Pluripotent Stem Cells and Their Derivatives.
Kao, Tim; Labonne, Tanya; Niclis, Jonathan C; Chaurasia, Ritu; Lokmic, Zerina; Qian, Elizabeth; Bruveris, Freya F; Howden, Sara E; Motazedian, Ali; Schiesser, Jacqueline V; Costa, Magdaline; Sourris, Koula; Ng, Elizabeth; Anderson, David; Giudice, Antonietta; Farlie, Peter; Cheung, Michael; Lamande, Shireen R; Penington, Anthony J; Parish, Clare L; Thomson, Lachlan H; Rafii, Arash; Elliott, David A; Elefanty, Andrew G; Stanley, Edouard G
2016-09-13
The ability to reliably express fluorescent reporters or other genes of interest is important for using human pluripotent stem cells (hPSCs) as a platform for investigating cell fates and gene function. We describe a simple expression system, designated GAPTrap (GT), in which reporter genes, including GFP, mCherry, mTagBFP2, luc2, Gluc, and lacZ are inserted into the GAPDH locus in hPSCs. Independent clones harboring variations of the GT vectors expressed remarkably consistent levels of the reporter gene. Differentiation experiments showed that reporter expression was reliably maintained in hematopoietic cells, cardiac mesoderm, definitive endoderm, and ventral midbrain dopaminergic neurons. Similarly, analysis of teratomas derived from GT-lacZ hPSCs showed that β-galactosidase expression was maintained in a spectrum of cell types representing derivatives of the three germ layers. Thus, the GAPTrap vectors represent a robust and straightforward tagging system that enables indelible labeling of PSCs and their differentiated derivatives. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
2013-01-01
Background Rice blast caused by the fungus Magnaporthe oryzae is an important disease in virtually every rice growing region of the world, which leads to significant annual decreases of grain quality and yield. To prevent disease, resistance genes in rice have been cloned and introduced into susceptible cultivars. However, introduced resistance can often be broken within few years of release, often due to mutation of cognate avirulence genes in fungal field populations. Results To better understand the pattern of mutation of M. oryzae field isolates under natural selection forces, we used a next generation sequencing approach to analyze the genomes of two field isolates FJ81278 and HN19311, as well as the transcriptome of FJ81278. By comparing the de novo genome assemblies of the two isolates against the finished reference strain 70–15, we identified extensive polymorphisms including unique genes, SNPs (single nucleotide polymorphism) and indels, structural variations, copy number variations, and loci under strong positive selection. The 1.75 MB of isolate-specific genome content carrying 118 novel genes from FJ81278, and 0.83 MB from HN19311 were also identified. By analyzing secreted proteins carrying polymorphisms, in total 256 candidate virulence effectors were found and 6 were chosen for functional characterization. Conclusions We provide results from genome comparison analysis showing extensive genome variation, and generated a list of M. oryzae candidate virulence effectors for functional characterization. PMID:24341723
Hussing, C; Bytyci, R; Huber, C; Morling, N; Børsting, C
2018-05-24
Some STR loci have internal sequence variations, which are not revealed by the standard STR typing methods used in forensic genetics (PCR and fragment length analysis by capillary electrophoresis (CE)). Typing of STRs with next-generation sequencing (NGS) uncovers the sequence variation in the repeat region and in the flanking regions. In this study, 363 Danish individuals were typed for 56 STRs (26 autosomal STRs, 24 Y-STRs, and 6 X-STRs) using the ForenSeq™ DNA Signature Prep Kit to establish a Danish STR sequence database. Increased allelic diversity was observed in 34 STRs by the PCR-NGS assay. The largest increases were found in DYS389II and D12S391, where the numbers of sequenced alleles were around four times larger than the numbers of alleles determined by repeat length alone. Thirteen SNPs and one InDel were identified in the flanking regions of 12 STRs. Furthermore, 36 single positions and five longer stretches in the STR flanking regions were found to have dubious genotyping quality. The combined match probability of the 26 autosomal STRs was 10,000 times larger using the PCR-NGS assay than by using PCR-CE. The typical paternity indices for trios and duos were 500 and 100 times larger, respectively, than those obtained with PCR-CE. The assay also amplified 94 SNPs selected for human identification. Eleven of these loci were not in Hardy-Weinberg equilibrium in the Danish population, most likely because the minimum threshold for allele calling (30 reads) in the ForenSeq™ Universal Analysis Software was too low and frequent allele dropouts were not detected.
Santos, C; Fondevila, M; Ballard, D; Banemann, R; Bento, A M; Børsting, C; Branicki, W; Brisighelli, F; Burrington, M; Capal, T; Chaitanya, L; Daniel, R; Decroyer, V; England, R; Gettings, K B; Gross, T E; Haas, C; Harteveld, J; Hoff-Olsen, P; Hoffmann, A; Kayser, M; Kohler, P; Linacre, A; Mayr-Eduardoff, M; McGovern, C; Morling, N; O'Donnell, G; Parson, W; Pascali, V L; Porto, M J; Roseth, A; Schneider, P M; Sijen, T; Stenzl, V; Court, D Syndercombe; Templeton, J E; Turanska, M; Vallone, P M; Oorschot, R A H van; Zatkalikova, L; Carracedo, Á; Phillips, C
2015-11-01
There is increasing interest in forensic ancestry tests, which are part of a growing number of DNA analyses that can enhance routine profiling by obtaining additional genetic information about unidentified DNA donors. Nearly all ancestry tests use single nucleotide polymorphisms (SNPs), but these currently rely on SNaPshot single base extension chemistry that can fail to detect mixed DNA. Insertion-deletion polymorphism (Indel) tests have been developed using dye-labeled primers that allow direct capillary electrophoresis detection of PCR products (PCR-to-CE). PCR-to-CE maintains the direct relationship between input DNA and signal strength as each marker is detected with a single dye, so mixed DNA is more reliably detected. We report the results of a collaborative inter-laboratory exercise of 19 participants (15 from the EDNAP European DNA Profiling group) that assessed a 34-plex SNP test using SNaPshot and a 46-plex Indel test using PCR-to-CE. Laboratories were asked to type five samples with different ancestries and detect an additional mixed DNA sample. Statistical inference of ancestry was made by participants using the Snipper online Bayes analysis portal plus an optional PCA module that analyzes the genotype data alongside calculation of Bayes likelihood ratios. Exercise results indicated consistent genotyping performance from both tests, reaching a particularly high level of reliability for the Indel test. SNP genotyping gave 93.5% concordance (compared to the organizing laboratory's data) that rose to 97.3% excluding one laboratory with a large number of miscalled genotypes. Indel genotyping gave a higher concordance rate of 99.8% and a reduced no-call rate compared to SNP analysis. All participants detected the mixture from their Indel peak height data and successfully assigned the correct ancestry to the other samples using Snipper, with the exception of one laboratory with SNP miscalls that incorrectly assigned ancestry of two samples and did not obtain informative likelihood ratios for a third. Therefore, successful ancestry assignments were achieved by participants in 92 of 95 Snipper analyses. This exercise demonstrates that ancestry inference tests based on binary marker sets can be readily adopted by laboratories that already have well-established CE regimes in place. The Indel test proved to be easy to use and allowed all exercise participants to detect the DNA mixture as well as achieving complete and concordant profiles in nearly all cases. Lastly, two participants successfully ran parallel next-generation sequencing analyses (each using different systems) and achieved high levels of genotyping concordance using the exercise PCR primer mixes unmodified. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Zheng, Jun; Liu, Hong; Wang, Yuquan; Wang, Lanfen; Chang, Xiaoping; Jing, Ruilian; Hao, Chenyang; Zhang, Xueyong
2014-01-01
In this study, TaTEF-7A, a member of the transcript elongation factor gene family, and its flanking sequences were isolated. TaTEF-7A was located on chromosome 7A and was flanked by markers Xwmc83 and XP3156.3. Subcellular localization revealed that TaTEF-7A protein was localized in the nucleus. This gene was expressed in all organs, but the highest expression occurred in young spikes and developing seeds. Overexpression of TaTEF-7A in Arabidopsis thaliana produced pleiotropic effects on vegetative and reproductive development that enhanced grain length, silique number, and silique length. No diversity was found in the coding region of TaTEF-7A, but 16 single nucleotide polymorphisms and Indels were detected in the promoter regions of different cultivars. Markers based on sequence variations in the promoter regions (InDel-629 and InDel-604) were developed, and three haplotypes were identified based on those markers. Haplotype–trait association analysis of the Chinese wheat mini core collection revealed that TaTEF-7A was significantly associated with grain number per spike. Phenotyping of near-isogenic lines (NILs) confirmed that TaTEF-7A increases potential grain yield and yield-related traits. Frequency changes in favoured haplotypes gradually increased in cultivars released in China from the 1940s. Geographic distributions of favoured haplotypes were characterized in six major wheat production regions worldwide. The presence of Hap-7A-3, the favoured haplotype, showed a positive correlation with yield in a global set of breeding lines. These results suggest that TaTEF-7A is a functional regulatory factor for grain number per spike and provide a basis for marker-assisted selection. PMID:25056774
Alvarado, David M; Yang, Ping; Druley, Todd E; Lovett, Michael; Gurnett, Christina A
2014-06-01
Despite declining sequencing costs, few methods are available for cost-effective single-nucleotide polymorphism (SNP), insertion/deletion (INDEL) and copy number variation (CNV) discovery in a single assay. Commercially available methods require a high investment to a specific region and are only cost-effective for large samples. Here, we introduce a novel, flexible approach for multiplexed targeted sequencing and CNV analysis of large genomic regions called multiplexed direct genomic selection (MDiGS). MDiGS combines biotinylated bacterial artificial chromosome (BAC) capture and multiplexed pooled capture for SNP/INDEL and CNV detection of 96 multiplexed samples on a single MiSeq run. MDiGS is advantageous over other methods for CNV detection because pooled sample capture and hybridization to large contiguous BAC baits reduces sample and probe hybridization variability inherent in other methods. We performed MDiGS capture for three chromosomal regions consisting of ∼ 550 kb of coding and non-coding sequence with DNA from 253 patients with congenital lower limb disorders. PITX1 nonsense and HOXC11 S191F missense mutations were identified that segregate in clubfoot families. Using a novel pooled-capture reference strategy, we identified recurrent chromosome chr17q23.1q23.2 duplications and small HOXC 5' cluster deletions (51 kb and 12 kb). Given the current interest in coding and non-coding variants in human disease, MDiGS fulfills a niche for comprehensive and low-cost evaluation of CNVs, coding, and non-coding variants across candidate regions of interest. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Hope, Trust, and Dreaming Big: Student-Athlete Identity and Athletic Divisional Reclassification
ERIC Educational Resources Information Center
Kissinger, Daniel B.; Newman, Richard E.; Miller, Michael T.
2015-01-01
Intercollegiate athletics are an indelible aspect of American higher education, and many collegiate athletes, particularly those at the Division I level, view their college careers as an extension of and springboard toward a professional sports career. This study is based on a series of semi-structured interviews with men's athletic administrators…
[Genetic polymorphism and forensic application of 30 InDel loci of Han population in Beijing].
Bai, Ru-Feng; Jiang, Li-Zhe; Zhang, Zhong; Shi, Mei-Sen
2013-12-01
To study the genetic diversities of 30 insertion-deletion (InDel) polymorphisms loci of Han population in Beijing, and to evaluate their forensic application, 210 unrelated healthy individuals of Han population in Beijing were investigated to determine the distributions of allele frequencies by using Investigator DIP system. The PCR products were detected with ABI 3130 XL Genetic Analyzer. Forensic parameters were calculated with relevant statistical analysis software. As a result, after the Bonferroni correction at a 95% significance level, there were no significant departures from Hardy-Weinberg equilibrium or significant linkage disequilibrium between the loci. The power of discrimination (DP) varies between 0.2690 (HLD118) and 0.6330 (HLD45), and the combined discrimination power (TDP) for the 30 InDel loci is 0.999999999985. The combined power of exclusion was 0.98771049 in trio cases (CPE(trio)) and 0.94579456 in duo cases (CPE(duo)). The parentage testing of 32 cases revealed no mutations happened to 30 InDel loci. Multiplex detection of the 30 InDel loci revealed a highly polymorphic genetic distribution in Beijing Han population, which represents a complementary tool in human identification studies, especially in challenging DNA cases.
40 CFR 63.11588 - What definitions apply to this subpart?
Code of Federal Regulations, 2013 CFR
2013-07-01
...-containing products or intermediates other than indelible ink, India ink, writing ink, and stamp pad ink. Indelible ink, India ink, writing ink, and stamp pad ink manufacturing operations are subject to regulation...
40 CFR 63.11588 - What definitions apply to this subpart?
Code of Federal Regulations, 2014 CFR
2014-07-01
...-containing products or intermediates other than indelible ink, India ink, writing ink, and stamp pad ink. Indelible ink, India ink, writing ink, and stamp pad ink manufacturing operations are subject to regulation...
Barrett, Angela N; Xiong, Li; Tan, Tuan Z; Advani, Henna V; Hua, Rui; Laureano-Asibal, Cecille; Soong, Richie; Biswas, Arijit; Nagarajan, Niranjan; Choolani, Mahesh
2017-01-01
Cell-free DNA from maternal plasma can be used for non-invasive prenatal testing for aneuploidies and single gene disorders, and also has applications as a biomarker for monitoring high-risk pregnancies, such as those at risk of pre-eclampsia. On average, the fractional cell-free fetal DNA concentration in plasma is approximately 15%, but can vary from less than 4% to greater than 30%. Although quantification of cell-free fetal DNA is straightforward in the case of a male fetus, there is no universal fetal marker; in a female fetus measurement is more challenging. We have developed a panel of multiplexed insertion/deletion polymorphisms that can measure fetal fraction in all pregnancies in a simple, targeted sequencing reaction. A multiplex panel of primers was designed for 35 indels plus a ZFX/ZFY amplicon. cfDNA was extracted from plasma from 157 pregnant women, and maternal genomic DNA was extracted for 20 of these samples for panel validation. Sixty-one samples from pregnancies with a male fetus were subjected to whole genome sequencing on the Ion Proton sequencing platform, and fetal fraction derived from Y chromosome counts was compared to fetal fraction measured using the indel panel. A total of 157 cell-free DNA samples were sequenced using the indel panel, and informativity was assessed, along with the proportion of fetal DNA. Using gDNA we optimised the indel panel, removing amplicons giving rise to PCR bias. Good correlation was found between fetal fraction using indels and using whole genome sequencing of the Y chromosome (Spearmans r = 0.69). A median of 12 indels were informative per sample. The indel panel was informative in 157/157 cases (mean fetal fraction 14.4% (±0.58%)). Using our targeted next generation sequencing panel we can readily assess the fetal DNA percentage in male and female pregnancies.
Xiong, Li; Tan, Tuan Z.; Advani, Henna V.; Hua, Rui; Laureano-Asibal, Cecille; Soong, Richie; Biswas, Arijit; Nagarajan, Niranjan; Choolani, Mahesh
2017-01-01
Objective Cell-free DNA from maternal plasma can be used for non-invasive prenatal testing for aneuploidies and single gene disorders, and also has applications as a biomarker for monitoring high-risk pregnancies, such as those at risk of pre-eclampsia. On average, the fractional cell-free fetal DNA concentration in plasma is approximately 15%, but can vary from less than 4% to greater than 30%. Although quantification of cell-free fetal DNA is straightforward in the case of a male fetus, there is no universal fetal marker; in a female fetus measurement is more challenging. We have developed a panel of multiplexed insertion/deletion polymorphisms that can measure fetal fraction in all pregnancies in a simple, targeted sequencing reaction. Methods A multiplex panel of primers was designed for 35 indels plus a ZFX/ZFY amplicon. cfDNA was extracted from plasma from 157 pregnant women, and maternal genomic DNA was extracted for 20 of these samples for panel validation. Sixty-one samples from pregnancies with a male fetus were subjected to whole genome sequencing on the Ion Proton sequencing platform, and fetal fraction derived from Y chromosome counts was compared to fetal fraction measured using the indel panel. A total of 157 cell-free DNA samples were sequenced using the indel panel, and informativity was assessed, along with the proportion of fetal DNA. Results Using gDNA we optimised the indel panel, removing amplicons giving rise to PCR bias. Good correlation was found between fetal fraction using indels and using whole genome sequencing of the Y chromosome (Spearmans r = 0.69). A median of 12 indels were informative per sample. The indel panel was informative in 157/157 cases (mean fetal fraction 14.4% (±0.58%)). Conclusions Using our targeted next generation sequencing panel we can readily assess the fetal DNA percentage in male and female pregnancies. PMID:29084245
Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis
John, Sumi Elsa; Thareja, Gaurav; Hebbar, Prashantha; Behbehani, Kazem; Thanaraj, Thangavel Alphonse; Alsmadi, Osama
2014-01-01
Kuwaiti native population comprises three distinct genetic subgroups of Persian, “city-dwelling” Saudi Arabian tribe, and nomadic “tent-dwelling” Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious ‘novel’ variants lie in genes associated with autosomal recessive disorders characteristic of the region. PMID:26484159
On the inversion-indel distance
2013-01-01
Background The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be computed thanks to a pioneering approach of Hannenhalli and Pevzner in 1995. In 2000, El-Mabrouk extended the inversion model to allow the comparison of unichromosomal genomes with unequal contents, thus insertions and deletions of DNA segments besides inversions. However, an exact algorithm was presented only for the case in which we have insertions alone and no deletion (or vice versa), while a heuristic was provided for the symmetric case, that allows both insertions and deletions and is called the inversion-indel distance. In 2005, Yancopoulos, Attie and Friedberg started a new branch of research by introducing the generic double cut and join (DCJ) operation, that can represent several genome rearrangements (including inversions). Among others, the DCJ model gave rise to two important results. First, it has been shown that the inversion distance can be computed in a simpler way with the help of the DCJ operation. Second, the DCJ operation originated the DCJ-indel distance, that allows the comparison of genomes with unequal contents, considering DCJ, insertions and deletions, and can be computed in linear time. Results In the present work we put these two results together to solve an open problem, showing that, when the graph that represents the relation between the two compared genomes has no bad components, the inversion-indel distance is equal to the DCJ-indel distance. We also give a lower and an upper bound for the inversion-indel distance in the presence of bad components. PMID:24564182
Etard, Christelle; Joshi, Swarnima; Stegmaier, Johannes; Mikut, Ralf; Strähle, Uwe
2017-12-01
A bottleneck in CRISPR/Cas9 genome editing is variable efficiencies of in silico-designed gRNAs. We evaluated the sensitivity of the TIDE method (Tracking of Indels by DEcomposition) introduced by Brinkman et al. in 2014 for assessing the cutting efficiencies of gRNAs in zebrafish. We show that this simple method, which involves bulk polymerase chain reaction amplification and Sanger sequencing, is highly effective in tracking well-performing gRNAs in pools of genomic DNA derived from injected embryos. The method is equally effective for tracing INDELs in heterozygotes.
Santurtún, Ana; Riancho, José A; Arozamena, Jana; López-Duarte, Mónica; Zarrabeitia, María T
2017-01-01
Several methods have been developed to determinate genetic profiles from a mixed samples and chimerism analysis in transplanted patients. The aim of this study was to explore the effectiveness of using the droplet digital PCR (ddPCR) for mixed chimerism detection (a mixture of genetic profiles resulting after allogeneic hematopoietic stem cell transplantation (HSCT)). We analyzed 25 DNA samples from patients who had undergone HSCT and compared the performance of ddPCR and two established methods for chimerism detection, based upon the Indel and STRs analysis, respectively. Additionally, eight artificial mixture DNA samples were created to evaluate the sensibility of ddPCR. Our results show that the chimerism percentages estimated by the analysis of a single Indel using ddPCR were very similar to those calculated by the amplification of 15 STRs (r 2 = 0.970) and with the results obtained by the amplification of 38 Indels (r 2 = 0.975). Moreover, the amplification of a single Indel by ddPCR was sensitive enough to detect a minor DNA contributor comprising down to 0.5 % of the sample. We conclude that ddPCR can be a powerful tool for the determination of a genetic profile of forensic mixtures and clinical chimerism analysis when traditional techniques are not sensitive enough.
Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.
Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth
2016-06-01
Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing
Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.
2015-01-01
The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379
Ragupathy, Raja; Naeem, Hamid A; Reimer, Elsa; Lukow, Odean M; Sapirstein, Harry D; Cloutier, Sylvie
2008-01-01
Sequencing of a BAC clone encompassing the Glu-B1 locus in Glenlea, revealed a 10.3 Kb segmental duplication including the Bx7 gene and flanking an LTR retroelement. To better understand the evolution of this locus, two collections of wheat were surveyed. The first consisted of 96 diploid and tetraploid species accessions while the second consisted of 316 Triticum aestivum cultivars and landraces from 41 countries. The genotypes were first characterized by SDS-PAGE and a total of 40 of the 316 T. aestivum accessions were found to display the overexpressed Bx7 phenotype (Bx7OE). Three lines from the 96 diploid/tetraploid collection also displayed the stronger intensity staining characteristic of the Bx7(OE) subunit. The relative amounts of the Bx7 subunit to total HMW-GS were quantified by RP-HPLC for all Bx7OE accessions and a number of checks. The entire collection was assessed for the presence of four DNA markers namely an 18 bp indel of the coding region of Bx7 variant alleles, a 43 bp indel of the 5'-region and the left and right junctions of the LTR retrotransposon borders and the duplicated segment. All 43 accessions found to have the Bx7OE subunit by SDS-PAGE and RP-HPLC produced the four diagnostic PCR amplicons. None of the lines without the Bx7OE had the LTR retroelement/duplication genomic structure. However, the 18 and 43 bp indel were found in accessions other than Bx7OE. These results indicate that the overexpression of the Bx7 HMW-GS is likely the result of a single event, i.e., a gene duplication at the Glu-B1 locus mediated by the insertion of a retroelement. Also, the 18 and 43 bp indels pre-date the duplication event. Allelic variants Bx7*, Bx7 with and without 43 bp insert and Bx7OE were found in both tetraploid and hexaploid collections and shared the same genomic organization. Though the possibility of introgression from T. aestivum to T. turgidum cannot be ruled out, the three structural genomic changes of the B-genome taken together support the hypothesis of multiple polyploidization events involving different tetraploid progenitors.
Garcia-Lor, Andres; Curk, Franck; Snoussi-Trifa, Hager; Morillon, Raphael; Ancillo, Gema; Luro, François; Navarro, Luis; Ollitrault, Patrick
2013-01-01
Background and Aims Despite differences in morphology, the genera representing ‘true citrus fruit trees’ are sexually compatible, and their phylogenetic relationships remain unclear. Most of the important commercial ‘species’ of Citrus are believed to be of interspecific origin. By studying polymorphisms of 27 nuclear genes, the average molecular differentiation between species was estimated and some phylogenetic relationships between ‘true citrus fruit trees’ were clarified. Methods Sanger sequencing of PCR-amplified fragments from 18 genes involved in metabolite biosynthesis pathways and nine putative genes for salt tolerance was performed for 45 genotypes of Citrus and relatives of Citrus to mine single nucleotide polymorphisms (SNPs) and indel polymorphisms. Fifty nuclear simple sequence repeats (SSRs) were also analysed. Key Results A total of 16 238 kb of DNA was sequenced for each genotype, and 1097 single nucleotide polymorphisms (SNPs) and 50 indels were identified. These polymorphisms were more valuable than SSRs for inter-taxon differentiation. Nuclear phylogenetic analysis revealed that Citrus reticulata and Fortunella form a cluster that is differentiated from the clade that includes three other basic taxa of cultivated citrus (C. maxima, C. medica and C. micrantha). These results confirm the taxonomic subdivision between the subgenera Metacitrus and Archicitrus. A few genes displayed positive selection patterns within or between species, but most of them displayed neutral patterns. The phylogenetic inheritance patterns of the analysed genes were inferred for commercial Citrus spp. Conclusions Numerous molecular polymorphisms (SNPs and indels), which are potentially useful for the analysis of interspecific genetic structures, have been identified. The nuclear phylogenetic network for Citrus and its sexually compatible relatives was consistent with the geographical origins of these genera. The positive selection observed for a few genes will help further works to analyse the molecular basis of the variability of the associated traits. This study presents new insights into the origin of C. sinensis. PMID:23104641
Garcia-Lor, Andres; Curk, Franck; Snoussi-Trifa, Hager; Morillon, Raphael; Ancillo, Gema; Luro, François; Navarro, Luis; Ollitrault, Patrick
2013-01-01
Despite differences in morphology, the genera representing 'true citrus fruit trees' are sexually compatible, and their phylogenetic relationships remain unclear. Most of the important commercial 'species' of Citrus are believed to be of interspecific origin. By studying polymorphisms of 27 nuclear genes, the average molecular differentiation between species was estimated and some phylogenetic relationships between 'true citrus fruit trees' were clarified. Sanger sequencing of PCR-amplified fragments from 18 genes involved in metabolite biosynthesis pathways and nine putative genes for salt tolerance was performed for 45 genotypes of Citrus and relatives of Citrus to mine single nucleotide polymorphisms (SNPs) and indel polymorphisms. Fifty nuclear simple sequence repeats (SSRs) were also analysed. A total of 16 238 kb of DNA was sequenced for each genotype, and 1097 single nucleotide polymorphisms (SNPs) and 50 indels were identified. These polymorphisms were more valuable than SSRs for inter-taxon differentiation. Nuclear phylogenetic analysis revealed that Citrus reticulata and Fortunella form a cluster that is differentiated from the clade that includes three other basic taxa of cultivated citrus (C. maxima, C. medica and C. micrantha). These results confirm the taxonomic subdivision between the subgenera Metacitrus and Archicitrus. A few genes displayed positive selection patterns within or between species, but most of them displayed neutral patterns. The phylogenetic inheritance patterns of the analysed genes were inferred for commercial Citrus spp. Numerous molecular polymorphisms (SNPs and indels), which are potentially useful for the analysis of interspecific genetic structures, have been identified. The nuclear phylogenetic network for Citrus and its sexually compatible relatives was consistent with the geographical origins of these genera. The positive selection observed for a few genes will help further works to analyse the molecular basis of the variability of the associated traits. This study presents new insights into the origin of C. sinensis.
Singh, Jaya; Mishra, Avshesh; Pandian, Arunachalam Jayamuruga; Mallipatna, Ashwin C.; Khetan, Vikas; Sripriya, S.; Kapoor, Suman; Agarwal, Smita; Sankaran, Satish; Katragadda, Shanmukh; Veeramachaneni, Vamsi; Hariharan, Ramesh; Subramanian, Kalyanasundaram
2016-01-01
Purpose Retinoblastoma (Rb) is the most common primary intraocular cancer of childhood and one of the major causes of blindness in children. India has the highest number of patients with Rb in the world. Mutations in the RB1 gene are the primary cause of Rb, and heterogeneous mutations are distributed throughout the entire length of the gene. Therefore, genetic testing requires screening of the entire gene, which by conventional sequencing is time consuming and expensive. Methods In this study, we screened the RB1 gene in the DNA isolated from blood or saliva samples of 50 unrelated patients with Rb using the TruSight Cancer panel. Next-generation sequencing (NGS) was done on the Illumina MiSeq platform. Genetic variations were identified using the Strand NGS software and interpreted using the StrandOmics platform. Results We were able to detect germline pathogenic mutations in 66% (33/50) of the cases, 12 of which were novel. We were able to detect all types of mutations, including missense, nonsense, splice site, indel, and structural variants. When we considered bilateral Rb cases only, the mutation detection rate increased to 100% (22/22). In unilateral Rb cases, the mutation detection rate was 30% (6/20). Conclusions Our study suggests that NGS-based approaches increase the sensitivity of mutation detection in the RB1 gene, making it fast and cost-effective compared to the conventional tests performed in a reflex-testing mode. PMID:27582626
Evolution of Protein Domain Repeats in Metazoa
Schüler, Andreas; Bornberg-Bauer, Erich
2016-01-01
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. Key Words: protein evolution, domain rearrangements, protein repeats, concerted evolution. PMID:27671125
Stavropoulos, Dimitri J; Merico, Daniele; Jobling, Rebekah; Bowdin, Sarah; Monfared, Nasim; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Pellecchia, Giovanna; Yuen, Ryan K C; Szego, Michael J; Hayeems, Robin Z; Shaul, Randi Zlotnik; Brudno, Michael; Girdea, Marta; Frey, Brendan; Alipanahi, Babak; Ahmed, Sohnee; Babul-Hirji, Riyana; Porras, Ramses Badilla; Carter, Melissa T; Chad, Lauren; Chaudhry, Ayeshah; Chitayat, David; Doust, Soghra Jougheh; Cytrynbaum, Cheryl; Dupuis, Lucie; Ejaz, Resham; Fishman, Leona; Guerin, Andrea; Hashemi, Bita; Helal, Mayada; Hewson, Stacy; Inbar-Feigenberg, Michal; Kannu, Peter; Karp, Natalya; Kim, Raymond H; Kronick, Jonathan; Liston, Eriskay; MacDonald, Heather; Mercimek-Mahmutoglu, Saadet; Mendoza-Londono, Roberto; Nasr, Enas; Nimmo, Graeme; Parkinson, Nicole; Quercia, Nada; Raiman, Julian; Roifman, Maian; Schulze, Andreas; Shugar, Andrea; Shuman, Cheryl; Sinajon, Pierre; Siriwardena, Komudi; Weksberg, Rosanna; Yoon, Grace; Carew, Chris; Erickson, Raith; Leach, Richard A; Klein, Robert; Ray, Peter N; Meyn, M Stephen; Scherer, Stephen W; Cohn, Ronald D; Marshall, Christian R
2016-01-01
The standard of care for first-tier clinical investigation of the aetiology of congenital malformations and neurodevelopmental disorders is chromosome microarray analysis (CMA) for copy-number variations (CNVs), often followed by gene(s)-specific sequencing searching for smaller insertion–deletions (indels) and single-nucleotide variant (SNV) mutations. Whole-genome sequencing (WGS) has the potential to capture all classes of genetic variation in one experiment; however, the diagnostic yield for mutation detection of WGS compared to CMA, and other tests, needs to be established. In a prospective study we utilised WGS and comprehensive medical annotation to assess 100 patients referred to a paediatric genetics service and compared the diagnostic yield versus standard genetic testing. WGS identified genetic variants meeting clinical diagnostic criteria in 34% of cases, representing a fourfold increase in diagnostic rate over CMA (8%; P value=1.42E−05) alone and more than twofold increase in CMA plus targeted gene sequencing (13%; P value=0.0009). WGS identified all rare clinically significant CNVs that were detected by CMA. In 26 patients, WGS revealed indel and missense mutations presenting in a dominant (63%) or a recessive (37%) manner. We found four subjects with mutations in at least two genes associated with distinct genetic disorders, including two cases harbouring a pathogenic CNV and SNV. When considering medically actionable secondary findings in addition to primary WGS findings, 38% of patients would benefit from genetic counselling. Clinical implementation of WGS as a primary test will provide a higher diagnostic yield than conventional genetic testing and potentially reduce the time required to reach a genetic diagnosis. PMID:28567303
Stavropoulos, Dimitri J; Merico, Daniele; Jobling, Rebekah; Bowdin, Sarah; Monfared, Nasim; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Pellecchia, Giovanna; Yuen, Ryan K C; Szego, Michael J; Hayeems, Robin Z; Shaul, Randi Zlotnik; Brudno, Michael; Girdea, Marta; Frey, Brendan; Alipanahi, Babak; Ahmed, Sohnee; Babul-Hirji, Riyana; Porras, Ramses Badilla; Carter, Melissa T; Chad, Lauren; Chaudhry, Ayeshah; Chitayat, David; Doust, Soghra Jougheh; Cytrynbaum, Cheryl; Dupuis, Lucie; Ejaz, Resham; Fishman, Leona; Guerin, Andrea; Hashemi, Bita; Helal, Mayada; Hewson, Stacy; Inbar-Feigenberg, Michal; Kannu, Peter; Karp, Natalya; Kim, Raymond; Kronick, Jonathan; Liston, Eriskay; MacDonald, Heather; Mercimek-Mahmutoglu, Saadet; Mendoza-Londono, Roberto; Nasr, Enas; Nimmo, Graeme; Parkinson, Nicole; Quercia, Nada; Raiman, Julian; Roifman, Maian; Schulze, Andreas; Shugar, Andrea; Shuman, Cheryl; Sinajon, Pierre; Siriwardena, Komudi; Weksberg, Rosanna; Yoon, Grace; Carew, Chris; Erickson, Raith; Leach, Richard A; Klein, Robert; Ray, Peter N; Meyn, M Stephen; Scherer, Stephen W; Cohn, Ronald D; Marshall, Christian R
2016-01-13
The standard of care for first-tier clinical investigation of the etiology of congenital malformations and neurodevelopmental disorders is chromosome microarray analysis (CMA) for copy number variations (CNVs), often followed by gene(s)-specific sequencing searching for smaller insertion-deletions (indels) and single nucleotide variant (SNV) mutations. Whole genome sequencing (WGS) has the potential to capture all classes of genetic variation in one experiment; however, the diagnostic yield for mutation detection of WGS compared to CMA, and other tests, needs to be established. In a prospective study we utilized WGS and comprehensive medical annotation to assess 100 patients referred to a paediatric genetics service and compared the diagnostic yield versus standard genetic testing. WGS identified genetic variants meeting clinical diagnostic criteria in 34% of cases, representing a 4-fold increase in diagnostic rate over CMA (8%) (p-value = 1.42e-05) alone and >2-fold increase in CMA plus targeted gene sequencing (13%) (p-value = 0.0009). WGS identified all rare clinically significant CNVs that were detected by CMA. In 26 patients, WGS revealed indel and missense mutations presenting in a dominant (63%) or a recessive (37%) manner. We found four subjects with mutations in at least two genes associated with distinct genetic disorders, including two cases harboring a pathogenic CNV and SNV. When considering medically actionable secondary findings in addition to primary WGS findings, 38% of patients would benefit from genetic counseling. Clinical implementation of WGS as a primary test will provide a higher diagnostic yield than conventional genetic testing and potentially reduce the time required to reach a genetic diagnosis.
2012-01-01
Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway. Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines characteristics of laboratory strains and wild-industrial strains. PMID:22448915
Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping
2012-01-01
Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in Gossypium is the maternal source of extant allotetraploid species and allotetraploids have a monophyletic origin. G. hirsutum AD1 lineages have experienced more sequence variations than other allotetraploids in intergenic regions. The available complete nucleotide sequences of 12 Gossypium chloroplast genomes should facilitate studies to uncover the molecular mechanisms of compartmental co-evolution and speciation of Gossypium allotetraploids. PMID:22876273
Mangrauthia, Satendra K; Malathi, P; Agarwal, Surekha; Ramkumar, G; Krishnaveni, D; Neeraja, C N; Madhav, M Sheshu; Ladhalakshmi, D; Balachandran, S M; Viraktamath, B C
2012-06-01
Rice tungro disease, one of the major constraints to rice production in South and Southeast Asia, is caused by a combination of two viruses: Rice tungro spherical virus (RTSV) and Rice tungro bacilliform virus (RTBV). The present study was undertaken to determine the genetic variation of RTSV population present in tungro endemic states of Indian subcontinent. Phylogenetic analysis based on coat protein sequences showed distinct divergence of Indian RTSV isolates into two groups; one consisted isolates from Hyderabad (Andhra Pradesh), Cuttack (Orissa), and Puducherry and another from West Bengal, Coimbatore (Tamil Nadu), and Kanyakumari (Tamil Nadu). The results obtained from phylogenetic study were further supported with the SNPs (single nucleotide polymorphism), INDELs (insertion and deletion) and evolutionary distance analysis. In addition, sequence difference count matrix revealed 2-68 nucleotides differences among all the Indian RTSV isolates taken in this study. However, at the protein level these differences were not significant as revealed by Ka/Ks ratio calculation. Sequence identity at nucleotide and amino acid level was 92-100% and 97-100%, respectively, among Indian isolates of RTSV. Understanding of the population structure of RTSV from tungro endemic regions of India would potentially provide insights into the molecular diversification of this virus.
Mining sequence variations in representative polyploid sugarcane germplasm accessions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiping; Song, Jian; You, Qian
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
Mining sequence variations in representative polyploid sugarcane germplasm accessions
Yang, Xiping; Song, Jian; You, Qian; ...
2017-08-09
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
2013-01-01
Background Salmonella enterica serovar Typhimurium (or simply Typhimurium) is the most common serovar in both human infections and farm animals in Australia and many other countries. Typhimurium is a broad host range serovar but has also evolved into host-adapted variants (i.e. isolated from a particular host such as pigeons). Six Typhimurium strains of different phage types (defined by patterns of susceptibility to lysis by a set of bacteriophages) were analysed using Illumina high-throughput genome sequencing. Results Variations between strains were mainly due to single nucleotide polymorphisms (SNPs) with an average of 611 SNPs per strain, ranging from 391 SNPs to 922 SNPs. There were seven insertions/deletions (indels) involving whole or partial gene deletions, four inactivation events due to IS200 insertion and 15 pseudogenes due to early termination. Four of these inactivated or deleted genes may be virulence related. Nine prophage or prophage remnants were identified in the six strains. Gifsy-1, Gifsy-2 and the sopE2 and sspH2 phage remnants were present in all six genomes while Fels-1, Fels-2, ST64B, ST104 and CP4-57 were variably present. Four strains carried the 90-kb plasmid pSLT which contains several known virulence genes. However, two strains were found to lack the plasmid. In addition, one strain had a novel plasmid similar to Typhi strain CT18 plasmid pHCM2. Conclusion The genome data suggest that variations between strains were mainly due to accumulation of SNPs, some of which resulted in gene inactivation. Unique genetic elements that were common between host-adapted phage types were not found. This study advanced our understanding on the evolution and adaptation of Typhimurium at genomic level. PMID:24138507
Brunwasser-Meirom, Michal; Pollak, Yaroslav; Goldberg, Sarah; Levy, Lior; Atar, Orna; Amit, Roee
2016-01-01
We explore a model for ‘quenching-like' repression by studying synthetic bacterial enhancers, each characterized by a different binding site architecture. To do so, we take a three-pronged approach: first, we compute the probability that a protein-bound dsDNA molecule will loop. Second, we use hundreds of synthetic enhancers to test the model's predictions in bacteria. Finally, we verify the mechanism bioinformatically in native genomes. Here we show that excluded volume effects generated by DNA-bound proteins can generate substantial quenching. Moreover, the type and extent of the regulatory effect depend strongly on the relative arrangement of the binding sites. The implications of these results are that enhancers should be insensitive to 10–11 bp insertions or deletions (INDELs) and sensitive to 5–6 bp INDELs. We test this prediction on 61 σ54-regulated qrr genes from the Vibrio genus and confirm the tolerance of these enhancers' sequences to the DNA's helical repeat. PMID:26832446
Brunwasser-Meirom, Michal; Pollak, Yaroslav; Goldberg, Sarah; Levy, Lior; Atar, Orna; Amit, Roee
2016-02-02
We explore a model for 'quenching-like' repression by studying synthetic bacterial enhancers, each characterized by a different binding site architecture. To do so, we take a three-pronged approach: first, we compute the probability that a protein-bound dsDNA molecule will loop. Second, we use hundreds of synthetic enhancers to test the model's predictions in bacteria. Finally, we verify the mechanism bioinformatically in native genomes. Here we show that excluded volume effects generated by DNA-bound proteins can generate substantial quenching. Moreover, the type and extent of the regulatory effect depend strongly on the relative arrangement of the binding sites. The implications of these results are that enhancers should be insensitive to 10-11 bp insertions or deletions (INDELs) and sensitive to 5-6 bp INDELs. We test this prediction on 61 σ(54)-regulated qrr genes from the Vibrio genus and confirm the tolerance of these enhancers' sequences to the DNA's helical repeat.
Gehlot, Praveen; Singh, S K; Pathak, Rakesh
2012-09-01
Taxonomy of the fungus Pestalotiopsis based on morphological characters has been equivocal. Molecular characterization often Pestalotiopsis species was done based on nuclear ribosomal DNA internal transcribed spacer (ITS) amplifications. Results of the analyses showed that species of genus Pestalotiopsis are monophyletic. We report ITS length variations, single nucleotide polymorphisms (SNPs) and insertions/ deletions (INDELS) among ten species of Pestalotiopsis that did not cause any phylogenetic error at either genus or species designation levels. New gene sequences have been assigned (Gen Accession numbers from HM 190146 to HM 190155) by the National Centre for Biotechnology Information, USA.
Gogniashvili, Mari; Jinjikhadze, Tamar; Maisaia, Inesa; Akhalkatsi, Maia; Kotorashvili, Adam; Kotaria, Nato; Beridze, Tengiz; Dudnikov, Alexander Ju
2016-11-01
Hexaploid wheat (Triticum aestivum L., genomes AABBDD) originated in South Caucasus by allopolyploidization of the cultivated Emmer wheat T. dicoccum (genomes AABB) with the Caucasian Ae. tauschii ssp strangulata (genomes DD). Genetic variation of Ae. tauschii is an important natural resource, that is why it is of particular importance to investigate how this variation was formed during Ae. tauschii evolutionary history and how it is presented through the species area. The D genome is also found in tetraploid Ae. cylindrica Host (2n = 28, CCDD). The plasmon diversity that exists in Triticum and Aegilops species is of great significance for understanding the evolution of these genera. In the present investigation the complete nucleotide sequence of plasmon D (chloroplast DNA) of nine accessions of Ae. tauschii and two accessions of Ae. cylindrica are presented. Twenty-eight SNPs are characteristic for both TauL1 and TauL2 accessions of Ae. tauschii using TauL3 as a reference. Four SNPs are additionally observed for TauL2 lineage. The longest (27 bp) indel is located in the intergenic spacer Rps15-ndhF of SSC. This indel can be used for simple determination of TauL3 lineage among Ae. tauschii accessions. In the case of Ae. cylindrica additionally 7 SNPs were observed. The phylogeny tree shows that chloroplast DNA of TauL1 and TauL2 diverged from the TauL3 lineage. TauL1 lineage is relatively older then TauL2. The position of Ae. cylindrica accessions on Ae. tauschii phylogeny tree constructed on chloroplast DNA variation data is intermediate between TauL1 and TauL2. The complete nucleotide sequence of chloroplast DNA of Ae. tauschii and Ae. cylindrica allows to refine the origin and evolution of D plasmon of genus Aegilops.
Although exome sequencing data are generated primarily to detect single-nucleotide variants and indels, they can also be used to identify a subset of genomic rearrangements whose breakpoints are located in or near exons. Using >4,600 tumor and normal pairs across 15 cancer types, we identified over 9,000 high confidence somatic rearrangements, including a large number of gene fusions.
Maw, Aye Aye; Shimogiri, Takeshi; Riztyan; Kawabe, Kotaro; Kawamoto, Yasuhiro; Okamoto, Shin
2012-01-01
The efficiency of insertion and/or deletion (indels) polymorphisms as genetic markers was evaluated by genotyping 102 indels loci in native chicken populations from Myanmar and Indonesia as well as Red jungle fowls and Green jungle fowls from Java Island. Out of the 102 indel markers, 97 were polymorphic. The average observed and expected heterozygosities were 0.206 to 0.268 and 0.229 to 0.284 in native chicken populations and 0.003 to 0.101 and 0.012 to 0.078 in jungle fowl populations. The coefficients of genetic differentiation (Gst) of the native chicken populations from Myanmar and Indonesia were 0.041 and 0.098 respectively. The genetic variability is higher among native chicken populations than jungle fowl populations. The high Gst value was found between native chicken populations and jungle fowl populations. Neighbor-joining tree using genetic distance revealed that the native chickens from two countries were genetically close to each other and remote from Red and Green jungle fowls of Java Island. PMID:25049646
Maw, Aye Aye; Shimogiri, Takeshi; Riztyan; Kawabe, Kotaro; Kawamoto, Yasuhiro; Okamoto, Shin
2012-07-01
The efficiency of insertion and/or deletion (indels) polymorphisms as genetic markers was evaluated by genotyping 102 indels loci in native chicken populations from Myanmar and Indonesia as well as Red jungle fowls and Green jungle fowls from Java Island. Out of the 102 indel markers, 97 were polymorphic. The average observed and expected heterozygosities were 0.206 to 0.268 and 0.229 to 0.284 in native chicken populations and 0.003 to 0.101 and 0.012 to 0.078 in jungle fowl populations. The coefficients of genetic differentiation (Gst) of the native chicken populations from Myanmar and Indonesia were 0.041 and 0.098 respectively. The genetic variability is higher among native chicken populations than jungle fowl populations. The high Gst value was found between native chicken populations and jungle fowl populations. Neighbor-joining tree using genetic distance revealed that the native chickens from two countries were genetically close to each other and remote from Red and Green jungle fowls of Java Island.
Strandbygaard, Bertel; Lavazza, Antonio; Lelli, Davide; Blanchard, Yannick; Grasland, Béatrice; Poder, Sophie Le; Rose, Nicolas; Steinbach, Falko; van der Poel, Wim H M; Widén, Frederik; Belsham, Graham J; Bøtner, Anette
2016-12-25
Porcine epidemic diarrhea virus (PEDV) has caused extensive economic losses to pig producers in many countries. It was recently introduced, for the first time, into North America and outbreaks have occurred again in multiple countries within Europe as well. To assess the properties of various diagnostic assays for the detection of PEDV infection, multiple panels of porcine sera have been shared and tested for the presence of antibodies against PEDV in an inter-laboratory ring trial. Different laboratories have used a variety of "in house" ELISAs and also one commercial assay. The sensitivity and specificity of each assay has been estimated using a Bayesian analysis applied to the ring trial results obtained with the different assays in the absence of a gold standard. Although different characteristics were found, it can be concluded that each of the assays used can detect infection of pigs at a herd level by either the early European strains of PEDV or the recently circulating strains (INDEL and non-INDEL). However, not all the assays seem suitable for demonstrating freedom from disease in a country. The results from individual animals, especially when the infection has occurred within an experimental situation, show more variation. Copyright © 2016. Published by Elsevier B.V.
Identification of single nucleotide polymorphism in ginger using expressed sequence tags
Chandrasekar, Arumugam; Riju, Aikkal; Sithara, Kandiyl; Anoop, Sahadevan; Eapen, Santhosh J
2009-01-01
Ginger (Zingiber officinale Rosc) (Family: Zingiberaceae) is a herbaceous perennial, the rhizomes of which are used as a spice. Ginger is a plant which is well known for its medicinal applications. Recently EST-derived SNPs are a free by-product of the currently expanding EST (Expressed Sequence Tag) databases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion/deletion) has led to a revolution in their use as molecular markers. Available (38139) Ginger EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script AutoSNP version 1.0 which has used 31905 ESTs for detecting SNPs and Indel sites. We found 64026 SNP sites and 7034 indel polymorphisms with frequency of 0.84 SNPs / 100 bp. Among the three tissues from which the EST libraries had been generated, Rhizomes had high frequency of 1.08 SNPs/indels per 100 bp whereas the leaves had lowest frequency of 0.63 per 100 bp and root is showing relative frequency 0.82/100bp. Transitions and transversion ratio is 0.90. In overall detected SNP, transversion is high when compare to transition. These detected SNPs can be used as markers for genetic studies. Availability The results of the present study hosted in our webserver www.spices.res.in/spicesnip PMID:20198184
Hefke, Gwynneth; Davison, Sean; D'Amato, Maria Eugenia
2015-12-01
The utilization of binary markers in human individual identification is gaining ground in forensic genetics. We analyzed the polymorphisms from the first commercial indel kit Investigator DIPplex (Qiagen) in 512 individuals from Afrikaner, Indian, admixed Cape Colored, and the native Bantu Xhosa and Zulu origin in South Africa and evaluated forensic and population genetics parameters for their forensic application in South Africa. The levels of genetic diversity in population and forensic parameters in South Africa are similar to other published data, with lower diversity values for the native Bantu. Departures from Hardy-Weinberg expectations were observed in HLD97 in Indians, Admixed and Bantus, along with 6.83% null homozygotes in the Bantu populations. Sequencing of the flanking regions showed a previously reported transition G>A in rs17245568. Strong population structure was detected with Fst, AMOVA, and the Bayesian unsupervised clustering method in STRUCTURE. Therefore we evaluated the efficiency of individual assignments to population groups using the ancestral membership proportions from STRUCTURE and the Bayesian classification algorithm in Snipper App Suite. Both methods showed low cross-assignment error (0-4%) between Bantus and either Afrikaners or Indians. The differentiation between populations seems to be driven by four loci under positive selection pressure. Based on these results, we draw recommendations for the application of this kit in SA. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Udagawa, Chihiro; Tada, Naomi; Asano, Junzo; Ishioka, Katsumi; Ochiai, Kazuhiko; Bonkobara, Makoto; Tsuchida, Shuichi; Omi, Toshinori
2014-12-11
The uncoupling proteins (UCPs) in the mitochondrial inner membrane are members of the mitochondrial anion carrier protein family that play an important role in energy homeostasis. Genetic association studies have shown that human UCP2 and UCP3 variants (SNPs and indels) are associated with obesity, insulin resistance, type 2 diabetes mellitus, and metabolic syndrome. The aim of this study was to examine the genetic association between polymorphisms in UCP2 and UCP3 and metabolic data in dogs. We identified 10 SNPs (9 intronic and 1 exonic) and 4 indels (intronic) in UCP2, and 13 SNPs (11 intronic and 2 exonic) and one indel (exonic) in UCP3, by DNA sequence analysis of 11 different dog breeds (n=119). An association study between these UCP2 and UCP3 variants and the biochemical parameters of glucose, total cholesterol, lactate dehydrogenase and triglyceride in Labrador Retrievers (n=50) showed that none of the UCP2 polymorphisms were significantly associated with the levels of these parameters. However, four UCP3 SNPs (intron 1) were significantly associated with total cholesterol levels. In addition, the allele frequencies of two of the four SNPs associated with higher total cholesterol levels in a breed that is susceptible to hypercholesterolemia (Shetland Sheepdogs, n=30), compared with the control breed (Shiba, n=30). The results obtained from a limited number of individuals suggest that the UCP3 gene in dogs may be associated with total cholesterol levels. The examination of larger sample sizes and further analysis will lead to increased precision of these results.
Vidal Arboleda, Juana L; Ortiz Roman, Luisa F; Olivera Angel, Martha
2017-12-22
Brucella canis is a facultative intracellular pathogen responsible for canine brucellosis, a zoonotic disease that affects canines, causing abortions and reproductive failure; and the production of non-specific symptoms in humans. In 2005 the presence of B. canis in Antioquia was demonstrated and the strains were identified as type 2. The sequencing of the genome of a field strain denoted Brucella canis str. Oliveri, showed species-specific indel events, which led us to investigate the genomic characteristics of the B. canis strain isolated and to establish the phylogenetic relationships and the divergence time of B. canis str. Oliveri. Conventional PCR sequencing was performed in 30 field strains identifying 5 indel events recognized in B. canis str. Oliveri. ADN from Brucella suis, Brucella melitensis and vaccine strains from Brucella abortus were used as control, and it was determined that all of the studied field strains shared 4 out of the 5 indels of the sequenced Oliveri strain, indicating the presence of more than one strain circulating in the region. Phylogenetic analysis was performed with 24 strains of Brucella using concatenated sequences of genetic markers for species differentiation. The molecular clock hypothesis and Tajima's relative rate test were tested, showing that the Oliveri strain, similarly to other canis species, diverged from B. suis. The molecular clock hypothesis between Brucella species was rejected and an evolution rate and a similar genetic distance between the B. canis were demonstrated. Copyright © 2017 Asociación Argentina de Microbiología. Publicado por Elsevier España, S.L.U. All rights reserved.
Tan, Shu; Cheng, Jiao-Wen; Zhang, Li; Qin, Cheng; Nong, Ding-Guo; Li, Wei-Peng; Tang, Xin; Wu, Zhi-Ming; Hu, Kai-Lin
2015-01-01
Re-sequencing permits the mining of genome-wide variations on a large scale and provides excellent resources for the research community. To accelerate the development and application of molecular markers and identify the QTLs affecting the flowering time-related trait in pepper, a total of 1,038 pairs of InDel and 674 SSR primers from different sources were used for genetic mapping using the F2 population (n = 154) derived from a cross between BA3 (C. annuum) and YNXML (C. frutescens). Of these, a total of 224 simple PCR-based markers, including 129 InDels and 95 SSRs, were validated and integrated into a map, which was designated as the BY map. The BY map consisted of 13 linkage groups (LGs) and spanned a total genetic distance of 1,249.77 cM with an average marker distance of 5.60 cM. Comparative analysis of the genetic and physical map based on the anchored markers showed that the BY map covered nearly the whole pepper genome. Based on the BY map, one major and five minor QTLs affecting the number of leaves on the primary axis (Nle) were detected on chromosomes P2, P7, P10 and P11 in 2012. The major QTL on P2 was confirmed based on another subset of the same F2 population (n = 147) in 2014 with selective genotyping of markers from the BY map. With the accomplishment of pepper whole genome sequencing and annotations (release 2.0), 153 candidate genes were predicted to embed in the Nle2.2 region, of which 12 important flowering related genes were obtained. The InDel/SSR-based interspecific genetic map, QTLs and candidate genes obtained by the present study will be useful for the downstream isolation of flowering time-related gene and other genetic applications for pepper.
Ho, Sherry Sze Yee; Barrett, Angela; Thadani, Henna; Asibal, Cecille Laureano; Koay, Evelyn Siew-Chuan; Choolani, Mahesh
2015-07-01
Prenatal diagnosis of sex-linked disorders requires invasive procedures, carrying a risk of miscarriage of up to 1%. Cell-free fetal DNA (cffDNA) present in cell-free DNA (cfDNA) from maternal plasma offers a non-invasive source of fetal genetic material for analysis. Detection of Y-chromosome sequences in cfDNA indicates presence of a male fetus; in the absence of a Y-chromosome signal a female fetus is inferred. We aimed to validate the clinical utility of insertion-deletion polymorphisms (INDELs) to confirm presence of a female fetus using cffDNA. Quantitative real-time PCR (qPCR) for the Y-chromosome-specific sequence, SRY, was performed on cfDNA from 82 samples at 6-39 gestational weeks. In samples without detectable SRY, qPCRs for eight INDELs were performed on maternal genomic DNA and cfDNA. Detection of paternally inherited fetal alleles in cfDNA negative for SRY confirmed a female fetus. Fetal sex was correctly determined in 77/82 (93.9%) cfDNA samples. SRY was detected in all 39 samples from male-bearing pregnancies, and none of the 43 female-bearing pregnancies (sensitivity and specificity of SRY qPCR is therefore 100%; 95% CI 91%-100%). Paternally inherited fetal alleles were detected in 38/43 samples with no SRY signal, confirming the presence of a female fetus (INDEL assay sensitivity is therefore 88.4%; 95% CI 74.1%-95.6%). Since paternally inherited fetal INDELs were not used in women bearing male fetuses, the specificity of INDELs cannot be calculated. Five cfDNA samples were negative for both SRY and INDELS. We have validated a non-invasive prenatal test to confirm fetal sex as early as 6 gestational weeks using cffDNA from maternal plasma.
Khattak, Shahryar; Schuez, Maritta; Richter, Tobias; Knapp, Dunja; Haigo, Saori L.; Sandoval-Guzmán, Tatiana; Hradlikova, Kristyna; Duemmler, Annett; Kerney, Ryan; Tanaka, Elly M.
2013-01-01
The salamander is the only tetrapod that regenerates complex body structures throughout life. Deciphering the underlying molecular processes of regeneration is fundamental for regenerative medicine and developmental biology, but the model organism had limited tools for molecular analysis. We describe a comprehensive set of germline transgenic strains in the laboratory-bred salamander Ambystoma mexicanum (axolotl) that open up the cellular and molecular genetic dissection of regeneration. We demonstrate tissue-dependent control of gene expression in nerve, Schwann cells, oligodendrocytes, muscle, epidermis, and cartilage. Furthermore, we demonstrate the use of tamoxifen-induced Cre/loxP-mediated recombination to indelibly mark different cell types. Finally, we inducibly overexpress the cell-cycle inhibitor p16INK4a, which negatively regulates spinal cord regeneration. These tissue-specific germline axolotl lines and tightly inducible Cre drivers and LoxP reporter lines render this classical regeneration model molecularly accessible. PMID:24052945
Phylogenetic origin of limes and lemons revealed by cytoplasmic and nuclear markers.
Curk, Franck; Ollitrault, Frédérique; Garcia-Lor, Andres; Luro, François; Navarro, Luis; Ollitrault, Patrick
2016-04-01
The origin of limes and lemons has been a source of conflicting taxonomic opinions. Biochemical studies, numerical taxonomy and recent molecular studies suggested that cultivated Citrus species result from interspecific hybridization between four basic taxa (C. reticulata,C. maxima,C. medica and C. micrantha). However, the origin of most lemons and limes remains controversial or unknown. The aim of this study was to perform extended analyses of the diversity, genetic structure and origin of limes and lemons. The study was based on 133 Citrus accessions. It combined maternal phylogeny studies based on mitochondrial and chloroplastic markers, and nuclear structure analysis based on the evaluation of ploidy level and the use of 123 markers, including 73 basic taxa diagnostic single nucleotide polymorphism (SNP) and indel markers. The lime and lemon horticultural group appears to be highly polymorphic, with diploid, triploid and tetraploid varieties, and to result from many independent reticulation events which defined the sub-groups. Maternal phylogeny involves four cytoplasmic types out of the six encountered in the Citrus genus. All lime and lemon accessions were highly heterozygous, with interspecific admixture of two, three and even the four ancestral taxa genomes. Molecular polymorphism between varieties of the same sub-group was very low. Citrus medica contributed to all limes and lemons and was the direct male parent for the main sub-groups in combination with C. micrantha or close papeda species (for C. aurata, C. excelsa, C. macrophylla and C. aurantifolia--'Mexican' lime types of Tanaka's taxa), C. reticulata(for C. limonia, C. karna and C. jambhiri varieties of Tanaka's taxa, including popular citrus rootstocks such as 'Rangpur' lime, 'Volkamer' and 'Rough' lemons), C. aurantium (for C. limetta and C. limon--yellow lemon types--varieties of Tanaka's taxa) or the C. maxima × C. reticulate hybrid (for C. limettioides--'Palestine sweet' lime types--and C. meyeri). Among triploid limes, C. latifolia accessions ('Tahiti' and 'Persian' lime types) result from the fertilization of a haploid ovule of C. limon by a diploid gamete of C. aurantifolia, while C. aurantifolia triploid accessions ('Tanepao' lime types and 'Madagascar' lemon) probably result from an interspecific backcross (a diploid ovule of C. aurantifolia fertilized by C. medica). As limes and lemons were vegetatively propagated (apomixis, horticultural practices) the intra-sub-group phenotypic diversity results from asexual variations. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Phylogenetic origin of limes and lemons revealed by cytoplasmic and nuclear markers
Curk, Franck; Ollitrault, Frédérique; Garcia-Lor, Andres; Luro, François; Navarro, Luis; Ollitrault, Patrick
2016-01-01
Background and Aims The origin of limes and lemons has been a source of conflicting taxonomic opinions. Biochemical studies, numerical taxonomy and recent molecular studies suggested that cultivated Citrus species result from interspecific hybridization between four basic taxa (C. reticulata, C. maxima, C. medica and C. micrantha). However, the origin of most lemons and limes remains controversial or unknown. The aim of this study was to perform extended analyses of the diversity, genetic structure and origin of limes and lemons. Methods The study was based on 133 Citrus accessions. It combined maternal phylogeny studies based on mitochondrial and chloroplastic markers, and nuclear structure analysis based on the evaluation of ploidy level and the use of 123 markers, including 73 basic taxa diagnostic single nucleotide polymorphism (SNP) and indel markers. Key Results The lime and lemon horticultural group appears to be highly polymorphic, with diploid, triploid and tetraploid varieties, and to result from many independent reticulation events which defined the sub-groups. Maternal phylogeny involves four cytoplasmic types out of the six encountered in the Citrus genus. All lime and lemon accessions were highly heterozygous, with interspecific admixture of two, three and even the four ancestral taxa genomes. Molecular polymorphism between varieties of the same sub-group was very low. Conclusions Citrus medica contributed to all limes and lemons and was the direct male parent for the main sub-groups in combination with C. micrantha or close papeda species (for C. aurata, C. excelsa, C. macrophylla and C. aurantifolia – ‘Mexican’ lime types of Tanaka’s taxa), C. reticulata (for C. limonia, C. karna and C. jambhiri varieties of Tanaka’s taxa, including popular citrus rootstocks such as ‘Rangpur’ lime, ‘Volkamer’ and ‘Rough’ lemons), C. aurantium (for C. limetta and C. limon – yellow lemon types – varieties of Tanaka’s taxa) or the C. maxima × C. reticulata hybrid (for C. limettioides – ‘Palestine sweet’ lime types – and C. meyeri). Among triploid limes, C. latifolia accessions (‘Tahiti’ and ‘Persian’ lime types) result from the fertilization of a haploid ovule of C. limon by a diploid gamete of C. aurantifolia, while C. aurantifolia triploid accessions (‘Tanepao’ lime types and ‘Madagascar’ lemon) probably result from an interspecific backcross (a diploid ovule of C. aurantifolia fertilized by C. medica). As limes and lemons were vegetatively propagated (apomixis, horticultural practices) the intra-sub-group phenotypic diversity results from asexual variations. PMID:26944784
Soheili, Fariborz; Jalili, Zahra; Rahbar, Mahtab; Khatooni, Zahed; Mashayekhi, Amir; Jafari, Hossein
2018-03-01
The mutations in GATA4 gene induce inherited atrial and ventricular septation defects, which is the most frequent forms of congenital heart defects (CHDs) constituting about half of all cases. We have performed High resolution melting (HRM) mutation scanning of GATA4 coding exons of nonsyndrome 100 patients as a case group including 39 atrial septal defects (ASD), 57 ventricular septal defects (VSD) and four patients with both above defects and 50 healthy individuals as a control group. Our samples are categorized according to their HRM graph. The genome sequencing has been done for 15 control samples and 25 samples of patients whose HRM analysis were similar to healthy subjects for each exon. The PolyPhen-2 and MUpro have been used to determine the causative possibility and structural stability prediction of GATA4 sequence variation. The HRM curve analysis exhibit that 21 patients and 3 normal samples have deviated curves for GATA4 coding exons. Sequencing analysis has revealed 12 nonsynonymous mutations while all of them resulted in stability structure of protein 10 of them are pathogenic and 2 of them are benign. Also we found two nucleotide deletions which one of them was novel and one new indel mutation resulting in frame shift mutation, and 4 synonymous variations or polymorphism in 6 of patients and 3 of normal individuals. Six or about 50% of these nonsynonymous mutations have not been previously reported. Our results show that there is a spectrum of GATA4 mutations resulting in septal defects. © 2018 Wiley Periodicals, Inc.
Hu, Peinan; Zhao, Xueying; Zhang, Qinghua; Li, Weiming; Zu, Yao
2018-01-01
The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has been proven to be an efficient and precise genome editing technology in various organisms. However, the gene editing efficiencies of Cas9 proteins with a nuclear localization signal (NLS) fused to different termini and Cas9 mRNA have not been systematically compared. Here, we compared the ability of Cas9 proteins with NLS fused to the N-, C-, or both the N- and C-termini and N-NLS-Cas9-NLS-C mRNA to target two sites in the tyr gene and two sites in the gol gene related to pigmentation in zebrafish. Phenotypic analysis revealed that all types of Cas9 led to hypopigmentation in similar proportions of injected embryos. Genome analysis by T7 Endonuclease I (T7E1) assays demonstrated that all types of Cas9 similarly induced mutagenesis in four target sites. Sequencing results further confirmed that a high frequency of indels occurred in the target sites (tyr1 > 66%, tyr2 > 73%, gol1 > 50%, and gol2 > 35%), as well as various types (more than six) of indel mutations observed in all four types of Cas9-injected embryos. Furthermore, all types of Cas9 showed efficient targeted mutagenesis on multiplex genome editing, resulting in multiple phenotypes simultaneously. Collectively, we conclude that various NLS-fused Cas9 proteins and Cas9 mRNAs have similar genome editing efficiencies on targeting single or multiple genes, suggesting that the efficiency of CRISPR/Cas9 genome editing is highly dependent on guide RNAs (gRNAs) and gene loci. These findings may help to simplify the selection of Cas9 for gene editing using the CRISPR/Cas9 system. PMID:29295818
Brunelle, Brian W; Greenlee, Justin J; Seabury, Christopher M; Brown, Charles E; Nicholson, Eric M
2008-01-01
Background Transmissible spongiform encephalopathies (TSEs) are neurodegenerative diseases that affect several mammalian species. At least three factors related to the host prion protein are known to modulate susceptibility or resistance to a TSE: amino acid sequence, atypical number of octapeptide repeats, and expression level. These factors have been extensively studied in breeds of Bos taurus cattle in relation to classical bovine spongiform encephalopathy (BSE). However, little is currently known about these factors in Bos indicus purebred or B. indicus × B. taurus composite cattle. The goal of our study was to establish the frequency of markers associated with enhanced susceptibility or resistance to classical BSE in B. indicus purebred and composite cattle. Results No novel or TSE-associated PRNP-encoded amino acid polymorphisms were observed for B. indicus purebred and composite cattle, and all had the typical number of octapeptide repeats. However, differences were observed in the frequencies of the 23-bp and 12-bp insertion/deletion (indel) polymorphisms associated with two bovine PRNP transcription regulatory sites. Compared to B. taurus, B. indicus purebred and composite cattle had a significantly lower frequency of 23-bp insertion alleles and homozygous genotypes. Conversely, B. indicus purebred cattle had a significantly higher frequency of 12-bp insertion alleles and homozygous genotypes in relation to both B. taurus and composite cattle. The origin of these disparities can be attributed to a significantly different haplotype structure within each species. Conclusion The frequencies of the 23-bp and 12-bp indels were significantly different between B. indicus and B. taurus cattle. No other known or potential risk factors were detected for the B. indicus purebred and composite cattle. To date, no consensus exists regarding which bovine PRNP indel region is more influential with respect to classical BSE. Should one particular indel region and associated genotypes prove more influential with respect to the incidence of classical BSE, differences regarding overall susceptibility and resistance for B. indicus and B. taurus cattle may be elucidated. PMID:18808703
Re-sequencing and genetic variation identification of a rice line with ideal plant architecture.
Li, Shuangcheng; Xie, Kailong; Li, Wenbo; Zou, Ting; Ren, Yun; Wang, Shiquan; Deng, Qiming; Zheng, Aiping; Zhu, Jun; Liu, Huainian; Wang, Lingxia; Ai, Peng; Gao, Fengyan; Huang, Bin; Cao, Xuemei; Li, Ping
2012-12-01
The ideal plant architecture (IPA) includes several important characteristics such as low tiller numbers, few or no unproductive tillers, more grains per panicle, and thick and sturdy stems. We have developed an indica restorer line 7302R that displays the IPA phenotype in terms of tiller number, grain number, and stem strength. However, its mechanism had to be clarified. We performed re-sequencing and genome-wide variation analysis of 7302R using the Solexa sequencing technology. With the genomic sequence of the indica cultivar 9311 as reference, 307 627 SNPs, 57 372 InDels, and 3 096 SVs were identified in the 7302R genome. The 7302R-specific variations were investigated via the synteny analysis of all the SNPs of 7302R with those of the previous sequenced none-IPA-type lines IR24, MH63, and SH527. Moreover, we found 178 168 7302R-specific SNPs across the whole genome and 30 239 SNPs in the predicted mRNA regions, among which 8 517 were Non-syn CDS. In addition, 263 large-effect SNPs that were expected to affect the integrity of encoded proteins were identified from the 7302R-specific SNPs. SNPs of several important previously cloned rice genes were also identified by aligning the 7302R sequence with other sequence lines. Our results provided several candidates account for the IPA phenotype of 7302R. These results therefore lay the groundwork for long-term efforts to uncover important genes and alleles for rice plant architecture construction, also offer useful data resources for future genetic and genomic studies in rice.
Barik, Saumya Ranjan; Sahoo, Ambika; Mohapatra, Sudipti; Nayak, Deepak Kumar; Mahender, Anumalla; Meher, Jitandriya; Anandan, Annamalai
2016-01-01
Rice exhibits enormous genetic diversity, population structure and molecular marker-traits associated with abiotic stress tolerance to high temperature stress. A set of breeding lines and landraces representing 240 germplasm lines were studied. Based on spikelet fertility percent under high temperature, tolerant genotypes were broadly classified into four classes. Genetic diversity indicated a moderate level of genetic base of the population for the trait studied. Wright’s F statistic estimates showed a deviation of Hardy-Weinberg expectation in the population. The analysis of molecular variance revealed 25 percent variation between population, 61 percent among individuals and 14 percent within individuals in the set. The STRUCTURE analysis categorized the entire population into three sub-populations and suggested that most of the landraces in each sub-population had a common primary ancestor with few admix individuals. The composition of materials in the panel showed the presence of many QTLs representing the entire genome for the expression of tolerance. The strongly associated marker RM547 tagged with spikelet fertility under stress and the markers like RM228, RM205, RM247, RM242, INDEL3 and RM314 indirectly controlling the high temperature stress tolerance were detected through both mixed linear model and general linear model TASSEL analysis. These markers can be deployed as a resource for marker-assisted breeding program of high temperature stress tolerance. PMID:27494320
Pradhan, Sharat Kumar; Barik, Saumya Ranjan; Sahoo, Ambika; Mohapatra, Sudipti; Nayak, Deepak Kumar; Mahender, Anumalla; Meher, Jitandriya; Anandan, Annamalai; Pandit, Elssa
2016-01-01
Rice exhibits enormous genetic diversity, population structure and molecular marker-traits associated with abiotic stress tolerance to high temperature stress. A set of breeding lines and landraces representing 240 germplasm lines were studied. Based on spikelet fertility percent under high temperature, tolerant genotypes were broadly classified into four classes. Genetic diversity indicated a moderate level of genetic base of the population for the trait studied. Wright's F statistic estimates showed a deviation of Hardy-Weinberg expectation in the population. The analysis of molecular variance revealed 25 percent variation between population, 61 percent among individuals and 14 percent within individuals in the set. The STRUCTURE analysis categorized the entire population into three sub-populations and suggested that most of the landraces in each sub-population had a common primary ancestor with few admix individuals. The composition of materials in the panel showed the presence of many QTLs representing the entire genome for the expression of tolerance. The strongly associated marker RM547 tagged with spikelet fertility under stress and the markers like RM228, RM205, RM247, RM242, INDEL3 and RM314 indirectly controlling the high temperature stress tolerance were detected through both mixed linear model and general linear model TASSEL analysis. These markers can be deployed as a resource for marker-assisted breeding program of high temperature stress tolerance.
A small indel mutation in an anthocyanin transporter causes variegated colouration of peach flowers.
Cheng, Jun; Liao, Liao; Zhou, Hui; Gu, Chao; Wang, Lu; Han, Yuepeng
2015-12-01
The ornamental peach cultivar 'Hongbaihuatao (HBH)' can simultaneously bear pink, red, and variegated flowers on a single tree. Anthocyanin content in pink flowers is extremely low, being only 10% that of a red flower. Surprisingly, the expression of anthocyanin structural and potential regulatory genes in white flowers was not significantly lower than that in both pink and red flowers. However, proteomic analysis revealed a GST encoded by a gene-regulator involved in anthocyanin transport (Riant)-which is expressed in the red flower, but almost undetectable in the variegated flower. The Riant gene contains an insertion-deletion (indel) polymorphism in exon 3. In white flowers, the Riant gene is interrupted by a 2-bp insertion in the last exon, which causes a frameshift and a premature stop codon. In contrast, both pink and red flowers that arise from bud sports are heterozygous for the Riant locus, with one functional allele due to the 2-bp deletion or a novel 1-bp insertion. Southern blot analysis indicated that the Riant gene occurs in a single copy in the peach genome and it is not interrupted by a transposon. The function of the Riant gene was confirmed by its ectopic expression in the Arabidopsis tt19 mutant, where it complements the anthocyanin phenotype, but not the proanthocyanidin pigmentation in seed coat. Collectively,these results indicate that a small indel mutation in the Riant gene, which is not the result of a transposon insertion or excision, causes variegated colouration of peach flowers. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.
A small indel mutation in an anthocyanin transporter causes variegated colouration of peach flowers
Cheng, Jun; Liao, Liao; Zhou, Hui; Gu, Chao; Wang, Lu; Han, Yuepeng
2015-01-01
The ornamental peach cultivar ‘Hongbaihuatao (HBH)’ can simultaneously bear pink, red, and variegated flowers on a single tree. Anthocyanin content in pink flowers is extremely low, being only 10% that of a red flower. Surprisingly, the expression of anthocyanin structural and potential regulatory genes in white flowers was not significantly lower than that in both pink and red flowers. However, proteomic analysis revealed a GST encoded by a gene—regulator involved in anthocyanin transport (Riant)—which is expressed in the red flower, but almost undetectable in the variegated flower. The Riant gene contains an insertion-deletion (indel) polymorphism in exon 3. In white flowers, the Riant gene is interrupted by a 2-bp insertion in the last exon, which causes a frameshift and a premature stop codon. In contrast, both pink and red flowers that arise from bud sports are heterozygous for the Riant locus, with one functional allele due to the 2-bp deletion or a novel 1-bp insertion. Southern blot analysis indicated that the Riant gene occurs in a single copy in the peach genome and it is not interrupted by a transposon. The function of the Riant gene was confirmed by its ectopic expression in the Arabidopsis tt19 mutant, where it complements the anthocyanin phenotype, but not the proanthocyanidin pigmentation in seed coat. Collectively,these results indicate that a small indel mutation in the Riant gene, which is not the result of a transposon insertion or excision, causes variegated colouration of peach flowers. PMID:26357885
FROG - Fingerprinting Genomic Variation Ontology
Bhardwaj, Anshu
2015-01-01
Genetic variations play a crucial role in differential phenotypic outcomes. Given the complexity in establishing this correlation and the enormous data available today, it is imperative to design machine-readable, efficient methods to store, label, search and analyze this data. A semantic approach, FROG: “FingeRprinting Ontology of Genomic variations” is implemented to label variation data, based on its location, function and interactions. FROG has six levels to describe the variation annotation, namely, chromosome, DNA, RNA, protein, variations and interactions. Each level is a conceptual aggregation of logically connected attributes each of which comprises of various properties for the variant. For example, in chromosome level, one of the attributes is location of variation and which has two properties, allosomes or autosomes. Another attribute is variation kind which has four properties, namely, indel, deletion, insertion, substitution. Likewise, there are 48 attributes and 278 properties to capture the variation annotation across six levels. Each property is then assigned a bit score which in turn leads to generation of a binary fingerprint based on the combination of these properties (mostly taken from existing variation ontologies). FROG is a novel and unique method designed for the purpose of labeling the entire variation data generated till date for efficient storage, search and analysis. A web-based platform is designed as a test case for users to navigate sample datasets and generate fingerprints. The platform is available at http://ab-openlab.csir.res.in/frog. PMID:26244889
Yang, Qing; Zhang, Sihuan; Liu, Liangliang; Cao, Xiukai; Lei, Chuzhao; Qi, Xinglei; Lin, Fengpeng; Qu, Weidong; Qi, Xingshan; Liu, Jiming; Wang, Rongmin; Chen, Hong; Lan, Xianyong
2016-09-02
The detection method based on the mathematical expectation (ME) strategy is fast and accuracy for low frequency mutation screening in large samples. Previous studies have found that the 14-bp insertion/deletion (indel) variants of the 3' untranslated region (3' UTR) within bovine PRNP gene have been characterized with low frequency (≤5%) in global breeds outside China, which has not been determined in Chinese cattle breeds yet. Therefore, this study aimed to identify the 14-bp indel within PRNP gene in 5 major Chinese indigenous cattle breeds and to evaluate its associations with phenotypic traits. It was the first time to use ME strategy to detect low frequency indel polymorphisms and found that minor allele frequency was 0.038 (Qinchuan), 0.033 (Xianan), 0.013 (Nanyang), 0.003 (Jiaxian), and zero (Ji'an), respectively. Compared to the traditional detection method by which the sample was screened one by one, the reaction time by using the ME method was decreased 62.5%, 64.9%, 77.6%, 88.9% and 66.4%, respectively. In addition, the 14-bp indel was significantly associated with the growth traits in 2 cattle breeds, with the body length of Qinchuan cattle as well as the body weight and waistline of Xianan cattle. Our results have uncovered that the method based on ME strategy is rapid, reliable, and cost-effective for detecting the low frequency mutation as well as our findings provide a potential valuable theoretical basis for the marker-assisted selection (MAS) in beef cattle.
Omer, Sumita; Lavi, Bar; Mieczkowski, Piotr A.; Covo, Shay; Hazkani-Covo, Einat
2017-01-01
Okazaki fragments that are formed during lagging strand DNA synthesis include an initiating primer consisting of both RNA and DNA. The RNA fragment must be removed before the fragments are joined. In Saccharomyces cerevisiae, a key player in this process is the structure-specific flap endonuclease, Rad27p (human homolog FEN1). To obtain a genomic view of the mutational consequence of loss of RAD27, a S. cerevisiae rad27Δ strain was subcultured for 25 generations and sequenced using Illumina paired-end sequencing. Out of the 455 changes observed in 10 colonies isolated the two most common types of events were insertions or deletions (INDELs) in simple sequence repeats (SSRs) and INDELs mediated by short direct repeats. Surprisingly, we also detected a previously neglected class of 21 template-switching events. These events were presumably generated by quasi-palindrome to palindrome correction, as well as palindrome elongation. The formation of these events is best explained by folding back of the stalled nascent strand and resumption of DNA synthesis using the same nascent strand as a template. Evidence of quasi-palindrome to palindrome correction that could be generated by template switching appears also in yeast genome evolution. Out of the 455 events, 55 events appeared in multiple isolates; further analysis indicates that these loci are mutational hotspots. Since Rad27 acts on the lagging strand when the leading strand should not contain any gaps, we propose a mechanism favoring intramolecular strand switching over an intermolecular mechanism. We note that our results open new ways of understanding template switching that occurs during genome instability and evolution. PMID:28974572
Yiu, Glenn; Tieu, Eric; Nguyen, Anthony T; Wong, Brittany; Smit-McBride, Zeljka
2016-10-01
To employ type II clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 endonuclease to suppress ocular angiogenesis by genomic disruption of VEGF-A in human RPE cells. CRISPR sequences targeting exon 1 of human VEGF-A were computationally identified based on predicted Cas9 on- and off-target probabilities. Single guide RNA (gRNA) cassettes with these target sequences were cloned into lentiviral vectors encoding the Streptococcuspyogenes Cas9 endonuclease (SpCas9) gene. The lentiviral vectors were used to infect ARPE-19 cells, a human RPE cell line. Frequency of insertion or deletion (indel) mutations was assessed by T7 endonuclease 1 mismatch detection assay; mRNA levels were assessed with quantitative real-time PCR; and VEGF-A protein levels were determined by ELISA. In vitro angiogenesis was measured using an endothelial cell tube formation assay. Five gRNAs targeting VEGF-A were selected based on the highest predicted on-target probabilities, lowest off-target probabilities, or combined average of both scores. Lentiviral delivery of the top-scoring gRNAs with SpCas9 resulted in indel formation in the VEGF-A gene at frequencies up to 37.0% ± 4.0% with corresponding decreases in secreted VEGF-A protein up to 41.2% ± 7.4% (P < 0.001), and reduction of endothelial tube formation up to 39.4% ± 9.8% (P = 0.02). No significant indel formation in the top three putative off-target sites tested was detected. The CRISPR-Cas9 endonuclease system may reduce VEGF-A secretion from human RPE cells and suppress angiogenesis, supporting the possibility of employing gene editing for antiangiogenesis therapy in ocular diseases.
p53 inhibits CRISPR-Cas9 engineering in human pluripotent stem cells.
Ihry, Robert J; Worringer, Kathleen A; Salick, Max R; Frias, Elizabeth; Ho, Daniel; Theriault, Kraig; Kommineni, Sravya; Chen, Julie; Sondey, Marie; Ye, Chaoyang; Randhawa, Ranjit; Kulkarni, Tripti; Yang, Zinger; McAllister, Gregory; Russ, Carsten; Reece-Hoyes, John; Forrester, William; Hoffman, Gregory R; Dolmetsch, Ricardo; Kaykas, Ajamete
2018-06-11
CRISPR/Cas9 has revolutionized our ability to engineer genomes and conduct genome-wide screens in human cells 1-3 . Whereas some cell types are amenable to genome engineering, genomes of human pluripotent stem cells (hPSCs) have been difficult to engineer, with reduced efficiencies relative to tumour cell lines or mouse embryonic stem cells 3-13 . Here, using hPSC lines with stable integration of Cas9 or transient delivery of Cas9-ribonucleoproteins (RNPs), we achieved an average insertion or deletion (indel) efficiency greater than 80%. This high efficiency of indel generation revealed that double-strand breaks (DSBs) induced by Cas9 are toxic and kill most hPSCs. In previous studies, the toxicity of Cas9 in hPSCs was less apparent because of low transfection efficiency and subsequently low DSB induction 3 . The toxic response to DSBs was P53/TP53-dependent, such that the efficiency of precise genome engineering in hPSCs with a wild-type P53 gene was severely reduced. Our results indicate that Cas9 toxicity creates an obstacle to the high-throughput use of CRISPR/Cas9 for genome engineering and screening in hPSCs. Moreover, as hPSCs can acquire P53 mutations 14 , cell replacement therapies using CRISPR/Cas9-enginereed hPSCs should proceed with caution, and such engineered hPSCs should be monitored for P53 function.
Muver, a computational framework for accurately calling accumulated mutations.
Burkholder, Adam B; Lujan, Scott A; Lavender, Christopher A; Grimm, Sara A; Kunkel, Thomas A; Fargo, David C
2018-05-09
Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.
Structural analysis of polarizing indels: an emerging consensus on the root of the tree of life
2009-01-01
Background The root of the tree of life has been a holy grail ever since Darwin first used the tree as a metaphor for evolution. New methods seek to narrow down the location of the root by excluding it from branches of the tree of life. This is done by finding traits that must be derived, and excluding the root from the taxa those traits cover. However the two most comprehensive attempts at this strategy, performed by Cavalier-Smith and Lake et al., have excluded each other's rootings. Results The indel polarizations of Lake et al. rely on high quality alignments between paralogs that diverged before the last universal common ancestor (LUCA). Therefore, sequence alignment artifacts may skew their conclusions. We have reviewed their data using protein structure information where available. Several of the conclusions are quite different when viewed in the light of structure which is conserved over longer evolutionary time scales than sequence. We argue there is no polarization that excludes the root from all Gram-negatives, and that polarizations robustly exclude the root from the Archaea. Conclusion We conclude that there is no contradiction between the polarization datasets. The combination of these datasets excludes the root from every possible position except near the Chloroflexi. Reviewers This article was reviewed by Greg Fournier (nominated by J. Peter Gogarten), Purificación López-García, and Eugene Koonin. PMID:19706177
Yang, Qing; Zhang, Sihuan; Liu, Liangliang; Cao, Xiukai; Lei, Chuzhao; Qi, Xinglei; Lin, Fengpeng; Qu, Weidong; Qi, Xingshan; Liu, Jiming; Wang, Rongmin; Chen, Hong; Lan, Xianyong
2016-01-01
ABSTRACT The detection method based on the mathematical expectation (ME) strategy is fast and accuracy for low frequency mutation screening in large samples. Previous studies have found that the 14-bp insertion/deletion (indel) variants of the 3′ untranslated region (3′ UTR) within bovine PRNP gene have been characterized with low frequency (≤5%) in global breeds outside China, which has not been determined in Chinese cattle breeds yet. Therefore, this study aimed to identify the 14-bp indel within PRNP gene in 5 major Chinese indigenous cattle breeds and to evaluate its associations with phenotypic traits. It was the first time to use ME strategy to detect low frequency indel polymorphisms and found that minor allele frequency was 0.038 (Qinchuan), 0.033 (Xianan), 0.013 (Nanyang), 0.003 (Jiaxian), and zero (Ji'an), respectively. Compared to the traditional detection method by which the sample was screened one by one, the reaction time by using the ME method was decreased 62.5%, 64.9%, 77.6%, 88.9% and 66.4%, respectively. In addition, the 14-bp indel was significantly associated with the growth traits in 2 cattle breeds, with the body length of Qinchuan cattle as well as the body weight and waistline of Xianan cattle. Our results have uncovered that the method based on ME strategy is rapid, reliable, and cost-effective for detecting the low frequency mutation as well as our findings provide a potential valuable theoretical basis for the marker-assisted selection (MAS) in beef cattle. PMID:27580010
Yamagishi, Hiroshi; Tanaka, Yoshiyuki; Terachi, Toru
2014-11-01
Crop species of Brassica (Brassicaceae) consist of three monogenomic species and three amphidiploid species resulting from interspecific hybridizations among them. Until now, mitochondrial genome sequences were available for only five of these species. We sequenced the mitochondrial genome of the sixth species, Brassica nigra (nuclear genome constitution BB), and compared it with those of Brassica oleracea (CC) and Brassica carinata (BBCC). The genome was assembled into a 232 145 bp circular sequence that is slightly larger than that of B. oleracea (219 952 bp). The genome of B. nigra contained 33 protein-coding genes, 3 rRNA genes, and 17 tRNA genes. The cox2-2 gene present in B. oleracea was absent in B. nigra. Although the nucleotide sequences of 52 genes were identical between B. nigra and B. carinata, the second exon of rps3 showed differences including an insertion/deletion (indel) and nucleotide substitutions. A PCR test to detect the indel revealed intraspecific variation in rps3, and in one line of B. nigra it amplified a DNA fragment of the size expected for B. carinata. In addition, the B. carinata lines tested here produced DNA fragments of the size expected for B. nigra. The results indicate that at least two mitotypes of B. nigra were present in the maternal parents of B. carinata.
Huotari, Tea; Korpelainen, Helena
2013-01-01
Non-indigenous species (NIS) are species living outside their historic or native range. Invasive NIS often cause severe environmental impacts, and may have large economical and social consequences. Elodea (Hydrocharitaceae) is a New World genus with at least five submerged aquatic angiosperm species living in fresh water environments. Our aim was to survey the geographical distribution of cpDNA haplotypes within the native and introduced ranges of invasive aquatic weeds Elodea canadensis and E. nuttallii and to reconstruct the spreading histories of these invasive species. In order to reveal informative chloroplast (cp) genome regions for phylogeographic analyses, we compared the plastid sequences of native and introduced individuals of E. canadensis. In total, we found 235 variable sites (186 SNPs, 47 indels and two inversions) between the two plastid sequences consisting of 112,193 bp and developed primers flanking the most variable genomic areas. These 29 primer pairs were used to compare the level and pattern of intraspecific variation within E. canadensis to interspecific variation between E. canadensis and E. nuttallii. Nine potentially informative primer pairs were used to analyze the phylogeographic structure of both Elodea species, based on 70 E. canadensis and 25 E. nuttallii individuals covering native and introduced distributions. On the whole, the level of variation between the two Elodea species was 53% higher than that within E. canadensis. In our phylogeographic analysis, only a single haplotype was found in the introduced range in both species. These haplotypes H1 (E. canadensis) and A (E. nuttallii) were also widespread in the native range, covering the majority of native populations analyzed. Therefore, we were not able to identify either the geographic origin of the introduced populations or test the hypothesis of single versus multiple introductions. The divergence between E. canadensis haplotypes was surprisingly high, and future research may clarify mechanisms that structure native E. canadensis populations. PMID:23620722
Aoki, Kimiko; Tanaka, Hiroyuki; Kawahara, Takashi
2018-07-01
The standard method for personal identification and verification of urine samples in doping control is short tandem repeat (STR) analysis using nuclear DNA (nDNA). The DNA concentration of urine is very low and decreases under most conditions used for sample storage; therefore, the amount of DNA from cryopreserved urine samples may be insufficient for STR analysis. We aimed to establish a multiplexed assay for urine mitochondrial DNA typing containing only trace amounts of DNA, particularly for Japanese populations. A multiplexed suspension-array assay using oligo-tagged microspheres (Luminex MagPlex-TAG) was developed to measure C-stretch length in hypervariable region 1 (HV1) and 2 (HV2), five single nucleotide polymorphisms (SNPs), and one polymorphic indel. Based on these SNPs and the indel, the Japanese population can be classified into five major haplogroups (D4, B, M7a, A, D5). The assay was applied to DNA samples from urine cryopreserved for 1 - 1.5 years (n = 63) and fresh blood (n = 150). The assay with blood DNA enabled Japanese subjects to be categorized into 62 types, exhibiting a discriminatory power of 0.960. The detection limit for cryopreserved urine was 0.005 ng of nDNA. Profiling of blood and urine pairs revealed that 5 of 63 pairs showed different C-stretch patterns in HV1 or HV2. The assay described here yields valuable information in terms of the verification of urine sample sources employing only trace amounts of recovered DNA. However, blood cannot be used as a reference sample.
Huang, Yong-Zhen; Zhang, En-Ping; Wang, Jing; Huai, Yong-Tao; Ma, Liang; Chen, Fu-Ying; Lan, Xian-Yong; Lei, Chu-Zhao; Fang, Xing-Tang; Wang, Ju-Qiang; Chen, Hong
2011-03-01
Adipocyte determination and differentiation-dependent factor 1/sterol regulatory element-binding protein-1c (ADD1/SREBP1c) is a major determinant of tissue differential lipogenic capacity in mammalian and avian species. The objectives of the present study were to focus on insertion-deletion polymorphism (indel) in the bovine ADD1/SREBP1c gene, and analyzing its effect on growth traits in a sample of 1035 cattle belonging to four Chinese cattle breeds. PCR-SSCP, DNA sequencing and agarose electrophoresis methods were used. The 778 bp PCR products of ADD1/SREBP1c gene exhibited three genotypes and two alleles were revealed: W and D. Frequencies of the W allele varied from 0.8651 to 1.000. The associations of the 84 bp indel mutation of ADD1/SREBP1c gene with growth traits of 265 Nanyang cows were analyzed. The animals with genotype WD had significantly higher birth weight, body weight, average daily gain than those with genotype WW at birth, 6-, 12-, 18-, and 24-month old (P < 0.05 or P < 0.01). These results suggest that the indel mutation of bovine ADD1/SREBP1c gene may influence the growth traits in cattle.
Raimondi, Daniele; Gazzo, Andrea M; Rooman, Marianne; Lenaerts, Tom; Vranken, Wim F
2016-06-15
There are now many predictors capable of identifying the likely phenotypic effects of single nucleotide variants (SNVs) or short in-frame Insertions or Deletions (INDELs) on the increasing amount of genome sequence data. Most of these predictors focus on SNVs and use a combination of features related to sequence conservation, biophysical, and/or structural properties to link the observed variant to either neutral or disease phenotype. Despite notable successes, the mapping between genetic variants and their phenotypic effects is riddled with levels of complexity that are not yet fully understood and that are often not taken into account in the predictions, despite their promise of significantly improving the prediction of deleterious mutants. We present DEOGEN, a novel variant effect predictor that can handle both missense SNVs and in-frame INDELs. By integrating information from different biological scales and mimicking the complex mixture of effects that lead from the variant to the phenotype, we obtain significant improvements in the variant-effect prediction results. Next to the typical variant-oriented features based on the evolutionary conservation of the mutated positions, we added a collection of protein-oriented features that are based on functional aspects of the gene affected. We cross-validated DEOGEN on 36 825 polymorphisms, 20 821 deleterious SNVs, and 1038 INDELs from SwissProt. The multilevel contextualization of each (variant, protein) pair in DEOGEN provides a 10% improvement of MCC with respect to current state-of-the-art tools. The software and the data presented here is publicly available at http://ibsquare.be/deogen : wvranken@vub.ac.be Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Carvalho, Darlen C; Wanderley, Alayde V; Amador, Marcos A T; Fernandes, Marianne R; Cavalcante, Giovanna C; Pantoja, Karla B C C; Mello, Fernando A R; de Assumpção, Paulo P; Khayat, André S; Ribeiro-Dos-Santos, Ândrea; Santos, Sidney; Dos Santos, Ney P C
2015-08-20
Acute lymphoblastic leukemia (ALL) is a malignant tumor common in children. Studies of genetic susceptibility to cancer using biallelic insertion/deletion (INDEL) type polymorphisms associated with cancer development pathways may help to clarify etymology of ALL. In this study, we investigate the role of eight functional INDEL polymorphisms and influence of genetic ancestry to B-cell ALL susceptibility in children of Brazilian Amazon population, which has a high degree of inter-ethnic admixture. Ancestry analysis was estimated using a panel of 48 autosomal ancestry informative markers. 130 B-cell ALL patients and 125 healthy controls were included in this study. The odds ratios and 95% confidence intervals were adjusted for confounders. The results indicated an association between the investigated INDEL polymorphisms in CASP8 (rs3834129), CYP19A1 (rs11575899) e XRCC1 (rs3213239) genes in the development of B-cell ALL. The carriers of Insertion/Insertion (Ins/Ins) genotype of the polymorphism in CASP8 gene presented reduced chances of developing B-cell ALL (P=0.001; OR=0.353; 95% CI=0.192-0.651). The Deletion/Deletion (Del/Del) genotype of the polymorphism in CYP19A1 gene was associated to a lower chance of developing B-cell ALL (P=3.35×10 -6 ; OR=0.121; 95% CI=0.050-0.295), while Del/Del genotype of the polymorphism in XRCC1 gene was associated to a higher chance of developing B-cell ALL (P=2.01×10 -4 ; OR=6.559; 95% CI=2.433-17.681). We also found that Amerindian ancestry correlates with the risk of B-cell ALL. For each increase of 10% in the Amerindian ancestry results in 1.4-fold chances of developing B-cell ALL (OR=1.406; 95% IC=1.123-1.761), while each increase of 10% in the European ancestry presents a protection effect in the development of B-cell ALL (OR=0.666; 95% IC=0.536-0.827). The results suggest that genetic factors influence leukemogenesis and might be explored in the stratification of B-cell ALL risk in admixed populations. Copyright © 2015 Z. Published by Elsevier Ltd.. All rights reserved.
Ivancic-Jelecki, Jelena; Slovic, Anamarija; Šantak, Maja; Tešović, Goran; Forcic, Dubravko
2016-07-29
The canonical genome organization of measles virus (MV) is characterized by total size of 15 894 nucleotides (nts) and defined length of every genomic region, both coding and non-coding. Only rarely have reports of strains possessing non-canonical genomic properties (possessing indels, with or without the change of total genome length) been published. The observed mutations are mutually compensatory in a sense that the total genome length remains polyhexameric. Although programmed and highly precise pseudo-templated nucleotide additions during transcription are inherent to polymerases of all viruses belonging to family Paramyxoviridae, a similar mechanism that would serve to non-randomly correct genome length, if an indel has occurred during replication, has so far not been described in the context of a complete virus genome. We compiled all complete MV genomic sequences (64 in total) available in open access sequence databases. Multiple sequence comparisons and phylogenetic analyses were performed with the aim of exploring whether non-recombinant and non-evolutionary linked measles strains that show deviations from canonical genome organization possess a common genetic characteristic. In 11 MV sequences we detected deviations from canonical genome organization due to short indels located within homopolymeric stretches or next to them. In nine out of 11 identified non-canonical MV sequences, a common feature was observed: one mutation, either an insertion or a deletion, was located in a 28 nts long region in F gene 5' untranslated region (positions 5051-5078 in genomic cDNA of canonical strains). This segment is composed of five tandemly linked homopolymeric stretches, its consensus sequence is G6-7C7-8A6-7G1-3C5-6. Although none of the mononucleotide repeats within this segment has fixed length, the total number of nts in canonical strains is always 28. These nine non-canonical strains, as well as the tenth (not mutated in 5051-5078 segment), can be grouped in three clusters, based on their passage histories/epidemiological data/genetic similarities. There are no indications that the 3 clusters are evolutionary linked, other than the fact that they all belong to clade D. A common narrow genomic region was found to be mutated in different, non-related, wild type strains suggesting that this region might have a function in non-random genome length corrections occurring during MV replication.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya
The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less
Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; ...
2015-10-20
The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less
Genomic diversity in switchgrass (Panicum virgatum): from the continental scale to a dune landscape
Morris, Geoffrey P.; Grabowski, Paul; Borevitz, Justin O.
2011-01-01
Connecting broad-scale patterns of genetic variation and population structure to genetic diversity on a landscape is a key step towards understanding historical processes of migration and adaptation. New genomic approaches can be used to increase the resolution of phylogeographic studies while reducing locus sampling effects and circumventing ascertainment bias. Here, we use a novel approach based on high-throughput sequencing to characterize genetic diversity in complete chloroplast genomes and >10,000 nuclear loci in switchgrass, across a continental and landscape scale. Switchgrass is a North American tallgrass species, which is widely used in conservation and perennial biomass production, and shows strong ecotypic adaptation and population structure across the continental range. We sequenced 40.9 billion base pairs from 24 individuals from across the species’ range and 20 individuals from the Indiana Dunes. Analysis of plastome sequence revealed 203 variable SNP sites that define eight haplogroups, which are differentiated by 4 to 127 SNPs and confirmed by patterns of indel variation. These include three deeply divergent haplogroups, which correspond to the previously described lowland-upland ecotypic split and a novel upland haplogroup split that dates to the mid-Pleistoscene. Most of the plastome haplogroup diversity present in the northern switchgrass range, including in the Indiana Dunes, originated in the mid- or upper-Pleistocene prior to the most recent postglacial recolonization. Furthermore, a recently colonized landscape feature (~150 ya) in the Indiana Dunes contains several deeply divergent upland haplogroups. Nuclear markers also support a deep lowland-upland split, followed by limited gene flow, and show extensive gene flow in the local population of the Indiana Dunes. PMID:22060816
VARiD: a variation detection framework for color-space and letter-space platforms.
Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael
2010-06-15
High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.
Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice
2011-05-05
High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
MySSP: Non-stationary evolutionary sequence simulation, including indels
Rosenberg, Michael S.
2007-01-01
MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package. PMID:19325855
The rate of meiotic gene conversion varies by sex and age
Halldorsson, Bjarni V.; Hardarson, Marteinn T.; Kehr, Birte; Styrkarsdottir, Unnur; Gylfason, Arnaldur; Thorleifsson, Gudmar; Zink, Florian; Jonasdottir, Adalbjorg; Jonasdottir, Aslaug; Sulem, Patrick; Masson, Gisli; Thorsteinsdottir, Unnur; Helgason, Agnar; Kong, Augustine; Gudbjartsson, Daniel F.; Stefansson, Kari
2016-01-01
Meiotic recombination involves a combination of gene conversion and crossover events that along with mutations produce germline genetic diversity. Here, we report the discovery of 3,176 SNP and 61 indel gene conversions. Our estimate of the non-crossover (NCO) gene conversion rate (G) is 7.0 for SNPs and 5.8 for indels per Mb per generation, and the GC bias is 67.6%. For indels we demonstrate a 65.6% preference for the shorter allele. NCO gene conversions from mothers are longer than those from fathers and G is 2.17 times greater in mothers. Notably, G increases with the age of mothers, but not fathers. A disproportionate number of NCO gene conversions in older mothers occur outside double strand break (DSB) regions and in regions with relatively low GC content. This points to age-related changes in the mechanisms of meiotic gene conversions in oocytes. PMID:27643539
A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes
Bergström, Anders; Simpson, Jared T.; Salinas, Francisco; Barré, Benjamin; Parts, Leopold; Zia, Amin; Nguyen Ba, Alex N.; Moses, Alan M.; Louis, Edward J.; Mustonen, Ville; Warringer, Jonas; Durbin, Richard; Liti, Gianni
2014-01-01
The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies. PMID:24425782
Characterization of phenylpropanoid pathway genes within European maize (Zea mays L.) inbreds
Andersen, Jeppe Reitan; Zein, Imad; Wenzel, Gerhard; Darnhofer, Birte; Eder, Joachim; Ouzunova, Milena; Lübberstedt, Thomas
2008-01-01
Background Forage quality of maize is influenced by both the content and structure of lignins in the cell wall. Biosynthesis of monolignols, constituting the complex structure of lignins, is catalyzed by enzymes in the phenylpropanoid pathway. Results In the present study we have amplified partial genomic fragments of six putative phenylpropanoid pathway genes in a panel of elite European inbred lines of maize (Zea mays L.) contrasting in forage quality traits. Six loci, encoding C4H, 4CL1, 4CL2, C3H, F5H, and CAD, displayed different levels of nucleotide diversity and linkage disequilibrium (LD) possibly reflecting different levels of selection. Associations with forage quality traits were identified for several individual polymorphisms within the 4CL1, C3H, and F5H genomic fragments when controlling for both overall population structure and relative kinship. A 1-bp indel in 4CL1 was associated with in vitro digestibility of organic matter (IVDOM), a non-synonymous SNP in C3H was associated with IVDOM, and an intron SNP in F5H was associated with neutral detergent fiber. However, the C3H and F5H associations did not remain significant when controlling for multiple testing. Conclusion While the number of lines included in this study limit the power of the association analysis, our results imply that genetic variation for forage quality traits can be mined in phenylpropanoid pathway genes of elite breeding lines of maize. PMID:18173847
Hayakawa, Takashi; Sugawara, Tohru; Go, Yasuhiro; Udono, Toshifumi; Hirai, Hirohisa; Imai, Hiroo
2012-01-01
Chimpanzees (Pan troglodytes) have region-specific difference in dietary repertoires from East to West across tropical Africa. Such differences may result from different genetic backgrounds in addition to cultural variations. We analyzed the sequences of all bitter taste receptor genes (cTAS2Rs) in a total of 59 chimpanzees, including 4 putative subspecies. We identified genetic variations including single-nucleotide variations (SNVs), insertions and deletions (indels), gene-conversion variations, and copy-number variations (CNVs) in cTAS2Rs. Approximately two-thirds of all cTAS2R haplotypes in the amino acid sequence were unique to each subspecies. We analyzed the evolutionary backgrounds of natural selection behind such diversification. Our previous study concluded that diversification of cTAS2Rs in western chimpanzees (P. t. verus) may have resulted from balancing selection. In contrast, the present study found that purifying selection dominates as the evolutionary form of diversification of the so-called human cluster of cTAS2Rs in eastern chimpanzees (P. t. schweinfurthii) and that the other cTAS2Rs were under no obvious selection as a whole. Such marked diversification of cTAS2Rs with different evolutionary backgrounds among subspecies of chimpanzees probably reflects their subspecies-specific dietary repertoires.
Hayakawa, Takashi; Sugawara, Tohru; Go, Yasuhiro; Udono, Toshifumi; Hirai, Hirohisa; Imai, Hiroo
2012-01-01
Chimpanzees (Pan troglodytes) have region-specific difference in dietary repertoires from East to West across tropical Africa. Such differences may result from different genetic backgrounds in addition to cultural variations. We analyzed the sequences of all bitter taste receptor genes (cTAS2Rs) in a total of 59 chimpanzees, including 4 putative subspecies. We identified genetic variations including single-nucleotide variations (SNVs), insertions and deletions (indels), gene-conversion variations, and copy-number variations (CNVs) in cTAS2Rs. Approximately two-thirds of all cTAS2R haplotypes in the amino acid sequence were unique to each subspecies. We analyzed the evolutionary backgrounds of natural selection behind such diversification. Our previous study concluded that diversification of cTAS2Rs in western chimpanzees (P. t. verus) may have resulted from balancing selection. In contrast, the present study found that purifying selection dominates as the evolutionary form of diversification of the so-called human cluster of cTAS2Rs in eastern chimpanzees (P. t. schweinfurthii) and that the other cTAS2Rs were under no obvious selection as a whole. Such marked diversification of cTAS2Rs with different evolutionary backgrounds among subspecies of chimpanzees probably reflects their subspecies-specific dietary repertoires. PMID:22916235
Bargues, M Dolores; Zuriaga, M Angeles; Mas-Coma, Santiago
2014-01-01
A pseudogene, paralogous to rDNA 5.8S and ITS-2, is described in Meccus dimidiata dimidiata, M. d. capitata, M. d. maculippenis, M. d. hegneri, M. sp. aff. dimidiata, M. p. phyllosoma, M. p. longipennis, M. p. pallidipennis, M. p. picturata, M. p. mazzottii, Triatoma mexicana, Triatoma nitida and Triatoma sanguisuga, covering North America, Central America and northern South America. Such a nuclear rDNA pseudogene is very rare. In the 5.8S gene, criteria for pseudogene identification included length variability, lower GC content, mutations regarding the functional uniform sequence, and relatively high base substitutions in evolutionary conserved sites. At ITS-2 level, criteria were the shorter sequence and large proportion of insertions and deletions (indels). Pseudogenic 5.8S and ITS-2 secondary structures were different from the functional foldings, different one another, showing less negative values for minimum free energy (mfe) and centroid predictions, and lower fit between mfe, partition function, and centroid structures. A complete characterization indicated a processed pseudogenic unit of the ghost type, escaping from rDNA concerted evolution and with functionality subject to constraints instead of evolving free by neutral drift. Despite a high indel number, low mutation number and an evolutionary rate similar to the functional ITS-2, that pseudogene distinguishes different taxa and furnishes coherent phylogenetic topologies with resolution similar to the functional ITS-2. The discovery of a pseudogene in many phylogenetically related species is unique in animals and allowed for an estimation of its palaeobiogeographical origin based on molecular clock data, inheritance pathways, evolutionary rate and pattern, and geographical spread. Additional to the technical risk to be considered henceforth, this relict pseudogene, designated as "ps(5.8S+ITS-2)", proves to be a valuable marker for specimen classification, phylogenetic analyses, and systematic/taxonomic studies. It opens a new research field, Chagas disease epidemiology and control included, given its potential relationships with triatomine fitness, behaviour and adaptability. Copyright © 2013 Elsevier B.V. All rights reserved.
Bassil, Nahla V; Davis, Thomas M; Zhang, Hailong; Ficklin, Stephen; Mittmann, Mike; Webster, Teresa; Mahoney, Lise; Wood, David; Alperin, Elisabeth S; Rosyara, Umesh R; Koehorst-Vanc Putten, Herma; Monfort, Amparo; Sargent, Daniel J; Amaya, Iraida; Denoyes, Beatrice; Bianco, Luca; van Dijk, Thijs; Pirani, Ali; Iezzoni, Amy; Main, Dorrie; Peace, Cameron; Yang, Yilong; Whitaker, Vance; Verma, Sujeet; Bellon, Laurent; Brew, Fiona; Herrera, Raul; van de Weg, Eric
2015-03-07
A high-throughput genotyping platform is needed to enable marker-assisted breeding in the allo-octoploid cultivated strawberry Fragaria × ananassa. Short-read sequences from one diploid and 19 octoploid accessions were aligned to the diploid Fragaria vesca 'Hawaii 4' reference genome to identify single nucleotide polymorphisms (SNPs) and indels for incorporation into a 90 K Affymetrix® Axiom® array. We report the development and preliminary evaluation of this array. About 36 million sequence variants were identified in a 19 member, octoploid germplasm panel. Strategies and filtering pipelines were developed to identify and incorporate markers of several types: di-allelic SNPs (66.6%), multi-allelic SNPs (1.8%), indels (10.1%), and ploidy-reducing "haploSNPs" (11.7%). The remaining SNPs included those discovered in the diploid progenitor F. iinumae (3.9%), and speculative "codon-based" SNPs (5.9%). In genotyping 306 octoploid accessions, SNPs were assigned to six classes with Affymetrix's "SNPolisher" R package. The highest quality classes, PolyHigh Resolution (PHR), No Minor Homozygote (NMH), and Off-Target Variant (OTV) comprised 25%, 38%, and 1% of array markers, respectively. These markers were suitable for genetic studies as demonstrated in the full-sib family 'Holiday' × 'Korona' with the generation of a genetic linkage map consisting of 6,594 PHR SNPs evenly distributed across 28 chromosomes with an average density of approximately one marker per 0.5 cM, thus exceeding our goal of one marker per cM. The Affymetrix IStraw90 Axiom array is the first high-throughput genotyping platform for cultivated strawberry and is commercially available to the worldwide scientific community. The array's high success rate is likely driven by the presence of naturally occurring variation in ploidy level within the nominally octoploid genome, and by effectiveness of the employed array design and ploidy-reducing strategies. This array enables genetic analyses including generation of high-density linkage maps, identification of quantitative trait loci for economically important traits, and genome-wide association studies, thus providing a basis for marker-assisted breeding in this high value crop.
The complete chloroplast genome sequence of Actinidia arguta using the PacBio RS II platform
Lin, Miaomiao; Qi, Xiujuan; Chen, Jinyong; Sun, Leiming; Zhong, Yunpeng; Fang, Jinbao; Hu, Chungen
2018-01-01
Actinidia arguta is the most basal species in a phylogenetically and economically important genus in the family Actinidiaceae. To better understand the molecular basis of the Actinidia arguta chloroplast (cp), we sequenced the complete cp genome from A. arguta using Illumina and PacBio RS II sequencing technologies. The cp genome from A. arguta was 157,611 bp in length and composed of a pair of 24,232 bp inverted repeats (IRs) separated by a 20,463 bp small single copy region (SSC) and an 88,684 bp large single copy region (LSC). Overall, the cp genome contained 113 unique genes. The cp genomes from A. arguta and three other Actinidia species from GenBank were subjected to a comparative analysis. Indel mutation events and high frequencies of base substitution were identified, and the accD and ycf2 genes showed a high degree of variation within Actinidia. Forty-seven simple sequence repeats (SSRs) and 155 repetitive structures were identified, further demonstrating the rapid evolution in Actinidia. The cp genome analysis and the identification of variable loci provide vital information for understanding the evolution and function of the chloroplast and for characterizing Actinidia population genetics. PMID:29795601
Hu, Hao; Wienker, Thomas F; Musante, Luciana; Kalscheuer, Vera M; Kahrizi, Kimia; Najmabadi, Hossein; Ropers, H Hilger
2014-12-01
Next-generation sequencing has greatly accelerated the search for disease-causing defects, but even for experts the data analysis can be a major challenge. To facilitate the data processing in a clinical setting, we have developed a novel medical resequencing analysis pipeline (MERAP). MERAP assesses the quality of sequencing, and has optimized capacity for calling variants, including single-nucleotide variants, insertions and deletions, copy-number variation, and other structural variants. MERAP identifies polymorphic and known causal variants by filtering against public domain databases, and flags nonsynonymous and splice-site changes. MERAP uses a logistic model to estimate the causal likelihood of a given missense variant. MERAP considers the relevant information such as phenotype and interaction with known disease-causing genes. MERAP compares favorably with GATK, one of the widely used tools, because of its higher sensitivity for detecting indels, its easy installation, and its economical use of computational resources. Upon testing more than 1,200 individuals with mutations in known and novel disease genes, MERAP proved highly reliable, as illustrated here for five families with disease-causing variants. We believe that the clinical implementation of MERAP will expedite the diagnostic process of many disease-causing defects. © 2014 WILEY PERIODICALS, INC.
Bergman, C M; Kreitman, M
2001-08-01
Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.
Performing monkeys of Bangladesh: characterizing their source and genetic variation.
Hasan, M Kamrul; Feeroz, M Mostafa; Jones-Engel, Lisa; Engel, Gregory A; Akhtar, Sharmin; Kanthaswamy, Sree; Smith, David Glenn
2016-04-01
The acquisition and training of monkeys to perform is a centuries-old tradition in South Asia, resulting in a large number of rhesus macaques kept in captivity for this purpose. The performing monkeys are reportedly collected from free-ranging populations, and may escape from their owners or may be released into other populations. In order to determine whether this tradition involving the acquisition and movement of animals has influenced the population structure of free-ranging rhesus macaques in Bangladesh, we first characterized the source of these monkeys. Biological samples from 65 performing macaques collected between January 2010 and August 2013 were analyzed for genetic variation using 716 base pairs of mitochondrial DNA. Performing monkey sequences were compared with those of free-ranging rhesus macaque populations in Bangladesh, India and Myanmar. Forty-five haplotypes with 116 (16 %) polymorphic nucleotide sites were detected among the performing monkeys. As for the free-ranging rhesus population, most of the substitutions (89 %) were transitions, and no indels (insertion/deletion) were observed. The estimate of the mean number of pair-wise differences for the performing monkey population was 10.1264 ± 4.686, compared to 14.076 ± 6.363 for the free-ranging population. Fifteen free-ranging rhesus macaque populations were identified as the source of performing monkeys in Bangladesh; several of these populations were from areas where active provisioning has resulted in a large number of macaques. The collection of performing monkeys from India was also evident.
Performing monkeys of Bangladesh: characterizing their source and genetic variation
Hasan, M Kamrul; Feeroz, M Mostafa; Jones-Engel, Lisa; Engel, Gregory A; Akhtar, Sharmin; Kanthaswamy, Sree; Smith, David Glenn
2016-01-01
The acquisition and training of monkeys to perform is a century's old tradition in South Asia, resulting in a large number of rhesus macaques kept in captivity for this purpose. The performing monkeys are reportedly collected from free-ranging populations and may escape from their owners or be released into other populations. In order to determine whether this tradition, that involves the acquisition and movement of animals, has influenced the population structure of free-ranging rhesus macaques in Bangladesh we first characterized the source of these monkeys. Biological samples from 65 performing macaques, collected between January 2010 and August 2013 were analyzed for genetic variation using 716 base pairs of mitochondrial DNA. Performing monkey sequences were compared with those of free-ranging rhesus macaque populations in Bangladesh, India and Myanmar. Forty-five haplotypes with 116 (16%) polymorphic nucleotide sites were detected among the performing monkeys. As for the free-ranging rhesus population, most of the substitutions (89%) were transitions and no indels (insertion/deletion) were observed. The estimate of the mean number of pair-wise difference for the performing monkey population was 10.1264 ± 4.686, compared to 14.076 ± 6.363 for the free-ranging population. Fifteen free-ranging rhesus macaque populations were identified as the source of performing monkeys in Bangladesh; several of these populations were from areas where active provisioning has resulted in a large number of macaques. Collection of performing monkeys from India was also evident. PMID:26758818
Whole-genome sequencing of Atacama skeleton shows novel mutations linked with dysplasia
Bhattacharya, Sanchita; Li, Jian; Sockell, Alexandra; Kan, Matthew J.; Bava, Felice A.; Chen, Shann-Ching; Ávila-Arcos, María C.; Ji, Xuhuai; Smith, Emery; Asadi, Narges B.; Lachman, Ralph S.; Lam, Hugo Y.K.; Bustamante, Carlos D.; Butte, Atul J.; Nolan, Garry P.
2018-01-01
Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype—6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age—leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6–8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes (COL1A1, COL2A1, KMT2D, FLNB, ATR, TRIP11, PCNT) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification. PMID:29567674
A universal method for automated gene mapping
Zipperlen, Peder; Nairz, Knud; Rimann, Ivo; Basler, Konrad; Hafen, Ernst; Hengartner, Michael; Hajnal, Alex
2005-01-01
Small insertions or deletions (InDels) constitute a ubiquituous class of sequence polymorphisms found in eukaryotic genomes. Here, we present an automated high-throughput genotyping method that relies on the detection of fragment-length polymorphisms (FLPs) caused by InDels. The protocol utilizes standard sequencers and genotyping software. We have established genome-wide FLP maps for both Caenorhabditis elegans and Drosophila melanogaster that facilitate genetic mapping with a minimum of manual input and at comparatively low cost. PMID:15693948
Atak, Zeynep Kalender; Gianfelici, Valentina; Hulselmans, Gert; De Keersmaecker, Kim; Devasia, Arun George; Geerdens, Ellen; Mentens, Nicole; Chiaretti, Sabina; Durinck, Kaat; Uyttebroeck, Anne; Vandenberghe, Peter; Wlodarska, Iwona; Cloos, Jacqueline; Foà, Robin; Speleman, Frank; Cools, Jan; Aerts, Stein
2013-01-01
RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.
Castro-Martínez, Anna Gabriela; Sánchez-Corona, José; Vázquez-Vargas, Adriana Patricia; García-Zapién, Alejandra Guadalupe; López-Quintero, Andres; Villalpando-Velazco, Héctor Javier; Flores-Martínez, Silvia Esperanza
2018-02-28
Gestational diabetes mellitus (GDM) is a metabolically complex disease with major genetic determinants. GDM has been associated with insulin resistance and dysfunction of pancreatic beta cells, so the GDM candidate genes are those that encode proteins modulating the function and secretion of insulin, such as that for calpain 10 (CAPN10). This study aimed to assess whether single nucleotide polymorphism (SNP)-43, SNP-44, SNP-63, and the indel-19 variant, and specific haplotypes of the CAPN10 gene were associated with gestational diabetes mellitus. We studied 116 patients with gestational diabetes mellitus and 83 women with normal glucose tolerance. Measurements of anthropometric and biochemical parameters were performed. SNP-43, SNP-44, and SNP-63 were identified by polymerase chain reaction (PCR)-restriction fragment length polymorphisms, while the indel-19 variant was detected by TaqMan qPCR assays. The allele, genotype, and haplotype frequencies of the four variants did not differ significantly between women with gestational diabetes mellitus and controls. However, in women with gestational diabetes mellitus, glucose levels were significantly higher bearing the 3R/3R genotype than in carriers of the 3R/2R genotype of the indel-19 variant (p = 0.006). In conclusion, the 3R/3R genotype of the indel-19 variant of the CAPN-10 gene influenced increased glucose levels in these Mexican women with gestational diabetes mellitus.
Losada, Liliana; Varga, John J.; Hostetler, Jessica; Radune, Diana; Kim, Maria; Durkin, Scott; Schneewind, Olaf; Nierman, William C.
2011-01-01
Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes. PMID:21559501
Losada, Liliana; Varga, John J; Hostetler, Jessica; Radune, Diana; Kim, Maria; Durkin, Scott; Schneewind, Olaf; Nierman, William C
2011-04-29
Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes.
Fisher, Colleen A.; Bhattarai, Eric K.; Osterstock, Jason B.; Dowd, Scot E.; Seabury, Paul M.; Vikram, Meenu; Whitlock, Robert H.; Schukken, Ynte H.; Schnabel, Robert D.; Taylor, Jeremy F.; Womack, James E.; Seabury, Christopher M.
2011-01-01
Members of the Toll-like receptor (TLR) gene family occupy key roles in the mammalian innate immune system by functioning as sentries for the detection of invading pathogens, thereafter provoking host innate immune responses. We utilized a custom next-generation sequencing approach and allele-specific genotyping assays to detect and validate 280 biallelic variants across all 10 bovine TLR genes, including 71 nonsynonymous single nucleotide polymorphisms (SNPs) and one putative nonsense SNP. Bayesian haplotype reconstructions and median joining networks revealed haplotype sharing between Bos taurus taurus and Bos taurus indicus breeds at every locus, and specialized beef and dairy breeds could not be differentiated despite an average polymorphism density of 1 marker/158 bp. Collectively, 160 tagSNPs and two tag insertion-deletion mutations (indels) were sufficient to predict 100% of the variation at 280 variable sites for both Bos subspecies and their hybrids, whereas 118 tagSNPs and 1 tagIndel predictively captured 100% of the variation at 235 variable sites for B. t. taurus. Polyphen and SIFT analyses of amino acid (AA) replacements encoded by bovine TLR SNPs indicated that up to 32% of the AA substitutions were expected to impact protein function. Classical and newly developed tests of diversity provide strong support for balancing selection operating on TLR3 and TLR8, and purifying selection acting on TLR10. An investigation of the persistence and continuity of linkage disequilibrium (r2≥0.50) between adjacent variable sites also supported the presence of selection acting on TLR3 and TLR8. A case-control study employing validated variants from bovine TLR genes recognizing bacterial ligands revealed six SNPs potentially eliciting small effects on susceptibility to Mycobacterium avium spp paratuberculosis infection in dairy cattle. The results of this study will broadly impact domestic cattle research by providing the necessary foundation to explore several avenues of bovine translational genomics, and the potential for marker-assisted vaccination. PMID:22164200
Hu, Peinan; Zhao, Xueying; Zhang, Qinghua; Li, Weiming; Zu, Yao
2018-03-02
The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has been proven to be an efficient and precise genome editing technology in various organisms. However, the gene editing efficiencies of Cas9 proteins with a nuclear localization signal (NLS) fused to different termini and Cas9 mRNA have not been systematically compared. Here, we compared the ability of Cas9 proteins with NLS fused to the N-, C-, or both the N- and C-termini and N-NLS-Cas9-NLS-C mRNA to target two sites in the tyr gene and two sites in the gol gene related to pigmentation in zebrafish. Phenotypic analysis revealed that all types of Cas9 led to hypopigmentation in similar proportions of injected embryos. Genome analysis by T7 Endonuclease I (T7E1) assays demonstrated that all types of Cas9 similarly induced mutagenesis in four target sites. Sequencing results further confirmed that a high frequency of indels occurred in the target sites ( tyr1 > 66%, tyr2 > 73%, gol1 > 50%, and gol2 > 35%), as well as various types (more than six) of indel mutations observed in all four types of Cas9-injected embryos. Furthermore, all types of Cas9 showed efficient targeted mutagenesis on multiplex genome editing, resulting in multiple phenotypes simultaneously. Collectively, we conclude that various NLS-fused Cas9 proteins and Cas9 mRNAs have similar genome editing efficiencies on targeting single or multiple genes, suggesting that the efficiency of CRISPR/Cas9 genome editing is highly dependent on guide RNAs (gRNAs) and gene loci. These findings may help to simplify the selection of Cas9 for gene editing using the CRISPR/Cas9 system. Copyright © 2018 Hu et al.
The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line.
Adey, Andrew; Burton, Joshua N; Kitzman, Jacob O; Hiatt, Joseph B; Lewis, Alexandra P; Martin, Beth K; Qiu, Ruolan; Lee, Choli; Shendure, Jay
2013-08-08
The HeLa cell line was established in 1951 from cervical cancer cells taken from a patient, Henrietta Lacks. This was the first successful attempt to immortalize human-derived cells in vitro. The robust growth and unrestricted distribution of HeLa cells resulted in its broad adoption--both intentionally and through widespread cross-contamination--and for the past 60 years it has served a role analogous to that of a model organism. The cumulative impact of the HeLa cell line on research is demonstrated by its occurrence in more than 74,000 PubMed abstracts (approximately 0.3%). The genomic architecture of HeLa remains largely unexplored beyond its karyotype, partly because like many cancers, its extensive aneuploidy renders such analyses challenging. We carried out haplotype-resolved whole-genome sequencing of the HeLa CCL-2 strain, examined point- and indel-mutation variations, mapped copy-number variations and loss of heterozygosity regions, and phased variants across full chromosome arms. We also investigated variation and copy-number profiles for HeLa S3 and eight additional strains. We find that HeLa is relatively stable in terms of point variation, with few new mutations accumulating after early passaging. Haplotype resolution facilitated reconstruction of an amplified, highly rearranged region of chromosome 8q24.21 at which integration of the human papilloma virus type 18 (HPV-18) genome occurred and that is likely to be the event that initiated tumorigenesis. We combined these maps with RNA-seq and ENCODE Project data sets to phase the HeLa epigenome. This revealed strong, haplotype-specific activation of the proto-oncogene MYC by the integrated HPV-18 genome approximately 500 kilobases upstream, and enabled global analyses of the relationship between gene dosage and expression. These data provide an extensively phased, high-quality reference genome for past and future experiments relying on HeLa, and demonstrate the value of haplotype resolution for characterizing cancer genomes and epigenomes.
JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES.
Lock, Eric F; Hoadley, Katherine A; Marron, J S; Nobel, Andrew B
2013-03-01
Research in several fields now requires the analysis of datasets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types.
Polymorphism of the prion protein gene (PRNP) in two Chinese indigenous cattle breeds.
Qin, L H; Zhao, Y M; Bao, Y H; Bai, W L; Chong, J; Zhang, G L; Zhang, J B; Zhao, Z H
2011-08-01
Prion protein (PRNP) gene has been located at position q17 of chromosome 13 in cattle. The polymorphisms of PRNP gene might be associated with BSE susceptibility. In the present work, we investigated the polymorphisms of PRNP gene, including SNP in exon 3, 23-bp indel in promoter region, 12-bp indel in intron 1 in 2 Chinese indigenous cattle breeds of northeast China. Eighty-six animals from Yanbian (34) and Chinese Red Steppes (52) were genotyped at PRNP locus by analyzing genomic DNA. A total of 4 single nucleotide polymorphism (SNP) sites were revealed in the PRNP gene exon 3 of the 2 cattle breeds investigated. Three of these SNPs were non-synonymous mutations that resulted in the amino acid exchanges (K119N, S154N, and M177V), and one is silent nucleotide substitutions (A234G). The two amino acid mutations of S154N and M177V were detected only in Yanbian with a very low frequency (0.0147), and they appears to be absent in Chinese Red Steppes. The average gene heterozygosity (He), effective allele numbers (Ne), Shannon's information index (I) and polymorphism information content (PIC) were 0.3088, 1.5013, 0.3814 and 0.2000 in Yanbian, respectively, being relatively higher than that of Chinese Red Steppes (0.2885, 1.4985, 0.3462 and 0.1873, respectively). In 23-bp indel and 12-bp indel loci, three different genotypes were identified in both Yanbian and Chinese Red Steppes breeds. Based 23- and 12-bp indels, four haplotypes was constructed in the 2 Chinese cattle breeds, of which the 23-bp (-)/12-bp (-) was main haplotypes accounting for more than 50% of the total in both Yanbian and Chinese Red Steppes breeds. These results might be useful in understanding the genetic characteristics of PRNP gene in Chinese indigenous cattle breeds.
Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred
2014-01-01
Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064
Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred
2014-09-01
Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.
Zhang, Bao; Liu, Chao; Wang, Yaqin; Yao, Xuan; Wang, Fang; Wu, Jiangsheng; King, Graham J; Liu, Kede
2015-06-01
In Brassica napus, yellow petals had a much higher content of carotenoids than white petals present in a small number of lines, with violaxanthin identified as the major carotenoid compound in yellow petals of rapeseed lines. Using positional cloning we identified a carotenoid cleavage dioxygenase 4 gene, BnaC3.CCD4, responsible for the formation of flower colour, with preferential expression in petals of white-flowered B. napus lines. Insertion of a CACTA-like transposable element 1 (TE1) into the coding region of BnaC3.CCD4 had disrupted its expression in yellow-flowered rapeseed lines. α-Ionone was identified as the major volatile apocarotenoid released from white petals but not from yellow petals. We speculate that BnaC3.CCD4 may use δ- and/or α-carotene as substrates. Four variations, including two CACTA-like TEs (alleles M1 and M4) and two insertion/deletions (INDELs, alleles M2 and M3), were identified in yellow-flowered Brassica oleracea lines. The two CACTA-like TEs were also identified in the coding region of BcaC3.CCD4 in Brassica carinata. However, the two INDELs were not detected in B. napus and B. carinata. We demonstrate that the insertions of TEs in BolC3.CCD4 predated the formation of the two allotetraploids. © 2015 The Authors New Phytologist © 2015 New Phytologist Trust.
Carrier, Gregory; Garnier, Matthieu; Le Cunff, Loïc; Bougaran, Gaël; Probert, Ian; De Vargas, Colomban; Corre, Erwan; Cadoret, Jean-Paul; Saint-Jean, Bruno
2014-01-01
The applied exploitation of microalgae cultures has to date almost exclusively involved the use of wild type strains, deposited over decades in dedicated culture collections. Concomitantly, the concept of improving algae with selection programs for particular specific purposes is slowly emerging. Studying since a decade an economically and ecologically important haptophyte Tisochrysis lutea (Tiso), we took advantage of the availability of wild type (Tiso-Wt) and selected (Tiso-S2M2) strains to conduct a molecular variations study. This endeavour presented substantial challenges: the genome assembly was not yet available, the life cycle unknown and genetic diversity of Tiso-Wt poorly documented. This study brings the first molecular data in order to set up a selection strategy for that microalgae. Following high-throughput Illumina sequencing, transcriptomes of Tiso-Wt and Tiso-S2M2 were de novo assembled and annotated. Genetic diversity between both strains was analyzed and revealed a clear conservation, while a comparison of transcriptomes allowed identification of polymorphisms resulting from the selection program. Of 34,374 transcripts, 291 were differentially expressed and 165 contained positional polymorphisms (SNP, Indel). We focused on lipid over-accumulation of the Tiso-S2M2 strain and 8 candidate genes were identified by combining analysis of positional polymorphism, differential expression levels, selection signature and by study of putative gene function. Moreover, genetic analysis also suggests the existence of a sexual cycle and genetic recombination in Tisochrysis lutea. PMID:24489800
Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder
Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W
2017-01-01
We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302
Wang, Chuan; Kokkonen, Heidi; Sandling, Johanna K; Johansson, Martin; Seddighzadeh, Maria; Padyukov, Leonid; Rantapää-Dahlqvist, Solbritt; Syvänen, Ann-Christine
2011-10-01
Two interferon regulatory factor 5 (IRF5) gene variants were examined for association with rheumatoid arthritis (RA). A total of 2300 patients with RA and 1836 controls were recruited from 2 independent RA studies in Sweden. One insertion-deletion polymorphism (CGGGG indel) and one single-nucleotide polymorphism (rs10488631) in the IRF5 gene were genotyped and analyzed within RA subgroups stratified by rheumatoid factor (RF) and anticitrullinated peptide antibodies (ACPA). The CGGGG indel was preferentially associated with the RF-negative (OR 1.29, p = 7.9 × 10(-5)) and ACPA-negative (OR 1.27, p = 7.3 × 10(-5)) RA subgroups compared to the seropositive counterparts. rs10488631 was exclusively associated within the seronegative RA subgroups (RF-negative: OR 1.24, p = 0.016; ACPA-negative: OR 1.27, p = 4.1 × 10(-3)). Both the CGGGG indel and rs10488631 are relevant for RA susceptibility, especially for seronegative RA.
Luo, Zhijing; Chen, Mingjiao; Zhao, Xiangxiang; Zhang, Dabing; Qi, Yiping; Yuan, Zheng
2016-01-01
Rapid and accurate genome-wide marker detection is essential to the marker-assisted breeding and functional genomics studies. In this work, we developed an integrated software, AgroMarker Finder (AMF: http://erp.novelbio.com/AMF), for providing graphical user interface (GUI) to facilitate the recently developed restriction-site associated DNA (RAD) sequencing data analysis in rice. By application of AMF, a total of 90,743 high-quality markers (82,878 SNPs and 7,865 InDels) were detected between rice varieties JP69 and Jiaoyuan5A. The density of the identified markers is 0.2 per Kb for SNP markers, and 0.02 per Kb for InDel markers. Sequencing validation revealed that the accuracy of genome-wide marker detection by AMF is 93%. In addition, a validated subset of 82 SNPs and 31 InDels were found to be closely linked to 117 important agronomic trait genes, providing a basis for subsequent marker-assisted selection (MAS) and variety identification. Furthermore, we selected 12 markers from 31 validated InDel markers to identify seed authenticity of variety Jiaoyuanyou69, and we also identified 10 markers closely linked to the fragrant gene BADH2 to minimize linkage drag for Wuxiang075 (BADH2 donor)/Jiachang1 recombinants selection. Therefore, this software provides an efficient approach for marker identification from RAD-seq data, and it would be a valuable tool for plant MAS and variety protection. PMID:26799713
Fan, Wei; Zong, Jie; Luo, Zhijing; Chen, Mingjiao; Zhao, Xiangxiang; Zhang, Dabing; Qi, Yiping; Yuan, Zheng
2016-01-01
Rapid and accurate genome-wide marker detection is essential to the marker-assisted breeding and functional genomics studies. In this work, we developed an integrated software, AgroMarker Finder (AMF: http://erp.novelbio.com/AMF), for providing graphical user interface (GUI) to facilitate the recently developed restriction-site associated DNA (RAD) sequencing data analysis in rice. By application of AMF, a total of 90,743 high-quality markers (82,878 SNPs and 7,865 InDels) were detected between rice varieties JP69 and Jiaoyuan5A. The density of the identified markers is 0.2 per Kb for SNP markers, and 0.02 per Kb for InDel markers. Sequencing validation revealed that the accuracy of genome-wide marker detection by AMF is 93%. In addition, a validated subset of 82 SNPs and 31 InDels were found to be closely linked to 117 important agronomic trait genes, providing a basis for subsequent marker-assisted selection (MAS) and variety identification. Furthermore, we selected 12 markers from 31 validated InDel markers to identify seed authenticity of variety Jiaoyuanyou69, and we also identified 10 markers closely linked to the fragrant gene BADH2 to minimize linkage drag for Wuxiang075 (BADH2 donor)/Jiachang1 recombinants selection. Therefore, this software provides an efficient approach for marker identification from RAD-seq data, and it would be a valuable tool for plant MAS and variety protection.
Mutt, Eshita; Sowdhamini, Ramanathan
2016-01-01
Insertions/deletions are common evolutionary tools employed to alter the structural and functional repertoire of protein domains. An insert situated proximal to the active site or ligand binding site frequently impacts protein function; however, the effect of distal indels on protein activity and/or stability are often not studied. In this paper, we have investigated a distal insert, which influences the function and stability of a unique DNA polymerase, called terminal deoxynucleotidyl transferase (TdT). TdT (EC:2.7.7.31) is a monomeric 58 kDa protein belonging to family X of eukaryotic DNA polymerases and known for its role in V(D)J recombination as well as in non-homologous end-joining (NHEJ) pathways. Two murine isoforms of TdT, with a length difference of twenty residues and having different biochemical properties, have been studied. All-atom molecular dynamics simulations at different temperatures and interaction network analyses were performed on the short and long-length isoforms. We observed conformational changes in the regions distal to the insert position (thumb subdomain) in the longer isoform, which indirectly affects the activity and stability of the enzyme through a mediating loop (Loop1). A structural rationale could be provided to explain the reduced polymerization rate as well as increased thermosensitivity of the longer isoform caused by peripherally located length variations within a DNA polymerase. These observations increase our understanding of the roles of length variants in introducing functional diversity in protein families in general. PMID:27311013
2011-01-01
Background Stenospermocarpy is a mechanism through which certain genotypes of Vitis vinifera L. such as Sultanina produce berries with seeds reduced in size. Stenospermocarpy has not yet been characterized at the molecular level. Results Genetic and physical maps were integrated with the public genomic sequence of Vitis vinifera L. to improve QTL analysis for seedlessness and berry size in experimental progeny derived from a cross of two seedless genotypes. Major QTLs co-positioning for both traits on chromosome 18 defined a 92-kb confidence interval. Functional information from model species including Vitis suggested that VvAGL11, included in this confidence interval, might be the main positional candidate gene responsible for seed and berry development. Characterization of VvAGL11 at the sequence level in the experimental progeny identified several SNPs and INDELs in both regulatory and coding regions. In association analyses performed over three seasons, these SNPs and INDELs explained up to 78% and 44% of the phenotypic variation in seed and berry weight, respectively. Moreover, genetic experiments indicated that the regulatory region has a larger effect on the phenotype than the coding region. Transcriptional analysis lent additional support to the putative role of VvAGL11's regulatory region, as its expression is abolished in seedless genotypes at key stages of seed development. These results transform VvAGL11 into a functional candidate gene for further analyses based on genetic transformation. For breeding purposes, intragenic markers were tested individually for marker assisted selection, and the best markers were those closest to the transcription start site. Conclusion We propose that VvAGL11 is the major functional candidate gene for seedlessness, and we provide experimental evidence suggesting that the seedless phenotype might be caused by variations in its promoter region. Current knowledge of the function of its orthologous genes, its expression profile in Vitis varieties and the strong association between its sequence variation and the degree of seedlessness together indicate that the D-lineage MADS-box gene VvAGL11 corresponds to the Seed Development Inhibitor locus described earlier as a major locus for seedlessness. These results provide new hypotheses for further investigations of the molecular mechanisms involved in seed and berry development. PMID:21447172
Ji, Jianling; Lee, Hane; Argiropoulos, Bob; Dorrani, Naghmeh; Mann, John; Martinez-Agosto, Julian A; Gomez-Ospina, Natalia; Gallant, Natalie; Bernstein, Jonathan A; Hudgins, Louanne; Slattery, Leah; Isidor, Bertrand; Le Caignec, Cédric; David, Albert; Obersztyn, Ewa; Wiśniowiecka-Kowalnik, Barbara; Fox, Michelle; Deignan, Joshua L; Vilain, Eric; Hendricks, Emily; Horton Harr, Margaret; Noon, Sarah E; Jackson, Jessi R; Wilkens, Alisha; Mirzaa, Ghayda; Salamon, Noriko; Abramson, Jeff; Zackai, Elaine H; Krantz, Ian; Innes, A Micheil; Nelson, Stanley F; Grody, Wayne W; Quintero-Rivera, Fabiola
2015-01-01
Dual-specificity tyrosine-(Y)-phosphorylation-regulated kinase 1 A (DYRK1A ) is a highly conserved gene located in the Down syndrome critical region. It has an important role in early development and regulation of neuronal proliferation. Microdeletions of chromosome 21q22.12q22.3 that include DYRK1A (21q22.13) are rare and only a few pathogenic single-nucleotide variants (SNVs) in the DYRK1A gene have been described, so as of yet, the landscape of DYRK1A disruptions and their associated phenotype has not been fully explored. We have identified 14 individuals with de novo heterozygous variants of DYRK1A; five with microdeletions, three with small insertions or deletions (INDELs) and six with deleterious SNVs. The analysis of our cohort and comparison with published cases reveals that phenotypes are consistent among individuals with the 21q22.12q22.3 microdeletion and those with translocation, SNVs, or INDELs within DYRK1A. All individuals shared congenital microcephaly at birth, intellectual disability, developmental delay, severe speech impairment, short stature, and distinct facial features. The severity of the microcephaly varied from −2 SD to −5 SD. Seizures, structural brain abnormalities, eye defects, ataxia/broad-based gait, intrauterine growth restriction, minor skeletal abnormalities, and feeding difficulties were present in two-thirds of all affected individuals. Our study demonstrates that haploinsufficiency of DYRK1A results in a new recognizable syndrome, which should be considered in individuals with Angelman syndrome-like features and distinct facial features. Our report represents the largest cohort of individuals with DYRK1A disruptions to date, and is the first attempt to define consistent genotype–phenotype correlations among subjects with 21q22.13 microdeletions and DYRK1A SNVs or small INDELs. PMID:25944381
Using Next Generation Sequencing for Multiplexed Trait-Linked Markers in Wheat
Bernardo, Amy; Wang, Shan; St. Amand, Paul; Bai, Guihua
2015-01-01
With the advent of next generation sequencing (NGS) technologies, single nucleotide polymorphisms (SNPs) have become the major type of marker for genotyping in many crops. However, the availability of SNP markers for important traits of bread wheat ( Triticum aestivum L.) that can be effectively used in marker-assisted selection (MAS) is still limited and SNP assays for MAS are usually uniplex. A shift from uniplex to multiplex assays will allow the simultaneous analysis of multiple markers and increase MAS efficiency. We designed 33 locus-specific markers from SNP or indel-based marker sequences that linked to 20 different quantitative trait loci (QTL) or genes of agronomic importance in wheat and analyzed the amplicon sequences using an Ion Torrent Proton Sequencer and a custom allele detection pipeline to determine the genotypes of 24 selected germplasm accessions. Among the 33 markers, 27 were successfully multiplexed and 23 had 100% SNP call rates. Results from analysis of "kompetitive allele-specific PCR" (KASP) and sequence tagged site (STS) markers developed from the same loci fully verified the genotype calls of 23 markers. The NGS-based multiplexed assay developed in this study is suitable for rapid and high-throughput screening of SNPs and some indel-based markers in wheat. PMID:26625271
NASA Astrophysics Data System (ADS)
Dumke, Ines; Klaucke, Ingo; Berndt, Christian; Bialas, Jörg
2014-06-01
Cold seeps on the Hikurangi Margin off New Zealand exhibit various seabed morphologies producing different intensity patterns in sidescan backscatter images. Acoustic backscatter characteristics of 25 investigated seep sites fall into four distinct types characterised by variations in backscatter intensity, distribution and inferred structural heights. The types reflect different carbonate morphologies including up to 20-m-high structures (type 1), low-relief crusts (type 2), scattered blocks (type 3) and carbonate-free sites (type 4). Each seep corresponds to a single type; intermediates were not observed. This correlates well with published data on seep fauna at each site, with the four types representing three different faunal habitats of successive stages of seep development. Backscatter signatures in sidescan sonar images of cold seeps may therefore serve as a convenient proxy for variations in faunal habitats.
Xie, Jing; Lu, Xiongxiong; Wu, Xue; Lin, Xiaoyi; Zhang, Chao; Huang, Xiaofang; Chang, Zhili; Wang, Xinjing; Wen, Chenlei; Tang, Xiaomei; Shi, Minmin; Zhan, Qian; Chen, Hao; Deng, Xiaxing; Peng, Chenghong; Li, Hongwei; Fang, Yuan; Shao, Yang; Shen, Baiyong
2016-05-01
Targeted therapies including monoclonal antibodies and small molecule inhibitors have dramatically changed the treatment of cancer over past 10 years. Their therapeutic advantages are more tumor specific and with less side effects. For precisely tailoring available targeted therapies to each individual or a subset of cancer patients, next-generation sequencing (NGS) has been utilized as a promising diagnosis tool with its advantages of accuracy, sensitivity, and high throughput. We developed and validated a NGS-based cancer genomic diagnosis targeting 115 prognosis and therapeutics relevant genes on multiple specimen including blood, tumor tissue, and body fluid from 10 patients with different cancer types. The sequencing data was then analyzed by the clinical-applicable analytical pipelines developed in house. We have assessed analytical sensitivity, specificity, and accuracy of the NGS-based molecular diagnosis. Also, our developed analytical pipelines were capable of detecting base substitutions, indels, and gene copy number variations (CNVs). For instance, several actionable mutations of EGFR,PIK3CA,TP53, and KRAS have been detected for indicating drug susceptibility and resistance in the cases of lung cancer. Our study has shown that NGS-based molecular diagnosis is more sensitive and comprehensive to detect genomic alterations in cancer, and supports a direct clinical use for guiding targeted therapy.
Kazama, Yusuke; Ishii, Kotaro; Hirano, Tomonari; Wakana, Taeko; Yamada, Mieko; Ohbu, Sumie; Abe, Tomoko
2017-12-01
Heavy-ion irradiation is a powerful mutagen that possesses high linear energy transfer (LET). Several studies have indicated that the value of LET affects DNA lesion formation in several ways, including the efficiency and the density of double-stranded break induction along the particle path. We assumed that the mutation type can be altered by selecting an appropriate LET value. Here, we quantitatively demonstrate differences in the mutation type induced by irradiation with two representative ions, Ar ions (LET: 290 keV μm -1 ) and C ions (LET: 30.0 keV μm -1 ), by whole-genome resequencing of the Arabidopsis mutants produced by these irradiations. Ar ions caused chromosomal rearrangements or large deletions (≥100 bp) more frequently than C ions, with 10.2 and 2.3 per mutant genome under Ar- and C-ion irradiation, respectively. Conversely, C ions induced more single-base substitutions and small indels (<100 bp) than Ar ions, with 28.1 and 56.9 per mutant genome under Ar- and C-ion irradiation, respectively. Moreover, the rearrangements induced by Ar-ion irradiation were more complex than those induced by C-ion irradiation, and tended to accompany single base substitutions or small indels located close by. In conjunction with the detection of causative genes through high-throughput sequencing, selective irradiation by beams with different effects will be a powerful tool for forward genetics as well as studies on chromosomal rearrangements. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Yellapu, Nandakumar; Mahto, Manoj Kumar; Valasani, Koteswara Rao; Sarma, P V G K; Matcha, Bhaskar
2015-01-01
Mutations in the glucokinase (GK) gene play a critical role in the establishment of type 2 diabetes. In our earlier study, R308K mutation in GK in a clinically proven type 2 diabetic patient showed, structural and functional variations that contributed immensely to the hyperglycemic condition. In the extension of this work, a cohort of 30 patients with established type 2 diabetic condition were chosen and the exons 10 and 11 of GK were PCR-amplified and sequenced. The sequence alignment showed A379S, D400Y, E300A, E395A, E395G, H380N, I348N, L301M, M298I, M381G, M402R, R308K, R394P, R397S, and S398R mutations in 12 different patients. The structural analysis of these mutated GKs, showed a variable number of β-α-β units, hairpins, β-bulges, strands, helices, helix-helix interactions, β-turns, and γ-turns along with the RMSD variations when compared to wild-type GK. Molecular modeling studies revealed that the substrate showed variable binding orientations and could not fit into the active site of these mutated structures; moreover, it was expelled out of the conformations. Therefore, these structural variations in GK due to mutations could be one of the strongest reasons for the hyperglycemic levels in these type 2 diabetic patients.
Phylogenetic inference under varying proportions of indel-induced alignment gaps
Dwivedi, Bhakti; Gadagkar, Sudhindra R
2009-01-01
Background The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. Results (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. Conclusion When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy. PMID:19698168
Whole-genome sequencing of Atacama skeleton shows novel mutations linked with dysplasia.
Bhattacharya, Sanchita; Li, Jian; Sockell, Alexandra; Kan, Matthew J; Bava, Felice A; Chen, Shann-Ching; Ávila-Arcos, María C; Ji, Xuhuai; Smith, Emery; Asadi, Narges B; Lachman, Ralph S; Lam, Hugo Y K; Bustamante, Carlos D; Butte, Atul J; Nolan, Garry P
2018-04-01
Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype-6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age-leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6-8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes ( COL1A1 , COL2A1 , KMT2D , FLNB , ATR , TRIP11 , PCNT ) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification. © 2018 Bhattacharya et al.; Published by Cold Spring Harbor Laboratory Press.
Kobayashi, Masaaki; Nagasaki, Hideki; Garcia, Virginie; Just, Daniel; Bres, Cécile; Mauxion, Jean-Philippe; Le Paslier, Marie-Christine; Brunel, Dominique; Suda, Kunihiro; Minakuchi, Yohei; Toyoda, Atsushi; Fujiyama, Asao; Toyoshima, Hiromi; Suzuki, Takayuki; Igarashi, Kaori; Rothan, Christophe; Kaminuma, Eli; Nakamura, Yasukazu; Yano, Kentaro; Aoki, Koh
2014-02-01
Tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. The genome sequencing of the tomato cultivar 'Heinz 1706' was recently completed. To accelerate the progress of tomato genomics studies, systematic bioresources, such as mutagenized lines and full-length cDNA libraries, have been established for the cultivar 'Micro-Tom'. However, these resources cannot be utilized to their full potential without the completion of the genome sequencing of 'Micro-Tom'. We undertook the genome sequencing of 'Micro-Tom' and here report the identification of single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) between 'Micro-Tom' and 'Heinz 1706'. The analysis demonstrated the presence of 1.23 million SNPs and 0.19 million indels between the two cultivars. The density of SNPs and indels was high in chromosomes 2, 5 and 11, but was low in chromosomes 6, 8 and 10. Three known mutations of 'Micro-Tom' were localized on chromosomal regions where the density of SNPs and indels was low, which was consistent with the fact that these mutations were relatively new and introgressed into 'Micro-Tom' during the breeding of this cultivar. We also report SNP analysis for two 'Micro-Tom' varieties that have been maintained independently in Japan and France, both of which have served as standard lines for 'Micro-Tom' mutant collections. Approximately 28,000 SNPs were identified between these two 'Micro-Tom' lines. These results provide high-resolution DNA polymorphic information on 'Micro-Tom' and represent a valuable contribution to the 'Micro-Tom'-based genomics resources.
An 8bp indel in exon 1 of Ghrelin gene associated with chicken growth.
Fang, Meixia; Nie, Qinghua; Luo, Chenglong; Zhang, Dexiang; Zhang, Xiquan
2007-04-01
Ghrelin, acts as the endogenous ligand for growth hormone secretagogues receptor (GHS-R), is a novel growth hormone (GH) releasing peptide with reported effects on food intake in chickens. In this study, an 8 bp indel polymorphism in exon 1 of the chicken Ghrelin (cGHRL) gene was genotyped in a F(2) designed full-sib population to analyze its associations with chicken growth and carcass traits. Later, mRNA level in the proventriculus was determined by real-time PCR to reveal the expression feature of cGHRL gene. Result showed that this 8 bp indel was significantly associated with body weight at the age of 28 days (BW28) and 56 days (BW56), eviscerated weight (EW) and leg muscle weight (LMW) (P<0.05), highly significantly associated with hatch weight (HW), BW14, 21, 35, 42, 49, 90 and body length (BL), dressed weight (DW), eviscerated weight with giblet (EWG), wing weight (WW), breast muscle weight (BMW) and head and neck weight (HNW) (P<0.01). Meanwhile, A allele (with 'CTAACCTG') was positive for chicken growth as individuals with AA genotype had the highest value of all traits. Analysis on cGhrelin mRNA level revealed that it differed significantly among individuals with three genotypes (P<0.05). Individuals with AB genotype had the highest mRNA level, whereas that of AA had the lowest one. It was concluded that this 8 bp indel of cGHRL gene was significantly associated with most body weight and body composition traits, and negative effect of endogenous Ghrelin on chicken growth were indicated by this study.
Chiou, H-Y; Huang, Y-L; Deng, M-C; Chang, C-Y; Jeng, C-R; Tsai, P-S; Yang, C; Pang, V F; Chang, H-W
2017-02-01
New variants of porcine epidemic diarrhoea virus (PEDV), which emerged in Taiwan in late 2013, have caused a high morbidity and mortality in neonatal piglets. To investigate the molecular characteristics of the spike (S) gene of the emerging Taiwan PEDV strains for a better understanding of the genetic diversity and relationship among the Taiwan new variants and the global PEDVs, full-length S genes of PEDVs from nine 1-7 day-old piglets from three pig farms in the central and southern Taiwan were sequenced and analysed. The result of phylogenetic analysis of the S gene showed that all the Taiwan PEDV strains were closely related to the non-S INDEL strains from US, Canada and China, suggesting a common ancestor for these strains. As compared with the historic PEDVs and CV777-based vaccine strains, the nine Taiwan PEDV variants shared almost the same genetic signatures as the global non-S INDEL strains, including a series of insertions, deletions and mutations in the amino terminal as well as identical mutations in the neutralizing epitopes of the S gene. The high similarity of the S protein among the Taiwan and the globally emerged non-S INDEL PEDV strains suggests that the Taiwan new variants may share similar pathogenesis and immunogenicity as the global outbreak variants. The development of a novel vaccine based on the Taiwan or the global non-S INDEL strains may be contributive to the control of the current global porcine epidemic diarrhoea outbreaks. © 2015 Blackwell Verlag GmbH.
Sajjad, Muhammad; Ma, Xiaoling; Habibullah Khan, Sultan; Shoaib, Muhammad; Song, Yanhong; Yang, Wenlong; Zhang, Aimin; Liu, Dongcheng
2017-10-16
The Flo2 gene is a member of a conserved gene family in plants. This gene has been found to be related to thousand grain weight (TGW) in rice. Its orthologs in hexaploid wheat were cloned, and the haplotype variation in TaFlo2-A1 was tested for association with TGW. The cloned sequences of TaFlo2-A1, TaFlo2-B1 and TaFlo2-D1 contained 23, 23 and 24 exons, respectively. The deduced proteins of TaFlo2-A1 (1734 aa), TaFlo2-B1 (1698 aa) and TaFlo2-D1 (1682 aa) were highly similar (>94%) and exhibited >77% similarity with the rice FLO2 protein. Like the rice FLO2 protein, four tetratricopeptide repeat (TPR) motifs were observed in the deduced TaFLO2 protein. An 8-bp InDel (-10 to -17 bp) in the promoter region and five SNPs in first intron of TaFlo2-A1 together formed two haplotypes, TaFlo2-A1a and TaFlo2-A1b, in bread wheat. TaFlo2 was located on homeologous group 2 chromosomes. TaFlo2-A1 was inferred to be located on deletion bin '2AL1-0.85-1.00'. The TaFlo2-A1 haplotypes were characterized in the Chinese Micro Core Collection (MCC) and Pakistani wheat collection using the molecular marker TaFlo2-Indel8. TaFlo2-A1 was found to be associated with TGW but not with grain number per spike (GpS) in both the MCC and Pakistani wheat collections. The frequency of TaFlo2-A1b (positive haplotype) was low in commercial wheat cultivars; thus this haplotype can be selected to improve grain weight without negatively affecting GpS. The expression level of TaFlo2-A1 in developing grains at 5 DAF (days after flowering) was positively correlated with TGW in cultivars carrying the positive haplotype. This study will likely lead to additional investigations to understand the regulatory mechanism of the Flo2 gene in hexaploid wheat. Furthermore, the newly developed molecular marker 'TaFlo2-InDel8' could be incorporated into the kit of wheat breeders for use in marker-assisted selection.
Genome-wide characterization of vibrio phage ϕpp2 with unique arrangements of the mob-like genes
2012-01-01
Background Vibrio parahaemolyticus is associated with gastroenteritis, wound infections, and septicemia in human and animals. Phages can control the population of the pathogen. So far, the only one reported genome among giant vibriophages is KVP40: 244,835 bp with 26% coding regions that have T4 homologs. Putative homing endonucleases (HE) were found in Vibrio phage KVP40 bearing one segD and Vibrio cholerae phage ICP1 carrying one mobC/E and one segG. Results A newly isolated Vibrio phage ϕpp2, which was specific to the hosts of V. parahaemolyticus and V. alginolyticus, featured a long nonenveloped head of ~90 × 150 nm and tail of ~110 nm. The phage can survive at 50°C for more than one hour. The genome of the phage ϕpp2 was sequenced to be 246,421 bp, which is 1587 bp larger than KVP40. 383 protein-encoding genes (PEGs) and 30 tRNAs were found in the phage ϕpp2. Between the genomes of ϕpp2 and KVP40, 254 genes including 29 PEGs for viral structure were of high similarity, whereas 17 PEGs of KVP40 and 21 PEGs of ϕpp2 were unmatched. In both genomes, the capsid and tail genes have been identified, as well as the extensive representation of the DNA replication, recombination, and repair enzymes. In addition to the three giant indels of 1098, 1143 and 3330 nt, ϕpp2 possessed unique proteins involved in potassium channel, gp2 (DNA end protector), tRNA nucleotidyltransferase, and mob-type HEs, which were not reported in KVP40. The ϕpp2 PEG274, with strong promoters and translational initiation, was identified to be a mobE type, flanked by NrdA and NrdB/C homologs. Coincidently, several pairs of HE-flanking homologs with empty center were found in the phages of Vibrio phages ϕpp2 and KVP40, as well as in Aeromonas phages (Aeh1 and Ae65), and cyanophage P-SSM2. Conclusions Vibrio phage ϕpp2 was characterized by morphology, growth, and genomics with three giant indels and different types of HEs. The gene analysis on the required elements for transcription and translation suggested that the ϕpp2 PEG274 was an active mobE gene. The phage was signified to be a new species of T4-related, differing from KVP40. PMID:22676552
Mariman, Edwin C M; Bouwman, Freek G; Aller, Erik E J G; van Baak, Marleen A; Wang, Ping
2015-06-01
The hypothalamus is important for regulation of energy intake. Mutations in genes involved in the function of the hypothalamus can lead to early-onset severe obesity. To look further into this, we have followed a strategy that allowed us to identify rare and common gene variants as candidates for the background of extreme obesity from a relatively small cohort. For that we focused on subjects with a well-selected phenotype and on a defined gene set and used a rich source of genetic data with stringent cut-off values. A list of 166 genes functionally related to the hypothalamus was generated. In those genes complete exome sequence data from 30 extreme obese subjects (60 genomes) were screened for novel rare indel, nonsense, and missense variants with a predicted negative impact on protein function. In addition, (moderately) common variants in those genes were analyzed for allelic association using the general population as reference (false discovery rate<0.05). Six novel rare deleterious missense variants were found in the genes for BAIAP3, NBEA, PRRC2A, RYR1, SIM1, and TRH, and a novel indel variant in LEPR. Common variants in the six genes for MBOAT4, NPC1, NPW, NUCB2, PER1, and PRRC2A showed significant allelic association with extreme obesity. Our findings underscore the complexity of the genetic background of extreme obesity involving rare and common variants of genes from defined metabolic and physiologic processes, in particular regulation of the circadian rhythm of food intake and hypothalamic signaling. Copyright © 2015 the American Physiological Society.
Landscape of Insertion Polymorphisms in the Human Genome
Onozawa, Masahiro; Goldberg, Liat; Aplan, Peter D.
2015-01-01
Nucleotide substitutions, small (<50 bp) insertions or deletions (indels), and large (>50 bp) deletions are well-known causes of genetic variation within the human genome. We recently reported a previously unrecognized form of polymorphic insertions, termed templated sequence insertion polymorphism (TSIP), in which the inserted sequence was templated from a distant genomic region, and was inserted in the genome through reverse transcription of an RNA intermediate. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; class 1 TSIPs show target site duplication, polyadenylation, and preference for insertion at a 5′-TTTT/A-3′ sequence, suggesting a LINE-1 based insertion mechanism, whereas class 2 TSIPs show features consistent with repair of a DNA double strand break by nonhomologous end joining. To gain a more complete picture of TSIPs throughout the human population, we evaluated whole-genome sequence from 52 individuals, and identified 171 TSIPs. Most individuals had 25–30 TSIPs, and common (present in >20% of individuals) TSIPs were found in individuals throughout the world, whereas rare TSIPs tended to cluster in specific geographic regions. The number of rare TSIPs was greater than the number of common TSIPs, suggesting that TSIP generation is an ongoing process. Intriguingly, mitochondrial sequences were a frequent template for class 2 insertions, used more commonly than any nuclear chromosome. Similar to single nucleotide polymorphisms and indels, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases, and can be useful in tracking historical migration of populations. PMID:25745018
Merkle, Florian T; Neuhausser, Werner M; Santos, David; Valen, Eivind; Gagnon, James A; Maas, Kristi; Sandoe, Jackson; Schier, Alexander F; Eggan, Kevin
2015-05-12
The CRISPR-Cas9 system has the potential to revolutionize genome editing in human pluripotent stem cells (hPSCs), but its advantages and pitfalls are still poorly understood. We systematically tested the ability of CRISPR-Cas9 to mediate reporter gene knockin at 16 distinct genomic sites in hPSCs. We observed efficient gene targeting but found that targeted clones carried an unexpectedly high frequency of insertion and deletion (indel) mutations at both alleles of the targeted gene. These indels were induced by Cas9 nuclease, as well as Cas9-D10A single or dual nickases, and often disrupted gene function. To overcome this problem, we designed strategies to physically destroy or separate CRISPR target sites at the targeted allele and developed a bioinformatic pipeline to identify and eliminate clones harboring deleterious indels at the other allele. This two-pronged approach enables the reliable generation of knockin hPSC reporter cell lines free of unwanted mutations at the targeted locus. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Subtelomeric Rearrangements and Copy Number Variations in People with Intellectual Disabilities
ERIC Educational Resources Information Center
Christofolini, D. M.; De Paula Ramos, M. A.; Kulikowski, L. D.; Da Silva Bellucco, F. T.; Belangero, S. I. N.; Brunoni, D.; Melaragno, M. I.
2010-01-01
Background: The most prevalent type of structural variation in the human genome is represented by copy number variations that can affect transcription levels, sequence, structure and function of genes. Method: In the present study, we used the multiplex ligation-dependent probe amplification (MLPA) technique and quantitative PCR for the detection…
Progress in Understanding and Sequencing the Genome of Brassica rapa
Hong, Chang Pyo; Kwon, Soo-Jin; Kim, Jung Sun; Yang, Tae-Jin; Park, Beom-Seok; Lim, Yong Pyo
2008-01-01
Brassica rapa, which is closely related to Arabidopsis thaliana, is an important crop and a model plant for studying genome evolution via polyploidization. We report the current understanding of the genome structure of B. rapa and efforts for the whole-genome sequencing of the species. The tribe Brassicaceae, which comprises ca. 240 species, descended from a common hexaploid ancestor with a basic genome similar to that of Arabidopsis. Chromosome rearrangements, including fusions and/or fissions, resulted in the present-day “diploid” Brassica species with variation in chromosome number and phenotype. Triplicated genomic segments of B. rapa are collinear to those of A. thaliana with InDels. The genome triplication has led to an approximately 1.7-fold increase in the B. rapa gene number compared to that of A. thaliana. Repetitive DNA of B. rapa has also been extensively amplified and has diverged from that of A. thaliana. For its whole-genome sequencing, the Brassica rapa Genome Sequencing Project (BrGSP) consortium has developed suitable genomic resources and constructed genetic and physical maps. Ten chromosomes of B. rapa are being allocated to BrGSP consortium participants, and each chromosome will be sequenced by a BAC-by-BAC approach. Genome sequencing of B. rapa will offer a new perspective for plant biology and evolution in the context of polyploidization. PMID:18288250
Comparative analysis of Dendrobium plastomes and utility of plastomic mutational hotspots.
Zhitao, Niu; Shuying, Zhu; Jiajia, Pan; Ludan, Li; Jing, Sun; Xiaoyu, Ding
2017-05-18
Dendrobium is one of the largest genera in Orchidaceae, comprising about 800-1500 species mainly distributed in tropical Asia, Australasia, and Australia. There are 74 species and two varieties of this genus in China. Because of their ornamental and commercial value, Dendrobium orchids have been studied at low taxonomic levels. However, structural changes and effective mutational hotspots of Dendrobium plastomes have rarely been documented. Here, 30 Dendrobium plastomes were compared, comprising 25 newly sequenced in this study and five previously published. Except for their differences in NDH genes, these plastomes shared identical gene content and order. Comparative analyses revealed that the variation in size of Dendroubium plastomes was associated with dramatically changed length of InDels. Furthermore, ten loci were identified as the top-ten mutational hotspots, whose sequence variability was almost unchanged with more than 10 plastomes sampled, suggesting that they may be powerful markers for Dendrobium species. In addition, primer pairs of 47 polymorphic microsatellites were developed. After assessing the mean BS values of all combinations derived from the top-ten hotspots, we recommend that the combination of five hotspots-trnT-trnL, rpl32-trnL, clpP-psbB, trnL intron, and rps16-trnQ-should be used in the phylogenetic and identification studies of Dendrobium.
Variation and Mathematics Pedagogy
ERIC Educational Resources Information Center
Leung, Allen
2012-01-01
This discussion paper put forwards variation as a theme to structure mathematical experience and mathematics pedagogy. Patterns of variation from Marton's Theory of Variation are understood and developed as types of variation interaction that enhance mathematical understanding. An idea of a discernment unit comprising mutually supporting variation…
Mutational Dynamics of Aroid Chloroplast Genomes
Ahmed, Ibrar; Biggs, Patrick J.; Matthews, Peter J.; Collins, Lesley J.; Hendy, Michael D.; Lockhart, Peter J.
2012-01-01
A characteristic feature of eukaryote and prokaryote genomes is the co-occurrence of nucleotide substitution and insertion/deletion (indel) mutations. Although similar observations have also been made for chloroplast DNA, genome-wide associations have not been reported. We determined the chloroplast genome sequences for two morphotypes of taro (Colocasia esculenta; family Araceae) and compared these with four publicly available aroid chloroplast genomes. Here, we report the extent of genome-wide association between direct and inverted repeats, indels, and substitutions in these aroid chloroplast genomes. We suggest that alternative but not mutually exclusive hypotheses explain the mutational dynamics of chloroplast genome evolution. PMID:23204304
Merschel, Andrew; Heyerdahl, Emily K.; Spies, Thomas A; Loehman, Rachel A.
2018-01-01
Context In the interior Northwest, debate over restoring mixed-conifer forests after a century of fire exclusion is hampered by poor understanding of the pattern and causes of spatial variation in historical fire regimes. Objectives To identify the roles of topography, landscape structure, and forest type in driving spatial variation in historical fire regimes in mixed-conifer forests of central Oregon. Methods We used tree rings to reconstruct multicentury fire and forest histories at 105 plots over 10,393 ha. We classified fire regimes into four types and assessed whether they varied with topography, the location of fuel-limited pumice basins that inhibit fire spread, and an updated classification of forest type. Results We identified four fire-regime types and six forest types. Although surface fires were frequent and often extensive, severe fires were rare in all four types. Fire regimes varied with some aspects of topography (elevation), but not others (slope or aspect) and with the distribution of pumice basins. Fire regimes did not strictly co-vary with mixed-conifer forest types. Conclusions Our work reveals the persistent influence of landscape structure on spatial variation in historical fire regimes and can help inform discussions about appropriate restoration of fire-excluded forests in the interior Northwest. Where the goal is to restore historical fire regimes at landscape scales, managers may want to consider the influence of topoedaphic and vegetation patch types that could affect fire spread and ignition frequency.
Cho, Kwang-Soo; Cheon, Kyeong-Sik; Hong, Su-Young; Cho, Ji-Hong; Im, Ju-Seong; Mekapogu, Manjulatha; Yu, Yei-Soo; Park, Tae-Ho
2016-10-01
Chloroplast genome of Solanum commersonii and S olanum tuberosum were completely sequenced, and Indel markers were successfully applied to distinguish chlorotypes demonstrating the chloroplast genome was randomly distributed during protoplast fusion. Somatic hybridization has been widely employed for the introgression of resistance to several diseases from wild Solanum species to overcome sexual barriers in potato breeding. Solanum commersonii is a major resource used as a parent line in somatic hybridization to improve bacterial wilt resistance in interspecies transfer to cultivated potato (S. tuberosum). Here, we sequenced the complete chloroplast genomes of Lz3.2 (S. commersonii) and S. tuberosum (PT56), which were used to develop fusion products, then compared them with those of five members of the Solanaceae family, S. tuberosum, Capsicum annum, S. lycopersicum, S. bulbocastanum and S. nigrum and Coffea arabica as an out-group. We then developed Indel markers for application in chloroplast genotyping. The complete chloroplast genome of Lz3.2 is composed of 155,525 bp, which is larger than the PT56 genome with 155,296 bp. Gene content, order and orientation of the S. commersonii chloroplast genome were highly conserved with those of other Solanaceae species, and the phylogenetic tree revealed that S. commersonii is located within the same node of S. tuberosum. However, sequence alignment revealed nine Indels between S. commersonii and S. tuberosum in their chloroplast genomes, allowing two Indel markers to be developed. The markers could distinguish the two species and were successfully applied to chloroplast genotyping (chlorotype) in somatic hybrids and their progenies. The results obtained in this study confirmed the random distribution of the chloroplast genome during protoplast fusion and its maternal inheritance and can be applied to select proper plastid genotypes in potato breeding program.
Association of COL1A1 polymorphisms with osteoporosis: a meta-analysis of clinical studies
Xie, Peigen; Liu, Bin; Zhang, Liangming; Chen, Ruiqiang; Yang, Bu; Dong, Jianwen; Rong, Limin
2015-01-01
Objective: To conduct a meta-analysis of all association studies on two of the collagen 1 alpha 1 (COL1A1) gene polymorphisms, the -1997G/T (rs1107946) and the -1663indelT (rs2412298) polymorphisms and osteoporosis/BMD and fracture. Methods: PubMed/Medline and Web of Knowledge were searched for relevant association studies published in English. Pooled OR and its corresponding 95% CI or pooled MD and its corresponding 95% CI was calculated with the Cochrane Review Manager (Revman, version 5.2) using a random-effect or a fixed effect model. Results: No significant association between the -1997G/T polymorphism and Lumbar Spine (LS) and Femoral Neck (FN) BMD except for the Caucasian subpopulation wherein subjects with the T allele of the -1997G/T polymorphism was associated with significantly higher LS BMD. Our analysis did reveal that women, especially postmenopausal or perimenopausal women with the GG genotype, had significantly higher Total Hip (TH) BMD than those with the GT. Additionally, our meta-analysis did not show significant association between the -1997G/T polymorphism and risk of fracture, between the -1663indelT polymorphism and LS BMD in postmenopausal or perimenopausal women, or between the -1663indelT polymorphism and the risk of fracture. Conclusions: Our results suggested the possibility of the COL1A1 -1997G/T and the -1663indelT polymorphisms individually playing very little role in osteoporosis and fracture, although more studies are needed especially for the analysis of association between these two polymorphisms and fracture. Haplotype studies may become one important future direction of study to further elucidate whether and how various COL1A1 polymorphisms affect bone health, osteoporosis and fracture. PMID:26628959
Smith, Graham D.; Robinson, Caroline; Stewart, Andrew P.; Edwards, Emily L.; Karet, Hannah I.; Norden, Anthony G. W.; Sandford, Richard N.
2011-01-01
Summary Background and objectives In a single-center renal clinic, we have established routine mutation testing to diagnose UMOD-associated kidney disease (UAKD), an autosomal dominant disorder typically characterized by gout, hyperuricemia, and renal failure in the third to sixth decades. Design, setting, participants, & measurements Four probands and their multigeneration kindreds were assessed by clinical, historical, and biochemical means. Diagnostic UMOD sequencing was performed, and mutant uromodulin was characterized in vitro. Results All available affected members of the four kindreds harbored the same complex indel change in UMOD, which was associated with almost complete absence of gout and a later onset of CKD; the youngest age at ESRD or death was 38 years (range, 38 to 68 years) compared with 3 to 70 years in other reports. Three mutation carriers (all ≤35 years) are currently asymptomatic. The indel sequence (c.278_289del TCTGCCCCGAAGinsCCGCCTCCT; p.V93_G97del/ins AASC) results in the replacement of five amino acids, including one cysteine, by four novel residues, also including a cysteine. Uromodulin staining of the only available patient biopsy suggested disorganized intracellular trafficking with cellular accumulation. Functional characterization of the mutant isoform revealed retarded intracellular trafficking associated with endoplasmic reticulum (ER) retention and reduced secretion into cell culture media, but to a lesser extent than we observed with the previously reported C150S mutation. Conclusions The indel mutation is associated with a relatively mild clinical UAKD phenotype, consistent with our in vitro analysis. UAKD should be routinely considered as a causative gene for ESRD of unknown cause, especially where there is an associated family history or where biopsy reveals interstitial fibrosis. PMID:22034507
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project
Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John
2008-01-01
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213
Kristjansdottir, G; Sandling, J K; Bonetti, A; Roos, I M; Milani, L; Wang, C; Gustafsdottir, S M; Sigurdsson, S; Lundmark, A; Tienari, P J; Koivisto, K; Elovaara, I; Pirttilä, T; Reunanen, M; Peltonen, L; Saarela, J; Hillert, J; Olsson, T; Landegren, U; Alcina, A; Fernández, O; Leyva, L; Guerrero, M; Lucas, M; Izquierdo, G; Matesanz, F; Syvänen, A-C
2008-01-01
Background: IRF5 is a transcription factor involved both in the type I interferon and the toll-like receptor signalling pathways. Previously, IRF5 has been found to be associated with systemic lupus erythematosus, rheumatoid arthritis and inflammatory bowel diseases. Here we investigated whether polymorphisms in the IRF5 gene would be associated with yet another disease with features of autoimmunity, multiple sclerosis (MS). Methods: We genotyped nine single nucleotide polymorphisms and one insertion-deletion polymorphism in the IRF5 gene in a collection of 2337 patients with MS and 2813 controls from three populations: two case–control cohorts from Spain and Sweden, and a set of MS trio families from Finland. Results: Two single nucleotide polymorphism (SNPs) (rs4728142, rs3807306), and a 5 bp insertion-deletion polymorphism located in the promoter and first intron of the IRF5 gene, showed association signals with values of p<0.001 when the data from all cohorts were combined. The predisposing alleles were present on the same common haplotype in all populations. Using electrophoretic mobility shift assays we observed allele specific differences in protein binding for the SNP rs4728142 and the 5 bp indel, and by a proximity ligation assay we demonstrated increased binding of the transcription factor SP1 to the risk allele of the 5 bp indel. Conclusion: These findings add IRF5 to the short list of genes shown to be associated with MS in more than one population. Our study adds to the evidence that there might be genes or pathways that are common in multiple autoimmune diseases, and that the type I interferon system is likely to be involved in the development of these diseases. PMID:18285424
Guo, Chunli; Yang, Xuqin; Wang, Yunli; Nie, Jingtao; Yang, Yi; Sun, Jingxian; Du, Hui; Zhu, Wenying; Pan, Jian; Chen, Yue; Lv, Duo; He, Huanle; Lian, Hongli; Pan, Junsong; Cai, Run
2018-01-01
Using map-based cloning of ts gene, we identified a new sort of gene involved in the initiation of multicellular tender spine in cucumber. The cucumber (Cucumis sativus L.) fruit contains spines on the surface, which is an extremely valuable quality trait affecting the selection of customers. In this study, we elaborated cucumber line NC072 with wild type (WT) hard fruit spines and its spontaneous mutant NC073, possessing tender and soft spines on fruits. The mutant trait was named as tender spines (ts), which is controlled by a single recessive nuclear gene. We identified the gene ts by map-based cloning with an F 2 segregating population of 721 individuals generated from NC073 and WT line SA419-2. It was located between two markers Indel6239679 and Indel6349344, 109.7 kb physical distance on chromosome 1 containing fifteen putative genes. With sequencing and quantitative reverse transcription-polymerase chain reaction analysis, the Csa1G056960 gene was considered as the most possible candidate gene of ts. In the mutant, Csa1G056960 has a nucleotide change in the 5' splicing site of the second intron, which causes different splicing to delete the second exon, resulting in a N-terminal deletion in the predicted amino acid sequence. The gene encodes a C-type lectin receptor-like tyrosine-protein kinase which would play an important role in the formation of cucumber fruit. This is firstly reported of a receptor kinase gene regulating the development of multicellular spines/trichomes in plants. The ts allele could accelerate the molecular breeding of cucumber soft spines.
Optimization of sequence alignment for simple sequence repeat regions.
Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C
2011-07-20
Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.
Tóth, Annamária; Hausknecht, Anton; Krisai-Greilhuber, Irmgard; Papp, Tamás; Vágvölgyi, Csaba; Nagy, László G.
2013-01-01
Reconciling traditional classifications, morphology, and the phylogenetic relationships of brown-spored agaric mushrooms has proven difficult in many groups, due to extensive convergence in morphological features. Here, we address the monophyly of the Bolbitiaceae, a family with over 700 described species and examine the higher-level relationships within the family using a newly constructed multilocus dataset (ITS, nrLSU rDNA and EF1-alpha). We tested whether the fast-evolving Internal Transcribed Spacer (ITS) sequences can be accurately aligned across the family, by comparing the outcome of two iterative alignment refining approaches (an automated and a manual) and various indel-treatment strategies. We used PRANK to align sequences in both cases. Our results suggest that – although PRANK successfully evades overmatching of gapped sites, referred previously to as alignment overmatching – it infers an unrealistically high number of indel events with natively generated guide-trees. This 'alignment undermatching' could be avoided by using more rigorous (e.g. ML) guide trees. The trees inferred in this study support the monophyly of the core Bolbitiaceae, with the exclusion of Panaeolus, Agrocybe, and some of the genera formerly placed in the family. Bolbitius and Conocybe were found monophyletic, however, Pholiotina and Galerella require redefinition. The phylogeny revealed that stipe coverage type is a poor predictor of phylogenetic relationships, indicating the need for a revision of the intrageneric relationships within Conocybe. PMID:23418526
CRISPR/Cas9 Inhibits Multiple Steps of HIV-1 Infection.
Yin, Lijuan; Hu, Siqi; Mei, Shan; Sun, Hong; Xu, Fengwen; Li, Jian; Zhu, Weijun; Liu, Xiaoman; Zhao, Fei; Zhang, Di; Cen, Shan; Liang, Chen; Guo, Fei
2018-05-09
CRISPR/Cas9 is an adaptive immune system where bacteria and archaea have evolved to resist the invading viruses and plasmid DNA by creating site-specific double-strand breaks in DNA. This study tested this gene editing system in inhibiting human immunodeficiency virus type 1 (HIV-1) infection by targeting the viral long terminal repeat and the gene coding sequences. Strong inhibition of HIV-1 infection by Cas9/gRNA was observed, which resulted not only from insertions and deletions (indels) that were introduced into viral DNA due to Cas9 cleavage, but also from the marked decrease in the levels of the late viral DNA products and the integrated viral DNA. This latter defect might have reflected the degradation of viral DNA that has not been immediately repaired after Cas9 cleavage. It was further observed that Cas9, when solely located in the cytoplasm, inhibits HIV-1 as strongly as the nuclear Cas9, except that the cytoplasmic Cas9 does not act on the integrated HIV-1 DNA and thus cannot be used to excise the latent provirus. Together, the results suggest that Cas9/gRNA is able to target and edit HIV-1 DNA both in the cytoplasm and in the nucleus. The inhibitory effect of Cas9 on HIV-1 is attributed to both the indels in viral DNA and the reduction in the levels of viral DNA.
Targeted genome editing in a quail cell line using a customized CRISPR/Cas9 system.
Ahn, Jinsoo; Lee, Joonbum; Park, Ju Yeon; Oh, Keon Bong; Hwang, Seongsoo; Lee, Chang-Won; Lee, Kichoon
2017-05-01
Soon after RNA-guided Cas9 (CRISPR-associated protein 9) endonuclease opened a new era of targeted genome editing, the CRISPR/Cas9 platform began to be extensively used to modify genes in various types of cells and organisms. However, successful CRISPR/Cas9-mediated insertion/deletion (indel) mutation remains to be demonstrated in avian cell lines. The objective of this study was to design a poultry-specific CRISPR/Cas9 system to efficiently introduce targeted deletion mutation in chromosomes of the quail muscle clone 7 (QM7) cell line using a customized quail CRISPR vector. In this study, two avian-specific promoters, quail 7SK (q7SK) promoter and CBh promoter, the hybrid form of cytomegalovirus and chicken β-actin promoters, were cloned into a CRISPR vector for the expression of guide RNA and Cas9 protein, respectively. Then, guide RNA, which was designed to target 20-base pair (bp) nucleotides in the quail melanophilin (MLPH) locus, was ligated to the modified CRISPR vector and transfected to QM7 cells. Our results showed multiple indel mutations in the quail MLPH locus in nearly half of the alleles being tested, suggesting the high efficiency of the system for targeted gene modification. The new CRISPR vector developed from this study has the potential application to generate knockout avian cell lines and knockout poultry. © 2016 Poultry Science Association Inc.
Observation of Children's Teeth as a Diagnostic Aid
Gibson, Wm. M.; Conchie, John M.
1964-01-01
Current interest in tetracycline staining of teeth and other enamel defects led to this review. In the handicapped child structural defects that were seen in the dental enamel may provide a most accurate etiological clue. The method of determining the time of insult is described. Comments are made on seven states in which enamel dysplasia may be frequently observed. A simple means of identifying tetracycline pigment incorporated in dental enamel is outlined. Bilirubin staining of teeth is also shown and warnings are given about the indelible nature of these pigments. ImagesFig. 2Fig. 3Fig. 4 PMID:14118684
2014-01-01
Background Modern watermelon (Citrullus lanatus L.) cultivars share a narrow genetic base due to many years of selection for desirable horticultural qualities. Wild subspecies within C. lanatus are important potential sources of novel alleles for watermelon breeding, but successful trait introgression into elite cultivars has had limited success. The application of marker assisted selection (MAS) in watermelon is yet to be realized, mainly due to the past lack of high quality genetic maps. Recently, a number of useful maps have become available, however these maps have few common markers, and were constructed using different marker sets, thus, making integration and comparative analysis among maps difficult. The objective of this research was to use single-nucleotide polymorphism (SNP) anchor markers to construct an integrated genetic map for C. lanatus. Results Under the framework of the high density genetic map, an integrated genetic map was constructed by merging data from four independent mapping experiments using a genetically diverse array of parental lines, which included three subspecies of watermelon. The 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel), 36 structure variation (SV) and 386 SNP markers from the four maps were used to construct an integrated map. This integrated map contained 1339 markers, spanning 798 cM with an average marker interval of 0.6 cM. Fifty-eight previously reported quantitative trait loci (QTL) for 12 traits in these populations were also integrated into the map. In addition, new QTL identified for brix, fructose, glucose and sucrose were added. Some QTL associated with economically important traits detected in different genetic backgrounds mapped to similar genomic regions of the integrated map, suggesting that such QTL are responsible for the phenotypic variability observed in a broad array of watermelon germplasm. Conclusions The integrated map described herein enhances the utility of genomic tools over previous watermelon genetic maps. A large proportion of the markers in the integrated map are SSRs, InDels and SNPs, which are easily transferable across laboratories. Moreover, the populations used to construct the integrated map include all three watermelon subspecies, making this integrated map useful for the selection of breeding traits, identification of QTL, MAS, analysis of germplasm and commercial hybrid seed detection. PMID:24443961
Evolutionary inference via the Poisson Indel Process.
Bouchard-Côté, Alexandre; Jordan, Michael I
2013-01-22
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
Evolutionary inference via the Poisson Indel Process
Bouchard-Côté, Alexandre; Jordan, Michael I.
2013-01-01
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114–124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments. PMID:23275296
Yuri, Tamaki; Kimball, Rebecca T.; Harshman, John; Bowie, Rauri C. K.; Braun, Michael J.; Chojnowski, Jena L.; Han, Kin-Lan; Hackett, Shannon J.; Huddleston, Christopher J.; Moore, William S.; Reddy, Sushma; Sheldon, Frederick H.; Steadman, David W.; Witt, Christopher C.; Braun, Edward L.
2013-01-01
Insertion/deletion (indel) mutations, which are represented by gaps in multiple sequence alignments, have been used to examine phylogenetic hypotheses for some time. However, most analyses combine gap data with the nucleotide sequences in which they are embedded, probably because most phylogenetic datasets include few gap characters. Here, we report analyses of 12,030 gap characters from an alignment of avian nuclear genes using maximum parsimony (MP) and a simple maximum likelihood (ML) framework. Both trees were similar, and they exhibited almost all of the strongly supported relationships in the nucleotide tree, although neither gap tree supported many relationships that have proven difficult to recover in previous studies. Moreover, independent lines of evidence typically corroborated the nucleotide topology instead of the gap topology when they disagreed, although the number of conflicting nodes with high bootstrap support was limited. Filtering to remove short indels did not substantially reduce homoplasy or reduce conflict. Combined analyses of nucleotides and gaps resulted in the nucleotide topology, but with increased support, suggesting that gap data may prove most useful when analyzed in combination with nucleotide substitutions. PMID:24832669
Bavarva, Jasmin H.; Tae, Hongseok; McIver, Lauren; Garner, Harold R.
2014-01-01
Although the connection between cancer and cigarette smoke is well established, nicotine is not characterized as a carcinogen. Here, we used exome sequencing to identify nicotine and oxidative stress-induced somatic mutations in normal human epithelial cells and its correlation with cancer. We identified over 6,400 SNVs, indels and microsatellites in each of the stress exposed cells relative to the control, of which, 2,159 were consistently observed at all nicotine doses. These included 429 nsSNVs including 158 novel and 79 cancer-associated. Over 80% of consistently nicotine induced variants overlap with variations detected in oxidative stressed cells, indicating that nicotine induced genomic alterations could be mediated through oxidative stress. Nicotine induced mutations were distributed across 1,585 genes, of which 49% were associated with cancer. MUC family genes were among the top mutated genes. Analysis of 591 lung carcinoma tumor exomes from The Cancer Genome Atlas (TCGA) revealed that 20% of non-small-cell lung cancer tumors in smokers have mutations in at least one of the MUC4, MUC6 or MUC12 genes in contrast to only 6% in non-smokers. These results indicate that nicotine induces genomic variations, promotes instability potentially mediated by oxidative stress, implicating nicotine in carcinogenesis, and establishes MUC genes as potential targets. PMID:24947164
Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A
2017-05-23
In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.
Senra, Marcus V X; Sung, Way; Ackerman, Matthew; Miller, Samuel F; Lynch, Michael; Soares, Carlos Augusto G
2018-03-01
Mutations contribute to genetic variation in all living systems. Thus, precise estimates of mutation rates and spectra across a diversity of organisms are required for a full comprehension of evolution. Here, a mutation-accumulation (MA) assay was carried out on the endosymbiotic bacterium Teredinibacter turnerae. After ∼3,025 generations, base-pair substitutions (BPSs) and insertion-deletion (indel) events were characterized by whole-genome sequencing analysis of 47 independent MA lines, yielding a BPS rate of 1.14 × 10-9 per site per generation and indel rate of 1.55 × 10-10 events per site per generation, which are among the highest within free-living and facultative intracellular bacteria. As in other endosymbionts, a significant bias of BPSs toward A/T and an excess of deletion mutations over insertion mutations are observed for these MA lines. However, even with a deletion bias, the genome remains relatively large (∼5.2 Mb) for an endosymbiotic bacterium. The estimate of the effective population size (Ne) in T. turnerae is quite high and comparable to free-living bacteria (∼4.5 × 107), suggesting that the heavy bottlenecking associated with many endosymbiotic relationships is not prevalent during the life of this endosymbiont. The efficiency of selection scales with increasing Ne and such strong selection may have been operating against the deletion bias, preventing genome erosion. The observed mutation rate in this endosymbiont is of the same order of magnitude of those with similar Ne, consistent with the idea that population size is a primary determinant of mutation-rate evolution within endosymbionts, and that not all endosymbionts have low Ne.
Gong, Wen-Bing; Li, Lei; Zhou, Yan; Bian, Yin-Bing; Kwan, Hoi-Shan; Cheung, Man-Kit; Xiao, Yang
2016-06-01
To provide a better understanding of the genetic architecture of fruiting body formation of Lentinula edodes, quantitative trait loci (QTLs) mapping was employed to uncover the loci underlying seven fruiting body-related traits (FBRTs). An improved L. edodes genetic linkage map, comprising 572 markers on 12 linkage groups with a total map length of 983.7 cM, was constructed by integrating 82 genomic sequence-based insertion-deletion (InDel) markers into a previously published map. We then detected a total of 62 QTLs for seven target traits across two segregating testcross populations, with individual QTLs contributing 5.5 %-30.2 % of the phenotypic variation. Fifty-three out of the 62 QTLs were clustered in six QTL hotspots, suggesting the existence of main genomic regions regulating the morphological characteristics of fruiting bodies in L. edodes. A stable QTL hotspot on MLG2, containing QTLs for all investigated traits, was identified in both testcross populations. QTLs for related traits were frequently co-located on the linkage groups, demonstrating the genetic basis for phenotypic correlation of traits. Meta-QTL (mQTL) analysis was performed and identified 16 mQTLs with refined positions and narrow confidence intervals (CIs). Nine genes, including those encoding MAP kinase, blue-light photoreceptor, riboflavin-aldehyde-forming enzyme and cyclopropane-fatty-acyl-phospholipid synthase, and cytochrome P450s, were likely to be candidate genes controlling the shape of fruiting bodies. The study has improved our understanding of the genetic architecture of fruiting body formation in L. edodes. To our knowledge, this is the first genome-wide QTL detection of FBRTs in L. edodes. The improved genetic map, InDel markers and QTL hotspot regions revealed here will assist considerably in the conduct of future genetic and breeding studies of L. edodes.
McGowen, Michael R; Clark, Clay; Gatesy, John
2008-08-01
The macroevolutionary transition of whales (cetaceans) from a terrestrial quadruped to an obligate aquatic form involved major changes in sensory abilities. Compared to terrestrial mammals, the olfactory system of baleen whales is dramatically reduced, and in toothed whales is completely absent. We sampled the olfactory receptor (OR) subgenomes of eight cetacean species from four families. A multigene tree of 115 newly characterized OR sequences from these eight species and published data for Bos taurus revealed a diverse array of class II OR paralogues in Cetacea. Evolution of the OR gene superfamily in toothed whales (Odontoceti) featured a multitude of independent pseudogenization events, supporting anatomical evidence that odontocetes have lost their olfactory sense. We explored the phylogenetic utility of OR pseudogenes in Cetacea, concentrating on delphinids (oceanic dolphins), the product of a rapid evolutionary radiation that has been difficult to resolve in previous studies of mitochondrial DNA sequences. Phylogenetic analyses of OR pseudogenes using both gene-tree reconciliation and supermatrix methods yielded fully resolved, consistently supported relationships among members of four delphinid subfamilies. Alternative minimizations of gene duplications, gene duplications plus gene losses, deep coalescence events, and nucleotide substitutions plus indels returned highly congruent phylogenetic hypotheses. Novel DNA sequence data for six single-copy nuclear loci and three mitochondrial genes (> 5000 aligned nucleotides) provided an independent test of the OR trees. Nucleotide substitutions and indels in OR pseudogenes showed a very low degree of homoplasy in comparison to mitochondrial DNA and, on average, provided more variation than single-copy nuclear DNA. Our results suggest that phylogenetic analysis of the large OR superfamily will be effective for resolving relationships within Cetacea whether supermatrix or gene-tree reconciliation procedures are used.
Genomic profiling of plastid DNA variation in the Mediterranean olive tree
2011-01-01
Background Characterisation of plastid genome (or cpDNA) polymorphisms is commonly used for phylogeographic, population genetic and forensic analyses in plants, but detecting cpDNA variation is sometimes challenging, limiting the applications of such an approach. In the present study, we screened cpDNA polymorphism in the olive tree (Olea europaea L.) by sequencing the complete plastid genome of trees with a distinct cpDNA lineage. Our objective was to develop new markers for a rapid genomic profiling (by Multiplex PCRs) of cpDNA haplotypes in the Mediterranean olive tree. Results Eight complete cpDNA genomes of Olea were sequenced de novo. The nucleotide divergence between olive cpDNA lineages was low and not exceeding 0.07%. Based on these sequences, markers were developed for studying two single nucleotide substitutions and length polymorphism of 62 regions (with variable microsatellite motifs or other indels). They were then used to genotype the cpDNA variation in cultivated and wild Mediterranean olive trees (315 individuals). Forty polymorphic loci were detected on this sample, allowing the distinction of 22 haplotypes belonging to the three Mediterranean cpDNA lineages known as E1, E2 and E3. The discriminating power of cpDNA variation was particularly low for the cultivated olive tree with one predominating haplotype, but more diversity was detected in wild populations. Conclusions We propose a method for a rapid characterisation of the Mediterranean olive germplasm. The low variation in the cultivated olive tree indicated that the utility of cpDNA variation for forensic analyses is limited to rare haplotypes. In contrast, the high cpDNA variation in wild populations demonstrated that our markers may be useful for phylogeographic and populations genetic studies in O. europaea. PMID:21569271
Karlsson, Edvin; Svensson, Kerstin; Lindgren, Petter; Byström, Mona; Sjödin, Andreas; Forsman, Mats; Johansson, Anders
2013-02-01
Previous studies of the causative agent of tularaemia, Francisella tularensis have identified phylogeographic patterns suggestive of environmental maintenance reservoirs. To investigate the phylogeography of tularaemia in Sweden, we selected 163 clinical isolates obtained during 1995-2009 in 10 counties and sequenced one isolate's genome to identify new genetic markers. An improved typing scheme based on two indels and nine SNPs was developed using hydrolysis or TaqMan MGB probe assays. The results showed that much of the known global genetic diversity of F. tularensis subsp. holarctica is present in Sweden. Thirteen of the 163 isolates belonged to a new genetic group that is basal to all other known members of the major genetic clade B.I, which is spread across the Eurosiberian region. One hundred and twenty-five of the 163 Swedish isolates belonged to B.I, but individual clades' frequencies differed from county to county (P < 0.001). Subsequent analyses revealed a correlation between genotype variation over time and recurrent outbreaks at specific places, supporting the 'maintenance reservoir' environmental maintenance hypothesis. Most importantly, the findings reveal the presence of diverse source populations of F. tularensis subsp. holarctica in Sweden and suggest a historical spread of the disease from Scandinavia to other parts of Eurosiberia. © 2012 Society for Applied Microbiology and Blackwell Publishing Ltd.
NASA Astrophysics Data System (ADS)
Niinemets, Ülo; Keenan, Trevor
2017-04-01
Major light gradients, characteristically 10- to 50-fold, constitute the most prominent feature of plant canopies. These gradients drive within-canopy variation in foliage structural, chemical and physiological traits. As a key acclimation response to variation in light availability, foliage photosynthetic capacity per area (Aarea) increases with increasing light availability within the canopy, maximizing whole canopy photosynthesis. Recently, a worldwide database including 831 within-canopy gradients with standardized light estimates for 304 species belonging to major vascular plant functional types was constructed and within-canopy variation in photosynthetic acclimation was characterized (Niinemets Ü, Keenan TF, Hallik L (2015) Tansley review. A worldwide analysis of within-canopy variations in leaf structural, chemical and physiological traits across plant functional types. The New Phytologist 205: 973-993). However, the understanding of how within-canopy photosynthetic gradients vary during the growing season and in response to site and stand characteristics is still limited. Here we analyzed temporal, environmental and site (nutrient availability, stand density, ambient CO2 concentration, water availability) sources of variation in within-canopy photosynthetic acclimation in different plant functional types. Variation in key structural (leaf dry mass per unit area, MA), chemical (nitrogen content per dry mass, NM, and area, NA) and physiological (photosynthetic nitrogen use efficiency, EN) photosynthetic capacity per dry mass, Amass and area, Aarea) was examined. The analysis demonstrates major, typically 1.5-2-fold, time-, environment and site-dependent modifications in within-canopy variation in foliage photosynthetic capacity. However, the magnitude and direction of temporal and environmental variations in plasticity significantly varied among functional types. Species with longer leaf life span and low rates of canopy expansion or flush-type canopy formation had lower within canopy plasticity during the growing season and in response to environmental and site modifications than species with high rates of canopy expansion and leaf turnover. The fast canopy-expanding species that grow in highly dynamic light environments, actively modified Aarea by nitrogen reallocation among and partitioning within leaves. In contrast, species with low rate of leaf turnover generally exhibited a passive acclimation response with variation in Aarea primarily determined by light-dependent modifications in leaf structure during leaf growth. Due to limited reacclimation capacity in species with low leaf turnover, within-canopy variation in Aarea decreased with increasing leaf age in these species. Furthermore, the plasticity responded less to modifications in environmental and site characteristics than in species with faster leaf turnover. This analysis concludes that the rate of leaf turnover is the key trait determining the temporal variation and environmental responses of canopy photosynthetic acclimation.
Yi, Xuan; Gao, Lei; Wang, Bo; Su, Ying-Juan; Wang, Ting
2013-01-01
We have determined the complete chloroplast (cp) genome sequence of Cephalotaxus oliveri. The genome is 134,337 bp in length, encodes 113 genes, and lacks inverted repeat (IR) regions. Genome-wide mutational dynamics have been investigated through comparative analysis of the cp genomes of C. oliveri and C. wilsoniana. Gene order transformation analyses indicate that when distinct isomers are considered as alternative structures for the ancestral cp genome of cupressophyte and Pinaceae lineages, it is not possible to distinguish between hypotheses favoring retention of the same IR region in cupressophyte and Pinaceae cp genomes from a hypothesis proposing independent loss of IRA and IRB. Furthermore, in cupressophyte cp genomes, the highly reduced IRs are replaced by short repeats that have the potential to mediate homologous recombination, analogous to the situation in Pinaceae. The importance of repeats in the mutational dynamics of cupressophyte cp genomes is also illustrated by the accD reading frame, which has undergone extreme length expansion in cupressophytes. This has been caused by a large insertion comprising multiple repeat sequences. Overall, we find that the distribution of repeats, indels, and substitutions is significantly correlated in Cephalotaxus cp genomes, consistent with a hypothesis that repeats play a role in inducing substitutions and indels in conifer cp genomes.
Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf
2014-01-01
CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234
Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf
2014-01-01
CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.
Liu, Hai-Rui; Gao, Qing-Bo; Zhang, Fa-Qi; Khan, Gulzar; Chen, Shi-Long
2018-01-01
The varying topography and environment that resulted from paleoorogeny and climate fluctuations of the Himalaya-Hengduan Mountains (HHM) areas had a considerable impact on the evolution of biota during the Quaternary. To understand the phylogeographic pattern and historical dynamics of Triosteum himalayanum (Caprifoliaceae), we sequenced three chloroplast DNA fragments ( rbcL-accD , rps15-ycf1 , and trnH-psbA ) from 238 individuals representing 20 populations. Nineteen haplotypes (H1-H19) were identified based on 23 single-site mutations and eight indels. Most haplotypes were restricted to a single population or neighboring populations. Analysis of molecular variance revealed that variations among populations were much higher than that within populations for the overall gene pool, as well as for the East Himalayan group (EH group) and the North Hengduan group (NHM group), but not for the Hengduan Mountains group (HM group). Ecoregions representing relatively high genetic diversity or high frequencies of private haplotypes were discovered, suggesting that this alpine herbaceous plant underwent enhanced allopatric divergence in isolated and fragmented locations during the Quaternary glaciations. The current phylogeographic structure of T. himalayanum might be due to heterogeneous habitats and Quaternary climatic oscillations. Based on the phylogeographic structure of T. himalayanum populations, the phylogenetic relationship of identified haplotypes and palaeodistributional reconstruction, we postulated both westwards and northwards expansion from the HM group for this species. The westwards dispersal corridor could be long, narrow mountain areas and/or the Yarlung Zangbo Valley, while the northwards movement path could be south-north oriented mountains and low-elevation valleys.
Mosher, Jennifer J; Findlay, Robert H
2011-11-01
A correlative study was performed to determine if variation in streambed microbial community structure in low-order forested streams can be directly or indirectly linked to the chemical nature of the parental bedrock of the environments through which the streams flow. Total microbial and photosynthetic biomass (phospholipid phosphate [PLP] and chlorophyll a), community structure (phospholipid fatty acid analysis), and physical and chemical parameters were measured in six streams, three located in sandstone and three in limestone regions of the Bankhead National Forest in northern Alabama. Although stream water flowing through the two different bedrock types differed significantly in chemical composition, there were no significant differences in total microbial and photosynthetic biomass in the sediments. In contrast, sedimentary microbial community structure differed between the bedrock types and was significantly correlated with stream water ion concentrations. A pattern of seasonal variation in microbial community structure was also observed. Further statistical analysis indicated dissolved organic matter (DOM) quality, which was previously shown to be influenced by geological variation, correlated with variation in bacterial community structure. These results indicate that the geology of underlying bedrock influences benthic microbial communities directly via changes in water chemistry and also indirectly via stream water DOM quality.
NASA Astrophysics Data System (ADS)
Hai, X.; Porcher, F.; Mayer, C.; Miraglia, S.
2018-02-01
Steady state and in-situ neutron powder diffraction on selected compositions of the magneto-caloric (La,Ce)(Fe,Si)13CxHy compounds has been used to locate the sites accommodated by the interstitial species and to reveal the structural modifications (breathing) that occur upon metal substitution and/or interstitial insertion. The latter type of measurement in which the sequential filling of interstitial sites is followed allows one to extract some useful hydrogenation kinetics data. This structural investigation has allowed to precise the deformations undergone by the complex metallic alloys La(Fe,Si)13 when subjected to light interstitial insertion or rare earth substitution at the cation site of the NaZn13-structure type. We attempt to correlate hydrogenation kinetics variations (depression or enhancement of the hydrogen absorption rate) with a particular inhomogeneous cell variation (breathing) and bonding of the NaZn13 structure-type.
Germline sequence variants in TGM3 and RGS22 confer risk of basal cell carcinoma
Stacey, Simon N.; Sulem, Patrick; Gudbjartsson, Daniel F.; Jonasdottir, Aslaug; Thorleifsson, Gudmar; Gudjonsson, Sigurjon A.; Masson, Gisli; Gudmundsson, Julius; Sigurgeirsson, Bardur; Benediktsdottir, Kristrun R.; Thorisdottir, Kristin; Ragnarsson, Rafn; Fuentelsaz, Victoria; Corredera, Cristina; Grasa, Matilde; Planelles, Dolores; Sanmartin, Onofre; Rudnai, Peter; Gurzau, Eugene; Koppova, Kvetoslava; Hemminki, Kari; Nexø, Bjørn A; Tjønneland, Anne; Overvad, Kim; Johannsdottir, Hrefna; Helgadottir, Hafdis T.; Thorsteinsdottir, Unnur; Kong, Augustine; Vogel, Ulla; Kumar, Rajiv; Nagore, Eduardo; Mayordomo, José I.; Rafnar, Thorunn; Olafsson, Jon H.; Stefansson, Kari
2014-01-01
To search for new sequence variants that confer risk of cutaneous basal cell carcinoma (BCC), we conducted a genome-wide association study of 38.5 million single nucleotide polymorphisms (SNPs) and small indels identified through whole-genome sequencing of 2230 Icelanders. We imputed genotypes for 4208 BCC patients and 109 408 controls using Illumina SNP chip typing data, carried out association tests and replicated the findings in independent population samples. We found new BCC susceptibility loci at TGM3 (rs214782[G], P = 5.5 × 10−17, OR = 1.29) and RGS22 (rs7006527[C], P = 8.7 × 10−13, OR = 0.77). TGM3 encodes transglutaminase type 3, which plays a key role in production of the cornified envelope during epidermal differentiation. PMID:24403052
Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample
Gilks, William P.; Pennell, Tanya M.; Flis, Ilona; Webster, Matthew T.; Morrow, Edward H.
2016-01-01
As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly ( Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LH M). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics ( https://zenodo.org/communities/sussex_drosophila_sequencing/). PMID:27928499
Whole-Exome Sequencing Identifies Novel Variants for Tooth Agenesis.
Dinckan, N; Du, R; Petty, L E; Coban-Akdemir, Z; Jhangiani, S N; Paine, I; Baugh, E H; Erdem, A P; Kayserili, H; Doddapaneni, H; Hu, J; Muzny, D M; Boerwinkle, E; Gibbs, R A; Lupski, J R; Uyguner, Z O; Below, J E; Letra, A
2018-01-01
Tooth agenesis is a common craniofacial abnormality in humans and represents failure to develop 1 or more permanent teeth. Tooth agenesis is complex, and variations in about a dozen genes have been reported as contributing to the etiology. Here, we combined whole-exome sequencing, array-based genotyping, and linkage analysis to identify putative pathogenic variants in candidate disease genes for tooth agenesis in 10 multiplex Turkish families. Novel homozygous and heterozygous variants in LRP6, DKK1, LAMA3, and COL17A1 genes, as well as known variants in WNT10A, were identified as likely pathogenic in isolated tooth agenesis. Novel variants in KREMEN1 were identified as likely pathogenic in 2 families with suspected syndromic tooth agenesis. Variants in more than 1 gene were identified segregating with tooth agenesis in 2 families, suggesting oligogenic inheritance. Structural modeling of missense variants suggests deleterious effects to the encoded proteins. Functional analysis of an indel variant (c.3607+3_6del) in LRP6 suggested that the predicted resulting mRNA is subject to nonsense-mediated decay. Our results support a major role for WNT pathways genes in the etiology of tooth agenesis while revealing new candidate genes. Moreover, oligogenic cosegregation was suggestive for complex inheritance and potentially complex gene product interactions during development, contributing to improved understanding of the genetic etiology of familial tooth agenesis.
Goettel, Wolfgang; Xia, Eric; Upchurch, Robert; Wang, Ming-Li; Chen, Pengyin; An, Yong-Qiang Charles
2014-04-23
Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.
Population genetic study of 34 X-Chromosome markers in 5 main ethnic groups of China.
Zhang, Suhua; Bian, Yingnan; Li, Li; Sun, Kuan; Wang, Zheng; Zhao, Qi; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Li, Chengtao
2015-12-04
As a multi-ethnic country, China has some indigenous population groups which vary in culture and social customs, perhaps as a result of geographic isolation and different traditions. However, upon close interactions and intermarriage, admixture of different gene pools among these ethnic groups may occur. In order to gain more insight on the genetic background of X-Chromosome from these ethnic groups, a set of X-markers (18 X-STRs and 16 X-Indels) was genotyped in 5 main ethnic groups of China (HAN, HUI, Uygur, Mongolian, Tibetan). Twenty-three private alleles were detected in HAN, Uygur, Tibetan and Mongolian. Significant differences (p < 0.0001) were all observed for the 3 parameters of heterozygosity (Ho, He and UHe) among the 5 ethnic groups. Highest values of Nei genetic distance were always observed at HUI-Uygur pairwise when analyzed with X-STRs or X-Indels separately and combined. Phylogenetic tree and PCA analyses revealed a clear pattern of population differentiation of HUI and Uygur. However, the HAN, Tibetan and Mongolian ethnic groups were closely clustered. Eighteen X-Indels exhibited in general congruent phylogenetic signal and similar cluster among the 5 ethnic groups compared with 16 X-STRs. Aforementioned results proved the genetic polymorphism and potential of the 34 X-markers in the 5 ethnic groups.
Identification and validation of loss of function variants in clinical contexts.
Lescai, Francesco; Marasco, Elena; Bacchelli, Chiara; Stanier, Philip; Mantovani, Vilma; Beales, Philip
2014-01-01
The choice of an appropriate variant calling pipeline for exome sequencing data is becoming increasingly more important in translational medicine projects and clinical contexts. Within GOSgene, which facilitates genetic analysis as part of a joint effort of the University College London and the Great Ormond Street Hospital, we aimed to optimize a variant calling pipeline suitable for our clinical context. We implemented the GATK/Queue framework and evaluated the performance of its two callers: the classical UnifiedGenotyper and the new variant discovery tool HaplotypeCaller. We performed an experimental validation of the loss-of-function (LoF) variants called by the two methods using Sequenom technology. UnifiedGenotyper showed a total validation rate of 97.6% for LoF single-nucleotide polymorphisms (SNPs) and 92.0% for insertions or deletions (INDELs), whereas HaplotypeCaller was 91.7% for SNPs and 55.9% for INDELs. We confirm that GATK/Queue is a reliable pipeline in translational medicine and clinical context. We conclude that in our working environment, UnifiedGenotyper is the caller of choice, being an accurate method, with a high validation rate of error-prone calls like LoF variants. We finally highlight the importance of experimental validation, especially for INDELs, as part of a standard pipeline in clinical environments.
Salehi, Samaneh; Emadi-Baygi, Modjtaba; Rezaei, Majdaddin; Kelishadi, Roya; Nikpour, Parvaneh
2017-01-01
Metabolic syndrome (MetS) is a common disorder which is a constellation of clinical features including abdominal obesity, increased level of serum triglycerides (TGs) and decrease of serum high-density lipoprotein-cholesterol (HDL-C), elevated blood pressure, and glucose intolerance. The apolipoprotein A5 (APOA5) is involved in lipid metabolism, influencing the level of plasma TG and HDL-C. In the present study, we aimed to investigate the associations between four INDEL variants of APOA5 gene and the MetS risk. In this case-control study, we genotyped 116 Iranian children and adolescents with/without MetS by using Sanger sequencing method for these INDELs. Then, we explored the association of INDELs with MetS risk and their clinical components by logistic regression and one-way analysis of variance analyses. We identified a novel insertion polymorphism, c. *282-283 insAG/c. *282-283 insG variant, which appears among case and control groups. rs72525532 showed a significant difference for TG levels between various genotype groups. In addition, there were significant associations between newly identified single-nucleotide polymorphism (SNP) and rs72525532 with MetS risk. These results show that rs72525532 and the newly identified SNP may influence the susceptibility of the individuals to MetS.
Maheshwari, Shamoni; Barbash, Daniel A.
2012-01-01
Hybrid incompatibility (HI) genes are frequently observed to be rapidly evolving under selection. This observation has led to the attractive conjecture that selection-derived protein-sequence divergence is culpable for incompatibilities in hybrids. The Drosophila simulans HI gene Lethal hybrid rescue (Lhr) is an intriguing case, because despite having experienced rapid sequence evolution, its HI properties are a shared function inherited from the ancestral state. Using an unusual D. simulans Lhr hybrid rescue allele, Lhr2, we here identify a conserved stretch of 10 amino acids in the C terminus of LHR that is critical for causing hybrid incompatibility. Altering these 10 amino acids weakens or abolishes the ability of Lhr to suppress the hybrid rescue alleles Lhr1 or Hmr1, respectively. Besides single-amino-acid substitutions, Lhr orthologs differ by a 16-aa indel polymorphism, with the ancestral deletion state fixed in D. melanogaster and the derived insertion state at very high frequency in D. simulans. Lhr2 is a rare D. simulans allele that has the ancestral deletion state of the 16-aa polymorphism. Through a series of transgenic constructs we demonstrate that the ancestral deletion state contributes to the rescue activity of Lhr2. This indel is thus a polymorphism that can affect the HI function of Lhr. PMID:22865735
Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan
2015-07-01
In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah
2014-01-01
This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.
Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling
Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien
2012-01-01
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Rennert, Hanna; Eng, Kenneth; Zhang, Tuo; Tan, Adrian; Xiang, Jenny; Romanel, Alessandro; Kim, Robert; Tam, Wayne; Liu, Yen-Chun; Bhinder, Bhavneet; Cyrta, Joanna; Beltran, Himisha; Robinson, Brian; Mosquera, Juan Miguel; Fernandes, Helen; Demichelis, Francesca; Sboner, Andrea; Kluk, Michael; Rubin, Mark A; Elemento, Olivier
2016-01-01
We describe Exome Cancer Test v1.0 (EXaCT-1), the first New York State-Department of Health-approved whole-exome sequencing (WES)-based test for precision cancer care. EXaCT-1 uses HaloPlex (Agilent) target enrichment followed by next-generation sequencing (Illumina) of tumour and matched constitutional control DNA. We present a detailed clinical development and validation pipeline suitable for simultaneous detection of somatic point/indel mutations and copy-number alterations (CNAs). A computational framework for data analysis, reporting and sign-out is also presented. For the validation, we tested EXaCT-1 on 57 tumours covering five distinct clinically relevant mutations. Results demonstrated elevated and uniform coverage compatible with clinical testing as well as complete concordance in variant quality metrics between formalin-fixed paraffin embedded and fresh-frozen tumours. Extensive sensitivity studies identified limits of detection threshold for point/indel mutations and CNAs. Prospective analysis of 337 cancer cases revealed mutations in clinically relevant genes in 82% of tumours, demonstrating that EXaCT-1 is an accurate and sensitive method for identifying actionable mutations, with reasonable costs and time, greatly expanding its utility for advanced cancer care. PMID:28781886
Genome-wide DNA polymorphisms in two cultivars of mei (Prunus mume sieb. et zucc.).
Sun, Lidan; Zhang, Qixiang; Xu, Zongda; Yang, Weiru; Guo, Yu; Lu, Jiuxing; Pan, Huitang; Cheng, Tangren; Cai, Ming
2013-10-06
Mei (Prunus mume Sieb. et Zucc.) is a famous ornamental plant and fruit crop grown in East Asian countries. Limited genetic resources, especially molecular markers, have hindered the progress of mei breeding projects. Here, we performed low-depth whole-genome sequencing of Prunus mume 'Fenban' and Prunus mume 'Kouzi Yudie' to identify high-quality polymorphic markers between the two cultivars on a large scale. A total of 1464.1 Mb and 1422.1 Mb of 'Fenban' and 'Kouzi Yudie' sequencing data were uniquely mapped to the mei reference genome with about 6-fold coverage, respectively. We detected a large number of putative polymorphic markers from the 196.9 Mb of sequencing data shared by the two cultivars, which together contained 200,627 SNPs, 4,900 InDels, and 7,063 SSRs. Among these markers, 38,773 SNPs, 174 InDels, and 418 SSRs were distributed in the 22.4 Mb CDS region, and 63.0% of these marker-containing CDS sequences were assigned to GO terms. Subsequently, 670 selected SNPs were validated using an Agilent's SureSelect solution phase hybridization assay. A subset of 599 SNPs was used to assess the genetic similarity of a panel of mei germplasm samples and a plum (P. salicina) cultivar, producing a set of informative diversity data. We also analyzed the frequency and distribution of detected InDels and SSRs in mei genome and validated their usefulness as DNA markers. These markers were successfully amplified in the cultivars and in their segregating progeny. A large set of high-quality polymorphic SNPs, InDels, and SSRs were identified in parallel between 'Fenban' and 'Kouzi Yudie' using low-depth whole-genome sequencing. The study presents extensive data on these polymorphic markers, which can be useful for constructing high-resolution genetic maps, performing genome-wide association studies, and designing genomic selection strategies in mei.
Analyzing Somatic Genome Rearrangements in Human Cancers by Using Whole-Exome Sequencing
Yang, Lixing; Lee, Mi-Sook; Lu, Hengyu; Oh, Doo-Yi; Kim, Yeon Jeong; Park, Donghyun; Park, Gahee; Ren, Xiaojia; Bristow, Christopher A.; Haseley, Psalm S.; Lee, Soohyun; Pantazi, Angeliki; Kucherlapati, Raju; Park, Woong-Yang; Scott, Kenneth L.; Choi, Yoon-La; Park, Peter J.
2016-01-01
Although exome sequencing data are generated primarily to detect single-nucleotide variants and indels, they can also be used to identify a subset of genomic rearrangements whose breakpoints are located in or near exons. Using >4,600 tumor and normal pairs across 15 cancer types, we identified over 9,000 high confidence somatic rearrangements, including a large number of gene fusions. We find that the 5′ fusion partners of functional fusions are often housekeeping genes, whereas the 3′ fusion partners are enriched in tyrosine kinases. We establish the oncogenic potential of ROR1-DNAJC6 and CEP85L-ROS1 fusions by showing that they can promote cell proliferation in vitro and tumor formation in vivo. Furthermore, we found that ∼4% of the samples have massively rearranged chromosomes, many of which are associated with upregulation of oncogenes such as ERBB2 and TERT. Although the sensitivity of detecting structural alterations from exomes is considerably lower than that from whole genomes, this approach will be fruitful for the multitude of exomes that have been and will be generated, both in cancer and in other diseases. PMID:27153396
Evo-devo models of tooth development and the origin of hominoid molar diversity
Bailey, Shara E.; Schwartz, Gary T.; Skinner, Matthew M.
2018-01-01
The detailed anatomical features that characterize fossil hominin molars figure prominently in the reconstruction of their taxonomy, phylogeny, and paleobiology. Despite the prominence of molar form in human origins research, the underlying developmental mechanisms generating the diversity of tooth crown features remain poorly understood. A model of tooth morphogenesis—the patterning cascade model (PCM)—provides a developmental framework to explore how and why the varying molar morphologies arose throughout human evolution. We generated virtual maps of the inner enamel epithelium—an indelibly preserved record of enamel knot arrangement—in 17 living and fossil hominoid species to investigate whether the PCM explains the expression of all major accessory cusps. We found that most of the variation and evolutionary changes in hominoid molar morphology followed the general developmental rule shared by all mammals, outlined by the PCM. Our results have implications for the accurate interpretation of molar crown configuration in hominoid systematics. PMID:29651459
Large-scale whole-genome sequencing of the Icelandic population.
Gudbjartsson, Daniel F; Helgason, Hannes; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Gylfason, Arnaldur; Besenbacher, Soren; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Stacey, Simon N; Frigge, Michael L; Holm, Hilma; Saemundsdottir, Jona; Helgadottir, Hafdis Th; Johannsdottir, Hrefna; Sigfusson, Gunnlaugur; Thorgeirsson, Gudmundur; Sverrisson, Jon Th; Gretarsdottir, Solveig; Walters, G Bragi; Rafnar, Thorunn; Thjodleifsson, Bjarni; Bjornsson, Einar S; Olafsson, Sigurdur; Thorarinsdottir, Hildur; Steingrimsdottir, Thora; Gudmundsdottir, Thora S; Theodors, Asgeir; Jonasson, Jon G; Sigurdsson, Asgeir; Bjornsdottir, Gyda; Jonsson, Jon J; Thorarensen, Olafur; Ludvigsson, Petur; Gudbjartsson, Hakon; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Arnar, David O; Magnusson, Olafur Th; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Helgason, Agnar; Sulem, Patrick; Stefansson, Kari
2015-05-01
Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.
Kim, Joo-Sung; Artymovich, Katherine A.; Hall, David F.; Smith, Eric J.; Fulton, Richard; Bell, Julia; Dybas, Leslie; Mansfield, Linda S.; Tempelman, Robert; Wilson, David L.
2012-01-01
Human illness due to Camplyobacter jejuni infection is closely associated with consumption of poultry products. We previously demonstrated a 50 % shift in allele frequency (phase variation) in contingency gene Cj1139 (wlaN) during passage of C. jejuni NCTC11168 populations through Ross 308 broiler chickens. We hypothesized that phase variation in contingency genes during chicken passage could promote subsequent colonization and disease in humans. To test this hypothesis, we passaged C. jejuni strains NCTC11168, 33292, 81-176, KanR4 and CamR2 through broiler chickens and analysed the ability of passaged and non-passaged populations to colonize C57BL6 IL-10-deficient mice, our model for human colonization and disease. We utilized fragment analysis and nucleotide sequence analysis to measure phase variation in contingency genes. Passage through the chicken reservoir promoted phase variation in five specific contingency genes, and these ‘successful’ populations colonized mice. When phase variation did not occur in these same five contingency genes during chicken passage, these ‘unsuccessful’ populations failed to colonize mice. Phase variation during chicken passage generated small insertions or deletions (indels) in the homopolymeric tract (HT) in contingency genes. Single-colony isolates of C. jejuni strain KanR4 carrying an allele of contingency gene Cj0170 with a10G HT colonized mice at high frequency and caused disease symptoms, whereas single-colony isolates carrying the 9G allele failed to colonize mice. Supporting results were observed for the successful 9G allele of Cj0045 in strain 33292. These data suggest that phase variation in Cj0170 and Cj0045 is strongly associated with mouse colonization and disease, and that the chicken reservoir can play an active role in natural selection, phase variation and disease. PMID:22343355
Kim, Joo-Sung; Artymovich, Katherine A; Hall, David F; Smith, Eric J; Fulton, Richard; Bell, Julia; Dybas, Leslie; Mansfield, Linda S; Tempelman, Robert; Wilson, David L; Linz, John E
2012-05-01
Human illness due to Camplyobacter jejuni infection is closely associated with consumption of poultry products. We previously demonstrated a 50 % shift in allele frequency (phase variation) in contingency gene Cj1139 (wlaN) during passage of C. jejuni NCTC11168 populations through Ross 308 broiler chickens. We hypothesized that phase variation in contingency genes during chicken passage could promote subsequent colonization and disease in humans. To test this hypothesis, we passaged C. jejuni strains NCTC11168, 33292, 81-176, KanR4 and CamR2 through broiler chickens and analysed the ability of passaged and non-passaged populations to colonize C57BL6 IL-10-deficient mice, our model for human colonization and disease. We utilized fragment analysis and nucleotide sequence analysis to measure phase variation in contingency genes. Passage through the chicken reservoir promoted phase variation in five specific contingency genes, and these 'successful' populations colonized mice. When phase variation did not occur in these same five contingency genes during chicken passage, these 'unsuccessful' populations failed to colonize mice. Phase variation during chicken passage generated small insertions or deletions (indels) in the homopolymeric tract (HT) in contingency genes. Single-colony isolates of C. jejuni strain KanR4 carrying an allele of contingency gene Cj0170 with a10G HT colonized mice at high frequency and caused disease symptoms, whereas single-colony isolates carrying the 9G allele failed to colonize mice. Supporting results were observed for the successful 9G allele of Cj0045 in strain 33292. These data suggest that phase variation in Cj0170 and Cj0045 is strongly associated with mouse colonization and disease, and that the chicken reservoir can play an active role in natural selection, phase variation and disease.
Waardenburg syndrome: Novel mutations in a large Brazilian sample.
Bocángel, Magnolia Astrid Pretell; Melo, Uirá Souto; Alves, Leandro Ucela; Pardono, Eliete; Lourenço, Naila Cristina Vilaça; Marcolino, Humberto Vicente Cezar; Otto, Paulo Alberto; Mingroni-Netto, Regina Célia
2018-06-01
This paper deals with the molecular investigation of Waardenburg syndrome (WS) in a sample of 49 clinically diagnosed probands (most from southeastern Brazil), 24 of them having the type 1 (WS1) variant (10 familial and 14 isolated cases) and 25 being affected by the type 2 (WS2) variant (five familial and 20 isolated cases). Sequential Sanger sequencing of all coding exons of PAX3, MITF, EDN3, EDNRB, SOX10 and SNAI2 genes, followed by CNV detection by MLPA of PAX3, MITF and SOX10 genes in selected cases revealed many novel pathogenic variants. Molecular screening, performed in all patients, revealed 19 causative variants (19/49 = 38.8%), six of them being large whole-exon deletions detected by MLPA, seven (four missense and three nonsense substitutions) resulting from single nucleotide substitutions (SNV), and six representing small indels. A pair of dizygotic affected female twins presented the c.430delC variant in SOX10, but the mutation, imputed to gonadal mosaicism, was not found in their unaffected parents. At least 10 novel causative mutations, described in this paper, were found in this Brazilian sample. Copy-number-variation detected by MLPA identified the causative mutation in 12.2% of our cases, corresponding to 31.6% of all causative mutations. In the majority of cases, the deletions were sporadic, since they were not present in the parents of isolated cases. Our results, as a whole, reinforce the fact that the screening of copy-number-variants by MLPA is a powerful tool to identify the molecular cause in WS patients. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Neutral theory, microbial practice: challenges in bacterial population genetics.
Rocha, Eduardo P C
2018-04-19
Kimura's outstanding contributions to population genetics included many elegant theoretical results on the vagaries of alleles in populations. Once polymorphism data showed extensive variation in natural populations, these results led naturally to the Neutral Theory. In this article, I'll depart from some of these results to focus on four major open problems in microbial population genetics with direct implications to the study of molecular evolution: the lack of neutral polymorphism, the modeling of genetic exchanges, the population genetics of ill-defined populations, and the difficulty of untangling selection and demography in the light of the previous issues. Whilst studies in population genetics usually focus on single nucleotide polymorphism and allelic recombination, ignoring even small indels, a large fraction of genetic diversification in Bacteria results from horizontal gene transfer. Ignoring this fact defeats the purpose of population genetics: to characterize the genetic variation in populations and their adaptive effects. I'll argue that, following on Kimura's life work, one may need to develop new approaches to study microbes that reproduce asexually but are able to engage in gene exchanges with very distantly related organisms in a context where random sampling is often unachievable, populations are ill-defined, genetic linkage is strong, and random drift is rare.
An integrated map of genetic variation from 1,092 human genomes
2012-01-01
Summary Through characterising the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help understand the genetic contribution to disease. We describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methodologies to integrate information across multiple algorithms and diverse data sources we provide a validated haplotype map of 38 million SNPs, 1.4 million indels and over 14 thousand larger deletions. We show that individuals from different populations carry different profiles of rare and common variants and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways and that each individual harbours hundreds of rare non-coding variants at conserved sites, such as transcription-factor-motif disrupting changes. This resource, which captures up to 98% of accessible SNPs at a frequency of 1% in populations of medical genetics focus, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. PMID:23128226
A high-resolution cattle CNV map by population-scale genome sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. CNVs represent an important type of genetic variation among cattle breeds and even individual animals; however, only low-resolution maps of cattle CNVs currently exis...
[Analysis of chloroplast rpS16 intron sequences in Lemnaceae].
Martirosian, E V; Ryzhova, N N; Kochieva, E Z; Skriabin, K G
2009-01-01
Chloroplast rpS16 gene intron sequences were determined and characterized for twenty-five Lemnaceae accessions representing nine duckweed species. For each Lemnaceae species nucleotide substitutions and for Lemna minor, Lemna aequinoctialis, Wolffia arrhiza different indels were detected. Most of indels were found for Wolffia arrhiza and Lemna aequinoctialis. The analyses of intraspecific polymorphism resulted in identification of several gaplotypes in L. gibba and L. trisulca. Lemnaceae phylogenetic relationship based on rpS16 intron variability data has revealed significant differences between L. aequinoctialis and other Lemna species. Genetic distance values corroborated competence of Landoltia punctata separations from Spirodela into an independent generic taxon. The acceptability of rpS16 intron sequences for phylogenetic studies in Lemnaceae was shown.
VCGDB: a dynamic genome database of the Chinese population
2014-01-01
Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222
Friedman, Esther M.; Montez, Jennifer Karas; Sheehan, Connor McDevitt; Guenewald, Tara L.; Seeman, Teresa E.
2015-01-01
Objective Adverse events in childhood can indelibly influence adult health. While evidence for this association has mounted, a fundamental set of questions about how to operationalize adverse events has been understudied. Method We used data from the National Survey of Midlife Development in the United States to examine how quantity, timing, and types of adverse events in childhood are associated with adult cardiometabolic health. Results The best-fitting specification of quantity of events was a linear measure reflecting a dose–response relationship. Timing of event mattered less than repeated exposure to events. Regarding the type of event, academic interruptions and sexual/physical abuse were most important. Adverse childhood events elevated the risk of diabetes and obesity similarly for men and women but had a greater impact on women’s risk of heart disease. Discussion Findings demonstrate the insights that can be gleaned about the early-life origins of adult health by examining operationalization of childhood exposures. PMID:25903978
2012-01-01
Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952) of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612) were intronic and 9% (n = 464) were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS). Significant (P < 0.01) mean allele frequency differentials between the low and high fertility groups were observed for 720 SNPs (58 NSS). Allele frequencies for 43 of the SNPs were also determined by genotyping the 150 individual animals (Sequenom® MassARRAY). No significant differences (P > 0.1) were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total). Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving interval plausibly harbouring causative variants contributing to heritable variation. To our knowledge, this is the first report describing sequencing of targeted genomic regions in any livestock species using groups with divergent phenotypes for an economically important trait. PMID:22235840
Differential evolution-simulated annealing for multiple sequence alignment
NASA Astrophysics Data System (ADS)
Addawe, R. C.; Addawe, J. M.; Sueño, M. R. K.; Magadia, J. C.
2017-10-01
Multiple sequence alignments (MSA) are used in the analysis of molecular evolution and sequence structure relationships. In this paper, a hybrid algorithm, Differential Evolution - Simulated Annealing (DESA) is applied in optimizing multiple sequence alignments (MSAs) based on structural information, non-gaps percentage and totally conserved columns. DESA is a robust algorithm characterized by self-organization, mutation, crossover, and SA-like selection scheme of the strategy parameters. Here, the MSA problem is treated as a multi-objective optimization problem of the hybrid evolutionary algorithm, DESA. Thus, we name the algorithm as DESA-MSA. Simulated sequences and alignments were generated to evaluate the accuracy and efficiency of DESA-MSA using different indel sizes, sequence lengths, deletion rates and insertion rates. The proposed hybrid algorithm obtained acceptable solutions particularly for the MSA problem evaluated based on the three objectives.
Salces-Ortiz, Judit; Ramón, Manuel; González, Carmen; Pérez-Guzmán, M Dolores; Garde, J Julián; García-Álvarez, Olga; Maroto-Morales, Alejandro; Calvo, Jorge H; Serrano, M Magdalena
2015-01-01
Heat shock (HS) is one of the best-studied exogenous cellular stresses. Almost all tissues, cell types, metabolic pathways and biochemical reactions are affected in greater or lesser extent by HS. However, there are some especially thermo sensible cellular types such as the mammalian male germ cells. The present study examined the role of three INDELs in conjunction with the -660G/C polymorphism located at the HSP90AA1 promoter region over the gene expression rate under HS. Specially, the -668insC INDEL, which is very close to the -660G/C transversion, is a good candidate to be implied in the transcriptional regulation of the gene by itself or in a cooperative way with this SNP. Animals carrying the genotype II-668 showed higher transcription rates than those with ID-668 (FC = 3.07) and DD-668 (FC = 3.40) genotypes for samples collected under HS. A linkage between gene expression and sperm DNA fragmentation was also found. When HS conditions were present along or in some stages of the spermatogenesis, alternative genotypes of the -668insC and -660G/C mutations are involved in the effect of HS over sperm DNA fragmentation. Thus, unfavorable genotypes in terms of gene expression induction (ID-668GC-660 and DD-668GG-660) do not produce enough mRNA (stored as messenger ribonucleoprotein particles) and Hsp90α protein to cope with future thermal stress which might occur in posterior stages when transcriptional activity is reduced and cell types and molecular processes are more sensible to heat (spermatocytes in pachytene and spermatids protamination). This would result in the impairment of DNA packaging and the consequent commitment of the events occurring shortly after fertilization and during embryonic development. In the short-term, the assessment of the relationship between sperm DNA fragmentation sensitivity and ram's fertility will be of interest to a better understanding of the mechanisms of response to HS and its consequences on animal production and reproduction performance.
dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms.
Puritz, Jonathan B; Hollenbeck, Christopher M; Gold, John R
2014-01-01
Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.
dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
Hollenbeck, Christopher M.; Gold, John R.
2014-01-01
Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com. PMID:24949246
Effective screen of CRISPR/Cas9-induced mutants in rice by single-strand conformation polymorphism.
Zheng, Xuelian; Yang, Shixin; Zhang, Dengwei; Zhong, Zhaohui; Tang, Xu; Deng, Kejun; Zhou, Jianping; Qi, Yiping; Zhang, Yong
2016-07-01
A method based on DNA single-strand conformation polymorphism is demonstrated for effective genotyping of CRISPR/Cas9-induced mutants in rice. Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) has been widely adopted for genome editing in many organisms. A large proportion of mutations generated by CRISPR/Cas9 are very small insertions and deletions (indels), presumably because Cas9 generates blunt-ended double-strand breaks which are subsequently repaired without extensive end-processing. CRISPR/Cas9 is highly effective for targeted mutagenesis in the important crop, rice. For example, homozygous mutant seedlings are commonly recovered from CRISPR/Cas9-treated calli. However, many current mutation detection methods are not very suitable for screening homozygous mutants that typically carry small indels. In this study, we tested a mutation detection method based on single-strand conformational polymorphism (SSCP). We found it can effectively detect small indels in pilot experiments. By applying the SSCP method for CRISRP-Cas9-mediated targeted mutagenesis in rice, we successfully identified multiple mutants of OsROC5 and OsDEP1. In conclusion, the SSCP analysis will be a useful genotyping method for rapid identification of CRISPR/Cas9-induced mutants, including the most desirable homozygous mutants. The method also has high potential for similar applications in other plant species.
Population genetic study of 34 X-Chromosome markers in 5 main ethnic groups of China
Zhang, Suhua; Bian, Yingnan; Li, Li; Sun, Kuan; wang, Zheng; Zhao, Qi; Zha, Lagabaiyila; Cai, Jifeng; Gao, Yuzhen; Ji, Chaoneng; Li, Chengtao
2015-01-01
As a multi-ethnic country, China has some indigenous population groups which vary in culture and social customs, perhaps as a result of geographic isolation and different traditions. However, upon close interactions and intermarriage, admixture of different gene pools among these ethnic groups may occur. In order to gain more insight on the genetic background of X-Chromosome from these ethnic groups, a set of X-markers (18 X-STRs and 16 X-Indels) was genotyped in 5 main ethnic groups of China (HAN, HUI, Uygur, Mongolian, Tibetan). Twenty-three private alleles were detected in HAN, Uygur, Tibetan and Mongolian. Significant differences (p < 0.0001) were all observed for the 3 parameters of heterozygosity (Ho, He and UHe) among the 5 ethnic groups. Highest values of Nei genetic distance were always observed at HUI-Uygur pairwise when analyzed with X-STRs or X-Indels separately and combined. Phylogenetic tree and PCA analyses revealed a clear pattern of population differentiation of HUI and Uygur. However, the HAN, Tibetan and Mongolian ethnic groups were closely clustered. Eighteen X-Indels exhibited in general congruent phylogenetic signal and similar cluster among the 5 ethnic groups compared with 16 X-STRs. Aforementioned results proved the genetic polymorphism and potential of the 34 X-markers in the 5 ethnic groups. PMID:26634331
Katie Greenberg; Beverly S. Collins; Henry McNab; Douglas K. Miller; Gary R. Wein
2015-01-01
EXCERPT FROM: Natural Disturbances and Historic Range Variation 2015. Throughout the history of upland hardwood forests of the Central Hardwood Region, natural disturbances have been integral to shaping forest structure and composition, and essential in maintaining diverse biotic...
2013-01-01
Background Color traits in animals play crucial roles in thermoregulation, photoprotection, camouflage, and visual communication, and are amenable to objective quantification and modeling. However, the extensive variation in non-melanic pigments and structural colors in squamate reptiles has been largely disregarded. Here, we used an integrated approach to investigate the morphological basis and physical mechanisms generating variation in color traits in tropical day geckos of the genus Phelsuma. Results Combining histology, optics, mass spectrometry, and UV and Raman spectroscopy, we found that the extensive variation in color patterns within and among Phelsuma species is generated by complex interactions between, on the one hand, chromatophores containing yellow/red pteridine pigments and, on the other hand, iridophores producing structural color by constructive interference of light with guanine nanocrystals. More specifically, we show that 1) the hue of the vivid dorsolateral skin is modulated both by variation in geometry of structural, highly ordered narrowband reflectors, and by the presence of yellow pigments, and 2) that the reflectivity of the white belly and of dorsolateral pigmentary red marks, is increased by underlying structural disorganized broadband reflectors. Most importantly, these interactions require precise colocalization of yellow and red chromatophores with different types of iridophores, characterized by ordered and disordered nanocrystals, respectively. We validated these results through numerical simulations combining pigmentary components with a multilayer interferential optical model. Finally, we show that melanophores form dark lateral patterns but do not significantly contribute to variation in blue/green or red coloration, and that changes in the pH or redox state of pigments provide yet another source of color variation in squamates. Conclusions Precisely colocalized interacting pigmentary and structural elements generate extensive variation in lizard color patterns. Our results indicate the need to identify the developmental mechanisms responsible for the control of the size, shape, and orientation of nanocrystals, and the superposition of specific chromatophore types. This study opens up new perspectives on Phelsuma lizards as models in evolutionary developmental biology. PMID:24099066
Koçer, Zeynep A; Fan, Yiping; Huether, Robert; Obenauer, John; Webby, Richard J; Zhang, Jinghui; Webster, Robert G; Wu, Gang
2014-12-12
Most influenza pandemics have been caused by H1N1 viruses of purely or partially avian origin. Here, using Cox proportional hazard model, we attempt to identify the genetic variations in the whole genome of wild-type North American avian H1N1 influenza A viruses that are associated with their virulence in mice by residue variations, host origins of virus (Anseriformes-ducks or Charadriiformes-shorebirds), and host-residue interactions. In addition, through structural modeling, we predicted that several polymorphic sites associated with pathogenicity were located in structurally important sites, especially in the polymerase complex and NS genes. Our study introduces a new approach to identify pathogenic variations in wild-type viruses circulating in the natural reservoirs and ultimately to understand their infectious risks to humans as part of risk assessment efforts towards the emergence of future pandemic strains.
Zhang, Haiyang; Wang, Lina; Zheng, Shuangshuang; Liu, Zezhou; Wu, Xiaoqin; Gao, Zhihui; Cao, Chenxing; Li, Qiang; Ren, Zhonghai
2016-07-01
The indel in the promoter of CsHDZIV11 co-segregates with fruit spine density and could be used for molecular breeding in cucumber. Fruit spine density is an important quality trait for marketing in cucumber (Cucumis sativus L.). However, the molecular basis of fruit spine density in cucumber remains unclear. In this study, we isolated a mutant, few spines 1 (fs1), from CNS2 (wild type, WT), a North China-type cucumber with a high density of fruit spines. Genetic analysis showed that fs1 was controlled by a single recessive Mendelian factor. Bulked segregant analysis combined with genome resequencing were used for mapping fs1 in the F2 population derived from a cross between the fs1 mutant and WT, and it was located on chromosome 6 through association analysis. To develop more polymorphic markers to locate fs1, another F2 population was constructed from the cross between fs1 and 'Chinese long' 9930. Then, fs1 was narrowed down to a 110.4-kb genomic region containing 25 annotated genes. A fragment substitution was identified in the promoter region of Csa6M514870 between fs1 and WT. This fragment in fs1 was also present in wild cucumber. Csa6M514870 encodes a PDF2-related protein, a homeodomain-leucine zipper IV transcription factor (CsHDZIV11/CsGL3) sharing high identity and similarity with proteins related to trichome formation or epidermal cell differentiation. Quantitative reverse-transcription PCR revealed a higher expression level of CsHDZIV11 in young fruits from fs1 compared to WT. A molecular marker based on this indel co-segregated with the spine density. This work provides a solid foundation not only for understanding the molecular mechanism of fruit spine density, but also for molecular breeding in cucumber.
Polfus, Linda M; Boerwinkle, Eric; Gibbs, Richard A; Metcalf, Ginger; Muzny, Donna; Veeraraghavan, Narayanan; Grove, Megan; Shete, Sanjay; Wallace, Stephanie; Milewicz, Dianna; Hanchard, Neil; Lupski, James R; Hashmi, Syed Shahrukh; Gupta-Malhotra, Monesha
2016-11-01
To comprehensively evaluate a European-American child with severe hypertension, whole-exome sequencing (WES) was performed on the child and parents, which identified causal variation of the proband's early-onset disease. The proband's hypertension was resistant to treatment, requiring a multiple drug regimen including amiloride, spironolactone, and hydrochlorothiazide. We suspected a monogenic form of hypertension because of the persistent hypokalemia with low plasma levels of renin and aldosterone. To address this, we focused on rare functional variants and indels, and performed gene-based tests incorporating linkage scores and allele frequency and filtered on deleterious functional mutations. Drawing upon clinical presentation, 27 genes were selected evidenced to cause monogenic hypertension and matched to the gene-based results. This resulted in the identification of a stop-gain mutation in an epithelial sodium channel (ENaC), SCNN1B , an established Liddle syndrome gene, shared by the child and her father. Interestingly, the father also harbored a missense mutation (p.Trp552Arg) in the α-subunit of the ENaC trimer, SCNN1A , possibly pointing to pseudohypoaldosteronism type I. This case is unique in that we present the early-onset disease and treatment response caused by a canonical stop-gain mutation (p.Arg566*) as well as ENaC digenic hits in the father, emphasizing the utility of WES informing precision medicine.
Park, Hyun-Chul; Kim, Kicheol; Nam, Younhyoung; Park, Jihye; Lee, Jinmyung; Lee, Hyehyeon; Kwon, Hansol; Jin, Hanjun; Kim, Wook; Kim, Won; Lim, Sikeun
2016-07-01
Allele frequencies for 23 autosomal short tandem repeat loci (D3S1358, vWA, D16S539, CSF1PO, TPOX, D8S1179, D21S11, D18S51, TH01, FGA, D5S818, D13S317, D7S820, D2S441, D19S433, D22S1045, D10S1248, D1S1656, D12S391, D2S1338, SE33, Penta D, Penta E), 1 Y-chromosome short tandem repeat locus (DYS391) and Y indel were obtained from 1000 unrelated individuals of the Korean population. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Genetic diversity of 38 insertion-deletion polymorphisms in Jewish populations.
Ferragut, J F; Pereira, R; Castro, J A; Ramon, C; Nogueiro, I; Amorim, A; Picornell, A
2016-03-01
Population genetic data of 38 non-coding biallelic autosomal indels are reported for 466 individuals, representing six populations with Jewish ancestry (Ashkenazim, Mizrahim, Sephardim, North African, Chuetas and Bragança crypto-Jews). Intra-population diversity and forensic parameters values showed that this set of indels was highly informative for forensic applications in the Jewish populations studied. Genetic distance analysis demonstrated that this set of markers efficiently separates populations from different continents, but does not seem effective for molecular anthropology studies in Mediterranean region. Finally, it is important to highlight that although the genetic distances between Jewish populations were small, significant differences were observed for Chuetas and Bragança Jews, and therefore, specific databases must be used for these populations. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Pandey, Ram Vinay; Kofler, Robert; Orozco-terWengel, Pablo; Nolte, Viola; Schlötterer, Christian
2011-03-02
The enormous potential of natural variation for the functional characterization of genes has been neglected for a long time. Only since recently, functional geneticists are starting to account for natural variation in their analyses. With the new sequencing technologies it has become feasible to collect sequence information for multiple individuals on a genomic scale. In particular sequencing pooled DNA samples has been shown to provide a cost-effective approach for characterizing variation in natural populations. While a range of software tools have been developed for mapping these reads onto a reference genome and extracting SNPs, linking this information to population genetic estimators and functional information still poses a major challenge to many researchers. We developed PoPoolation DB a user-friendly integrated database. Popoolation DB links variation in natural populations with functional information, allowing a wide range of researchers to take advantage of population genetic data. PoPoolation DB provides the user with population genetic parameters (Watterson's θ or Tajima's π), Tajima's D, SNPs, allele frequencies and indels in regions of interest. The database can be queried by gene name, chromosomal position, or a user-provided query sequence or GTF file. We anticipate that PoPoolation DB will be a highly versatile tool for functional geneticists as well as evolutionary biologists. PoPoolation DB, available at http://www.popoolation.at/pgt, provides an integrated platform for researchers to investigate natural polymorphism and associated functional annotations from UCSC and Flybase genome browsers, population genetic estimators and RNA-seq information.
2014-01-01
Background Variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. Results In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. Conclusions As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality. PMID:24755115
Molecular evolution of the mitochondrial 12S rRNA in Ungulata (mammalia).
Douzery, E; Catzeflis, F M
1995-11-01
The complete 12S rRNA gene has been sequenced in 4 Ungulata (hoofed eutherians) and 1 marsupial and compared to 38 available mammalian sequences in order to investigate the molecular evolution of the mitochondrial small-subunit ribosomal RNA molecule. Ungulata were represented by one artiodactyl (the collared peccary, Tayassu tajacu, suborder Suiformes), two perissodactyls (the Grevy's zebra, Equus grevyi, suborder Hippomorpha; the white rhinoceros, Ceratotherium simum, suborder Ceratomorpha), and one hyracoid (the tree hyrax, Dendrohyrax dorsalis). The fifth species was a marsupial, the eastern gray kangaroo (Macropus giganteus). Several transition/transversion biases characterized the pattern of changes between mammalian 12S rRNA molecules. A bias toward transitions was found among 12S rRNA sequences of Ungulata, illustrating the general bias exhibited by ribosomal and protein-encoding genes of the mitochondrial genome. The derivation of a mammalian 12S rRNA secondary structure model from the comparison of 43 eutherian and marsupial sequences evidenced a pronounced bias against transversions in stems. Moreover, transversional compensatory changes were rare events within double-stranded regions of the ribosomal RNA. Evolutionary characteristics of the 12S rRNA were compared with those of the nuclear 18S and 28S rRNAs. From a phylogenetic point of view, transitions, transversions and indels in stems as well as transversional and indels events in loops gave congruent results for comparisons within orders. Some compensatory changes in double-stranded regions and some indels in single-stranded regions also constituted diagnostic events. The 12S rRNA molecule confirmed the monophyly of infraorder Pecora and order Cetacea and demonstrated the monophyly of the suborder Ruminantia was not supported and the branching pattern between Cetacea and the artiodacytyl suborders Ruminantia and Suiformes was not established. The monophyly of the order Perissodactyla was evidenced, but the relationships between Artiodactyla, Cetacea, and Perissodactyla remained unresolved. Nevertheless, we found no support for a Perissodactyla + Hyracoidea clade, neither with distance approach, nor with parsimony reconstruction. The 12S rRNA was useful to solve intraordinal relationships among Ungulata, but it seemed to harbor too few informative positions to decipher the bushlike radiation of some Ungulata orders, an event which has most probably occurred in a short span of time between 55 and 70 MYA.
Beyond bilateral symmetry: geometric morphometric methods for any type of symmetry
2011-01-01
Background Studies of symmetric structures have made important contributions to evolutionary biology, for example, by using fluctuating asymmetry as a measure of developmental instability or for investigating the mechanisms of morphological integration. Most analyses of symmetry and asymmetry have focused on organisms or parts with bilateral symmetry. This is not the only type of symmetry in biological shapes, however, because a multitude of other types of symmetry exists in plants and animals. For instance, some organisms have two axes of reflection symmetry (biradial symmetry; e.g. many algae, corals and flowers) or rotational symmetry (e.g. sea urchins and many flowers). So far, there is no general method for the shape analysis of these types of symmetry. Results We generalize the morphometric methods currently used for the shape analysis of bilaterally symmetric objects so that they can be used for analyzing any type of symmetry. Our framework uses a mathematical definition of symmetry based on the theory of symmetry groups. This approach can be used to divide shape variation into a component of symmetric variation among individuals and one or more components of asymmetry. We illustrate this approach with data from a colonial coral that has ambiguous symmetry and thus can be analyzed in multiple ways. Our results demonstrate that asymmetric variation predominates in this dataset and that its amount depends on the type of symmetry considered in the analysis. Conclusions The framework for analyzing symmetry and asymmetry is suitable for studying structures with any type of symmetry in two or three dimensions. Studies of complex symmetries are promising for many contexts in evolutionary biology, such as fluctuating asymmetry, because these structures can potentially provide more information than structures with bilateral symmetry. PMID:21958045
Guo, Xiao-Hui; Wu, Bi-Hua; Hu, Xi-Gui; Bi, Zhe-Guang; Wang, Zhen-Zhen; Liu, Deng-Cai; Zheng, You-Liang
2013-03-01
Two y-type high molecular weight glutenin subunits (HMW-GSs) 1Ay12 and 1Ay8 from the two accessions PI560720 and PI345186 of cultivated einkorn wheat (Triticum monococcum ssp. monococcum, AA, 2n=2x=14), were identified by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The mobility of 1Ay12 and 1Ay8 was similar to that of 1Dy12 and 1By8 from common wheat Chinese Spring, respectively. Their ORFs respectively consisted of 1812bp and 1935bp, encoding 602 and 643 amino acid residues with the four typical structural domains of HMW-GS including signal peptide, conserved N-, and C-terminal and central repetitive domains. Compared with the most similar active 1Ay alleles previous published, there were a total of 15 SNPs and 2 InDels in them. Their encoding functions were confirmed by successful heterogeneous expression. The two novel 1Ay alleles were named as 1Ay12 and 1Ay8 with the accession No. JQ318694 and JQ318695 in GenBank, respectively. The two alleles were classed into the two distinct groups, Phe-type and Cys-type, which might be relevant to the differentiation of Glu-A1-2 alleles. Of which, 1Ay8 belonged to Cys-type group, and its protein possessed an additional conserved cysteine residue in central repetitive region besides the six common ones in N- and C-terminal regions of Phe-type group, and was the second longest in all the known active 1Ay alleles. These results suggested that the subunit 1Ay8 of cultivated einkorn wheat accession PI345186 might have a potential ability to strengthen the gluten polymer interactions and be a valuable genetic resource for wheat quality improvement. Copyright © 2012 Elsevier B.V. All rights reserved.
Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations
NASA Astrophysics Data System (ADS)
Müller, Meinard; Kurth, Frank
2006-12-01
One major goal of structural analysis of an audio recording is to automatically extract the repetitive structure or, more generally, the musical form of the underlying piece of music. Recent approaches to this problem work well for music, where the repetitions largely agree with respect to instrumentation and tempo, as is typically the case for popular music. For other classes of music such as Western classical music, however, musically similar audio segments may exhibit significant variations in parameters such as dynamics, timbre, execution of note groups, modulation, articulation, and tempo progression. In this paper, we propose a robust and efficient algorithm for audio structure analysis, which allows to identify musically similar segments even in the presence of large variations in these parameters. To account for such variations, our main idea is to incorporate invariance at various levels simultaneously: we design a new type of statistical features to absorb microvariations, introduce an enhanced local distance measure to account for local variations, and describe a new strategy for structure extraction that can cope with the global variations. Our experimental results with classical and popular music show that our algorithm performs successfully even in the presence of significant musical variations.
Canafoglia, Laura; Gennaro, Elena; Capovilla, Giuseppe; Gobbi, Giuseppe; Boni, Antonella; Beccaria, Francesca; Viri, Maurizio; Michelucci, Roberto; Agazzi, Pamela; Assereto, Stefania; Coviello, Domenico A; Di Stefano, Maria; Rossi Sebastiano, Davide; Franceschetti, Silvana; Zara, Federico
2012-12-01
Unverricht-Lundborg disease (EPM1A) is frequently due to an unstable expansion of a dodecamer repeat in the CSTB gene, whereas other types of mutations are rare. EPM1A due to homozygous expansion has a rather stereotyped presentation with prominent action myoclonus. We describe eight patients with five different compound heterozygous CSTB point or indel mutations in order to highlight their particular phenotypical presentations and evaluate their genotype-phenotype relationships. We screened CSTB mutations by means of Southern blotting and the sequencing of the genomic DNA of each proband. CSTB messenger RNA (mRNA) aberrations were characterized by sequencing the complementary DNA (cDNA) of lymphoblastoid cells, and assessing the protein concentrations in the lymphoblasts. The patient evaluations included the use of a simplified myoclonus severity rating scale, multiple neurophysiologic tests, and electroencephalography (EEG)-polygraphic recordings. To highlight the particular clinical features and disease time-course in compound heterozygous patients, we compared some of their characteristics with those observed in a series of 40 patients carrying the common homozygous expansion mutation observed at the C. Besta Foundation, Milan, Italy. The eight compound heterozygous patients belong to six EPM1A families (out of 52; 11.5%) diagnosed at the Laboratory of Genetics of the Galliera Hospitals in Genoa, Italy. They segregated five different heterozygous point or indel mutations in association with the common dodecamer expansion. Four patients from three families had previously reported CSTB mutations (c.67-1G>C and c.168+1_18del); one had a novel nonsense mutation at the first exon (c.133C>T) leading to a premature stop codon predicting a short peptide; the other three patients from two families had a complex novel indel mutation involving the donor splice site of intron 2 (c.168+2_169+21delinsAA) and leading to an aberrant transcript with a partially retained intron. The protein dose (cystatin B/β-actin) in our heterozygous patients was 0.24 ± 0.02, which is not different from that assessed in patients bearing the homozygous dodecamer expansion. The compound heterozygous patients had a significantly earlier disease onset (7.4 ± 1.7 years) than the homozygous patients, and their disease presentations included frequent myoclonic seizures and absences, often occurring in clusters throughout the course of the disease. The seizures were resistant to the pharmacologic treatments that usually lead to complete seizure control in homozygous patients. EEG-polygraphy allowed repeated seizures to be recorded. Action myoclonus progressively worsened and all of the heterozygous patients older than 30 years were in wheelchairs. Most of the patients showed moderate to severe cognitive impairment, and six had psychiatric symptoms. EPM1A due to compound heterozygous CSTB mutations presents with variable but often markedly severe and particular phenotypes. Most of our patients presented with the electroclinical features of severe epilepsy, which is unexpected in homozygous patients, and showed frequent seizures resistant to pharmacologic treatment. The presence of variable phenotypes (even in siblings) suggests interactions with other genetic factors influencing the final disease presentation. Wiley Periodicals, Inc. © 2012 International League Against Epilepsy.
Pereira, Rui; Phillips, Christopher; Pinto, Nádia; Santos, Carla; dos Santos, Sidney Emanuel Batista; Amorim, António; Carracedo, Ángel; Gusmão, Leonor
2012-01-01
Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control association studies, but also in population and forensic genetics studies. This work presents a set of 46 ancestry-informative insertion deletion polymorphisms selected to efficiently measure population admixture proportions of four different origins (African, European, East Asian and Native American). All markers are analyzed in short fragments (under 230 basepairs) through a single PCR followed by capillary electrophoresis (CE) allowing a very simple one tube PCR-to-CE approach. HGDP-CEPH diversity panel samples from the four groups, together with Oceanians, were genotyped to evaluate the efficiency of the assay in clustering populations from different continental origins and to establish reference databases. In addition, other populations from diverse geographic origins were tested using the HGDP-CEPH samples as reference data. The results revealed that the AIM-INDEL set developed is highly efficient at inferring the ancestry of individuals and provides good estimates of ancestry proportions at the population level. In conclusion, we have optimized the multiplexed genotyping of 46 AIM-INDELs in a simple and informative assay, enabling a more straightforward alternative to the commonly available AIM-SNP typing methods dependent on complex, multi-step protocols or implementation of large-scale genotyping technologies. PMID:22272242
Impact of inhomogeneity on SH-type wave propagation in an initially stressed composite structure
NASA Astrophysics Data System (ADS)
Saha, S.; Chattopadhyay, A.; Singh, A. K.
2018-02-01
The present analysis has been made on the influence of distinct form of inhomogeneity in a composite structure comprised of double superficial layers lying over a half-space, on the phase velocity of SH-type wave propagating through it. Propagation of SH-type wave in the said structure has been examined in four distinct cases of inhomogeneity viz. when inhomogeneity in double superficial layer is due to exponential variation in density only (Case I); when inhomogeneity in double superficial layers is due to exponential variation in rigidity only (Case II); when inhomogeneity in double superficial layer is due to exponential variation in rigidity, density and initial stress (Case III) and when inhomogeneity in double superficial layer is due to linear variation in rigidity, density and initial stress (Case IV). Closed-form expression of dispersion relation has been accomplished for all four aforementioned cases through extensive application of Debye asymptotic analysis. Deduced dispersion relations for all the cases are found in well-agreement to the classical Love-wave equation. Numerical computation has been carried out to graphically demonstrate the effect of inhomogeneity parameters, initial stress parameters as well as width ratio associated with double superficial layers in the composite structure for each of the four aforesaid cases on dispersion curve. Meticulous examination of distinct cases of inhomogeneity and initial stress in context of considered problem has been carried out with detailed analysis in a comparative approach.
Raman structural studies of the nickel electrode
NASA Technical Reports Server (NTRS)
Cornilsen, B. C.
1985-01-01
Raman spectroscopy is sensitive to empirically controlled nickel electrode structural variations, and has unique potential for structural characterization of these materials. How the structure relates to electrochemical properties is examined so that the latter can be more completely understood, controlled, and optimized. Electrodes were impregnated and cycled, and cyclic voltammetry is being used for electrochemical characterization. Structural variation was observed which has escaped detection using other methods. Structural changes are induced by: (1) cobalt doping, (2) the state of change or discharge, (3) the preparation conditions and type of buffer used, and (4) the formation process. Charged active mass has an NiOOH-type structure, agreeing with X-ray diffraction results. Discharged active mass, however, is not isostructural with beta-Ni(OH)2. Chemically prepared alpha phases are not isostructural either. A disordered structural model, containing point defects, is proposed for the cycled materials. This model explains K(+) incorporation. Band assignments were made and spectra interpreted for beta-Ni(OH)2, electrochemical NiOOH and chemically precipitated NiOOH.
Niinemets, Ülo; Keenan, Trevor F.; Hallik, Lea
2018-01-01
Summary Extensive within-canopy light gradients importantly affect photosynthetic productivity of leaves in different canopy positions and lead to light-dependent increases in foliage photosynthetic capacity per area (AA). However, the controls on AA variations by changes in underlying traits are poorly known. We constructed an unprecedented worldwide database including 831 within-canopy gradients with standardized light estimates for 304 species belonging to major vascular plant functional types, and analyzed within-canopy variations in 12 key foliage structural, chemical and physiological traits by quantitatively separating the contributions of different traits to photosynthetic acclimation. Although the light-dependent increase in AA is surprisingly similar in different plant functional types, they fundamentally differ in the share of the controls on AA by constituent traits. Species with high rates of canopy development and leaf turnover exhibiting highly dynamic light environments, actively change AA by nitrogen reallocation among and partitioning within leaves. In contrast, species with slow leaf turnover exhibit a passive AA acclimation response primarily determined by acclimation of leaf structure to growth light. This review emphasizes that different combinations of traits are responsible for within-canopy photosynthetic acclimation in different plant functional types and solves an old enigma of the role of mass- vs. area-based traits in vegetation acclimation. PMID:25318596
Zhang, Qi-Lin; Zhang, Li; Zhao, Tian-Xuan; Wang, Juan; Zhu, Qian-Hua; Chen, Jun-Yuan; Yuan, Ming-Long
2017-04-30
The adaptive evolution of animals to high-elevation environments has been extensively studied in vertebrates, while few studies have focused on insects. Gynaephora species (Lepidoptera: Lymantriinae) are endemic to the Qinghai-Tibetan Plateau (QTP) and represent an important insect pest of alpine meadows. Here, we present a detailed comparative analysis of the mitochondrial genomes (mitogenomes) of two Gynaephora species inhabiting different high-elevation environments: G. alpherakii and G. menyuanensis. The results indicated that the general mitogenomic features (genome size, nucleotide composition, codon usage and secondary structures of tRNAs) were well conserved between the two species. All of mitochondrial protein-coding genes were evolving under purifying selection, suggesting that selection constraints may play a role in ensuring adequate energy production. However, a number of substitutions and indels were identified that altered the protein conformations of ATP8 and NAD1, which may be the result of adaptive evolution of the two Gynaephora species to different high-elevation environments. Levels of gene expression for nine mitochondrial genes in nine different developmental stages were significantly suppressed in G. alpherakii, which lives at the higher elevation (~4800m above sea level), suggesting that gene expression patterns could be modulated by atmospheric oxygen content and environmental temperature. These results enhance our understanding of the genetic bases for the adaptive evolution of insects endemic to the QTP. Copyright © 2017 Elsevier B.V. All rights reserved.
Ryynänen, Heikki J; Primmer, Craig R
2006-01-01
Background Single nucleotide polymorphisms (SNPs) represent the most abundant type of DNA variation in the vertebrate genome, and their applications as genetic markers in numerous studies of molecular ecology and conservation of natural populations are emerging. Recent large-scale sequencing projects in several fish species have provided a vast amount of data in public databases, which can be utilized in novel SNP discovery in salmonids. However, the suggested duplicated nature of the salmonid genome may hamper SNP characterization if the primers designed in conserved gene regions amplify multiple loci. Results Here we introduce a new intron-primed exon-crossing (IPEC) method in an attempt to overcome this duplication problem, and also evaluate different priming methods for SNP discovery in Atlantic salmon (Salmo salar) and other salmonids. A total of 69 loci with differing priming strategies were screened in S. salar, and 27 of these produced ~13 kb of high-quality sequence data consisting of 19 SNPs or indels (one per 680 bp). The SNP frequency and the overall nucleotide diversity (3.99 × 10-4) in S. salar was lower than reported in a majority of other organisms, which may suggest a relative young population history for Atlantic salmon. A subset of primers used in cross-species analyses revealed considerable variation in the SNP frequencies and nucleotide diversities in other salmonids. Conclusion Sequencing success was significantly higher with the new IPEC primers; thus the total number of loci to screen in order to identify one potential polymorphic site was six times less with this new strategy. Given that duplication may hamper SNP discovery in some species, the IPEC method reported here is an alternative way of identifying novel polymorphisms in such cases. PMID:16872523
Jeong, Hyeonsoo; Song, Ki-Duk; Seo, Minseok; Caetano-Anollés, Kelsey; Kim, Jaemin; Kwak, Woori; Oh, Jae-Don; Kim, EuiSoo; Jeong, Dong Kee; Cho, Seoae; Kim, Heebal; Lee, Hak-Kyo
2015-08-20
Natural and artificial selection following domestication has led to the existence of more than a hundred pig breeds, as well as incredible variation in phenotypic traits. Berkshire pigs are regarded as having superior meat quality compared to other breeds. As the meat production industry seeks selective breeding approaches to improve profitable traits such as meat quality, information about genetic determinants of these traits is in high demand. However, most of the studies have been performed using trained sensory panel analysis without investigating the underlying genetic factors. Here we investigate the relationship between genomic composition and this phenotypic trait by scanning for signatures of positive selection in whole-genome sequencing data. We generated genomes of 10 Berkshire pigs at a total of 100.6 coverage depth, using the Illumina Hiseq2000 platform. Along with the genomes of 11 Landrace and 13 Yorkshire pigs, we identified genomic variants of 18.9 million SNVs and 3.4 million Indels in the mapped regions. We identified several associated genes related to lipid metabolism, intramuscular fatty acid deposition, and muscle fiber type which attribute to pork quality (TG, FABP1, AKIRIN2, GLP2R, TGFBR3, JPH3, ICAM2, and ERN1) by applying between population statistical tests (XP-EHH and XP-CLR). A statistical enrichment test was also conducted to detect breed specific genetic variation. In addition, de novo short sequence read assembly strategy identified several candidate genes (SLC25A14, IGF1, PI4KA, CACNA1A) as also contributing to lipid metabolism. Results revealed several candidate genes involved in Berkshire meat quality; most of these genes are involved in lipid metabolism and intramuscular fat deposition. These results can provide a basis for future research on the genomic characteristics of Berkshire pigs.
Fiotti, Nicola; Calvagna, Cristiano; Sgorlon, Giada; Altamura, Nicola; Pitacco, Paola; Zamolo, Francesca; Di Girolamo, Filippo Giorgio; Chiarandini, Stefano; Biolo, Gianni; Adovasio, Roberto
2018-06-01
The objective of this study was to assess whether functional genetic polymorphisms of matrix metalloproteinases (MMPs) 1, 3, 9, and 12 are associated with arterial enlargements or aneurysms of the thoracic aorta or popliteal arteries in patients with abdominal aortic aneurysm (AAA). The associations between MMP1 (-1607 G in/del, rs1799750), MMP3 (-1171 A in/del rs35068180), MMP9 (13-26 CA repeats around -90, rs2234681, rs917576, rs917577), and MMP12 (G/T missense variation, rs652438) polymorphisms and enlargements or aneurysms of the thoracic aorta and popliteal arteries were tested in 169 consecutive AAA patients. Thoracic aorta enlargement or aneurysm (TE/A; maximum diameter, >35 mm) was detected in 34 patients (20.1% prevalence). MMP9 rs2234681 microsatellite was the only genetic determinant of TE/A in AAA patients (P = .003), followed by hypercholesterolemia and antiplatelet use. Carriers of both alleles with ≥22 CA repeats had a 5.9 (95% confidence interval, 1.9-18.6; P < .0001) increased odds of TE/A, and a score considering all three variables showed 98% negative predictive value and 30% positive predictive value for thoracic aortic aneurysm detection. Eighty-two popliteal artery enlargements or aneurysms (diameter >10 mm) occurred in 55 patients (33.1% prevalence). Carriers of MMP12 rs652438 C allele showed an 18% (P = .006) increased diameter in popliteal arteries and a 2.8 (95% confidence interval, 1.3-6; P = .008) increased odds of popliteal artery enlargement or aneurysm compared with TT genotype. Among patients with AAA, carriers of homozygous ≥22 CA repeats in MMP9 rs12234681 and of C allele in MMP12 rs652438 have a substantial risk of carrying thoracic and popliteal enlargements, respectively. Copyright © 2017 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
Defying stereotypes: the elusive search for a universal model of LysR-type regulation.
Momany, Cory; Neidle, Ellen L
2012-02-01
LysR-type transcriptional regulators (LTTRs) compose the largest family of homologous regulators in bacteria. Considering their prevalence, it is not surprising that LTTRs control diverse metabolic functions. Arguably, the most unexpected aspect of LTTRs is the paucity of available structural information. Solubility issues are notoriously problematic, and structural studies have only recently begun to flourish. In this issue of Molecular Microbiology, Taylor et al. (2012) present the structure of AphB, a LysR-type regulator of virulence in Vibrio cholerae. This contribution adds significantly to the group of known full-length atomic LTTR structures, which remains small. Importantly, this report also describes an active-form variant. Small conformational changes in the effector-binding domain translate to global reorganization of the DNA-binding domain. Emerging from these results is a model of theme-and-variation among LTTRs rather than a unified regulatory scheme. Despite common structural folds, LTTRs exhibit differences in oligomerization, promoter recognition and communication with RNA polymerase. Such variation mirrors the diversity in sequence and function associated with members of this very large family. © 2012 Blackwell Publishing Ltd.
46 CFR 160.021-5 - Labeling and marking.
Code of Federal Regulations, 2010 CFR
2010-10-01
..., showing in clear, indelible black lettering on a red background, the following wording and information...: Stand with back to wind and point away from body when igniting or flare is burning. Service Life...
50 CFR 600.507 - Recordkeeping.
Code of Federal Regulations, 2010 CFR
2010-10-01
... indelible ink, with corrections to be accomplished by lining out and rewriting, rather than erasure. (i) Alternative log formats. As an alternative to the use of the specific formats provided, a Nation may submit a...
50 CFR 600.507 - Recordkeeping.
Code of Federal Regulations, 2011 CFR
2011-10-01
... indelible ink, with corrections to be accomplished by lining out and rewriting, rather than erasure. (i) Alternative log formats. As an alternative to the use of the specific formats provided, a Nation may submit a...
Cardoso, Sergio; Sevillano, Rubén; Gamarra, David; Santurtún, Ana; Martínez-Jarreta, Begoña; de Pancorbo, Marian M
2017-03-01
Insertion-deletions have been reported very useful markers for forensic purposes. To further deepen in this matter, 38 non-coding bi-allelic autosomal indels were analyzed in 575 individuals representing six populations from the northern fringe of the Iberian Peninsula. Autochthonous populations from the Basque Country, northern Navarre, the Pas Valley in Cantabria and Aragon were analyzed, together with non-autochthonous populations from the Basque Country and northern Navarre. At the intra-population level, all loci analyzed were in Hardy-Weinberg equilibrium except for marker rs33917182 in autochthonous Basques. Linkage disequilibrium (LD) test did not reveal statistically significant allelic association between the different loci pairs in all six populations. Forensic parameters proved to be highly informative in the six populations analyzed, even if a scenario with population substructure and local inbreeding was considered for match probability calculations, and the potential of this indels set to be used in combination with other genetic markers is remarkable. As for inter-population analyses, in general terms the six populations showed low but statistically significant genetic distances. However, though this indels set efficiently differentiate between main ancestries, it does not allow an accurate separation at a local level and, for the time being, their combination with other informative markers is needed to maximize the power to accurately differentiate populations with close genetic ancestry. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Torres, Sandra Regina Rachadel; Uehara, Clineu Julien Seki; Sutter-Latorre, Ana Frederica; de Almeida, Bibiana Sgorla; Sauerbier, Tania Streck; Muniz, Yara Costa Netto; Marrero, Andrea Rita; de Souza, Ilíada Rainha
2014-08-01
The application of DNA technology in forensic investigations has grown rapidly in the last 25 years and with an exponential increase of short tandem repeats (STRs) data, usually presented as allele frequencies, that may be later used as databases for forensic and population genetics purposes. Thereby, classes of molecular markers such as single nucleotide polymorphisms and insertions/deletions (InDels) have been presented as another option of genetic marker sets. These markers can be used in paternity cases, when mutations in STR polymorphisms are present, as well as in highly degraded DNA analysis. In the present study, the allele frequencies and heterozygosity (H) of a 30 InDel markers set were determined and the forensic efficacy was evaluated through estimation of discrimination power (DP), match probability, typical paternity index and power of paternity exclusion in 108 unrelated volunteers from the State of Santa Catarina (South Brazil). The observed H per locus showed a range between 0.370 and 0.574 (mean = 0.479). HLD128 was the locus with the highest DP (DP = 0.656). DP for all markers combined was greater than 99.9999999999646 % which provides satisfactory levels of information for forensic demands. Genetic comparisons (exact tests of population differentiation and pairwise genetic distances) revealed that the population of Santa Catarina State differs from Korea and USA Afro-American populations but is similar to the Portuguese, German, Polish, Spanish and Basque populations.
On the challenges of using field spectroscopy to measure the impact of soil type on leaf traits
NASA Astrophysics Data System (ADS)
Nunes, Matheus H.; Davey, Matthew P.; Coomes, David A.
2017-07-01
Understanding the causes of variation in functional plant traits is a central issue in ecology, particularly in the context of global change. Spectroscopy is increasingly used for rapid and non-destructive estimation of foliar traits, but few studies have evaluated its accuracy when assessing phenotypic variation in multiple traits. Working with 24 chemical and physical leaf traits of six European tree species growing on strongly contrasting soil types (i.e. deep alluvium versus nearby shallow chalk), we asked (i) whether variability in leaf traits is greater between tree species or soil type, and (ii) whether field spectroscopy is effective at predicting intraspecific variation in leaf traits as well as interspecific differences. Analysis of variance showed that interspecific differences in traits were generally much stronger than intraspecific differences related to soil type, accounting for 25 % versus 5 % of total trait variation, respectively. Structural traits, phenolic defences and pigments were barely affected by soil type. In contrast, foliar concentrations of rock-derived nutrients did vary: P and K concentrations were lower on chalk than alluvial soils, while Ca, Mg, B, Mn and Zn concentrations were all higher, consistent with the findings of previous ecological studies. Foliar traits were predicted from 400 to 2500 nm reflectance spectra collected by field spectroscopy using partial least square regression, a method that is commonly employed in chemometrics. Pigments were best modelled using reflectance data from the visible region (400-700 nm), while all other traits were best modelled using reflectance data from the shortwave infrared region (1100-2500 nm). Spectroscopy delivered accurate predictions of species-level variation in traits. However, it was ineffective at detecting intraspecific variation in rock-derived nutrients (with the notable exception of P). The explanation for this failure is that rock-derived elements do not have absorption features in the 400-2500 nm region, and their estimation is indirect, relying on elemental concentrations covarying with structural traits that do have absorption features in that spectral region (constellation effects
). Since the structural traits did not vary with soil type, it was impossible for our regression models to predict intraspecific variation in rock-derived nutrients via constellation effects. This study demonstrates the value of spectroscopy for rapid, non-destructive estimation of foliar traits across species, but highlights problems with predicting intraspecific variation indirectly. We discuss the implications of these findings for mapping functional traits by airborne imaging spectroscopy.
Positive selection in the SLC11A1 gene in the family Equidae.
Bayerova, Zuzana; Janova, Eva; Matiasovic, Jan; Orlando, Ludovic; Horin, Petr
2016-05-01
Immunity-related genes are a suitable model for studying effects of selection at the genomic level. Some of them are highly conserved due to functional constraints and purifying selection, while others are variable and change quickly to cope with the variation of pathogens. The SLC11A1 gene encodes a transporter protein mediating antimicrobial activity of macrophages. Little is known about the patterns of selection shaping this gene during evolution. Although it is a typical evolutionarily conserved gene, functionally important polymorphisms associated with various diseases were identified in humans and other species. We analyzed the genomic organization, genetic variation, and evolution of the SLC11A1 gene in the family Equidae to identify patterns of selection within this important gene. Nucleotide SLC11A1 sequences were shown to be highly conserved in ten equid species, with more than 97 % sequence identity across the family. Single nucleotide polymorphisms (SNPs) were found in the coding and noncoding regions of the gene. Seven codon sites were identified to be under strong purifying selection. Codons located in three regions, including the glycosylated extracellular loop, were shown to be under diversifying selection. A 3-bp indel resulting in a deletion of the amino acid 321 in the predicted protein was observed in all horses, while it has been maintained in all other equid species. This codon comprised in an N-glycosylation site was found to be under positive selection. Interspecific variation in the presence of predicted N-glycosylation sites was observed.
Bison PRNP genotyping and potential association with Brucella spp. seroprevalence
Seabury, C.M.; Halbert, N.D.; Gogan, P.J.P.; Templeton, J.W.; Derr, J.N.
2005-01-01
The implication that host cellular prion protein (PrPC) may function as a cell surface receptor and/or portal protein for Brucella abortus in mice prompted an evaluation of nucleotide and amino acid variation within exon 3 of the prion protein gene (PRNP) for six US bison populations. A non-synonymous single nucleotide polymorphism (T50C), resulting in the predicted amino acid replacement M17T (Met ??? Thr), was identified in each population. To date, no variation (T50: Met) has been detected at the corresponding exon 3 nucleotide and/or amino acid position for domestic cattle. Notably, 80% (20 of 25) of the Yellowstone National Park bison possessing the C/C genotype were Brucella spp. seropositive, representing a significant (P = 0.021) association between seropositivity and the C/C genotypic class. Moreover, significant differences in the distribution of PRNP exon 3 alleles and genotypes were detected between Yellowstone National Park bison and three bison populations that were either founded from seronegative stock or previously subjected to test-and-slaughter management to eradicate brucellosis. Unlike domestic cattle, no indel polymorphisms were detected within the corresponding regions of the putative bison PRNP promoter, intron 1, octapeptide repeat region or 3???-untranslated region for any population examined. This study provides the first evidence of a potential association between nucleotide variation within PRNP exon 3 and the presence of Brucella spp. antibodies in bison, implicating PrPC in the natural resistance of bison to brucellosis infection. ?? 2005 International Society for Animal Genetics.
Abdallah, Abdallah M.; Hill-Cawthorne, Grant A.; Otto, Thomas D.; Coll, Francesc; Guerra-Assunção, José Afonso; Gao, Ge; Naeem, Raeece; Ansari, Hifzur; Malas, Tareq B.; Adroub, Sabir A.; Verboom, Theo; Ummels, Roy; Zhang, Huoming; Panigrahi, Aswini Kumar; McNerney, Ruth; Brosch, Roland; Clark, Taane G.; Behr, Marcel A.; Bitter, Wilbert; Pain, Arnab
2015-01-01
Although Bacillus Calmette-Guérin (BCG) vaccines against tuberculosis have been available for more than 90 years, their effectiveness has been hindered by variable protective efficacy and a lack of lasting memory responses. One factor contributing to this variability may be the diversity of the BCG strains that are used around the world, in part from genomic changes accumulated during vaccine production and their resulting differences in gene expression. We have compared the genomes and transcriptomes of a global collection of fourteen of the most widely used BCG strains at single base-pair resolution. We have also used quantitative proteomics to identify key differences in expression of proteins across five representative BCG strains of the four tandem duplication (DU) groups. We provide a comprehensive map of single nucleotide polymorphisms (SNPs), copy number variation and insertions and deletions (indels) across fourteen BCG strains. Genome-wide SNP characterization allowed the construction of a new and robust phylogenic genealogy of BCG strains. Transcriptional and proteomic profiling revealed a metabolic remodeling in BCG strains that may be reflected by altered immunogenicity and possibly vaccine efficacy. Together, these integrated-omic data represent the most comprehensive catalogue of genetic variation across a global collection of BCG strains. PMID:26487098
The humankind genome: from genetic diversity to the origin of human diseases.
Belizário, Jose E
2013-12-01
Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease's etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.
Ferret: a user-friendly Java tool to extract data from the 1000 Genomes Project.
Limou, Sophie; Taverner, Andrew M; Winkler, Cheryl A
2016-07-15
The 1000 Genomes (1KG) Project provides a near-comprehensive resource on human genetic variation in worldwide reference populations. 1KG variants can be accessed through a browser and through the raw and annotated data that are regularly released on an ftp server. We developed Ferret, a user-friendly Java tool, to easily extract genetic variation information from these large and complex data files. From a locus, gene(s) or SNP(s) of interest, Ferret retrieves genotype data for 1KG SNPs and indels, and computes allelic frequencies for 1KG populations and optionally, for the Exome Sequencing Project populations. By converting the 1KG data into files that can be imported into popular pre-existing tools (e.g. PLINK and HaploView), Ferret offers a straightforward way, even for non-bioinformatics specialists, to manipulate, explore and merge 1KG data with the user's dataset, as well as visualize linkage disequilibrium pattern, infer haplotypes and design tagSNPs. Ferret tool and source code are publicly available at http://limousophie35.github.io/Ferret/ ferret@nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Jheng, Cheng-Fong; Chen, Tien-Chih; Lin, Jhong-Yi; Chen, Ting-Chieh; Wu, Wen-Luan; Chang, Ching-Chun
2012-07-01
The chloroplast genome of Phalaenopsis equestris was determined and compared to those of Phalaenopsis aphrodite and Oncidium Gower Ramsey in Orchidaceae. The chloroplast genome of P. equestris is 148,959 bp, and a pair of inverted repeats (25,846 bp) separates the genome into large single-copy (85,967 bp) and small single-copy (11,300 bp) regions. The genome encodes 109 genes, including 4 rRNA, 30 tRNA and 75 protein-coding genes, but loses four ndh genes (ndhA, E, F and H) and seven other ndh genes are pseudogenes. The rate of inter-species variation between the two moth orchids was 0.74% (1107 sites) for single nucleotide substitution and 0.24% for insertions (161 sites; 1388 bp) and deletions (189 sites; 1393 bp). The IR regions have a lower rate of nucleotide substitution (3.5-5.8-fold) and indels (4.3-7.1-fold) than single-copy regions. The intergenic spacers are the most divergent, and based on the length variation of the three intergenic spacers, 11 native Phalaenopsis orchids could be successfully distinguished. The coding genes, IR junction and RNA editing sites are relatively more conserved between the two moth orchids than between those of Phalaenopsis and Oncidium spp. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Chen, Jun; Wang, Bo; Zhang, Yueli; Yue, Xiaopeng; Li, Zhaohong; Liu, Kede
2017-06-01
Rapeseed ( Brassica napus L.) is one of the most important oil crops almost all over the world. Seed-related traits, including oil content (OC), silique length (SL), seeds per silique (SS), and seed weight (SW), are primary targets for oil yield improvement. To dissect the genetic basis of these traits, 192 recombinant inbred lines (RILs) were derived from two parents with distinct oil content and silique length. High-density linkage map with a total length of 1610.4 cM were constructed using 1,329 double-digestion restriction site associated DNA (ddRAD) markers, 107 insertion/deletions (INDELs), and 90 well-distributed simple sequence repeats (SSRs) markers. A total of 37 consensus quantitative trait loci (QTLs) were detected for the four traits, with individual QTL explained 3.1-12.8% of the phenotypic variations. Interestingly, one OC consensus QTL ( cqOCA10b ) on chromosome A10 was consistently detected in all three environments, and explained 9.8% to 12.8% of the OC variation. The locus was further delimited into an approximately 614 kb genomic region, in which the flanking markers could be further evaluated for marker-assisted selection in rapeseed OC improvement and the candidate genes targeted for map-based cloning and genetic manipulation.
Quantifying side-chain conformational variations in protein structure
Miao, Zhichao; Cao, Yang
2016-01-01
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs. PMID:27845406
Quantifying side-chain conformational variations in protein structure
NASA Astrophysics Data System (ADS)
Miao, Zhichao; Cao, Yang
2016-11-01
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Quantifying side-chain conformational variations in protein structure.
Miao, Zhichao; Cao, Yang
2016-11-15
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Shimizu, Tokurou; Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
2016-01-01
Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy-Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies.
Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
2016-01-01
Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy–Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies. PMID:27902727
Ward, Jodie; Gilmore, Simon R; Robertson, James; Peakall, Rod
2009-11-01
Plant material is frequently encountered in criminal investigations but often overlooked as potential evidence. We designed a DNA-based molecular identification system for 100 Australian grasses that consisted of a series of polymerase chain reaction assays that enabled the progressive identification of grasses to different taxonomic levels. The identification system was based on DNA sequence variation at four chloroplast and two mitochondrial loci. Seventeen informative indels and 68 single-nucleotide polymorphisms were utilized as molecular markers for subfamily to species-level identification. To identify an unknown sample to subfamily level required a minimum of four markers or nine markers for species identification. The accuracy of the system was confirmed by blind tests. We have demonstrated "proof of concept" of a molecular identification system for trace botanical samples. Our evaluation suggests that the adoption of a system that combines this approach with DNA sequencing could assist the morphological identification of grasses found as forensic evidence.
46 CFR 160.037-5 - Labeling and marking.
Code of Federal Regulations, 2010 CFR
2010-10-01
..., showing in clear, indelible black lettering on an orange background, the following wording and information.... Caution: Stand with back to wind and point away from body when igniting or signal is burning. Service Life...
NASA Astrophysics Data System (ADS)
2018-05-01
Human manipulation of hydrocarbons — as fuel and raw materials for modern society — has changed our world and the indelible imprint we will leave in the rock record. Plastics alone have permeated our lives and every corner of our planet.
[A new type of flagellar structure. Type 9+n
1977-01-01
The ultrastructural study of the Eoacanthocephala sperm cell shows a variation from 0 to 5 in the number of the axial fibers in the axoneme. All the species of the order Eoacanthocephala available to us show this variation; moreover, every individual possesses simultaneously several different structural types. So, we are dealing with a new flagellar organization: 9+n, with 0 less than or equal to n less than or equal to 5. In the Quadrigyridae and the Tenuisentidae families, n varies from 0 to 4, with a maximum of 2 for most individuals, exceptionally at 1 for some individuals. In the Neoechinorhynchidae family, n varies from 0 to 5 with a conspicuous prevalence of 3 (from 84 to 99%, according to the individual). These results prompted us to reexamine the two other orders of Acanthocephala in which the structural types 9+2 or 9+0 have been considered as fixed. Indeed, we have found a few flagella the structure of which is different from the prevalent one. It seems, therefore, that the number of the central fibers of the axoneme in the Acanthocephala sperm cell is never absolutely fixed. PMID:557042
Su, Rina; Cheng, Junhui; Chen, Dima; Bai, Yongfei; Jin, Hua; Chao, Lumengqiqige; Wang, Zhijun; Li, Junqing
2017-02-28
Grasslands worldwide are suffering from overgrazing, which greatly alters plant community structure and ecosystem functioning. However, the general effects of grazing on community structure and ecosystem function at spatial and temporal scales has rarely been examined synchronously in the same grassland. Here, during 2011-2013, we investigated community structure (cover, height, and species richness) and aboveground biomass (AGB) using 250 paired field sites (grazed vs. fenced) across three vegetation types (meadow, typical, and desert steppes) on the Inner Mongolian Plateau. Grazing, vegetation type, and year all had significant effects on cover, height, species richness, and AGB, although the primary factor influencing variations in these variables was vegetation type. Spatially, grazing significantly reduced the measured variables in meadow and typical steppes, whereas no changes were observed in desert steppe. Temporally, both linear and quadratic relationships were detected between growing season precipitation and cover, height, richness, or AGB, although specific relationships varied among observation years and grazing treatments. In each vegetation type, the observed community properties were significantly correlated with each other, and the shape of the relationship was unaffected by grazing treatment. These findings indicate that vegetation type is the most important factor to be considered in grazing management for this semi-arid grassland.
NASA Astrophysics Data System (ADS)
Randall, Jan A.; McCowan, Brenda; Collins, Kellie C.; Hooper, Stacie L.; Rogovin, Konstantin
2005-10-01
The great gerbil, Rhombomys opinus, is a highly social rodent that usually lives in family groups consisting of related females, their offspring, and an adult male. The gerbils emit alarm vocalizations in the presence of diverse predators with different hunting tactics. Alarm calls were recorded in response to three predators, a monitor lizard, hunting dog, and human, to determine whether the most common call type, the rhythmic call, is functionally referential with regard to type of predator. Results show variation in the alarm calls of both adults and subadults with the type of predator. Discriminant function analysis classified an average of 70% of calls to predator type. Call variation, however, was not limited to the predator context, because signal structure also differed by sex, age, individual callers, and family groups. These variations illustrate the flexibility of the rhythmic alarm call of the great gerbil and how it might have multiple functions and communicate in multiple contexts. Three alarm calls, variation in the rhythmic call, and vibrational signals generated from foot-drumming provide the gerbils with a varied and multi-channel acoustic repertoire.
Rostgaard Nielsen, Lene; Brandes, Ursula; Dahl Kjaer, Erik; Fjellheim, Siri
2016-06-01
Cytisus scoparius is a global invasive species that affects local flora and fauna at the intercontinental level. Its natural distribution spans across Europe, but seeds have also been moved among countries, mixing plants of native and non-native genetic origins. Hybridization between the introduced and native gene pool is likely to threaten both the native gene pool and the local flora. In this study, we address the potential threat of invasive C. scoparius to local gene pools in vulnerable heathlands. We used nuclear single nucleotide polymorphic (SNP) and simple sequence repeat (SSR) markers together with plastid SSR and indel markers to investigate the level and direction of gene flow between invasive and native heathland C. scoparius. Analyses of population structures confirmed the presence of two gene pools: one native and the other invasive. The nuclear genome of the native types was highly introgressed with the invasive genome, and we observed advanced-generation hybrids, suggesting that hybridization has been occurring for several generations. There is asymmetrical gene flow from the invasive to the native gene pool, which can be attributed to higher fecundity in the invasive individuals, measured by the number of flowers and seed pods. Strong spatial genetic structure in plastid markers and weaker structure in nuclear markers suggest that seeds spread over relatively short distances and that gene flow over longer distances is mainly facilitated by pollen dispersal. We further show that the growth habits of heathland plants become more vigorous with increased introgression from the invaders. Implications of the findings are discussed in relation to future management of invading C. scoparius. © 2016 John Wiley & Sons Ltd.
Wade, M; Tucker, I; Cunningham, P; Skinner, R; Bell, F; Lyons, T; Patten, K; Gonzalez, L; Wess, T
2013-10-01
Human hair is a major determinant of visual ethnic differentiation. Although hair types are celebrated as part of our ethnic diversity, the approach to hair care has made the assumption that hair types are structurally and chemically similar. Although this is clearly not the case at the macroscopic level, the intervention of many hair treatments is at the nanoscopic and molecular levels. The purpose of the work presented here is to identify the main nanoscopic and molecular hierarchical differences across five different ethnic hair types from hair fibres taken exclusively from the scalp. These are Afro (subdivided into elastic 'rubber' and softer non-elastic 'soft'), Chinese, European and Mullato (mixed race). Small angle X-Ray scattering (SAXS) is a technique capable of resolving nanostructural variations in complex materials. Individual hair fibres from different ethnic hair types were used to investigate structural features found in common and also specific to each type. Simultaneous wide angle X-Ray scattering (WAXS) was used to analyse the submolecular level structure of the fibrous keratin present. The data sets from both techniques were analysed with principal component analysis (PCA) to identify underlying variables. Principal component analysis of both SAXS and WAXS data was shown to discriminate the scattering signal between different hair types. The X-ray scattering results show a common underlying keratin intermediate filament (KIF) structure. However, distinct differences were observed in the preferential orientation and intensity signal from the lipid component of the hair. In addition, differences were observed in the intensity distribution of the very low-angle sample-dependent diffuse scatter surrounding the 'beamstop.' The results indicate that the fibrous keratin scaffold remains consistent between ethnic hair types. The hierarchies made by these may be modulated by variation in the content of keratin-associated proteins (KAPs) and lipids that alter the interfacial structures and lead to macroscopic differences in hair morphology. © 2013 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
Niinemets, Ülo; Keenan, Trevor F; Hallik, Lea
2015-02-01
Extensive within-canopy light gradients importantly affect the photosynthetic productivity of leaves in different canopy positions and lead to light-dependent increases in foliage photosynthetic capacity per area (AA). However, the controls on AA variations by changes in underlying traits are poorly known. We constructed an unprecedented worldwide database including 831 within-canopy gradients with standardized light estimates for 304 species belonging to major vascular plant functional types, and analyzed within-canopy variations in 12 key foliage structural, chemical and physiological traits by quantitative separation of the contributions of different traits to photosynthetic acclimation. Although the light-dependent increase in AA is surprisingly similar in different plant functional types, they differ fundamentally in the share of the controls on AA by constituent traits. Species with high rates of canopy development and leaf turnover, exhibiting highly dynamic light environments, actively change AA by nitrogen reallocation among and partitioning within leaves. By contrast, species with slow leaf turnover exhibit a passive AA acclimation response, primarily determined by the acclimation of leaf structure to growth light. This review emphasizes that different combinations of traits are responsible for within-canopy photosynthetic acclimation in different plant functional types, and solves an old enigma of the role of mass- vs area-based traits in vegetation acclimation. © 2014 The Authors. New Phytologist © 2014 New Phytologist Trust.
2011-01-01
Background A number of molecular marker linkage maps have been developed for melon (Cucumis melo L.) over the last two decades. However, these maps were constructed using different marker sets, thus, making comparative analysis among maps difficult. In order to solve this problem, a consensus genetic map in melon was constructed using primarily highly transferable anchor markers that have broad potential use for mapping, synteny, and comparative quantitative trait loci (QTL) analysis, increasing breeding effectiveness and efficiency via marker-assisted selection (MAS). Results Under the framework of the International Cucurbit Genomics Initiative (ICuGI, http://www.icugi.org), an integrated genetic map has been constructed by merging data from eight independent mapping experiments using a genetically diverse array of parental lines. The consensus map spans 1150 cM across the 12 melon linkage groups and is composed of 1592 markers (640 SSRs, 330 SNPs, 252 AFLPs, 239 RFLPs, 89 RAPDs, 15 IMAs, 16 indels and 11 morphological traits) with a mean marker density of 0.72 cM/marker. One hundred and ninety-six of these markers (157 SSRs, 32 SNPs, 6 indels and 1 RAPD) were newly developed, mapped or provided by industry representatives as released markers, including 27 SNPs and 5 indels from genes involved in the organic acid metabolism and transport, and 58 EST-SSRs. Additionally, 85 of 822 SSR markers contributed by Syngenta Seeds were included in the integrated map. In addition, 370 QTL controlling 62 traits from 18 previously reported mapping experiments using genetically diverse parental genotypes were also integrated into the consensus map. Some QTL associated with economically important traits detected in separate studies mapped to similar genomic positions. For example, independently identified QTL controlling fruit shape were mapped on similar genomic positions, suggesting that such QTL are possibly responsible for the phenotypic variability observed for this trait in a broad array of melon germplasm. Conclusions Even though relatively unsaturated genetic maps in a diverse set of melon market types have been published, the integrated saturated map presented herein should be considered the initial reference map for melon. Most of the mapped markers contained in the reference map are polymorphic in diverse collection of germplasm, and thus are potentially transferrable to a broad array of genetic experimentation (e.g., integration of physical and genetic maps, colinearity analysis, map-based gene cloning, epistasis dissection, and marker-assisted selection). PMID:21797998
Diaz, Aurora; Fergany, Mohamed; Formisano, Gelsomina; Ziarsolo, Peio; Blanca, José; Fei, Zhanjun; Staub, Jack E; Zalapa, Juan E; Cuevas, Hugo E; Dace, Gayle; Oliver, Marc; Boissot, Nathalie; Dogimont, Catherine; Pitrat, Michel; Hofstede, René; van Koert, Paul; Harel-Beja, Rotem; Tzuri, Galil; Portnoy, Vitaly; Cohen, Shahar; Schaffer, Arthur; Katzir, Nurit; Xu, Yong; Zhang, Haiying; Fukino, Nobuko; Matsumoto, Satoru; Garcia-Mas, Jordi; Monforte, Antonio J
2011-07-28
A number of molecular marker linkage maps have been developed for melon (Cucumis melo L.) over the last two decades. However, these maps were constructed using different marker sets, thus, making comparative analysis among maps difficult. In order to solve this problem, a consensus genetic map in melon was constructed using primarily highly transferable anchor markers that have broad potential use for mapping, synteny, and comparative quantitative trait loci (QTL) analysis, increasing breeding effectiveness and efficiency via marker-assisted selection (MAS). Under the framework of the International Cucurbit Genomics Initiative (ICuGI, http://www.icugi.org), an integrated genetic map has been constructed by merging data from eight independent mapping experiments using a genetically diverse array of parental lines. The consensus map spans 1150 cM across the 12 melon linkage groups and is composed of 1592 markers (640 SSRs, 330 SNPs, 252 AFLPs, 239 RFLPs, 89 RAPDs, 15 IMAs, 16 indels and 11 morphological traits) with a mean marker density of 0.72 cM/marker. One hundred and ninety-six of these markers (157 SSRs, 32 SNPs, 6 indels and 1 RAPD) were newly developed, mapped or provided by industry representatives as released markers, including 27 SNPs and 5 indels from genes involved in the organic acid metabolism and transport, and 58 EST-SSRs. Additionally, 85 of 822 SSR markers contributed by Syngenta Seeds were included in the integrated map. In addition, 370 QTL controlling 62 traits from 18 previously reported mapping experiments using genetically diverse parental genotypes were also integrated into the consensus map. Some QTL associated with economically important traits detected in separate studies mapped to similar genomic positions. For example, independently identified QTL controlling fruit shape were mapped on similar genomic positions, suggesting that such QTL are possibly responsible for the phenotypic variability observed for this trait in a broad array of melon germplasm. Even though relatively unsaturated genetic maps in a diverse set of melon market types have been published, the integrated saturated map presented herein should be considered the initial reference map for melon. Most of the mapped markers contained in the reference map are polymorphic in diverse collection of germplasm, and thus are potentially transferrable to a broad array of genetic experimentation (e.g., integration of physical and genetic maps, colinearity analysis, map-based gene cloning, epistasis dissection, and marker-assisted selection).
Indel Group in Genomes (IGG) Molecular Genetic Markers1[OPEN
Burkart-Waco, Diana; Kuppu, Sundaram; Britt, Anne; Chetelat, Roger
2016-01-01
Genetic markers are essential when developing or working with genetically variable populations. Indel Group in Genomes (IGG) markers are primer pairs that amplify single-locus sequences that differ in size for two or more alleles. They are attractive for their ease of use for rapid genotyping and their codominant nature. Here, we describe a heuristic algorithm that uses a k-mer-based approach to search two or more genome sequences to locate polymorphic regions suitable for designing candidate IGG marker primers. As input to the IGG pipeline software, the user provides genome sequences and the desired amplicon sizes and size differences. Primer sequences flanking polymorphic insertions/deletions are produced as output. IGG marker files for three sets of genomes, Solanum lycopersicum/Solanum pennellii, Arabidopsis (Arabidopsis thaliana) Columbia-0/Landsberg erecta-0 accessions, and S. lycopersicum/S. pennellii/Solanum tuberosum (three-way polymorphic) are included. PMID:27436831
Genetic and Functional Dissection of HTRA1 and LOC387715 in Age-Related Macular Degeneration
Zeng, Jiexi; Lu, Fang; Sun, Xufang; Zhao, Chao; Wang, Kevin; Davey, Lisa; Chen, Haoyu; London, Nyall; Muramatsu, Daisuke; Salasar, Francesca; Carmona, Ruben; Kasuga, Daniel; Wang, Xiaolei; Bedell, Matthew; Dixie, Manjuxia; Zhao, Peiquan; Yang, Ruifu; Gibbs, Daniel; Liu, Xiaoqi; Li, Yan; Li, Cai; Li, Yuanfeng; Campochiaro, Betsy; Constantine, Ryan; Zack, Donald J.; Campochiaro, Peter; Fu, Yinbin; Li, Dean Y.; Katsanis, Nicholas; Zhang, Kang
2010-01-01
A common haplotype on 10q26 influences the risk of age-related macular degeneration (AMD) and encompasses two genes, LOC387715 and HTRA1. Recent data have suggested that loss of LOC387715, mediated by an insertion/deletion (in/del) that destabilizes its message, is causally related with the disorder. Here we show that loss of LOC387715 is insufficient to explain AMD susceptibility, since a nonsense mutation (R38X) in this gene that leads to loss of its message resides in a protective haplotype. At the same time, the common disease haplotype tagged by the in/del and rs11200638 has an effect on the transcriptional upregulation of the adjacent gene, HTRA1. These data implicate increased HTRA1 expression in the pathogenesis of AMD and highlight the importance of exploring multiple functional consequences of alleles in haplotypes that confer susceptibility to complex traits. PMID:20140183
Françoso, Elaine; Gomes, Fernando; Arias, Maria Cristina
2016-07-01
Nuclear mitochondrial DNA insertions (NUMTs) are mitochondrial DNA sequences that have been transferred into the nucleus and are recognized by the presence of indels and stop codons. Although NUMTs have been identified in a diverse range of species, their discovery was frequently accidental. Here, our initial goal was to develop and standardize a simple method for isolating NUMTs from the nuclear genome of a single bee. Subsequently, we tested our new protocol by determining whether the indels and stop codons of the cytochrome c oxidase subunit I (COI) sequence of Melipona flavolineata are of nuclear origin. The new protocol successfully demonstrated the presence of a COI NUMT. In addition to NUMT investigations, the protocol described here will also be very useful for studying mitochondrial mutations related to diseases and for sequencing complete mitochondrial genomes with high read coverage by Next-Generation technology.
Ferreira Palha, Teresinha de Jesus Brabo; Ribeiro Rodrigues, Elzemar Martins; Cavalcante, Giovanna Chaves; Marrero, Andrea; de Souza, Ilíada Rainha; Seki Uehara, Clineu Julien; Silveira da Motta, Carlos Henrique Ares; Koshikene, Daniela; da Silva, Dayse Aparecida; de Carvalho, Elizeu Fagundes; Chemale, Gustavo; Freitas, Jorge M; Alexandre, Lídia; Paranaiba, Renato T F; Soler, Mirella Perruccio; Santos, Sidney
2015-11-01
The aim of this study was to estimate the diversity of 30 insertion/deletion (INDEL) markers (Investigator(®) DIPplex kit) in a sample of 519 individuals from six Brazilian states and to evaluate their applicability in forensic genetics. All INDEL markers were found to be highly polymorphic in the Brazilian population and were in Hardy-Weinberg equilibrium. To determine their forensic suitability in the Brazilian population, the markers were evaluated for discrimination power, match probability and exclusion power. The combined discrimination power (CDP), combined match power (CMP) and combined power of exclusion (CPE) were higher than 0.999999, 3.4 × 10(-13) and 0.9973, respectively. Further comparison of 29 worldwide populations revealed significant genetic differences between continental populations and a closer relationship between the Brazilian and European populations. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Optimized MOL-PCR for Characterization of Microbial Pathogens.
Wuyts, Véronique; Roosens, Nancy H C; Bertrand, Sophie; Marchal, Kathleen; De Keersmaecker, Sigrid C J
2016-01-06
Characterization of microbial pathogens is necessary for surveillance, outbreak detection, and tracing of outbreak sources. This unit describes a multiplex oligonucleotide ligation-PCR (MOL-PCR) optimized for characterization of microbial pathogens. With MOL-PCR, different types of markers, like unique sequences, single-nucleotide polymorphisms (SNPs) and indels, can be simultaneously analyzed in one assay. This assay consists of a multiplex ligation for detection of the markers, a singleplex PCR for signal amplification, and hybridization to MagPlex-TAG beads for readout on a Luminex platform after fluorescent staining. The current protocol describes the MOL-PCR, as well as methods for DNA isolation, probe design, and data interpretation and it is based on an optimized MOL-PCR assay for subtyping of Salmonella Typhimurium. Copyright © 2016 John Wiley & Sons, Inc.
Cas9-nickase-mediated genome editing corrects hereditary tyrosinemia in rats.
Shao, Yanjiao; Wang, Liren; Guo, Nana; Wang, Shengfei; Yang, Lei; Li, Yajing; Wang, Mingsong; Yin, Shuming; Han, Honghui; Zeng, Li; Zhang, Ludi; Hui, Lijian; Ding, Qiurong; Zhang, Jiqin; Geng, Hongquan; Liu, Mingyao; Li, Dali
2018-05-04
Hereditary tyrosinemia type I (HTI) is a metabolic genetic disorder caused by mutation of fumarylacetoacetate hydrolase (FAH). Because of the accumulation of toxic metabolites, HTI causes severe liver cirrhosis, liver failure, and even hepatocellular carcinoma. HTI is an ideal model for gene therapy, and several strategies have been shown to ameliorate HTI symptoms in animal models. Although CRISPR/Cas9-mediated genome editing is able to correct the Fah mutation in mouse models, WT Cas9 induces numerous undesired mutations that have raised safety concerns for clinical applications. To develop a new method for gene correction with high fidelity, we generated a Fah mutant rat model to investigate whether Cas9 nickase (Cas9n)-mediated genome editing can efficiently correct the Fah First, we confirmed that Cas9n rarely induces indels in both on-target and off-target sites in cell lines. Using WT Cas9 as a positive control, we delivered Cas9n and the repair donor template/single guide (sg)RNA through adenoviral vectors into HTI rats. Analyses of the initial genome editing efficiency indicated that only WT Cas9 but not Cas9n causes indels at the on-target site in the liver tissue. After receiving either Cas9n or WT Cas9-mediated gene correction therapy, HTI rats gained weight steadily and survived. Fah-expressing hepatocytes occupied over 95% of the liver tissue 9 months after the treatment. Moreover, CRISPR/Cas9-mediated gene therapy prevented the progression of liver cirrhosis, a phenotype that could not be recapitulated in the HTI mouse model. These results strongly suggest that Cas9n-mediated genome editing is a valuable and safe gene therapy strategy for this genetic disease. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
McRobb, Evan; Sarovich, Derek S; Price, Erin P; Kaestli, Mirjam; Mayo, Mark; Keim, Paul; Currie, Bart J
2015-04-01
Melioidosis, a disease of public health importance in Southeast Asia and northern Australia, is caused by the Gram-negative soil bacillus Burkholderia pseudomallei. Melioidosis is typically acquired through environmental exposure, and case clusters are rare, even in regions where the disease is endemic. B. pseudomallei is classed as a tier 1 select agent by the Centers for Disease Control and Prevention; from a biodefense perspective, source attribution is vital in an outbreak scenario to rule out a deliberate release. Two cases of melioidosis within a 3-month period at a residence in rural northern Australia prompted an investigation to determine the source of exposure. B. pseudomallei isolates from the property's groundwater supply matched the multilocus sequence type of the clinical isolates. Whole-genome sequencing confirmed the water supply as the probable source of infection in both cases, with the clinical isolates differing from the likely infecting environmental strain by just one single nucleotide polymorphism (SNP) each. For the first time, we report a phylogenetic analysis of genomewide insertion/deletion (indel) data, an approach conventionally viewed as problematic due to high mutation rates and homoplasy. Our whole-genome indel analysis was concordant with the SNP phylogeny, and these two combined data sets provided greater resolution and a better fit with our epidemiological chronology of events. Collectively, this investigation represents a highly accurate account of source attribution in a melioidosis outbreak and gives further insight into a frequently overlooked reservoir of B. pseudomallei. Our methods and findings have important implications for outbreak source tracing of this bacterium and other highly recombinogenic pathogens. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
McRobb, Evan; Kaestli, Mirjam; Mayo, Mark; Keim, Paul
2015-01-01
Melioidosis, a disease of public health importance in Southeast Asia and northern Australia, is caused by the Gram-negative soil bacillus Burkholderia pseudomallei. Melioidosis is typically acquired through environmental exposure, and case clusters are rare, even in regions where the disease is endemic. B. pseudomallei is classed as a tier 1 select agent by the Centers for Disease Control and Prevention; from a biodefense perspective, source attribution is vital in an outbreak scenario to rule out a deliberate release. Two cases of melioidosis within a 3-month period at a residence in rural northern Australia prompted an investigation to determine the source of exposure. B. pseudomallei isolates from the property's groundwater supply matched the multilocus sequence type of the clinical isolates. Whole-genome sequencing confirmed the water supply as the probable source of infection in both cases, with the clinical isolates differing from the likely infecting environmental strain by just one single nucleotide polymorphism (SNP) each. For the first time, we report a phylogenetic analysis of genomewide insertion/deletion (indel) data, an approach conventionally viewed as problematic due to high mutation rates and homoplasy. Our whole-genome indel analysis was concordant with the SNP phylogeny, and these two combined data sets provided greater resolution and a better fit with our epidemiological chronology of events. Collectively, this investigation represents a highly accurate account of source attribution in a melioidosis outbreak and gives further insight into a frequently overlooked reservoir of B. pseudomallei. Our methods and findings have important implications for outbreak source tracing of this bacterium and other highly recombinogenic pathogens. PMID:25631791
AAuAl (A = Ca, Sc, and Ti): Peierls Distortion, Atomic Coloring, and Structural Competition
Pham, Joyce; Miller, Gordon J.
2018-04-02
Using density functional theory, the crystal structure variation of AAuAl (A = Ca, Sc, and Ti) from orthorhombic Co 2Si-type to distorted hexagonal Fe 2P-type and then Ni 2In-type structures is shown to correlate with their electronic structures and valence electron counts, sizes of the active metals A, and site preferences for Au and Al atoms, which are arranged to maximize Au–Al nearest neighbor contacts. An evaluation of chemical pressure imposed by the varying A metals using total energy vs volume calculations indicates that larger unit cell volumes favor the orthorhombic structure, whereas smaller volumes favor the hexagonal structures. Themore » electronic origin of the Mg 2Ga-type crystal structure of ScAuAl, refined as a distorted Fe 2P-type supercell doubled along the c-axis, indicates a Peierls-type distortion mechanism of the Au chains along the c-axis.« less
AAuAl (A = Ca, Sc, and Ti): Peierls Distortion, Atomic Coloring, and Structural Competition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, Joyce; Miller, Gordon J.
Using density functional theory, the crystal structure variation of AAuAl (A = Ca, Sc, and Ti) from orthorhombic Co 2Si-type to distorted hexagonal Fe 2P-type and then Ni 2In-type structures is shown to correlate with their electronic structures and valence electron counts, sizes of the active metals A, and site preferences for Au and Al atoms, which are arranged to maximize Au–Al nearest neighbor contacts. An evaluation of chemical pressure imposed by the varying A metals using total energy vs volume calculations indicates that larger unit cell volumes favor the orthorhombic structure, whereas smaller volumes favor the hexagonal structures. Themore » electronic origin of the Mg 2Ga-type crystal structure of ScAuAl, refined as a distorted Fe 2P-type supercell doubled along the c-axis, indicates a Peierls-type distortion mechanism of the Au chains along the c-axis.« less
Frahry, Matthew Blake; Sun, Cheng; Chong, Rebecca A; Mueller, Rachel Lockridge
2015-02-01
Across the tree of life, species vary dramatically in nuclear genome size. Mutations that add or remove sequences from genomes-insertions or deletions, or indels-are the ultimate source of this variation. Differences in the tempo and mode of insertion and deletion across taxa have been proposed to contribute to evolutionary diversity in genome size. Among vertebrates, most of the largest genomes are found within the salamanders, an amphibian clade with genome sizes ranging from ~14 to ~120 Gb. Salamander genomes have been shown to experience slower rates of DNA loss through small (i.e., <30 bp) deletions than do other vertebrate genomes. However, no studies have addressed DNA loss from salamander genomes resulting from larger deletions. Here, we focus on one type of large deletion-ectopic-recombination-mediated removal of LTR retrotransposon sequences. In ectopic recombination, double-strand breaks are repaired using a "wrong" (i.e., ectopic, or non-allelic) template sequence-typically another locus of similar sequence. When breaks occur within the LTR portions of LTR retrotransposons, ectopic-recombination-mediated repair can produce deletions that remove the internal transposon sequence and the equivalent of one of the two LTR sequences. These deletions leave a signature in the genome-a solo LTR sequence. We compared levels of solo LTRs in the genomes of four salamander species with levels present in five vertebrates with smaller genomes. Our results demonstrate that salamanders have low levels of solo LTRs, suggesting that ectopic-recombination-mediated deletion of LTR retrotransposons occurs more slowly than in other vertebrates with smaller genomes.
Comparison of Genomic and Epigenomic Expression in Monozygotic Twins Discordant for Rett Syndrome
Kunio, Miyake; Yang, Chunshu; Minakuchi, Yohei; Ohori, Kenta; Soutome, Masaki; Hirasawa, Takae; Kazuki, Yasuhiro; Adachi, Noboru; Suzuki, Seiko; Itoh, Masayuki; Goto, Yu-ichi; Andoh, Tomoko; Kurosawa, Hiroshi; Akamatsu, Wado; Ohyama, Manabu; Okano, Hideyuki; Oshimura, Mitsuo; Sasaki, Masayuki; Toyoda, Atsushi; Kubota, Takeo
2013-01-01
Monozygotic (identical) twins have been widely used in genetic studies to determine the relative contributions of heredity and the environment in human diseases. Discordance in disease manifestation between affected monozygotic twins has been attributed to either environmental factors or different patterns of X chromosome inactivation (XCI). However, recent studies have identified genetic and epigenetic differences between monozygotic twins, thereby challenging the accepted experimental model for distinguishing the effects of nature and nurture. Here, we report the genomic and epigenomic sequences in skin fibroblasts of a discordant monozygotic twin pair with Rett syndrome, an X-linked neurodevelopmental disorder characterized by autistic features, epileptic seizures, gait ataxia and stereotypical hand movements. The twins shared the same de novo mutation in exon 4 of the MECP2 gene (G269AfsX288), which was paternal in origin and occurred during spermatogenesis. The XCI patterns in the twins did not differ in lymphocytes, skin fibroblasts, and hair cells (which originate from ectoderm as does neuronal tissue). No reproducible differences were detected between the twins in single nucleotide polymorphisms (SNPs), insertion-deletion polymorphisms (indels), or copy number variations. Differences in DNA methylation between the twins were detected in fibroblasts in the upstream regions of genes involved in brain function and skeletal tissues such as Mohawk Homeobox (MKX), Brain-type Creatine Kinase (CKB), and FYN Tyrosine Kinase Protooncogene (FYN). The level of methylation in these upstream regions was inversely correlated with the level of gene expression. Thus, differences in DNA methylation patterns likely underlie the discordance in Rett phenotypes between the twins. PMID:23805272
Comparison of Genomic and Epigenomic Expression in Monozygotic Twins Discordant for Rett Syndrome.
Miyake, Kunio; Yang, Chunshu; Minakuchi, Yohei; Ohori, Kenta; Soutome, Masaki; Hirasawa, Takae; Kazuki, Yasuhiro; Adachi, Noboru; Suzuki, Seiko; Itoh, Masayuki; Goto, Yu-Ichi; Andoh, Tomoko; Kurosawa, Hiroshi; Oshimura, Mitsuo; Sasaki, Masayuki; Toyoda, Atsushi; Kubota, Takeo
2013-01-01
Monozygotic (identical) twins have been widely used in genetic studies to determine the relative contributions of heredity and the environment in human diseases. Discordance in disease manifestation between affected monozygotic twins has been attributed to either environmental factors or different patterns of X chromosome inactivation (XCI). However, recent studies have identified genetic and epigenetic differences between monozygotic twins, thereby challenging the accepted experimental model for distinguishing the effects of nature and nurture. Here, we report the genomic and epigenomic sequences in skin fibroblasts of a discordant monozygotic twin pair with Rett syndrome, an X-linked neurodevelopmental disorder characterized by autistic features, epileptic seizures, gait ataxia and stereotypical hand movements. The twins shared the same de novo mutation in exon 4 of the MECP2 gene (G269AfsX288), which was paternal in origin and occurred during spermatogenesis. The XCI patterns in the twins did not differ in lymphocytes, skin fibroblasts, and hair cells (which originate from ectoderm as does neuronal tissue). No reproducible differences were detected between the twins in single nucleotide polymorphisms (SNPs), insertion-deletion polymorphisms (indels), or copy number variations. Differences in DNA methylation between the twins were detected in fibroblasts in the upstream regions of genes involved in brain function and skeletal tissues such as Mohawk Homeobox (MKX), Brain-type Creatine Kinase (CKB), and FYN Tyrosine Kinase Protooncogene (FYN). The level of methylation in these upstream regions was inversely correlated with the level of gene expression. Thus, differences in DNA methylation patterns likely underlie the discordance in Rett phenotypes between the twins.
Meshach Paul, D; Chadah, Tania; Senthilkumar, B; Sethumadhavan, Rao; Rajasekaran, R
2017-11-03
The major candidate for multiple sulfatase deficiency is a defective formylglycine-generating enzyme (FGE). Though adequately produced, mutations in FGE stall the activation of sulfatases and prevent their activity. Missense mutations, viz. E130D, S155P, A177P, W179S, C218Y, R224W, N259I, P266L, A279V, C336R, R345C, A348P, R349Q and R349W associated with multiple sulfatase deficiency are yet to be computationally studied. Aforementioned mutants were initially screened through ws-SNPs&GO 3D program. Mutant R345C acquired the highest score, and hence was studied in detail. Discrete molecular dynamics explored structural distortions due to amino acid substitution. Therein, comparative analyses of wild type and mutant were carried out. Changes in structural contours were observed between wild type and mutant. Mutant had low conformational fluctuation, high atomic mobility and more compactness than wild type. Moreover, free energy landscape showed mutant to vary in terms of its conformational space as compared to wild type. Subsequently, wild type and mutant were subjected to single-model analyses. Mutant had lesser intra molecular interactions than wild type suggesting variations pertaining to its secondary structure. Furthermore, simulated thermal denaturation showed dissimilar pattern of hydrogen bond dilution. Effects of these variations were observed as changes in elements of secondary structure. Docking studies of mutant revealed less favourable binding energy towards its substrate as compared to wild type. Therefore, theoretical explanations for structural distortions of mutant R345C leading to multiple sulfatase deficiency were revealed. The protocol of the study could be useful to examine the effectiveness of pharmacological chaperones prior to experimental studies.
Bataillon, Thomas; Duan, Jinjie; Hvilsom, Christina; Jin, Xin; Li, Yingrui; Skov, Laurits; Glemin, Sylvain; Munch, Kasper; Jiang, Tao; Qian, Yu; Hobolth, Asger; Wang, Jun; Mailund, Thomas; Siegismund, Hans R; Schierup, Mikkel H
2015-03-30
We study genome-wide nucleotide diversity in three subspecies of extant chimpanzees using exome capture. After strict filtering, Single Nucleotide Polymorphisms and indels were called and genotyped for greater than 50% of exons at a mean coverage of 35× per individual. Central chimpanzees (Pan troglodytes troglodytes) are the most polymorphic (nucleotide diversity, θw = 0.0023 per site) followed by Eastern (P. t. schweinfurthii) chimpanzees (θw = 0.0016) and Western (P. t. verus) chimpanzees (θw = 0.0008). A demographic scenario of divergence without gene flow fits the patterns of autosomal synonymous nucleotide diversity well except for a signal of recent gene flow from Western into Eastern chimpanzees. The striking contrast in X-linked versus autosomal polymorphism and divergence previously reported in Central chimpanzees is also found in Eastern and Western chimpanzees. We show that the direction of selection statistic exhibits a strong nonmonotonic relationship with the strength of purifying selection S, making it inappropriate for estimating S. We instead use counts in synonymous versus nonsynonymous frequency classes to infer the distribution of S coefficients acting on nonsynonymous mutations in each subspecies. The strength of purifying selection we infer is congruent with the differences in effective sizes of each subspecies: Central chimpanzees are undergoing the strongest purifying selection followed by Eastern and Western chimpanzees. Coding indels show stronger selection against indels changing the reading frame than observed in human populations. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Chintalapati, Manjusha; Dannemann, Michael; Prüfer, Kay
2017-08-04
Small insertions and deletions occur in humans at a lower rate compared to nucleotide changes, but evolve under more constraint than nucleotide changes. While the evolution of insertions and deletions have been investigated using ape outgroups, the now available genome of a Neandertal can shed light on the evolution of indels in more recent times. We used the Neandertal genome together with several primate outgroup genomes to differentiate between human insertion/deletion changes that likely occurred before the split from Neandertals and those that likely arose later. Changes that pre-date the split from Neandertals show a smaller proportion of deletions than those that occurred later. The presence of a Neandertal-shared allele in Europeans or Asians but the absence in Africans was used to detect putatively introgressed indels in Europeans and Asians. A larger proportion of these variants reside in intergenic regions compared to other modern human variants, and some variants are linked to SNPs that have been associated with traits in modern humans. Our results are in agreement with earlier results that suggested that deletions evolve under more constraint than insertions. When considering Neandertal introgressed variants, we find some evidence that negative selection affected these variants more than other variants segregating in modern humans. Among introgressed variants we also identify indels that may influence the phenotype of their carriers. In particular an introgressed deletion associated with a decrease in the time to menarche may constitute an example of a former Neandertal-specific trait contributing to modern human phenotypic diversity.
The Role of Genetic Ancestry in Brazilian Patients With Primary Congenital Glaucoma.
Rolim, Hévila; Cronemberger, Sebastião; Rangel, Hayana; Batista, Wagner D; Bastos-Rodrigues, Luciana; De Marco, Luiz
2016-01-01
The relationship between clinical data and genetic ancestry in Brazilian patients with primary congenital glaucoma (PCG) was studied. Thirty patients with PCG and 60 unrelated controls underwent a complete ophthalmological examination. The PCG inclusion criterion was prior surgery with a minimum follow-up of 6 months after the last surgical procedure. Clinical data were recorded and DNA from each individual was extracted and genotyped for a panel of 40 validated ancestry-informative insertion-deletion DNA polymorphisms (indels). Eighteen (60%) children had bilateral disease and 16 (53.3%) were male. The mean age at diagnosis was 6.3 months and surgical follow-up time varied from 8 to 85 months. For the PCG group, the proportion of Europeans, Africans, and Amerindians was 0.784±0.044 (mean±SEM), 0.149±0.035, and 0.067±0.023, respectively, whereas for the control group was 0.730±0.048, 0.132±0.034, and 0.138±0.032, respectively. An increased proportion of African indels was associated with worse surgical prognosis (P=0.036). There was also a statistically significant (P<0.05) positive correlation between axial length and African component (initial: R=0.625; final: R=0.567). An increased proportion of African indels was associated with worse prognosis for PCG in a mixed population. Genetic ancestry markers may be helpful in assessing risk factors for surgical outcomes in PCG. Further studies are needed to unveil the role of ancestry in heterogeneous populations such as Brazilians with PCG.
Lara-Romero, Rocío; Gómez-Núñez, Luis; Cerriteño-Sánchez, José Luis; Márquez-Valdelamar, Laura; Mendoza-Elvira, Susana; Ramírez-Mendoza, Humberto; Rivera-Benítez, José Francisco
2018-04-01
In Mexico, the first outbreaks suggestive of the circulation of the porcine epidemic diarrhea virus (PEDV) were identified at the beginning of July 2013. To identify the molecular characteristics of the PEDV Spike (S) gene in Mexico, 116 samples of the intestine and diarrhea of piglets with clinical signs of porcine epidemic diarrhea (PED) were obtained. Samples were collected from 14 farms located in six states of Mexico (Jalisco, Puebla, Sonora, Veracruz, Guanajuato, and Michoacán) from 2013 to 2016. To identify PEDV, we used real-time RT-PCR to discriminate between non-INDEL and INDEL strains. We chose samples according to state and year to characterize the S gene. After amplification of the S gene, the obtained products were sequenced and assembled. The complete amino acid sequences of the spike protein were used to perform an epitope analysis, which was used to determine null mutations in regions SS2, SS6, and 2C10 compared to the sequences of G2. A phylogenetic analysis determined the circulation of G2b and INDEL strains in Mexico. However, several mutations were recorded in the collagenase equivalent (COE) region that were related to the change in polarity and charge of the amino acid residues. The PEDV strain circulating in Jalisco in 2016 has an insertion of three amino acids ( 232 LGL 234 ) and one change in the antigenic site of the COE region, and strains from the years 2015 and 2016 changed the index of the surface probability, which could be related to the re-emergence of disease outbreaks.
14 CFR 26.43 - Holders of and applicants for type certificates-Repairs.
Code of Federal Regulations, 2014 CFR
2014-01-01
... payload capacity of 7,500 pounds or more. (b) List of fatigue critical baseline structure. For airplanes...) Identify fatigue critical baseline structure for all airplane model variations and derivatives approved... affects fatigue critical baseline structure identified under paragraph (b)(1) of this section; (2) Perform...
14 CFR 26.43 - Holders of and applicants for type certificates-Repairs.
Code of Federal Regulations, 2011 CFR
2011-01-01
... payload capacity of 7,500 pounds or more. (b) List of fatigue critical baseline structure. For airplanes...) Identify fatigue critical baseline structure for all airplane model variations and derivatives approved... affects fatigue critical baseline structure identified under paragraph (b)(1) of this section; (2) Perform...
14 CFR 26.43 - Holders of and applicants for type certificates-Repairs.
Code of Federal Regulations, 2012 CFR
2012-01-01
... payload capacity of 7,500 pounds or more. (b) List of fatigue critical baseline structure. For airplanes...) Identify fatigue critical baseline structure for all airplane model variations and derivatives approved... affects fatigue critical baseline structure identified under paragraph (b)(1) of this section; (2) Perform...
14 CFR 26.43 - Holders of and applicants for type certificates-Repairs.
Code of Federal Regulations, 2013 CFR
2013-01-01
... payload capacity of 7,500 pounds or more. (b) List of fatigue critical baseline structure. For airplanes...) Identify fatigue critical baseline structure for all airplane model variations and derivatives approved... affects fatigue critical baseline structure identified under paragraph (b)(1) of this section; (2) Perform...
14 CFR 26.43 - Holders of and applicants for type certificates-Repairs.
Code of Federal Regulations, 2010 CFR
2010-01-01
... payload capacity of 7,500 pounds or more. (b) List of fatigue critical baseline structure. For airplanes...) Identify fatigue critical baseline structure for all airplane model variations and derivatives approved... affects fatigue critical baseline structure identified under paragraph (b)(1) of this section; (2) Perform...
Spiral phyllotaxis underlies constrained variation in Anemone (Ranunculaceae) tepal arrangement.
Kitazawa, Miho S; Fujimoto, Koichi
2018-05-01
Stabilization and variation of floral structures are indispensable for plant reproduction and evolution; however, the developmental mechanism regulating their structural robustness is largely unknown. To investigate this mechanism, we examined positional arrangement (aestivation) of excessively produced perianth organs (tepals) of six- and seven-tepaled (lobed) flowers in six Anemone species (Ranunculaceae). We found that the tepal arrangement that occurred in nature varied intraspecifically between spiral and whorled arrangements. Moreover, among the studied species, variation was commonly limited to three types, including whorls, despite five geometrically possible arrangements in six-tepaled flowers and two types among six possibilities in seven-tepaled flowers. A spiral arrangement, on the other hand, was unique to five-tepaled flowers. A spiral phyllotaxis model with stochasticity on initiating excessive primordia accounted for these limited variations in arrangement in cases when the divergence angle between preexisting primordia was less than 144°. Moreover, interspecific differences in the frequency of the observed arrangements were explained by the change of model parameters that represent meristematic growth and differential organ growth. These findings suggest that the phyllotaxis parameters are responsible for not only intraspecific stability but interspecific difference of floral structure. Decreasing arrangements from six-tepaled to seven-tepaled Anemone flowers demonstrate that the stabilization occurs as development proceeds to increase the component (organ) number, in contrast from the intuition that the variation will be larger due to increasing number of possible states (arrangements).
Cellulose microfibril structure: inspirations from plant diversity
NASA Astrophysics Data System (ADS)
Roberts, A. W.
2018-03-01
Cellulose microfibrils are synthesized at the plasma membrane by cellulose synthase catalytic subunits that associate to form cellulose synthesis complexes. Variation in the organization of these complexes underlies the variation in cellulose microfibril structure among diverse organisms. However, little is known about how the catalytic subunits interact to form complexes with different morphologies. We are using an evolutionary approach to investigate the roles of different catalytic subunit isoforms in organisms that have rosette-type cellulose synthesis complexes.
Caporale, Lynn Helena
2012-09-01
This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.
Röschenbleck, Joachim; Weinl, Stefan; Kudla, Jörg; Müller, Kai F.
2017-01-01
Geraniaceae are known for their unusual plastid genomes (plastomes), with the genus Pelargonium being most conspicuous with regard to plastome size and gene organization as judged by the sequenced plastomes of P. x hortorum and P. alternans. However, the hybrid origin of P. x hortorum and the uncertain phylogenetic position of P. alternans obscure the events that led to these extraordinary plastomes. Here, we examine all plastid reconfiguration hotspots for 60 Pelargonium species across all subgenera using a PCR and sequencing approach. Our reconstruction of the rearrangement history revealed four distinct plastome types. The ancestral plastome configuration in the two subgenera Magnipetala and Pelargonium is consistent with that of the P. alternans plastome, whereas that of the subgenus Parvulipetala deviates from this organization by one synapomorphic inversion in the trnNGUU–ndhF region. The plastome of P. x hortorum resembles those of one group of the subgenus Paucisignata, but differs from a second group by another inversion in the psaI–psaJ region. The number of microstructural changes and amount of repetitive DNA are generally elevated in all inverted regions. Nucleotide substitution rates correlate positively with the number of indels in all regions across the different subgenera. We also observed lineage- and species-specific changes in the gene content, including gene duplications and fragmentations. For example, the plastid rbcL–psaI region of Pelargonium contains a highly variable accD-like region. Our results suggest alternative evolutionary paths under possibly changing modes of plastid transmission and indicate the non-functionalization of the plastid accD gene in Pelargonium. PMID:28172771
Adaptive potential of genomic structural variation in human and mammalian evolution.
Radke, David W; Lee, Charles
2015-09-01
Because phenotypic innovations must be genetically heritable for biological evolution to proceed, it is natural to consider new mutation events as well as standing genetic variation as sources for their birth. Previous research has identified a number of single-nucleotide polymorphisms that underlie a subset of adaptive traits in organisms. However, another well-known class of variation, genomic structural variation, could have even greater potential to produce adaptive phenotypes, due to the variety of possible types of alterations (deletions, insertions, duplications, among others) at different genomic positions and with variable lengths. It is from these dramatic genomic alterations, and selection on their phenotypic consequences, that adaptations leading to biological diversification could be derived. In this review, using studies in humans and other mammals, we highlight examples of how phenotypic variation from structural variants might become adaptive in populations and potentially enable biological diversification. Phenotypic change arising from structural variants will be described according to their immediate effect on organismal metabolic processes, immunological response and physical features. Study of population dynamics of segregating structural variation can therefore provide a window into understanding current and historical biological diversification. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Evidence of Dynamic Crustal Deformation in Tohoku, Japan, From Time-Varying Receiver Functions
NASA Astrophysics Data System (ADS)
Porritt, R. W.; Yoshioka, S.
2017-10-01
Temporal variation of crustal structure is key to our understanding of Earth processes on human timescales. Often, we expect that the most significant structural variations are caused by strong ground shaking associated with large earthquakes, and recent studies seem to confirm this. Here we test the possibility of using P receiver functions (PRF) to isolate structural variations over time. Synthetic receiver function tests indicate that structural variation could produce PRF changes on the same order of magnitude as random noise or contamination by local earthquakes. Nonetheless, we find significant variability in observed receiver functions over time at several stations located in northeastern Honshu. Immediately following the Tohoku-oki earthquake, we observe high PRF variation clustering spatially, especially in two regions near the beginning and end of the rupture plane. Due to the depth sensitivity of PRF and the timescales over which this variability is observed, we infer this effect is primarily due to fluid migration in volcanic regions and shear stress/strength reorganization. While the noise levels in PRF are high for this type of analysis, by sampling small data sets, the computational cost is lower than other methods, such as ambient noise, thereby making PRF a useful tool for estimating temporal variations in crustal structure.
Structural basis of glycan specificity in neonate-specific bovine-human reassortant rotavirus
Hu, Liya; Ramani, Sasirekha; Czako, Rita; ...
2015-09-30
We report that strain-dependent variation of glycan recognition during initial cell attachment of viruses is a critical determinant of host specificity, tissue-tropism and zoonosis. Rotaviruses (RVs), which cause life-threatening gastroenteritis in infants and children, display significant genotype-dependent variations in glycan recognition resulting from sequence alterations in the VP8* domain of the spike protein VP4. The structural basis of this genotype-dependent glycan specificity, particularly in human RVs, remains poorly understood. Here, from crystallographic studies, we show how genotypic variations configure a novel binding site in the VP8* of a neonate-specific bovine-human reassortant to uniquely recognize either type I or type IImore » precursor glycans, and to restrict type II glycan binding in the bovine counterpart. In conclusion, such a distinct glycan-binding site that allows differential recognition of the precursor glycans, which are developmentally regulated in the neonate gut and abundant in bovine and human milk provides a basis for age-restricted tropism and zoonotic transmission of G10P[11] rotaviruses.« less
Structural basis of glycan specificity in neonate-specific bovine-human reassortant rotavirus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Liya; Ramani, Sasirekha; Czako, Rita
We report that strain-dependent variation of glycan recognition during initial cell attachment of viruses is a critical determinant of host specificity, tissue-tropism and zoonosis. Rotaviruses (RVs), which cause life-threatening gastroenteritis in infants and children, display significant genotype-dependent variations in glycan recognition resulting from sequence alterations in the VP8* domain of the spike protein VP4. The structural basis of this genotype-dependent glycan specificity, particularly in human RVs, remains poorly understood. Here, from crystallographic studies, we show how genotypic variations configure a novel binding site in the VP8* of a neonate-specific bovine-human reassortant to uniquely recognize either type I or type IImore » precursor glycans, and to restrict type II glycan binding in the bovine counterpart. In conclusion, such a distinct glycan-binding site that allows differential recognition of the precursor glycans, which are developmentally regulated in the neonate gut and abundant in bovine and human milk provides a basis for age-restricted tropism and zoonotic transmission of G10P[11] rotaviruses.« less
Bi-Hamiltonian Structure in 2-d Field Theory
NASA Astrophysics Data System (ADS)
Ferapontov, E. V.; Galvão, C. A. P.; Mokhov, O. I.; Nutku, Y.
We exhibit the bi-Hamiltonian structure of the equations of associativity (Witten-Dijkgraaf-Verlinde-Verlinde-Dubrovin equations) in 2-d topological field theory, which reduce to a single equation of Monge-Ampère type $ fttt}=f{xxt;;;;;2 - fxxx}f{xtt ,$ in the case of three primary fields. The first Hamiltonian structure of this equation is based on its representation as a 3-component system of hydrodynamic type and the second Hamiltonian structure follows from its formulation in terms of a variational principle with a degenerate Lagrangian.
Litter quality versus soil microbial community controls over decomposition: a quantitative analysis
Cleveland, Cory C.; Reed, Sasha C.; Keller, Adrienne B.; Nemergut, Diana R.; O'Neill, Sean P.; Ostertag, Rebecca; Vitousek, Peter M.
2014-01-01
The possible effects of soil microbial community structure on organic matter decomposition rates have been widely acknowledged, but are poorly understood. Understanding these relationships is complicated by the fact that microbial community structure and function are likely to both affect and be affected by organic matter quality and chemistry, thus it is difficult to draw mechanistic conclusions from field studies. We conducted a reciprocal soil inoculum × litter transplant laboratory incubation experiment using samples collected from a set of sites that have similar climate and plant species composition but vary significantly in bacterial community structure and litter quality. The results showed that litter quality explained the majority of variation in decomposition rates under controlled laboratory conditions: over the course of the 162-day incubation, litter quality explained nearly two-thirds (64 %) of variation in decomposition rates, and a smaller proportion (25 %) was explained by variation in the inoculum type. In addition, the relative importance of inoculum type on soil respiration increased over the course of the experiment, and was significantly higher in microcosms with lower litter quality relative to those with higher quality litter. We also used molecular phylogenetics to examine the relationships between bacterial community composition and soil respiration in samples through time. Pyrosequencing revealed that bacterial community composition explained 32 % of the variation in respiration rates. However, equal portions (i.e., 16 %) of the variation in bacterial community composition were explained by inoculum type and litter quality, reflecting the importance of both the meta-community and the environment in bacterial assembly. Taken together, these results indicate that the effects of changing microbial community composition on decomposition are likely to be smaller than the potential effects of climate change and/or litter quality changes in response to increasing atmospheric CO2 concentrations or atmospheric nutrient deposition.
Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology.
Pareek, Chandra Shekhar; Błaszczyk, Paweł; Dziuba, Piotr; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Pierzchała, Mariusz; Feng, Yaping; Kadarmideen, Haja N; Kumar, Dibyendu
2017-01-01
RNA-seq is a useful next-generation sequencing (NGS) technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs) in liver tissue of young bulls of the Polish Red, Polish Holstein-Friesian (HF) and Hereford breeds, and to understand the genomic variation in the three cattle breeds that may reflect differences in production traits. The RNA-seq experiment on bovine liver produced 107,114,4072 raw paired-end reads, with an average of approximately 60 million paired-end reads per library. Breed-wise, a total of 345.06, 290.04 and 436.03 million paired-end reads were obtained from the Polish Red, Polish HF, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed that 81.35%, 82.81% and 84.21% of the mapped sequencing reads were properly paired to the Polish Red, Polish HF, and Hereford breeds, respectively. This study identified 5,641,401 SNPs and insertion and deletion (indel) positions expressed in the bovine liver with an average of 313,411 SNPs and indel per young bull. Following the removal of the indel mutations, a total of 195,3804, 152,7120 and 205,3184 raw SNPs expressed in bovine liver were identified for the Polish Red, Polish HF, and Hereford breeds, respectively. Breed-wise, three highly reliable breed-specific SNP-databases (SNP-dbs) with 31,562, 24,945 and 28,194 SNP records were constructed for the Polish Red, Polish HF, and Hereford breeds, respectively. Using a combination of stringent parameters of a minimum depth of ≥10 mapping reads that support the polymorphic nucleotide base and 100% SNP ratio, 4,368, 3,780 and 3,800 SNP records were detected in the Polish Red, Polish HF, and Hereford breeds, respectively. The SNP detections using RNA-seq data were successfully validated by kompetitive allele-specific PCR (KASPTM) SNP genotyping assay. The comprehensive QTL/CG analysis of 110 QTL/CG with RNA-seq data identified 20 monomorphic SNP hit loci (CARTPT, GAD1, GDF5, GHRH, GHRL, GRB10, IGFBPL1, IGFL1, LEP, LHX4, MC4R, MSTN, NKAIN1, PLAG1, POU1F1, SDR16C5, SH2B2, TOX, UCP3 and WNT10B) in all three cattle breeds. However, six SNP loci (CCSER1, GHR, KCNIP4, MTSS1, EGFR and NSMCE2) were identified as highly polymorphic among the cattle breeds. This study identified breed-specific SNPs with greater SNP ratio and excellent mapping coverage, as well as monomorphic and highly polymorphic putative SNP loci within QTL/CGs of bovine liver tissue. A breed-specific SNP-db constructed for bovine liver yielded nearly six million SNPs. In addition, a KASPTM SNP genotyping assay, as a reliable cost-effective method, successfully validated the breed-specific putative SNPs originating from the RNA-seq experiments.
von Kohn, Christopher; Kiełkowska, Agnieszka; Havey, Michael J
2013-12-01
Male-sterile (S) cytoplasm of onion is an alien cytoplasm introgressed into onion in antiquity and is widely used for hybrid seed production. Owing to the biennial generation time of onion, classical crossing takes at least 4 years to classify cytoplasms as S or normal (N) male-fertile. Molecular markers in the organellar DNAs that distinguish N and S cytoplasms are useful to reduce the time required to classify onion cytoplasms. In this research, we completed next-generation sequencing of the chloroplast DNAs of N- and S-cytoplasmic onions; we assembled and annotated the genomes in addition to identifying polymorphisms that distinguish these cytoplasms. The sizes (153 538 and 153 355 base pairs) and GC contents (36.8%) were very similar for the chloroplast DNAs of N and S cytoplasms, respectively, as expected given their close phylogenetic relationship. The size difference was primarily due to small indels in intergenic regions and a deletion in the accD gene of N-cytoplasmic onion. The structures of the onion chloroplast DNAs were similar to those of most land plants with large and small single copy regions separated by inverted repeats. Twenty-eight single nucleotide polymorphisms, two polymorphic restriction-enzyme sites, and one indel distributed across 20 chloroplast genes in the large and small single copy regions were selected and validated using diverse onion populations previously classified as N or S cytoplasmic using restriction fragment length polymorphisms. Although cytoplasmic male sterility is likely associated with the mitochondrial DNA, maternal transmission of the mitochondrial and chloroplast DNAs allows for polymorphisms in either genome to be useful for classifying onion cytoplasms to aid the development of hybrid onion cultivars.
Guo, Xianwu; Castillo-Ramírez, Santiago; González, Víctor; Bustos, Patricia; Luís Fernández-Vázquez, José; Santamaría, Rosa Isela; Arellano, Jesús; Cevallos, Miguel A; Dávila, Guillermo
2007-01-01
Background Fabaceae (legumes) is one of the largest families of flowering plants, and some members are important crops. In contrast to what we know about their great diversity or economic importance, our knowledge at the genomic level of chloroplast genomes (cpDNAs or plastomes) for these crops is limited. Results We sequenced the complete genome of the common bean (Phaseolus vulgaris cv. Negro Jamapa) chloroplast. The plastome of P. vulgaris is a 150,285 bp circular molecule. It has gene content similar to that of other legume plastomes, but contains two pseudogenes, rpl33 and rps16. A distinct inversion occurred at the junction points of trnH-GUG/rpl14 and rps19/rps8, as in adzuki bean [1]. These two pseudogenes and the inversion were confirmed in 10 varieties representing the two domestication centers of the bean. Genomic comparative analysis indicated that inversions generally occur in legume plastomes and the magnitude and localization of insertions/deletions (indels) also vary. The analysis of repeat sequences demonstrated that patterns and sequences of tandem repeats had an important impact on sequence diversification between legume plastomes and tandem repeats did not belong to dispersed repeats. Interestingly, P. vulgaris plastome had higher evolutionary rates of change on both genomic and gene levels than G. max, which could be the consequence of pressure from both mutation and natural selection. Conclusion Legume chloroplast genomes are widely diversified in gene content, gene order, indel structure, abundance and localization of repetitive sequences, intracellular sequence exchange and evolutionary rates. The P. vulgaris plastome is a rapidly evolving genome. PMID:17623083
Genome Features of “Dark-Fly”, a Drosophila Line Reared Long-Term in a Dark Environment
Zhou, Jun; Sugiyama, Yuzo; Nishimura, Osamu; Aizu, Tomoyuki; Toyoda, Atsushi; Fujiyama, Asao; Agata, Kiyokazu
2012-01-01
Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a Drosophila melanogaster line, termed “Dark-fly”, which has been maintained in constant dark conditions for 57 years (1400 generations). We found that Dark-fly exhibited higher fecundity in dark than in light conditions, indicating that Dark-fly possesses some traits advantageous in darkness. Using next-generation sequencing technology, we determined the whole genome sequence of Dark-fly and identified approximately 220,000 single nucleotide polymorphisms (SNPs) and 4,700 insertions or deletions (InDels) in the Dark-fly genome compared to the genome of the Oregon-R-S strain, a control strain. 1.8% of SNPs were classified as non-synonymous SNPs (nsSNPs: i.e., they alter the amino acid sequence of gene products). Among them, we detected 28 nonsense mutations (i.e., they produce a stop codon in the protein sequence) in the Dark-fly genome. These included genes encoding an olfactory receptor and a light receptor. We also searched runs of homozygosity (ROH) regions as putative regions selected during the population history, and found 21 ROH regions in the Dark-fly genome. We identified 241 genes carrying nsSNPs or InDels in the ROH regions. These include a cluster of alpha-esterase genes that are involved in detoxification processes. Furthermore, analysis of structural variants in the Dark-fly genome showed the deletion of a gene related to fatty acid metabolism. Our results revealed unique features of the Dark-fly genome and provided a list of potential candidate genes involved in environmental adaptation. PMID:22432011
William J. Brennan Jr.: Judicial Architect of Affirmative Action.
ERIC Educational Resources Information Center
Eisler, Kim Isaac
1997-01-01
More than any other Supreme Court Justice, William Brennan worked to level the playing field for black Americans. As the recognized architect of affirmative action strategies for higher education, he left an indelible imprint on equal education in America. (SLD)
Code of Federal Regulations, 2010 CFR
2010-10-01
... issuing country must be written legibly and indelibly on the outside of the package. (ix) Customs forms... permit holder must email, fax, or mail a copy of the completed consignment document and re-export...
A reference human genome dataset of the BGISEQ-500 sequencer.
Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian
2017-05-01
BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. © The Authors 2017. Published by Oxford University Press.
La Cognata, Valentina; Morello, Giovanna; D'Agata, Velia; Cavallaro, Sebastiano
2017-01-01
Parkinson's disease (PD), the second most common progressive neurodegenerative disorder of aging, was long believed to be a non-genetic sporadic origin syndrome. The proof that several genetic loci are responsible for rare Mendelian forms has represented a revolutionary breakthrough, enabling to reveal molecular mechanisms underlying this debilitating still incurable condition. While single nucleotide polymorphisms (SNPs) and small indels constitute the most commonly investigated DNA variations accounting for only a limited number of PD cases, larger genomic molecular rearrangements have emerged as significant PD-causing mutations, including submicroscopic Copy Number Variations (CNVs). CNVs constitute a prevalent source of genomic variations and substantially participate in each individual's genomic makeup and phenotypic outcome. However, the majority of genetic studies have focused their attention on single candidate-gene mutations or on common variants reaching a significant statistical level of acceptance. This gene-centric approach is insufficient to uncover the genetic background of polygenic multifactorial disorders like PD, and potentially masks rare individual CNVs that all together might contribute to disease development or progression. In this review, we will discuss literature and bioinformatic data describing the involvement of CNVs on PD pathobiology. We will analyze the most frequent copy number changes in familiar PD genes and provide a "systems biology" overview of rare individual rearrangements that could functionally act on commonly deregulated molecular pathways. Assessing the global genome-wide burden of CNVs in PD patients may reveal new disease-related molecular mechanisms, and open the window to a new possible genetic scenario in the unsolved PD puzzle.
Withler, Ruth E.
2017-01-01
Population structure of three ecotypes of Oncorhynchus nerka (sea-type Sockeye Salmon, lake-type Sockeye Salmon, and Kokanee) in the Fraser River and Columbia River drainages was examined with microsatellite variation, with the main focus as to whether Kokanee population structure within the Fraser River drainage suggested either a monophyletic or polyphyletic origin of the ecotype within the drainage. Variation at 14 microsatellite loci was surveyed for sea-type and lake-type Sockeye Salmon and Kokanee sampled from 121 populations in the two river drainages. An index of genetic differentiation, FST, over all populations and loci was 0.087, with individual locus values ranging from 0.031 to 0.172. Standardized to an ecotype sample size of 275 individuals, the least genetically diverse ecotype was sea-type Sockeye Salmon with 203 alleles, whereas Kokanee displayed the greatest number of alleles (260 alleles), with lake-type Sockeye Salmon intermediate (241 alleles). Kokanee populations from the Columbia River drainage (Okanagan Lake, Kootenay Lake), the South Thompson River (a major Fraser River tributary) drainage populations, and the mid-Fraser River populations all clustered together in a neighbor-joining analysis, indicative of a monophyletic origin of the Kokanee ecotype in these regions, likely reflecting the origin of salmon radiating from a refuge after the last glaciation period. However, upstream of the mid-Fraser River populations, there were closer relationships between the lake-type Sockeye Salmon ecotype and the Kokanee ecotype, indicative of the Kokanee ecotype evolving independently from the lake-type Sockeye Salmon ecotype in parallel radiation. Kokanee population structure within the entire Fraser River drainage suggested a polyphyletic origin of the ecotype within the drainage. Studies employing geographically restricted population sampling may not outline accurately the phylogenetic history of salmonid ecotypes. PMID:28886033
Dikshit, Vishwesh; Nagalingam, Arun Prasanth; Yap, Yee Ling; Sing, Swee Leong; Yeong, Wai Yee; Wei, Jun
2017-01-01
The objective of this investigation was to determine the quasi-static indentation response and failure mode in three-dimensional (3D) printed trapezoidal core structures, and to characterize the energy absorbed by the structures. In this work, the trapezoidal sandwich structure was designed in the following two ways. Firstly, the trapezoidal core along with its facesheet was 3D printed as a single element comprising a single material for both core and facesheet (type A); Secondly, the trapezoidal core along with facesheet was 3D printed, but with variation in facesheet materials (type B). Quasi-static indentation was carried out using three different indenters, namely standard hemispherical, conical, and flat indenters. Acoustic emission (AE) technique was used to capture brittle cracking in the specimens during indentation. The major failure modes were found to be brittle failure and quasi-brittle fractures. The measured indentation energy was at a maximum when using a conical indenter at 9.40 J and 9.66 J and was at a minimum when using a hemispherical indenter at 6.87 J and 8.82 J for type A and type B series specimens respectively. The observed maximum indenter displacements at failure were the effect of material variations and composite configurations in the facesheet. PMID:28772649
Dikshit, Vishwesh; Nagalingam, Arun Prasanth; Yap, Yee Ling; Sing, Swee Leong; Yeong, Wai Yee; Wei, Jun
2017-03-14
The objective of this investigation was to determine the quasi-static indentation response and failure mode in three-dimensional (3D) printed trapezoidal core structures, and to characterize the energy absorbed by the structures. In this work, the trapezoidal sandwich structure was designed in the following two ways. Firstly, the trapezoidal core along with its facesheet was 3D printed as a single element comprising a single material for both core and facesheet (type A); Secondly, the trapezoidal core along with facesheet was 3D printed, but with variation in facesheet materials (type B). Quasi-static indentation was carried out using three different indenters, namely standard hemispherical, conical, and flat indenters. Acoustic emission (AE) technique was used to capture brittle cracking in the specimens during indentation. The major failure modes were found to be brittle failure and quasi-brittle fractures. The measured indentation energy was at a maximum when using a conical indenter at 9.40 J and 9.66 J and was at a minimum when using a hemispherical indenter at 6.87 J and 8.82 J for type A and type B series specimens respectively. The observed maximum indenter displacements at failure were the effect of material variations and composite configurations in the facesheet.
Siggs, Owen M; Javadiyan, Shari; Sharma, Shiwani; Souzeau, Emmanuelle; Lower, Karen M; Taranath, Deepa A; Black, Jo; Pater, John; Willoughby, John G; Burdon, Kathryn P; Craig, Jamie E
2017-01-01
Congenital cataract is a rare but severe paediatric visual impediment, often caused by variants in one of several crystallin genes that produce the bulk of structural proteins in the lens. Here we describe a pedigree with autosomal dominant isolated congenital cataract and linkage to the crystallin gene cluster on chromosome 22. No rare single nucleotide variants or short indels were identified by exome sequencing, yet copy number variant analysis revealed a duplication spanning both CRYBB1 and CRYBA4. While the CRYBA4 duplication was complete, the CRYBB1 duplication was not, with the duplicated CRYBB1 product predicted to create a gain of function allele. This association suggests a new genetic mechanism for the development of isolated congenital cataract. PMID:28272538
Probabilistic evaluation of fuselage-type composite structures
NASA Technical Reports Server (NTRS)
Shiao, Michael C.; Chamis, Christos C.
1992-01-01
A methodology is developed to computationally simulate the uncertain behavior of composite structures. The uncertain behavior includes buckling loads, natural frequencies, displacements, stress/strain etc., which are the consequences of the random variation (scatter) of the primitive (independent random) variables in the constituent, ply, laminate and structural levels. This methodology is implemented in the IPACS (Integrated Probabilistic Assessment of Composite Structures) computer code. A fuselage-type composite structure is analyzed to demonstrate the code's capability. The probability distribution functions of the buckling loads, natural frequency, displacement, strain and stress are computed. The sensitivity of each primitive (independent random) variable to a given structural response is also identified from the analyses.
Li, Yan-li; Li, Yan-fen; Xu, Zong-xue
2015-01-01
In May-June 2012, macroinvertebrates were investigated at 66 sampling sites in the Huntai River basin in Northeast of China. A total of 72 macrobenthos species were collected, of which, 51 species (70.83%) were aquatic insects, 10 species (13.89%) were mollusks, 7 species (9.72%) were annelids, and 4 species (5.56%) were arthropods. First, 13 candidate metrics (EPT taxa, Dominant taxon%, Ephemeroptera%, Trichoptera%, mollusks%, Heptageniidae/Ephemeroptera; Hydropsychidae/ Trichoptera, Oligochaeta%, intolerant taxon% , tolerant taxon%, Collector%, Clingers%, Shannon-wiener index.) which belonged to six types were chosen to represent macroinvertebrate community structure by correlation analysis. Then, relationships between anthropogenic and physiography pressures and macroinvertebrate community structure variables were measured using redundancy analysis. Then, this study compared the relative influences of anthropogenic and physiographic pressures on macroinvertebrate community structure and the relative influences of anthropogenic pressures at reach, riparian and catchment scales by pRDA. The results showed all environmental factors explained 72.23% of the variation of macroinvertebrate community structure. In addition, a large proportion of the explained variability in macroinvertebrate community structure was related to anthropogenic pressures (48.9%) and to physiographic variables (11.8%), anthropogenic pressures at reach scale influenced most significantly macroinvertebrate community structure which explained 35.3% of the variation of macroinvertebrate community structure. pH, habitat, TN, CODMn, hardness, conductivity, total dissolved particle and ammonia influenced respectively explained 4%, 3.6%, 1.8%, 1.7%, 1.7%, 0.9%, 0.9% and 0.9% of the variation of macroinvertebrate community structure. The land use at riparian and catchment scale respectively explained 10% and 7% of the variation of macroinvertebrate community structure. Finally, the relationships of land use at catchment and riparian scales and water quality factors, hydrological indicators, habitat, substrate types were analyzed. This study supports the idea that human pressures effects on river macroinvertebrate communities are linked at spatial scales and must be considered jointly.
Cheng, Hui; Li, Jinfeng; Zhang, Hong; Cai, Binhua; Gao, Zhihong
2017-01-01
Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria. PMID:29038765
Comparative structural analysis of Bru1 region homeologs in Saccharum spontaneum and S. officinarum
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jisen; Sharma, Anupma; Yu, Qingyi
Here, sugarcane is a major sugar and biofuel crop, but genomic research and molecular breeding have lagged behind other major crops due to the complexity of auto-allopolyploid genomes. Sugarcane cultivars are frequently aneuploid with chromosome number ranging from 100 to 130, consisting of 70-80 % S. officinarum, 10-20 % S. spontaneum, and 10 % recombinants between these two species. Analysis of a genomic region in the progenitor autoploid genomes of sugarcane hybrid cultivars will reveal the nature and divergence of homologous chromosomes. As a result, to investigate the origin and evolution of haplotypes in the Bru1 genomic regions in sugarcanemore » cultivars, we identified two BAC clones from S. spontaneum and four from S. officinarum and compared to seven haplotype sequences from sugarcane hybrid R570. The results clarified the origin of seven homologous haplotypes in R570, four haplotypes originated from S. officinarum, two from S. spontaneum and one recombinant.. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence ranged from 18.2 % to 60.5 % with an average of 33. 7 %. Gene content and gene structure were relatively well conserved among the homologous haplotypes. Exon splitting occurred in haplotypes of the hybrid genome but not in its progenitor genomes. Tajima's D analysis revealed that S. spontaneum hapotypes in the Bru1 genomic regions were under strong directional selection. Numerous inversions, deletions, insertions and translocations were found between haplotypes within each genome. In conclusion, this is the first comparison among haplotypes of a modern sugarcane hybrid and its two progenitors. Tajima's D results emphasized the crucial role of this fungal disease resistance gene for enhancing the fitness of this species and indicating that the brown rust resistance gene in R570 is from S. spontaneum. Species-specific InDel, sequences similarity and phylogenetic analysis of homologous genes can be used for identifying the origin of S. spontaneum and S. officinarum haplotype in Saccharum hybrids. Comparison of exon splitting among the homologous haplotypes suggested that the genome rearrangements in Saccharum hybrids S. officinarum would be sufficient for proper genome assembly of this autopolyploid genome. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence may allow sequencing and assembling the autopolyploid Saccharum genomes and the auto-allopolyploid hybrid genomes using whole genome shotgun sequencing.« less
Comparative structural analysis of Bru1 region homeologs in Saccharum spontaneum and S. officinarum
Zhang, Jisen; Sharma, Anupma; Yu, Qingyi; ...
2016-06-10
Here, sugarcane is a major sugar and biofuel crop, but genomic research and molecular breeding have lagged behind other major crops due to the complexity of auto-allopolyploid genomes. Sugarcane cultivars are frequently aneuploid with chromosome number ranging from 100 to 130, consisting of 70-80 % S. officinarum, 10-20 % S. spontaneum, and 10 % recombinants between these two species. Analysis of a genomic region in the progenitor autoploid genomes of sugarcane hybrid cultivars will reveal the nature and divergence of homologous chromosomes. As a result, to investigate the origin and evolution of haplotypes in the Bru1 genomic regions in sugarcanemore » cultivars, we identified two BAC clones from S. spontaneum and four from S. officinarum and compared to seven haplotype sequences from sugarcane hybrid R570. The results clarified the origin of seven homologous haplotypes in R570, four haplotypes originated from S. officinarum, two from S. spontaneum and one recombinant.. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence ranged from 18.2 % to 60.5 % with an average of 33. 7 %. Gene content and gene structure were relatively well conserved among the homologous haplotypes. Exon splitting occurred in haplotypes of the hybrid genome but not in its progenitor genomes. Tajima's D analysis revealed that S. spontaneum hapotypes in the Bru1 genomic regions were under strong directional selection. Numerous inversions, deletions, insertions and translocations were found between haplotypes within each genome. In conclusion, this is the first comparison among haplotypes of a modern sugarcane hybrid and its two progenitors. Tajima's D results emphasized the crucial role of this fungal disease resistance gene for enhancing the fitness of this species and indicating that the brown rust resistance gene in R570 is from S. spontaneum. Species-specific InDel, sequences similarity and phylogenetic analysis of homologous genes can be used for identifying the origin of S. spontaneum and S. officinarum haplotype in Saccharum hybrids. Comparison of exon splitting among the homologous haplotypes suggested that the genome rearrangements in Saccharum hybrids S. officinarum would be sufficient for proper genome assembly of this autopolyploid genome. Retrotransposon insertions and sequences variations among the homologous haplotypes sequence divergence may allow sequencing and assembling the autopolyploid Saccharum genomes and the auto-allopolyploid hybrid genomes using whole genome shotgun sequencing.« less
Genomic Variants Revealed by Invariably Missing Genotypes in Nelore Cattle
da Silva, Joaquim Manoel; Giachetto, Poliana Fernanda; da Silva, Luiz Otávio Campos; Cintra, Leandro Carrijo; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues; Yamagishi, Michel Eduardo Beleza
2015-01-01
High density genotyping panels have been used in a wide range of applications. From population genetics to genome-wide association studies, this technology still offers the lowest cost and the most consistent solution for generating SNP data. However, in spite of the application, part of the generated data is always discarded from final datasets based on quality control criteria used to remove unreliable markers. Some discarded data consists of markers that failed to generate genotypes, labeled as missing genotypes. A subset of missing genotypes that occur in the whole population under study may be caused by technical issues but can also be explained by the presence of genomic variations that are in the vicinity of the assayed SNP and that prevent genotyping probes from annealing. The latter case may contain relevant information because these missing genotypes might be used to identify population-specific genomic variants. In order to assess which case is more prevalent, we used Illumina HD Bovine chip genotypes from 1,709 Nelore (Bos indicus) samples. We found 3,200 missing genotypes among the whole population. NGS re-sequencing data from 8 sires were used to verify the presence of genomic variations within their flanking regions in 81.56% of these missing genotypes. Furthermore, we discovered 3,300 novel SNPs/Indels, 31% of which are located in genes that may affect traits of importance for the genetic improvement of cattle production. PMID:26305794
Segtor: Rapid Annotation of Genomic Coordinates and Single Nucleotide Variations Using Segment Trees
Renaud, Gabriel; Neves, Pedro; Folador, Edson Luiz; Ferreira, Carlos Gil; Passetti, Fabio
2011-01-01
Various research projects often involve determining the relative position of genomic coordinates, intervals, single nucleotide variations (SNVs), insertions, deletions and translocations with respect to genes and their potential impact on protein translation. Due to the tremendous increase in throughput brought by the use of next-generation sequencing, investigators are routinely faced with the need to annotate very large datasets. We present Segtor, a tool to annotate large sets of genomic coordinates, intervals, SNVs, indels and translocations. Our tool uses segment trees built using the start and end coordinates of the genomic features the user wishes to use instead of storing them in a database management system. The software also produces annotation statistics to allow users to visualize how many coordinates were found within various portions of genes. Our system currently can be made to work with any species available on the UCSC Genome Browser. Segtor is a suitable tool for groups, especially those with limited access to programmers or with interest to analyze large amounts of individual genomes, who wish to determine the relative position of very large sets of mapped reads and subsequently annotate observed mutations between the reads and the reference. Segtor (http://lbbc.inca.gov.br/segtor/) is an open-source tool that can be freely downloaded for non-profit use. We also provide a web interface for testing purposes. PMID:22069465
Varietal Tracing of Virgin Olive Oils Based on Plastid DNA Variation Profiling
Pérez-Jiménez, Marga; Besnard, Guillaume; Dorado, Gabriel; Hernandez, Pilar
2013-01-01
Olive oil traceability remains a challenge nowadays. DNA analysis is the preferred approach to an effective varietal identification, without any environmental influence. Specifically, olive organelle genomics is the most promising approach for setting up a suitable set of markers as they would not interfere with the pollinator variety DNA traces. Unfortunately, plastid DNA (cpDNA) variation of the cultivated olive has been reported to be low. This feature could be a limitation for the use of cpDNA polymorphisms in forensic analyses or oil traceability, but rare cpDNA haplotypes may be useful as they can help to efficiently discriminate some varieties. Recently, the sequencing of olive plastid genomes has allowed the generation of novel markers. In this study, the performance of cpDNA markers on olive oil matrices, and their applicability on commercial Protected Designation of Origin (PDO) oils were assessed. By using a combination of nine plastid loci (including multi-state microsatellites and short indels), it is possible to fingerprint six haplotypes (in 17 Spanish olive varieties), which can discriminate high-value commercialized cultivars with PDO. In particular, a rare haplotype was detected in genotypes used to produce a regional high-value commercial oil. We conclude that plastid haplotypes can help oil traceability in commercial PDO oils and set up an experimental methodology suitable for organelle polymorphism detection in the complex olive oil matrices. PMID:23950947
Cui, Yubao; Yu, Lili
2016-12-01
The clustered regularly-interspaced short palindromic repeats (CRISPR) structural family functions as an acquired immune system in prokaryotes. Gene editing techniques have co-opted CRISPR and the associated Cas nucleases to allow for the precise genetic modification of human cells, zebrafish, mice, and other eukaryotes. Indeed, this approach has been used to induce a variety of modifications including directed insertion/deletion (InDel) of bases, gene knock-in, introduction of mutations in both alleles of a target gene, and deletion of small DNA fragments. Thus, CRISPR technology offers a precise molecular tool for directed genome modification with a range of potential applications; further, its high mutation efficiency, simple process, and low cost provide additional advantages over prior editing techniques. This paper will provide an overview of the basic structure and function of the CRISPR gene editing system as well as current and potential applications to research on parasites. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Multiplexed fragaria chloroplast genome sequencing
W. Njuguna; A. Liston; R. Cronn; N.V. Bassil
2010-01-01
A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...
Code of Federal Regulations, 2013 CFR
2013-01-01
... leaders in every aspect of American life—in government and industry, science and medicine, the arts and... world. This month, we reflect on the indelible ways AAPI communities have shaped our national life. As...
NASA Astrophysics Data System (ADS)
Jain, Aayushi; Dixit, R. C.
2018-05-01
Pressure induced structural phase transition of NaCl-type (B1) to CsCl-type (B2) structure in Sodium Chloride NaCl are presented. An effective interionic interaction potential (EIOP) with long range Coulomb, van der Waals (vdW) interaction and the short-range repulsive interaction upto second-neighbor ions within the Hafemeister and Flygare approach with modified ionic charge is reported here. The reckon value of the phase transition pressure (Pt) and the magnitude of the discontinuity in volume at the transition pressure are compatible as compared with reported data. The variations of elastic constants and their combinations with pressure follow ordered behavior. The present approach has also succeeded in predicting the Born and relative stability criteria.
Manel, Stéphanie; Couvreur, Thomas L P; Munoz, François; Couteron, Pierre; Hardy, Olivier J; Sonké, Bonaventure
2014-01-01
Tropical rain forests, the richest terrestrial ecosystems in biodiversity on Earth are highly threatened by global changes. This paper aims to infer the mechanisms governing species tree assemblages by characterizing the phylogenetic structure of a tropical rain forest in a protected area of the Congo Basin, the Dja Faunal Reserve (Cameroon). We re-analyzed a dataset of 11538 individuals belonging to 372 taxa found along nine transects spanning five habitat types. We generated a dated phylogenetic tree including all sampled taxa to partition the phylogenetic diversity of the nine transects into alpha and beta components at the level of the transects and of the habitat types. The variation in phylogenetic composition among transects did not deviate from a random pattern at the scale of the Dja Faunal Reserve, probably due to a common history and weak environmental variation across the park. This lack of phylogenetic structure combined with an isolation-by-distance pattern of taxonomic diversity suggests that neutral dispersal limitation is a major driver of community assembly in the Dja. To assess any lack of sensitivity to the variation in habitat types, we restricted the analyses of transects to the terra firme primary forest and found results consistent with those of the whole dataset at the level of the transects. Additionally to previous analyses, we detected a weak but significant phylogenetic turnover among habitat types, suggesting that species sort in varying environments, even though it is not predominating on the overall phylogenetic structure. Finer analyses of clades indicated a signal of clustering for species from the Annonaceae family, while species from the Apocynaceae family indicated overdispersion. These results can contribute to the conservation of the park by improving our understanding of the processes dictating community assembly in these hyperdiverse but threatened regions of the world.
Shivakumar, Venkataram; Debnath, Monojit; Venugopal, Deepthi; Rajasekaran, Ashwini; Kalmady, Sunil V; Subbanna, Manjula; Narayanaswamy, Janardhanan C; Amaresha, Anekal C; Venkatasubramanian, Ganesan
2018-07-01
Converging evidence suggests important implications of immuno-inflammatory pathway in the risk and progression of schizophrenia. Prenatal infection resulting in maternal immune activation and developmental neuroinflammation reportedly increases the risk of schizophrenia in the offspring by generating pro-inflammatory cytokines including IL-6. However, it is not known how prenatal infection can induce immuno-inflammatory responses despite the presence of immuno-inhibitory Human Leukocyte Antigen-G (HLA-G) molecules. To address this, the present study was aimed at examining the correlation between 14 bp Insertion/Deletion (INDEL) polymorphism of HLA-G and IL-6 gene expression in schizophrenia patients. The 14 bp INDEL polymorphism was studied by PCR amplification/direct sequencing and IL-6 gene expression was quantified by using real-time RT-PCR in 56 schizophrenia patients and 99 healthy controls. We observed significantly low IL6 gene expression in the peripheral mononuclear cells (PBMCs) of schizophrenia patients (t = 3.8, p = .004) compared to the controls. In addition, schizophrenia patients carrying Del/Del genotype of HLA-G 14 bp INDEL exhibited significantly lower IL6 gene expression (t = 3.1; p = .004) than the Del/Ins as well as Ins/Ins carriers. Our findings suggest that presence of "high-expressor" HLA-G 14 bp Del/Del genotype in schizophrenia patients could attenuate IL-6 mediated inflammation in schizophrenia. Based on these findings it can be assumed that HLA-G and cytokine interactions might play an important role in the immunological underpinnings of schizophrenia. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fernandez-San Jose, Patricia; Liu, Yichuan; March, Michael; Pellegrino, Renata; Golhar, Ryan; Corton, Marta; Blanco-Kelly, Fiona; López-Molina, Maria Isabel; García-Sandoval, Blanca; Guo, Yiran; Tian, Lifeng; Liu, Xuanzhu; Guan, Liping; Zhang, Jianguo; Keating, Brendan; Xu, Xun
2015-01-01
This study aimed to identify the genetics underlying dominant forms of inherited retinal dystrophies using whole exome sequencing (WES) in six families extensively screened for known mutations or genes. Thirty-eight individuals were subjected to WES. Causative variants were searched among single nucleotide variants (SNVs) and insertion/deletion variants (indels) and whenever no potential candidate emerged, copy number variant (CNV) analysis was performed. Variants or regions harboring a candidate variant were prioritized and segregation of the variant with the disease was further assessed using Sanger sequencing in case of SNVs and indels, and quantitative PCR (qPCR) for CNVs. SNV and indel analysis led to the identification of a previously reported mutation in PRPH2. Two additional mutations linked to different forms of retinal dystrophies were identified in two families: a known frameshift deletion in RPGR, a gene responsible for X-linked retinitis pigmentosa and p.Ser163Arg in C1QTNF5 associated with Late-Onset Retinal Degeneration. A novel heterozygous deletion spanning the entire region of PRPF31 was also identified in the affected members of a fourth family, which was confirmed with qPCR. This study allowed the identification of the genetic cause of the retinal dystrophy and the establishment of a correct diagnosis in four families, including a large heterozygous deletion in PRPF31, typically considered one of the pitfalls of this method. Since all findings in this study are restricted to known genes, we propose that targeted sequencing using gene-panel is an optimal first approach for the genetic screening and that once known genetic causes are ruled out, WES might be used to uncover new genes involved in inherited retinal dystrophies. PMID:26197217
Modares Sadeghi, Mehran; Shariati, Laleh; Hejazi, Zahra; Shahbazi, Mansoureh; Tabatabaiefar, Mohammad Amin; Khanahmad, Hossein
2018-03-01
β-thalassemia is a common autosomal recessive disorder characterized by a deficiency in the synthesis of β-chains. Evidences show that increased HbF levels improve the symptoms in patients with β-thalassemia or sickle cell anemia. In this study, ZFN technology was applied to induce a mutation in the binding domain region of SOX6 to reactivate γ-globin expression. The sequences coding for ZFP arrays were designed and sub cloned in TDH plus as a transfer vector. The ZFN expression was confirmed using Western blot analysis. In the next step, using the site-directed mutagenesis strategy through the overlap PCR, a missense mutation (D64V) was induced in the catalytic domain of the integrase gene in the packaging plasmid and verified using DNA sequencing. Then, the integrase minus lentivirus containing ZFN cassette was packaged. Transduction of K562 cells with this virus was performed. Mutation detection assay was performed. The indel percentage of the cells transducted with lenti virus containing ZFN was 31%. After 5 days of erythroid differentiation with 15 μg/mL cisplatin, the levels of γ-globin mRNA were sixfold in the cells treated with ZFN compared to untreated cells. In the meantime, the measurement of HbF expression levels was carried out using hemoglobin electrophoresis and showed the same results. Integrase minus lentivirus can provide a useful tool for efficient transient gene expression and helps avoid disadvantages of gene targeting using the native virus. The ZFN strategy applied here to induce indel on SOX6 gene in adult erythroid progenitors may provide a method to activate fetal hemoglobin expression in individuals with β-thalassemia. © 2017 Wiley Periodicals, Inc.
Correlational study on mitochondrial DNA mutations as potential risk factors in breast cancer.
Li, Linhai; Chen, Lidan; Li, Jun; Zhang, Weiyun; Liao, Yang; Chen, Jianyun; Sun, Zhaohui
2016-05-24
The presented study performed an mtDNA genome-wide association analysis to screen the peripheral blood of breast cancer patients for high-risk germline mutations. Unlike previous studies, which have used breast tissue in analyzing somatic mutations, we looked for germline mutations in our study, since they are better predictors of breast cancer in high-risk groups, facilitate early, non-invasive diagnoses of breast cancer and may provide a broader spectrum of therapeutic options. The data comprised 22 samples of healthy group and 83 samples from breast cancer patients. The sequencing data showed 170 mtDNA mutations in the healthy group and 393 mtDNA mutations in the disease group. Of these, 283 mtDNA mutations (88 in the healthy group and 232 in the disease group) had never been reported in the literature. Moreover, correlation analysis indicated there was a significant difference in 32 mtDNA mutations. According to our relative risk analysis of these 32 mtDNA mutations, 27 of the total had odds ratio values (ORs) of less than 1, meaning that these mutations have a potentially protective role to play in breast cancer. The remaining 5 mtDNA mutations, RNR2-2463 indelA, COX1-6296 C>A, COX1-6298 indelT, ATP6-8860 A>G, and ND5-13327 indelA, whose ORs were 8.050, 4.464, 4.464, 5.254 and 4.853, respectively, were regarded as risk factors of increased breast cancer. The five mutations identified here may serve as novel indicators of breast cancer and may have future therapeutic applications. In addition, the use of peripheral blood samples was procedurally simple and could be applied as a non-invasive diagnostic technique.
Ma, Ruilin; Shen, Chunmei; Wei, Yuanyuan; Jin, Xiaoye; Guo, Yuxin; Mu, Yuling; Sun, Siqi; Chen, Chong; Cui, Wei; Wei, Zhaoming; Lian, Zhenmin
2018-06-20
The present study investigated the genetic diversities of 30 autosomal insertion and deletion (InDel) loci of Investigator DIPplex kit (Qiagen) in Chinese Salar ethnic minority and explored the genetic relationships between the studied Salar group and other populations. The allelic frequencies of deletion alleles at the 30 InDel loci were in the range of 0.1739 (HLD64) to 0.8478 (HLD39). The discrimination power, polymorphism information content and probability of exclusion ranged from 0.4101 (HLD39) to 0.6447 (HLD136), 0.2247 (HLD39) to 0.3750 (HLD92) and 0.0400 (HLD39) to 0.2806 (HLD92), respectively. The observed and expected heterozygosity were in the range of 0.2348 (HLD39) to 0.5913 (HLD92), and 0.2580 (HLD39) to 0.5000 (HLD92), respectively. The cumulative discrimination power and probability of exclusion of the 30 loci reached 0.999999999993418 and 0.99039, respectively. The results of population genetic differentiation comparisons revealed that Salar group had similar allele distributions with Qinghai Tibetan, Xibe and Yi groups. Population Bayesian cluster analysis showed that there were similar ancestry components between Salar group and most Chinese populations. Besides, the principal components analysis and phylogenetic reconstructions further indicated that Salar group had intimate genetic relationships with Qinghai Tibetan and Xibe groups. In short, the results of the current studies indicated the genetic distributions of the 30 InDel loci in Salar group were relatively high genetic polymorphisms, which could be used in forensic individual identifications and as a supplementary tool for complex paternity testing. Copyright © 2018 Elsevier B.V. All rights reserved.
Lu, Jun-Xia; Bayro, Marvin J.; Tycko, Robert
2016-01-01
We present the results of solid state nuclear magnetic resonance (NMR) experiments on HIV-1 capsid protein (CA) assemblies with three different morphologies, namely wild-type CA (WT-CA) tubes with 35–60 nm diameters, planar sheets formed by the Arg18-Leu mutant (R18L-CA), and R18L-CA spheres with 20–100 nm diameters. The experiments are intended to elucidate molecular structural variations that underlie these variations in CA assembly morphology. We find that multidimensional solid state NMR spectra of 15N,13C-labeled CA assemblies are remarkably similar for the three morphologies, with only small differences in 15N and 13C chemical shifts, no significant differences in NMR line widths, and few differences in the number of detectable NMR cross-peaks. Thus, the pronounced differences in morphology do not involve major differences in the conformations and identities of structurally ordered protein segments. Instead, morphological variations are attributable to variations in conformational distributions within disordered segments, which do not contribute to the solid state NMR spectra. Variations in solid state NMR signals from certain amino acid side chains are also observed, suggesting differences in the intermolecular dimerization interface between curved and planar CA lattices, as well as possible differences in intramolecular helix-helix packing. PMID:27129282
Magnetic contributions in Bekenstein type models
NASA Astrophysics Data System (ADS)
Kraiselburd, Lucila; Castillo, Florencia L.; Mosquera, Mercedes E.; Vucetich, Héctor
2018-02-01
In this work, we analyze the spatial and time variation of the fine structure constant (α ) upon the theoretical framework developed by Bekenstein (Phys. Rev. D 66, 123514 (2002), 10.1103/PhysRevD.66.123514). We have computed the field ψ related to α at first order of the weak-field approximation and have also improved the estimation of the nuclear magnetic energy and, therefore, their contributions to the source term in the equation of motion of ψ . We obtained that the results are similar to the ones published in L. Kraiselburd and H. Vucetich, Int. J. Mod. Phys. E 20, 101 (2011) which were computed using the zero order of the approximation, showing that one can neglect the first order contribution to the variation of the fine structure constant. Through the comparison between our theoretical results and the observational data of the Eötvös-type experiments or the time variation of α over the cosmological time scale, we set constraints on the free parameter of the Bekenstein model, namely the Bekenstein length.
Comfort, Leeann N; Shortell, Stephen M; Rodriguez, Hector P; Colla, Carrie H
2018-01-31
To examine whether an empirically derived taxonomy of Accountable Care Organizations (ACOs) is associated with quality and spending performance among patients of ACOs in the Medicare Shared Savings Program (MSSP). Three waves of the National Survey of ACOs and corresponding publicly available Centers for Medicare & Medicaid Services performance data for NSACO respondents participating in the MSSP (N = 204); SK&A Office Based Physicians Database from QuintilesIMS. We compare the performance of three ACO types (physician-led, integrated, and hybrid) for three domains: quality, spending, and likelihood of achieving savings. Sources of performance variation within and between ACO types are compared for each performance measure. There is greater heterogeneity within ACO types than between ACO types. There were no consistent differences in quality by ACO type, nor were there differences in likelihood of achieving savings or overall spending per-person-year. There was evidence for higher spending on physician services for physician-led ACOs. ACOs of diverse structures perform comparably on core MSSP quality and spending measures. CMS should maintain its flexibility and continue to support participation of diverse ACOs. Future research to identify modifiable organizational factors that account for performance variation within ACO types may provide insight as to how best to improve ACO performance based on organizational structure and ownership. © Health Research and Educational Trust.
Nguyen, Van Binh; Park, Hyun-Seung; Lee, Sang-Choon; Lee, Junki; Park, Jee Young; Yang, Tae-Jin
2017-08-02
Ginseng represents a set of high-value medicinal plants of different species: Panax ginseng (Asian ginseng), Panax quinquefolius (American ginseng), Panax notoginseng (Chinese ginseng), Panax japonicus (Bamboo ginseng), and Panax vietnamensis (Vietnamese ginseng). Each species is pharmacologically and economically important, with differences in efficacy and price. Accordingly, an authentication system is needed to combat economically motivated adulteration of Panax products. We conducted comparative analysis of the chloroplast genome sequences of these five species, identifying 34-124 InDels and 141-560 SNPs. Fourteen InDel markers were developed to authenticate the Panax species. Among these, eight were species-unique markers that successfully differentiated one species from the others. We generated at least one species-unique marker for each of the five species, and any of the species can be authenticated by selection among these markers. The markers are reliable, easily detectable, and valuable for applications in the ginseng industry as well as in related research.
Ferragut, J F; Bentayebi, K; Pereira, R; Castro, J A; Amorim, A; Ramon, C; Picornell, A
2017-11-01
Population genetic data for 53 X-chromosome markers (32 X-indels, 9 X-Alu insertions and 12 X-STRs) are reported for five populations with Jewish ancestry (Sephardim, North African Jews, Middle Eastern Jews, Ashkenazim, and Chuetas) and Majorca, as the host population of Chuetas. Genetic distances between these populations demonstrated significant differences, except between Sephardic and North African Jews, with the Chuetas as the most differentiated group, in accordance with the particular demographic history of this population. X-chromosome analysis and a comparison with autosomal data suggest a generally sex-biased demographic history in Jewish populations. Asymmetry was found between female and male effective population sizes both in the admixture processes between Jewish communities, and between them and their respective non-Jewish host populations. Results further show that these X-linked markers are highly informative for forensic purposes, and highlight the need for specific databases for differentiated Jewish populations. Copyright © 2017 Elsevier B.V. All rights reserved.
Numerical simulation of incidence and sweep effects on delta wing vortex breakdown
NASA Technical Reports Server (NTRS)
Ekaterinaris, J. A.; Schiff, Lewis B.
1994-01-01
The structure of the vortical flowfield over delta wings at high angles of attack was investigated. Three-dimensional Navier-Stokes numerical simulations were carried out to predict the complex leeward-side flowfield characteristics, including leading-edge separation, secondary separation, and vortex breakdown. Flows over a 75- and a 63-deg sweep delta wing with sharp leading edges were investigated and compared with available experimental data. The effect of variation of circumferential grid resolution grid resolution in the vicinity of the wing leading edge on the accuracy of the solutions was addressed. Furthermore, the effect of turbulence modeling on the solutions was investigated. The effects of variation of angle of attack on the computed vortical flow structure for the 75-deg sweep delta wing were examined. At moderate angles of attack no vortex breakdown was observed. When a critical angle of attack was reached, bubble-type vortex breakdown was found. With further increase in angle of attack, a change from bubble-type breakdown to spiral-type vortex breakdown was predicted by the numerical solution. The effects of variation of sweep angle and freestream Mach number were addressed with the solutions on a 63-deg sweep delta wing.
Genome editing with CompoZr custom zinc finger nucleases (ZFNs).
Hansen, Keith; Coussens, Matthew J; Sago, Jack; Subramanian, Shilpi; Gjoka, Monika; Briner, Dave
2012-06-14
Genome editing is a powerful technique that can be used to elucidate gene function and the genetic basis of disease. Traditional gene editing methods such as chemical-based mutagenesis or random integration of DNA sequences confer indiscriminate genetic changes in an overall inefficient manner and require incorporation of undesirable synthetic sequences or use of aberrant culture conditions, potentially confusing biological study. By contrast, transient ZFN expression in a cell can facilitate precise, heritable gene editing in a highly efficient manner without the need for administration of chemicals or integration of synthetic transgenes. Zinc finger nucleases (ZFNs) are enzymes which bind and cut distinct sequences of double-stranded DNA (dsDNA). A functional CompoZr ZFN unit consists of two individual monomeric proteins that bind a DNA "half-site" of approximately 15-18 nucleotides (see Figure 1). When two ZFN monomers "home" to their adjacent target sites the DNA-cleavage domains dimerize and create a double-strand break (DSB) in the DNA. Introduction of ZFN-mediated DSBs in the genome lays a foundation for highly efficient genome editing. Imperfect repair of DSBs in a cell via the non-homologous end-joining (NHEJ) DNA repair pathway can result in small insertions and deletions (indels). Creation of indels within the gene coding sequence of a cell can result in frameshift and subsequent functional knockout of a gene locus at high efficiency. While this protocol describes the use of ZFNs to create a gene knockout, integration of transgenes may also be conducted via homology-directed repair at the ZFN cut site. The CompoZr Custom ZFN Service represents a systematic, comprehensive, and well-characterized approach to targeted gene editing for the scientific community with ZFN technology. Sigma scientists work closely with investigators to 1) perform due diligence analysis including analysis of relevant gene structure, biology, and model system pursuant to the project goals, 2) apply this knowledge to develop a sound targeting strategy, 3) then design, build, and functionally validate ZFNs for activity in a relevant cell line. The investigator receives positive control genomic DNA and primers, and ready-to-use ZFN reagents supplied in both plasmid DNA and in-vitro transcribed mRNA format. These reagents may then be delivered for transient expression in the investigator's cell line or cell type of choice. Samples are then tested for gene editing at the locus of interest by standard molecular biology techniques including PCR amplification, enzymatic digest, and electrophoresis. After positive signal for gene editing is detected in the initial population, cells are single-cell cloned and genotyped for identification of mutant clones/alleles.
Impact of genetic variation on three dimensional structure and function of proteins
Bhattacharya, Roshni; Rose, Peter W.; Burley, Stephen K.
2017-01-01
The Protein Data Bank (PDB; http://wwpdb.org) was established in 1971 as the first open access digital data resource in biology with seven protein structures as its initial holdings. The global PDB archive now contains more than 126,000 experimentally determined atomic level three-dimensional (3D) structures of biological macromolecules (proteins, DNA, RNA), all of which are freely accessible via the Internet. Knowledge of the 3D structure of the gene product can help in understanding its function and role in disease. Of particular interest in the PDB archive are proteins for which 3D structures of genetic variant proteins have been determined, thus revealing atomic-level structural differences caused by the variation at the DNA level. Herein, we present a systematic and qualitative analysis of such cases. We observe a wide range of structural and functional changes caused by single amino acid differences, including changes in enzyme activity, aggregation propensity, structural stability, binding, and dissociation, some in the context of large assemblies. Structural comparison of wild type and mutated proteins, when both are available, provide insights into atomic-level structural differences caused by the genetic variation. PMID:28296894
Bakker, Theo C M; Giger, Thomas; Frommen, Joachim G; Largiadèr, Carlo R
2017-08-01
There is a need for rapid and reliable molecular sexing of three-spined sticklebacks, Gasterosteus aculeatus, the supermodel species for evolutionary biology. A DNA region at the 5' end of the sex-linked microsatellite Gac4202 was sequenced for the X chromosome of six females and the Y chromosome of five males from three populations. The Y chromosome contained two large insertions, which did not recombine with the phenotype of sex in a cross of 322 individuals. Genetic variation (SNPs and indels) within the insertions was smaller than on flanking DNA sequences. Three molecular PCR-based sex tests were developed, in which the first, the second or both insertions were covered. In five European populations (from DE, CH, NL, GB) of three-spined sticklebacks, tests with both insertions combined showed two clearly separated bands on agarose minigels in males and one band in females. The tests with the separate insertions gave similar results. Thus, the new molecular sexing method gave rapid and reliable results for sexing three-spined sticklebacks and is an improvement and/or alternative to existing methods.
Yellapu, Nanda Kumar; Kandlapalli, Kalpana; Valasani, Koteswara Rao; Sarma, P V G K; Matcha, Bhaskar
2013-01-01
Glucokinase (GK) is the predominant hexokinase that acts as glucose sensor and catalyses the formation of Glucose-6-phosphate. The mutations in GK gene influence the affinity for glucose and lead to altered glucose levels in blood causing maturity onset diabetes of the young type 2 (MODY2) condition, which is one of the prominent reasons of type 2 diabetic condition. In view of the importance of mutated GK resulting in hyperglycemic condition, in the present study, molecular dynamics simulations were carried out in intact and 256 E-K mutated GK structures and their energy values and conformational variations were correlated. Energy variations were observed in mutated GK (3500 Kcal/mol) structure with respect to intact GK (5000 Kcal/mol), and it showed increased γ -turns, decreased β -turns, and more helix-helix interactions that affected substrate binding region where its volume increased from 1089.152 Å(2) to 1246.353 Å(2). Molecular docking study revealed variation in docking scores (intact = -12.199 and mutated = -8.383) and binding mode of glucose in the active site of mutated GK where the involvement of A53, S54, K56, K256, D262 and Q286 has resulted in poor glucose binding which probably explains the loss of catalytic activity and the consequent prevailing of high glucose levels in MODY2 condition.
2013-01-01
Background Copy number variation (CNV), an important source of diversity in genomic structure, is frequently found in clusters called CNV regions (CNVRs). CNVRs are strongly associated with segmental duplications (SDs), but the composition of these complex repetitive structures remains unclear. Results We conducted self-comparative-plot analysis of all mouse chromosomes using the high-speed and large-scale-homology search algorithm SHEAP. For eight chromosomes, we identified various types of large SD as tartan-checked patterns within the self-comparative plots. A complex arrangement of diagonal split lines in the self-comparative-plots indicated the presence of large homologous repetitive sequences. We focused on one SD on chromosome 13 (SD13M), and developed SHEPHERD, a stepwise ab initio method, to extract longer repetitive elements and to characterize repetitive structures in this region. Analysis using SHEPHERD showed the existence of 60 core elements, which were expected to be the basic units that form SDs within the repetitive structure of SD13M. The demonstration that sequences homologous to the core elements (>70% homology) covered approximately 90% of the SD13M region indicated that our method can characterize the repetitive structure of SD13M effectively. Core elements were composed largely of fragmented repeats of a previously identified type, such as long interspersed nuclear elements (LINEs), together with partial genic regions. Comparative genome hybridization array analysis showed that whereas 42 core elements were components of CNVR that varied among mouse strains, 8 did not vary among strains (constant type), and the status of the others could not be determined. The CNV-type core elements contained significantly larger proportions of long terminal repeat (LTR) types of retrotransposon than the constant-type core elements, which had no CNV. The higher divergence rates observed in the CNV-type core elements than in the constant type indicate that the CNV-type core elements have a longer evolutionary history than constant-type core elements in SD13M. Conclusions Our methodology for the identification of repetitive core sequences simplifies characterization of the structures of large SDs and detailed analysis of CNV. The results of detailed structural and quantitative analyses in this study might help to elucidate the biological role of one of the SDs on chromosome 13. PMID:23834397