Science.gov

Sample records for sequence including maps

  1. Genetic mapping and DNA sequencing

    SciTech Connect

    Speed, T.; Waterman, M.S.

    1996-12-31

    The Human Genome Initiative has as its primary objective the characterization of the human genome. High-resolution linkage maps of genetic markers will play an important role in completing the human genome project. This is one of two volumes based on the proceedings of the 1994 IMA Summer Program on Molecular Biology and comprises Weeks 1 and 2 of the four-week program. This volume focuses on genetic mapping and DNA sequencing. Selected papers are indexed separately for inclusion in the Energy Science and Technology Database.

  2. Benchmarking short sequence mapping tools.

    PubMed

    Hatem, Ayat; Bozdağ, Doruk; Toland, Amanda E; Çatalyürek, Ümit V

    2013-06-07

    The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.

  3. Benchmarking short sequence mapping tools

    PubMed Central

    2013-01-01

    Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764

  4. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    SciTech Connect

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  5. Validation of rice genome sequence by optical mapping

    PubMed Central

    Zhou, Shiguo; Bechner, Michael C; Place, Michael; Churas, Chris P; Pape, Louise; Leong, Sally A; Runnheim, Rod; Forrest, Dan K; Goldstein, Steve; Livny, Miron; Schwartz, David C

    2007-01-01

    Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences

  6. From sequence mapping to genome assemblies.

    PubMed

    Otto, Thomas D

    2015-01-01

    The development of "next-generation" high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics-the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence ("de novo assembly"). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.

  7. [Mapping and human genome sequence program].

    PubMed

    Weissenbach, J

    1997-03-01

    Until recently, human genome programs focused primarily on establishing maps that would provide signposts to researchers seeking to identify genes responsible for inherited diseases, as well as a basis for genome sequencing studies. Preestablished gene mapping goals have been reached. The over 7,000 microsatellite markers identified to date provide a map of sufficient density to allow localization of the gene of a monogenic disease with a precision of 1 to 2 million base pairs. The physical map, based on systematically arranged overlapping sets of artificial yeast chromosomes (YACs), has also made considerable headway during the last few years. The most recently published map covers more than 90% of the genome. However, currently available physical maps cannot be used for sequencing studies because multiple rearrangements occur in YACs. The recently developed sets of radioinduced hybrids are extremely useful for incorporating genes into existing maps. A network of American and European laboratories has successfully used these radioinduced hybrids to map 15,000 gene tags from large-scale cDNA library sequencing programs. There are increasingly pressing reasons for initiating large scale human genome sequencing studies.

  8. Sequence-based mapping of the polyploid wheat genome.

    PubMed

    Saintenac, Cyrille; Jiang, Dayou; Wang, Shichen; Akhunov, Eduard

    2013-07-08

    The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40-100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome.

  9. A Statistical Approach for Ambiguous Sequence Mappings

    USDA-ARS?s Scientific Manuscript database

    When attempting to map RNA sequences to a reference genome, high percentages of short sequence reads are often assigned to multiple genomic locations. One approach to handling these “ambiguous mappings” has been to discard them. This results in a loss of data, which can sometimes be as much as 45% o...

  10. Mapping Replication Origin Sequences in Eukaryotic Chromosomes

    PubMed Central

    Fu, Haiqing; Besnard, Emilie; Desprat, Romain; Ryan, Michael; Kahli, Malik; Lemaitre, Jean-Marc; Aladjem, Mirit I.

    2014-01-01

    Recent advances in genome sequencing technology have led towards the complete mapping of DNA replication initiation sites in the human genome. This thorough origin mapping facilitates the understanding of the relationship between replication initiation events, transcription and chromatin modifications and allows the characterization of consensus sequences of potential replication origins. This unit provides a detailed protocol for isolation and sequence analyses of nascent DNA strands. Two variations of the protocol based on non-overlapping assumptions are described below, addressing potential bias issues for whole genome analyses. PMID:25447077

  11. A Teaching-Learning Sequence about Weather Map Reading

    ERIC Educational Resources Information Center

    Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

    2017-01-01

    In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a…

  12. From mapping to sequencing, post-sequencing and beyond.

    PubMed

    Sasaki, Takuji; Matsumoto, Takashi; Antonio, Baltazar A; Nagamura, Yoshiaki

    2005-01-01

    The Rice Genome Research Program (RGP) in Japan has been collaborating with the international community in elucidating a complete high-quality sequence of the rice genome. As the pioneer in large-scale analysis of the rice genome, the RGP has successfully established the fundamental tools for genome research such as a genetic map, a yeast artificial chromosome (YAC)-based physical map, a transcript map and a phage P1 artificial chromosome (PAC)/bacterial artificial chromosome (BAC) sequence-ready physical map, which serve as common resources for genome sequencing. Among the 12 rice chromosomes, the RGP is in charge of sequencing six chromosomes covering 52% of the 390 Mb total length of the genome. The contribution of the RGP to the realization of decoding the rice genome sequence with high accuracy and deciphering the genetic information in the genome will have a great impact in understanding the biology of the rice plant that provides a major food source for almost half of the world's population. A high-quality draft sequence (phase 2) was completed in December 2002. Since then, much of the finished quality sequence (phase 3) has become available in public databases. With the completion of sequencing in December 2004, it is expected that the genome sequence would facilitate innovative research in functional and applied genomics. A map-based genome sequence is indispensable for further improvement of current rice varieties and for development of novel varieties carrying agronomically important traits such as high yield potential and tolerance to both biotic and abiotic stresses. In addition to genome sequencing, various related projects have been initiated to generate valuable resources, which could serve as indispensable tools in clarifying the structure and function of the rice genome. These resources have been made available to the scientific community through the Rice Genome Resource Center (RGRC) of the National Institute of Agrobiological Sciences (NIAS) to

  13. Reading sequence-directed computational nucleosome maps.

    PubMed

    Nibhani, Reshma; Trifonov, Edward N

    2015-01-01

    Recently developed latest version of the sequence-directed single-base resolution nucleosome mapping reveals existence of strong nucleosomes and chromatin columnar structures (columns). Broad application of this simple technique for further studies of chromatin and chromosome structure requires some basic understanding as to how it works and what information it affords. The paper provides such an introduction to the method. The oscillating maps of singular nucleosomes, of short and long oligonucleosome columns, are explained, as well as maps of chromatin on satellite DNA and occurrences of counter-phase (antiparallel) nucleosome neighbors.

  14. Interior view looking SW includes map hanging from ceiling and ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Interior view looking SW includes map hanging from ceiling and edge of fire finder stand on right. - Badger Mountain Lookout, .125 mile northwest of Badger Mountain summit, East Wenatchee, Douglas County, WA

  15. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  16. Getting started in mapping-by-sequencing.

    PubMed

    Candela, Héctor; Casanova-Sáez, Rubén; Micol, José Luis

    2015-07-01

    Next-generation sequencing (NGS) technologies allow the cost-effective sequencing of whole genomes and have expanded the scope of genomics to novel applications, such as the genome-wide characterization of intraspecific polymorphisms and the rapid mapping and identification of point mutations. Next-generation sequencing platforms, such as the Illumina HiSeq2000 platform, are now commercially available at affordable prices and routinely produce an enormous amount of sequence data, but their wide use is often hindered by a lack of knowledge on how to manipulate and process the information produced. In this review, we focus on the strategies that are available to geneticists who wish to incorporate these novel approaches into their research but who are not familiar with the necessary bioinformatic concepts and computational tools. In particular, we comprehensively summarize case studies where the use of NGS technologies has led to the identification of point mutations, a strategy that has been dubbed "mapping-by-sequencing", and review examples from plants and other model species such as Caenorhabditis elegans, Saccharomyces cerevisiae, and Drosophila melanogaster. As these technologies are becoming cheaper and more powerful, their use is also expanding to allow mutation identification in species with larger genomes, such as many crop plants. © 2014 Institute of Botany, Chinese Academy of Sciences.

  17. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  18. Mapping and Sequencing the Human Genome

    DOE R&D Accomplishments Database

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  19. Using the NCBI Map Viewer to browse genomic sequence data.

    PubMed

    Wolfsberg, Tyra G

    2011-04-01

    This unit includes a basic protocol with an introduction to the Map Viewer, describing how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The alternate protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family. 2011 by John Wiley & Sons, Inc.

  20. Using the NCBI map viewer to browse genomic sequence data.

    PubMed

    Wolfsberg, Tyra G

    2010-03-01

    This unit includes a Basic Protocol with an introduction to the Map Viewer, describing how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The Alternate Protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family. (c) 2010 by John Wiley & Sons, Inc.

  1. Using the NCBI Map Viewer to browse genomic sequence data.

    PubMed

    Wolfsberg, Tyra G

    2007-01-01

    This unit includes an introduction to the Map Viewer, which describes how to perform a simple text-based search of genome annotations to view the genomic context of a gene, navigate along a chromosome, zoom in and out, and change the displayed maps to hide and show information. It also describes some of NCBI's sequence-analysis tools, which are provided as links from the Map Viewer. The Alternate Protocols describe different ways to query the genome sequence, and also illustrate additional features of the Map Viewer. Alternate Protocol 1 shows how to perform and interpret the results of a BLAST search against the human genome. Alternate Protocol 2 demonstrates how to retrieve a list of all genes between two STS markers. Finally, Alternate Protocol 3 shows how to find all annotated members of a gene family.

  2. Does cortical mapping protect naming if surgery includes hippocampal resection?

    PubMed Central

    Hamberger, Marla J.; Seidel, William T.; Goodman, Robert R.; McKhann, Guy M.

    2009-01-01

    Objective Pre-resection electrical stimulation mapping is frequently used to identify cortical sites critical for visual object naming. These sites are typically spared from surgical resection with the goal of preserving postoperative language. Recent studies, however, suggest a potential role of the hippocampus in naming, although this is inconsistent with neurocognitive models of language and memory. We sought to determine whether preservation of visual naming sites identified via cortical stimulation mapping protects against naming decline when resection includes the hippocampal region. Methods We assessed postoperative changes in visual naming in 33 patients, 14 who underwent left temporal resection including hippocamal removal and 19 patients who had left temporal resection without hippocampal removal. All patients had preresection cortical language mapping. Visual object naming sites identified via electrical stimulation were always preserved. Results Patients without hippocampal resection showed no significant naming decline, suggesting a clinical benefit from cortical mapping. In contrast, patients who had hippocampal resection exhibited significant postoperative naming decline, despite pre-resection mapping and preservation of all visual naming sites (P ≤ .02). These group effects were also evident in individual patients (P = .02). More detailed, post hoc examination of patients who had hippocampal resection revealed that overall, patients who declined were those with a preoperative, structurally intact hippocampus, whereas patients with preoperative hippocampal sclerosis did not exhibit significant decline. Interpretation Despite cortical language mapping with preservation of visual naming sites from resection, removal of an intact dominant hippocampus will likely result in visual naming decline postoperatively. PMID:20373346

  3. Sequence analysis by iterated maps, a review

    PubMed Central

    2014-01-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, ‘Chaos Game Representation’. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  4. Survey sequencing and radiation hybrid mapping to construct comparative maps

    PubMed Central

    Hitte, Christophe; Kirkness, Ewen F.; Ostrander, Elaine A; Galibert, Francis

    2008-01-01

    Radiation hybrid (RH) mapping has become one of the most well established techniques for economically and efficiently navigating genomes of interest. The success of the technique relies on random chromosome breakage of a target genome, which is then captured by recipient cells missing a pre-selected marker. Selection for hybrid cells that have DNA fragments bearing the marker of choice, plus a random set of DNA fragments from the initial irradiation, generates a set of cell lines that recapitulates the genome of the target organism several-fold. Markers or genes of interest are analyzed by PCR using DNA isolated from each cell line. Statistical tools are applied to determine both the linear order of markers on each chromosome, and the confidence of each placement. The resolution of the resulting map relies on many factors, most notably the degree of breakage from the initial radiation as well as the number of hybrid clones and mean retention value. A high resolution RH map of a genome derived from low pass or survey sequencing (coverage from 1 to 2x) can provide essentially the same comparative data on gene order that is derived from high-coverage (greater than 7x) genome sequencing. When combined with Fluorescence in Situ Hybridization (FISH), RH maps are complete and ordered blueprints for each chromosome. They give information about the relative order and spacing of genes and markers, and allow investigators to move between target and reference genomes, such as those of mouse or human, with ease although the approach is not limited to mammal genomes. PMID:18629661

  5. 1. MAP OF THE OHIO CANAL, INCLUDING LOCK #37 (14 ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    1. MAP OF THE OHIO CANAL, INCLUDING LOCK #37 (14 MILE LOCK). MADE UNDER THE DIRECTION OF THE BOARD OF PUBLIC WORKS, DECEMBER 1912. SCALE 80'=1'. PROPERTY OF AMERICAN STEEL AND WIRE COMPANY, CLEVELAND, OHIO. - Ohio & Erie Canal, Lock No. 37, At Canal & Fitzwater Roads, Valley View, Cuyahoga County, OH

  6. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data

    PubMed Central

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  7. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.

    PubMed

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences.

  8. Analecta of structures formed during the 28 June 1992 Landers-Big Bear, California earthquake sequence (including maps of shear zones, belts of shear zones, tectonic ridge, duplex en echelon fault, fault elements, and thrusts in restraining steps)

    SciTech Connect

    Johnson, A.M.; Johnson, N.A.; Johnson, K.M.; Wei, W.; Fleming, R.W.; Cruikshank, K.M.; Martosudarmo, S.Y.

    1997-12-31

    The June 28, 1992, M{sub s} 7.5 earthquake at Landers, California, which occurred about 10 km north of the community of Yucca Valley, California, produced spectacular ground rupturing more than 80 km in length (Hough and others, 1993). The ground rupturing, which was dominated by right-lateral shearing, extended along at least four distinct faults arranged broadly en echelon. The faults were connected through wide transfer zones by stepovers, consisting of right-lateral fault zones and tension cracks. The Landers earthquakes occurred in the desert of southeastern California, where details of ruptures were well preserved, and patterns of rupturing were generally unaffected by urbanization. The structures were varied and well-displayed and, because the differential displacements were so large, spectacular. The scarcity of vegetation, the aridity of the area, the compactness of the alluvium and bedrock, and the relative isotropy and brittleness of surficial materials collaborated to provide a marvelous visual record of the character of the deformation zones. The authors present a series of analecta -- that is, verbal clips or snippets -- dealing with a variety of structures, including belts of shear zones, segmentation of ruptures, rotating fault block, en echelon fault zones, releasing duplex structures, spines, and ramps. All of these structures are documented with detailed maps in text figures or in plates (in pocket). The purpose is to describe the structures and to present an understanding of the mechanics of their formation. Hence, most descriptions focus on structures where the authors have information on differential displacements as well as spatial data on the position and orientation of fractures.

  9. Next-Generation Technologies for Multiomics Approaches Including Interactome Sequencing

    PubMed Central

    Ohashi, Hiroyuki; Miyamoto-Sato, Etsuko

    2015-01-01

    The development of high-speed analytical techniques such as next-generation sequencing and microarrays allows high-throughput analysis of biological information at a low cost. These techniques contribute to medical and bioscience advancements and provide new avenues for scientific research. Here, we outline a variety of new innovative techniques and discuss their use in omics research (e.g., genomics, transcriptomics, metabolomics, proteomics, and interactomics). We also discuss the possible applications of these methods, including an interactome sequencing technology that we developed, in future medical and life science research. PMID:25649523

  10. A teaching-learning sequence about weather map reading

    NASA Astrophysics Data System (ADS)

    Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

    2017-07-01

    In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a weather forecast. Sixty PET capabilities and difficulties in understanding weather maps were investigated, using inquiry-based learning activities. The results show that most PET became more capable of reading weather maps and assigning wind direction and speed on them. Our results also show that PET could be guided to understand meteorology concepts useful in everyday life and in teaching their future students.

  11. Target Enrichment Improves Mapping of Complex Traits by Deep Sequencing.

    PubMed

    Guo, Jianjun; Fan, Jue; Hauser, Bernard A; Rhee, Seung Y

    2015-11-03

    Complex traits such as crop performance and human diseases are controlled by multiple genetic loci, many of which have small effects and often go undetected by traditional quantitative trait locus (QTL) mapping. Recently, bulked segregant analysis with large F2 pools and genome-level markers (named extreme-QTL or X-QTL mapping) has been used to identify many QTL. To estimate parameters impacting QTL detection for X-QTL mapping, we simulated the effects of population size, marker density, and sequencing depth of markers on QTL detectability for traits with differing heritabilities. These simulations indicate that a high (>90%) chance of detecting QTL with at least 5% effect requires 5000× sequencing depth for a trait with heritability of 0.4-0.7. For most eukaryotic organisms, whole-genome sequencing at this depth is not economically feasible. Therefore, we tested and confirmed the feasibility of applying deep sequencing of target-enriched markers for X-QTL mapping. We used two traits in Arabidopsis thaliana with different heritabilities: seed size (H(2) = 0.61) and seedling greening in response to salt (H(2) = 0.94). We used a modified G test to identify QTL regions and developed a model-based statistical framework to resolve individual peaks by incorporating recombination rates. Multiple QTL were identified for both traits, including previously undiscovered QTL. We call our method target-enriched X-QTL (TEX-QTL) mapping; this mapping approach is not limited by the genome size or the availability of recombinant inbred populations and should be applicable to many organisms and traits.

  12. Simulation of Accident Sequences Including Emergency Operating Procedures

    SciTech Connect

    Queral, Cesar; Exposito, Antonio; Hortal, Javier

    2004-07-01

    Operator actions play an important role in accident sequences. However, design analysis (Safety Analysis Report, SAR) seldom includes consideration of operator actions, although they are required by compulsory Emergency Operating Procedures (EOP) to perform some checks and actions from the very beginning of the accident. The basic aim of the project is to develop a procedure validation system which consists of the combination of three elements: a plant transient simulation code TRETA (a C based modular program) developed by the CSN, a computerized procedure system COPMA-III (Java technology based program) developed by the OECD-Halden Reactor Project and adapted for simulation with the contribution of our group and a software interface that provides the communication between COPMA-III and TRETA. The new combined system is going to be applied in a pilot study in order to analyze sequences initiated by secondary side breaks in a Pressurized Water Reactors (PWR) plant. (authors)

  13. Mapping copy number variation by population-scale genome sequencing.

    PubMed

    Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

    2011-02-03

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

  14. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1993-02-16

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a pu GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon.

  15. Halvade: scalable sequence analysis with MapReduce

    PubMed Central

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2015-01-01

    Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR. Its source is available at http://bioinformatics.intec.ugent.be/halvade under GPL license. Contact: jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25819078

  16. A maize map standard with sequenced core markers, grass genome reference points and 932 expressed sequence tagged sites (ESTs) in a 1736-locus map.

    PubMed Central

    Davis, G L; McMullen, M D; Baysdorfer, C; Musket, T; Grant, D; Staebell, M; Xu, G; Polacco, M; Koster, L; Melia-Hancock, S; Houchins, K; Chao, S; Coe, E H

    1999-01-01

    We have constructed a 1736-locus maize genome map containing1156 loci probed by cDNAs, 545 probed by random genomic clones, 16 by simple sequence repeats (SSRs), 14 by isozymes, and 5 by anonymous clones. Sequence information is available for 56% of the loci with 66% of the sequenced loci assigned functions. A total of 596 new ESTs were mapped from a B73 library of 5-wk-old shoots. The map contains 237 loci probed by barley, oat, wheat, rice, or tripsacum clones, which serve as grass genome reference points in comparisons between maize and other grass maps. Ninety core markers selected for low copy number, high polymorphism, and even spacing along the chromosome delineate the 100 bins on the map. The average bin size is 17 cM. Use of bin assignments enables comparison among different maize mapping populations and experiments including those involving cytogenetic stocks, mutants, or quantitative trait loci. Integration of nonmaize markers in the map extends the resources available for gene discovery beyond the boundaries of maize mapping information into the expanse of map, sequence, and phenotype information from other grass species. This map provides a foundation for numerous basic and applied investigations including studies of gene organization, gene and genome evolution, targeted cloning, and dissection of complex traits. PMID:10388831

  17. A maize map standard with sequenced core markers, grass genome reference points and 932 expressed sequence tagged sites (ESTs) in a 1736-locus map.

    PubMed

    Davis, G L; McMullen, M D; Baysdorfer, C; Musket, T; Grant, D; Staebell, M; Xu, G; Polacco, M; Koster, L; Melia-Hancock, S; Houchins, K; Chao, S; Coe, E H

    1999-07-01

    We have constructed a 1736-locus maize genome map containing1156 loci probed by cDNAs, 545 probed by random genomic clones, 16 by simple sequence repeats (SSRs), 14 by isozymes, and 5 by anonymous clones. Sequence information is available for 56% of the loci with 66% of the sequenced loci assigned functions. A total of 596 new ESTs were mapped from a B73 library of 5-wk-old shoots. The map contains 237 loci probed by barley, oat, wheat, rice, or tripsacum clones, which serve as grass genome reference points in comparisons between maize and other grass maps. Ninety core markers selected for low copy number, high polymorphism, and even spacing along the chromosome delineate the 100 bins on the map. The average bin size is 17 cM. Use of bin assignments enables comparison among different maize mapping populations and experiments including those involving cytogenetic stocks, mutants, or quantitative trait loci. Integration of nonmaize markers in the map extends the resources available for gene discovery beyond the boundaries of maize mapping information into the expanse of map, sequence, and phenotype information from other grass species. This map provides a foundation for numerous basic and applied investigations including studies of gene organization, gene and genome evolution, targeted cloning, and dissection of complex traits.

  18. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 12 figs.

  19. CDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  20. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  1. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 11 figures.

  2. cDNA encoding a polypeptide including a hevein sequence

    SciTech Connect

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  3. A Probabilistic Approach for Improved Sequence Mapping in Metatranscriptomic Studies

    USDA-ARS?s Scientific Manuscript database

    Mapping millions of short DNA sequences a reference genome is a necessary step in many experiments designed to investigate the expression of genes involved in disease resistance. This is a difficult task in which several challenges often arise resulting in a suboptimal mapping. This mapping process ...

  4. Microbial genome sequencing using optical mapping and Illumina sequencing

    USDA-ARS?s Scientific Manuscript database

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  5. [Recent progress in gene mapping through high-throughput sequencing technology and forward genetic approaches].

    PubMed

    Lu, Cairui; Zou, Changsong; Song, Guoli

    2015-08-01

    Traditional gene mapping using forward genetic approaches is conducted primarily through construction of a genetic linkage map, the process of which is tedious and time-consuming, and often results in low accuracy of mapping and large mapping intervals. With the rapid development of high-throughput sequencing technology and decreasing cost of sequencing, a variety of simple and quick methods of gene mapping through sequencing have been developed, including direct sequencing of the mutant genome, sequencing of selective mutant DNA pooling, genetic map construction through sequencing of individuals in population, as well as sequencing of transcriptome and partial genome. These methods can be used to identify mutations at the nucleotide level and has been applied in complex genetic background. Recent reports have shown that sequencing mapping could be even done without the reference of genome sequence, hybridization, and genetic linkage information, which made it possible to perform forward genetic study in many non-model species. In this review, we summarized these new technologies and their application in gene mapping.

  6. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  7. Trace and antitrace maps for aperiodic sequences: Extensions and applications

    NASA Astrophysics Data System (ADS)

    Wang, Xiaoguang; Grimm, Uwe; Schreiber, Michael

    2000-12-01

    We study aperiodic systems based on substitution rules by means of a transfer-matrix approach. In addition to the well-known trace map, we investigate the so-called ``antitrace'' map, which is the corresponding map for the difference of the off-diagonal elements of the 2×2 transfer matrix. The antitrace maps are obtained for various binary, ternary, and quaternary aperiodic sequences, such as the Fibonacci, Thue-Morse, period-doubling, Rudin-Shapiro sequences, and certain generalizations. For arbitrary substitution rules, we show that not only trace maps, but also antitrace maps exist. The dimension of our antitrace map is r(r+1)/2, where r denotes the number of basic letters in the aperiodic sequence. Analogous maps for specific matrix elements of the transfer matrix can also be constructed, but the maps for the off-diagonal elements and for the difference of the diagonal elements coincide with the antitrace map. Thus, from the trace and antitrace map, we can determine any physical quantity related to the global transfer matrix of the system. As examples, we employ these dynamical maps to compute the transmission coefficients for optical multilayers, harmonic chains, and electronic systems.

  8. QTL mapping using high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Quantitative trait locus (QTL) mapping in plants dates to the 1980’s, but earlier studies were often hindered by the expense and time required to identify large numbers of polymorphic genetic markers that differentiated the parental genotypes and then to genotype them on large segregating mapping po...

  9. A novel fluence map optimization model incorporating leaf sequencing constraints.

    PubMed

    Jin, Renchao; Min, Zhifang; Song, Enmin; Liu, Hong; Ye, Yinyu

    2010-02-21

    A novel fluence map optimization model incorporating leaf sequencing constraints is proposed to overcome the drawbacks of the current objective inside smoothing models. Instead of adding a smoothing item to the objective function, we add the total number of monitor unit (TNMU) requirement directly to the constraints which serves as an important factor to balance the fluence map optimization and leaf sequencing optimization process at the same time. Consequently, we formulate the fluence map optimization models for the trailing (left) leaf synchronized, leading (right) leaf synchronized and the interleaf motion constrained non-synchronized leaf sweeping schemes, respectively. In those schemes, the leaves are all swept unidirectionally from left to right. Each of those models is turned into a linear constrained quadratic programming model which can be solved effectively by the interior point method. Those new models are evaluated with two publicly available clinical treatment datasets including a head-neck case and a prostate case. As shown by the empirical results, our models perform much better in comparison with two recently emerged smoothing models (the total variance smoothing model and the quadratic smoothing model). For all three leaf sweeping schemes, our objective dose deviation functions increase much slower than those in the above two smoothing models with respect to the decreasing of the TNMU. While keeping plans in the similar conformity level, our new models gain much better performance on reducing TNMU.

  10. Restoration of distorted depth maps calculated from stereo sequences

    NASA Technical Reports Server (NTRS)

    Damour, Kevin; Kaufman, Howard

    1991-01-01

    A model-based Kalman estimator is developed for spatial-temporal filtering of noise and other degradations in velocity and depth maps derived from image sequences or cinema. As an illustration of the proposed procedures, edge information from image sequences of rigid objects is used in the processing of the velocity maps by selecting from a series of models for directional adaptive filtering. Adaptive filtering then allows for noise reduction while preserving sharpness in the velocity maps. Results from several synthetic and real image sequences are given.

  11. Congenic mapping and sequence analysis of the Renin locus

    PubMed Central

    Flister, Michael J.; Hoffman, Matthew J.; Reddy, Prajwal; Jacob, Howard J.; Moreno, Carol

    2013-01-01

    Renin was the first blood pressure (BP) quantitative trait locus (QTL) mapped by linkage analysis in the rat. Subsequent BP linkage and congenic studies capturing different portions of the renin region have returned conflicting results, suggesting that multiple interdependent BP loci may be residing in the chromosome 13 BP QTL that includes Renin. We used SS-13BN congenic strains to map 2 BP loci in the Renin region (chr13:45.2–49.0 Mb). We identified a 1.1 Mb protective Brown Norway (BN) region around Renin (chr13:46.1–47.2 Mb) that significantly decreased BP by 32 mmHg. The Renin protective BP locus was offset by an adjacent hypertensive locus (chr13:47.2–49.0 Mb) that significantly increased BP by 29 mmHg. Sequence analysis of the protective and hypertensive BP loci revealed 1,433 and 2,063 variants between Dahl salt-sensitive/Mcwi (SS) and BN rats, respectively. To further reduce the list of candidate variants, we re-genotyped an overlapping SS-13SR congenic strain (S/renrr) with a previously reported BP phenotype. Sequence comparison between SS, Dahl R (SR), and BN reduced the number of candidate variants in the 2 BP loci by 42% for further study. Combined with previous studies, these data suggest that at least 4 BP loci reside within the 30 cM chromosome 13 BP QTL that includes Renin. PMID:23460292

  12. Appliation of rad-sequencing to linkage mapping in citrus

    USDA-ARS?s Scientific Manuscript database

    High density linkage maps can be developed for modest cost using high-throughput DNA sequencing to genotype a defined fraction (representation) of the genome. We developed linkage maps in two citrus populations using the RAD (Restriction site Associated DNA) genotyping method which involves restrict...

  13. Universal full-length nucleosome mapping sequence probe.

    PubMed

    Tripathi, Vijay; Salih, Bilal; Trifonov, Edward N

    2015-01-01

    For the computational sequence-directed mapping of the nucleosomes, the knowledge of the nucleosome positioning motifs - 10-11 base long sequences - and respective matrices of bendability, is not sufficient, since there is no justified way to fuse these motifs in one continuous nucleosome DNA sequence. Discovery of the strong nucleosome (SN) DNA sequences, with visible sequence periodicity allows derivation of the full-length nucleosome DNA bendability pattern as matrix or consensus sequence. The SN sequences of three species (A. thaliana, C. elegans, and H. sapiens) are aligned (512 sequences for each species), and long (115 dinucleotides) matrices of bendability derived for the species. The matrices have strong common property - alternation of runs of purine-purine (RR) and pyrimidine-pyrimidine (YY) dinucleotides, with average period 10.4 bases. On this basis the universal [R,Y] consensus of the nucleosome DNA sequence is derived, with exactly defined positions of respective penta- and hexamers RRRRR, RRRRRR, YYYYY, and YYYYYY.

  14. Simple sequence repeat map of the sunflower genome.

    PubMed

    Tang, S.; Yu, J.-K.; Slabaugh, B.; Shintani, K.; Knapp, J.

    2002-12-01

    Several independent molecular genetic linkage maps of varying density and completeness have been constructed for cultivated sunflower ( Helianthus annuus L.). Because of the dearth of sequence and probe-specific DNA markers in the public domain, the various genetic maps of sunflower have not been integrated and a single reference map has not emerged. Moreover, comparisons between maps have been confounded by multiple linkage group nomenclatures and the lack of common DNA markers. The goal of the present research was to construct a dense molecular genetic linkage map for sunflower using simple sequence repeat (SSR) markers. First, 879 SSR markers were developed by identifying 1,093 unique SSR sequences in the DNA sequences of 2,033 clones isolated from genomic DNA libraries enriched for (AC)(n) or (AG)(n) and screening 1,000 SSR primer pairs; 579 of the newly developed SSR markers (65.9% of the total) were polymorphic among four elite inbred lines (RHA280, RHA801, PHA and PHB). The genetic map was constructed using 94 RHA280 x RHA801 F(7) recombinant inbred lines (RILs) and 408 polymorphic SSR markers (462 SSR marker loci segregated in the mapping population). Of the latter, 459 coalesced into 17 linkage groups presumably corresponding to the 17 chromosomes in the haploid sunflower genome ( x = 17). The map was 1,368.3-cM long and had a mean density of 3.1 cM per locus. The SSR markers described herein supply a critical mass of DNA markers for constructing genetic maps of sunflower and create the basis for unifying and cross-referencing the multitude of genetic maps developed for wild and cultivated sunflowers.

  15. Probe mapping to facilitate transposon-based DNA sequencing

    SciTech Connect

    Strausbaugh, L.D.; Bourke, M.T.; Sommer, M.T.; Coon, M.E.; Berg, C.M. )

    1990-08-01

    A promising strategy for DNA sequencing exploits transposons to provide mobile sites for the binding of sequencing primers. For such a strategy to be maximally efficient, the location and orientation of the transposon must be readily determined and the insertion sites should be randomly distributed. The authors demonstrate an efficient probe-based method for the localization and orientation of transposon-borne primer sites, which is adaptable to large-scale sequencing strategies. This approach requires no prior restriction enzyme mapping or knowledge of the cloned sequence and eliminates the inefficiency inherent in totally random sequencing methods. To test the efficiency of probe mapping, 49 insertions of the transposon {gamma}{delta} (Tn1000) in a cloned fragment of Drosophila melanogaster DNA were mapped and oriented. In addition, oligonucleotide primers specific for unique subterminal {gamma}{delta} segments were used to prime dideoxynucleotide double-stranded sequencing. These data provided an opportunity to rigorously examine {gamma}{delta} insertion sites. The insertions were quire randomly distributed, even though the target DNA fragment had both A+T-rich and G+C-rich regions; in G+C-rich DNA, the insertions were found in A+T-rich valleys. These data demonstrate that {gamma}{delta} is an excellent choice for supplying mobile primer binding sites to cloned DNA and that transposon-based probe mapping permits the sequences of large cloned segments to be determined without any subcloning.

  16. Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers

    PubMed Central

    Atwood, Tressa S.; Currey, Mark C.; Shiver, Anthony L.; Lewis, Zachary A.; Selker, Eric U.; Cresko, William A.; Johnson, Eric A.

    2008-01-01

    Single nucleotide polymorphism (SNP) discovery and genotyping are essential to genetic mapping. There remains a need for a simple, inexpensive platform that allows high-density SNP discovery and genotyping in large populations. Here we describe the sequencing of restriction-site associated DNA (RAD) tags, which identified more than 13,000 SNPs, and mapped three traits in two model organisms, using less than half the capacity of one Illumina sequencing run. We demonstrated that different marker densities can be attained by choice of restriction enzyme. Furthermore, we developed a barcoding system for sample multiplexing and fine mapped the genetic basis of lateral plate armor loss in threespine stickleback by identifying recombinant breakpoints in F2 individuals. Barcoding also facilitated mapping of a second trait, a reduction of pelvic structure, by in silico re-sorting of individuals. To further demonstrate the ease of the RAD sequencing approach we identified polymorphic markers and mapped an induced mutation in Neurospora crassa. Sequencing of RAD markers is an integrated platform for SNP discovery and genotyping. This approach should be widely applicable to genetic mapping in a variety of organisms. PMID:18852878

  17. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Namhai Chua; Kush, A.

    1993-02-16

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids.

  18. Data repository mapping for influenza protein sequence analysis

    NASA Astrophysics Data System (ADS)

    Pellegrino, Donald; Chen, Chaomei

    2011-01-01

    This paper introduces a new method for creating an interactive sequence similarity map of all known influenza virus protein sequences and integrating the map with existing general purpose analytical tools. The NCBI data model was designed to provide a high degree of interconnectedness amongst data objects. Substantial and continuous increase in data volume has led to a large and highly connected information space. Researchers seeking to explore this space are challenged to identify a starting point. They often choose data that is popular in the literature. Reference in the literature follow a power law distribution and popular data points may bias explorers toward paths that lead only to a dead-end of what is already known. To help discover the unexpected we developed an interactive visual analytics system to map the information space of influenza protein sequence data. The design is motivated by the needs of eScience researchers.

  19. Genetic interaction mapping with microfluidic-based single cell sequencing.

    PubMed

    Haliburton, John R; Shao, Wenjun; Deutschbauer, Adam; Arkin, Adam; Abate, Adam R

    2017-01-01

    Genetic interaction mapping is useful for understanding the molecular basis of cellular decision making, but elucidating interactions genome-wide is challenging due to the massive number of gene combinations that must be tested. Here, we demonstrate a simple approach to thoroughly map genetic interactions in bacteria using microfluidic-based single cell sequencing. Using single cell PCR in droplets, we link distinct genetic information into single DNA sequences that can be decoded by next generation sequencing. Our approach is scalable and theoretically enables the pooling of entire interaction libraries to interrogate multiple pairwise genetic interactions in a single culture. The speed, ease, and low-cost of our approach makes genetic interaction mapping viable for routine characterization, allowing the interaction network to be used as a universal read out for a variety of biology experiments, and for the elucidation of interaction networks in non-model organisms.

  20. Genetic interaction mapping with microfluidic-based single cell sequencing

    PubMed Central

    Haliburton, John R.; Shao, Wenjun; Deutschbauer, Adam; Arkin, Adam; Abate, Adam R.

    2017-01-01

    Genetic interaction mapping is useful for understanding the molecular basis of cellular decision making, but elucidating interactions genome-wide is challenging due to the massive number of gene combinations that must be tested. Here, we demonstrate a simple approach to thoroughly map genetic interactions in bacteria using microfluidic-based single cell sequencing. Using single cell PCR in droplets, we link distinct genetic information into single DNA sequences that can be decoded by next generation sequencing. Our approach is scalable and theoretically enables the pooling of entire interaction libraries to interrogate multiple pairwise genetic interactions in a single culture. The speed, ease, and low-cost of our approach makes genetic interaction mapping viable for routine characterization, allowing the interaction network to be used as a universal read out for a variety of biology experiments, and for the elucidation of interaction networks in non-model organisms. PMID:28170417

  1. Decoding the cognitive map: ensemble hippocampal sequences and decision making.

    PubMed

    Wikenheiser, Andrew M; Redish, A David

    2015-06-01

    Tolman proposed that complex animal behavior is mediated by the cognitive map, an integrative learning system that allows animals to reconfigure previous experience in order to compute predictions about the future. The discovery of place cells in the rodent hippocampus immediately suggested a plausible neural mechanism to fulfill the 'map' component of Tolman's theory. Recent work examining hippocampal representations occurring at fast time scales suggests that these sequences might be important for supporting the inferential mental operations associated with the cognitive map function. New findings that hippocampal sequences play an important causal role in mediating adaptive behavior on a moment-by-moment basis suggest specific neural processes that may underlie Tolman's cognitive map framework. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. A physical map of the papaya genome with integrated genetic map and genome sequence

    PubMed Central

    2009-01-01

    Background Papaya is a major fruit crop in tropical and subtropical regions worldwide and has primitive sex chromosomes controlling sex determination in this trioecious species. The papaya genome was recently sequenced because of its agricultural importance, unique biological features, and successful application of transgenic papaya for resistance to papaya ringspot virus. As a part of the genome sequencing project, we constructed a BAC-based physical map using a high information-content fingerprinting approach to assist whole genome shotgun sequence assembly. Results The physical map consists of 963 contigs, representing 9.4× genome equivalents, and was integrated with the genetic map and genome sequence using BAC end sequences and a sequence-tagged high-density genetic map. The estimated genome coverage of the physical map is about 95.8%, while 72.4% of the genome was aligned to the genetic map. A total of 1,181 high quality overgo (overlapping oligonucleotide) probes representing conserved sequences in Arabidopsis and genetically mapped loci in Brassica were anchored on the physical map, which provides a foundation for comparative genomics in the Brassicales. The integrated genetic and physical map aligned with the genome sequence revealed recombination hotspots as well as regions suppressed for recombination across the genome, particularly on the recently evolved sex chromosomes. Suppression of recombination spread to the adjacent region of the male specific region of the Y chromosome (MSY), and recombination rates were recovered gradually and then exceeded the genome average. Recombination hotspots were observed at about 10 Mb away on both sides of the MSY, showing 7-fold increase compared with the genome wide average, demonstrating the dynamics of recombination of the sex chromosomes. Conclusion A BAC-based physical map of papaya was constructed and integrated with the genetic map and genome sequence. The integrated map facilitated the draft genome assembly

  3. Mapping DNA methylation with high-throughput nanopore sequencing.

    PubMed

    Rand, Arthur C; Jain, Miten; Eizenga, Jordan M; Musselman-Brown, Audrey; Olsen, Hugh E; Akeson, Mark; Paten, Benedict

    2017-04-01

    DNA chemical modifications regulate genomic function. We present a framework for mapping cytosine and adenosine methylation with the Oxford Nanopore Technologies MinION using this nanopore sequencer's ionic current signal. We map three cytosine variants and two adenine variants. The results show that our model is sensitive enough to detect changes in genomic DNA methylation levels as a function of growth phase in Escherichia coli.

  4. Complete MHC Haplotype Sequencing for Common Disease Gene Mapping

    PubMed Central

    Stewart, C. Andrew; Horton, Roger; Allcock, Richard J.N.; Ashurst, Jennifer L.; Atrazhev, Alexey M.; Coggill, Penny; Dunham, Ian; Forbes, Simon; Halls, Karen; Howson, Joanna M.M.; Humphray, Sean J.; Hunt, Sarah; Mungall, Andrew J.; Osoegawa, Kazutoyo; Palmer, Sophie; Roberts, Anne N.; Rogers, Jane; Sims, Sarah; Wang, Yu; Wilming, Laurens G.; Elliott, John F.; de Jong, Pieter J.; Sawcer, Stephen; Todd, John A.; Trowsdale, John; Beck, Stephan

    2004-01-01

    The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification. PMID:15140828

  5. Indexing a sequence for mapping reads with a single mismatch

    PubMed Central

    Crochemore, Maxime; Langiu, Alessio; Rahman, M. Sohel

    2014-01-01

    Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the sequence, m is the length of the read, 0<ε<1 and is the optimal output size. PMID:24751874

  6. Mapping DNA polymerase errors by single-molecule sequencing

    SciTech Connect

    Lee, David F.; Lu, Jenny; Chang, Seungwoo; Loparo, Joseph J.; Xie, Xiaoliang S.

    2016-05-16

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replication product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.

  7. Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences.

    PubMed

    Doğan, Tunca; Karaçalı, Bilge

    2013-01-01

    Identifying shared sequence segments along amino acid sequences generally requires a collection of closely related proteins, most often curated manually from the sequence datasets to suit the purpose at hand. Currently developed statistical methods are strained, however, when the collection contains remote sequences with poor alignment to the rest, or sequences containing multiple domains. In this paper, we propose a completely unsupervised and automated method to identify the shared sequence segments observed in a diverse collection of protein sequences including those present in a smaller fraction of the sequences in the collection, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. Since shared sequence fragments often imply conserved functional or structural attributes, the method produces a table of associations between the sequences and the identified conserved regions that can reveal previously unknown protein families as well as new members to existing ones. We evaluated the biological relevance of the method by clustering the proteins in gold standard datasets and assessing the clustering performance in comparison with previous methods from the literature. We have then applied the proposed method to a genome wide dataset of 17793 human proteins and generated a global association map to each of the 4753 identified conserved regions. Investigations on the major conserved regions revealed that they corresponded strongly to annotated structural domains. This suggests that the method can be useful in predicting novel domains on protein sequences.

  8. Sniper: improved SNP discovery by multiply mapping deep sequenced reads.

    PubMed

    Simola, Daniel F; Kim, Junhyong

    2011-06-20

    SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.

  9. EST2Prot: Mapping EST sequences to proteins

    PubMed Central

    Shafer, Paul; Lin, David M; Yona, Golan

    2006-01-01

    Background EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked. Results We describe a system (EST2Prot) that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors. Conclusion EST2Prot extends and significantly enriches the popular UniGene mapping by utilizing multiple relations between known biological entities. It produces a mapping between ESTs and proteins in real-time through a simple web-interface. The system is part of the Biozon database and is accessible at . PMID:16515706

  10. Sequence of the WT1 upstream region including the Wit-1 gene

    SciTech Connect

    Gessler, M. ); Bruns, G.A.P. )

    1993-08-01

    The Wilms tumor gene WT1 encodes a Cys[sub 2]His[sub 2]-type zinc finger protein that can bind DNA and function as a transcriptional regulator. The pathological spectrum of tumorigenesis and various developmental defects produced by different WT1 alteration suggests that WT1 controls a number of subsequent effector genes. To define the role of WT1 in these developmental processes it will be important to elucidate mechanisms that govern expression of WT1 itself. To facilitate mapping of the WT1 promoter region and 5[prime] control elements the authors have determined the sequence upstream of the WT1 transcription unit. This includes the Wit-1 gene that is transcribed in the opposite direction. 11 refs., 3 figs.

  11. Evolutionary optimization of biopolymers and sequence structure maps

    SciTech Connect

    Reidys, C.M.; Kopp, S.; Schuster, P.

    1996-06-01

    Searching for biopolymers having a predefined function is a core problem of biotechnology, biochemistry and pharmacy. On the level of RNA sequences and their corresponding secondary structures we show that this problem can be analyzed mathematically. The strategy will be to study the properties of the RNA sequence to secondary structure mapping that is essential for the understanding of the search process. We show that to each secondary structure s there exists a neutral network consisting of all sequences folding into s. This network can be modeled as a random graph and has the following generic properties: it is dense and has a giant component within the graph of compatible sequences. The neutral network percolates sequence space and any two neutral nets come close in terms of Hamming distance. We investigate the distribution of the orders of neutral nets and show that above a certain threshold the topology of neutral nets allows to find practically all frequent secondary structures.

  12. Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

    PubMed

    Misra, Sanchit; Agrawal, Ankit; Liao, Wei-keng; Choudhary, Alok

    2011-01-15

    Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. http://www.ece.northwestern.edu/~smi539/agile.html.

  13. Fractal MapReduce decomposition of sequence alignment

    PubMed Central

    2012-01-01

    Background The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. Results In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. Conclusions The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing. Availability Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm". PMID:22551205

  14. Fast and sensitive mapping of nanopore sequencing reads with GraphMap

    PubMed Central

    Sović, Ivan; Šikić, Mile; Wilm, Andreas; Fenlon, Shannon Nicole; Chen, Swaine; Nagarajan, Niranjan

    2016-01-01

    Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10–80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap. PMID:27079541

  15. DNA methylation mapping by tag-modified bisulfite genomic sequencing.

    PubMed

    Han, Weiguo; Cauchi, Stephane; Herman, James G; Spivack, Simon D

    2006-08-01

    A tag-modified bisulfite genomic sequencing (tBGS) method employing direct cycle sequencing of polymerase chain reaction (PCR) products at kilobase scale, without conventional DNA fragment cloning, was developed for simplified evaluation of DNA methylation sites. The method entails subjecting bisulfite-modified genomic DNA to a second-round PCR amplification employing GC-tagged primers. Qualitative results from tBGS closely correlated with those from conventional BGS (R=0.935, p=0.002). In application, the intertissue and interindividual CpG methylation differences in promoter sequence for two genes, CYP1B1 and GSTP1, were then explored across four human tissue types (peripheral blood cells, exfoliated buccal cells, paired nontumor-tumor lung tissues), and two lung cell types in culture (normal NHBE and malignant A549). Predominantly conserved methylation maps for the two gene promoters were apparent across donors and tissues. At any given CpG site, variation in the degree of methylation could be determined by the relative height of C and T peaks in the sequencing trace. Methylation maps for the GSTP1 promoter diverged between NHBE (unmethylated) and A549 (completely methylated) cells in a previously unexplored upstream region, correlating with a 2.7-fold difference in GSTP1 mRNA expression (p<0.01). The tBGS method simplifies detailed methylation scanning of kilobase-scale genomic DNA, facilitating more ambitious genomic methylation mapping studies.

  16. A probe-based mapping strategy for DNA sequencing with mobile primers

    SciTech Connect

    Strausbaugh, L.D.; Berg, C.M.

    1991-01-01

    Research on DNA sequencing continued. The specific areas of research targeted for the period of this Progress Report included three general phases: (1) optimization of probe-mapping by both the development of new transposons and the design of stream-lined methods for mapping; (2) application of transposon-based methods to larger plasmids and cosmids; and (3) initiation of PCR-based applications of transposons.

  17. A probe-based mapping strategy for DNA sequencing with mobile primers. Progress report

    SciTech Connect

    Strausbaugh, L.D.; Berg, C.M.

    1991-12-31

    Research on DNA sequencing continued. The specific areas of research targeted for the period of this Progress Report included three general phases: (1) optimization of probe-mapping by both the development of new transposons and the design of stream-lined methods for mapping; (2) application of transposon-based methods to larger plasmids and cosmids; and (3) initiation of PCR-based applications of transposons.

  18. DNA sequence mapping by fluorescence in situ hybridization

    SciTech Connect

    Brandriff, B.F.; Gordon, L.A.; Trask, B.J. )

    1991-01-01

    Various types of DNA probes, such as total genomic DNA, repetitive sequences, unique sequences, and composites of chromosome-specific DNA probes, can be used with fluorescence in situ hybridization (FISH) techniques to address research questions having to do with localization, mapping, and distribution of DNA in situ. FISH involves the formation of a heteroduplex between such DNA probes and chromatin targets on a microscope slide, which can be visualized with fluorescent reporter molecules. Three chromatin targets - metaphase chromosomes, somatic interphases, and zygote interphases - offer increasingly extended states of chromatin which can be strategically selected, individually or in combination, to address specific research questions of interest.

  19. Simple sequence repeat marker development and genetic mapping in quinoa (Chenopodium quinoa Willd.).

    PubMed

    Jarvis, D E; Kopp, O R; Jellen, E N; Mallory, M A; Pattee, J; Bonifacio, A; Coleman, C E; Stevens, M R; Fairbanks, D J; Maughan, P J

    2008-04-01

    Quinoa is a regionally important grain crop in the Andean region of South America. Recently quinoa has gained international attention for its high nutritional value and tolerances of extreme abiotic stresses. DNA markers and linkage maps are important tools for germplasm conservation and crop improvement programmes. Here we report the development of 216 new polymorphic SSR (simple sequence repeats) markers from libraries enriched for GA, CAA and AAT repeats, as well as 6 SSR markers developed from bacterial artificial chromosome-end sequences (BES-SSRs). Heterozygosity (H) values of the SSR markers ranges from 0.12 to 0.90, with an average value of 0.57. A linkage map was constructed for a newly developed recombinant inbred lines (RIL) population using these SSR markers. Additional markers, including amplified fragment length polymorphisms (AFLPs), two 11S seed storage protein loci, and the nucleolar organizing region (NOR), were also placed on the linkage map. The linkage map presented here is the first SSR-based map in quinoa and contains 275 markers, including 200 SSR. The map consists of 38 linkage groups (LGs) covering 913 cM. Segregation distortion was observed in the mapping population for several marker loci, indicating possible chromosomal regions associated with selection or gametophytic lethality. As this map is based primarily on simple and easily-transferable SSR markers, it will be particularly valuable for research in laboratories in Andean regions of South America.

  20. A fine-scale chimpanzee genetic map from population sequencing.

    PubMed

    Auton, Adam; Fledel-Alon, Adi; Pfeifer, Susanne; Venn, Oliver; Ségurel, Laure; Street, Teresa; Leffler, Ellen M; Bowden, Rory; Aneas, Ivy; Broxholme, John; Humburg, Peter; Iqbal, Zamin; Lunter, Gerton; Maller, Julian; Hernandez, Ryan D; Melton, Cord; Venkat, Aarti; Nobrega, Marcelo A; Bontrop, Ronald; Myers, Simon; Donnelly, Peter; Przeworski, Molly; McVean, Gil

    2012-04-13

    To study the evolution of recombination rates in apes, we developed methodology to construct a fine-scale genetic map from high-throughput sequence data from 10 Western chimpanzees, Pan troglodytes verus. Compared to the human genetic map, broad-scale recombination rates tend to be conserved, but with exceptions, particularly in regions of chromosomal rearrangements and around the site of ancestral fusion in human chromosome 2. At fine scales, chimpanzee recombination is dominated by hotspots, which show no overlap with those of humans even though rates are similarly elevated around CpG islands and decreased within genes. The hotspot-specifying protein PRDM9 shows extensive variation among Western chimpanzees, and there is little evidence that any sequence motifs are enriched in hotspots. The contrasting locations of hotspots provide a natural experiment, which demonstrates the impact of recombination on base composition.

  1. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

    PubMed

    Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair; Stewart, Chip; Garrison, Erik P; Marth, Gabor T

    2014-01-01

    MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).

  2. DOE project on genome mapping and sequencing. Progress report, 1992

    SciTech Connect

    Evans, G.A.

    1992-12-31

    These efforts on the human genome project were initiated in September, 1990, to contribute towards completion of the human genome project physical mapping effort. In the original application, the authors proposed a novel strategy for constructing a physical map of human chromosome 11, based upon techniques derived in this group and by others. The original goals were to (1) produce a set of cosmid reference clones mapped to specific sites by high resolution fluorescence in situ hybridization, (2) produce a set of associated STS sequences and PCR primers for each site, (3) isolate YAC clones corresponding to each STS and, (4) construct YAC contigs such that > 90% of the chromosome would be covered by contigs of 2 mb or greater. Since that time, and with the advent of new technology and reagents, the strategy has been modified slightly but still retains the same goals as originally proposed. The authors have added a project to produce chromosome 11-specific cDNAs and determine the map location and DNA sequence of a selected portion of them.

  3. Genetic Linkage Maps of the Red Flour Beetle, Tribolium castaneum, Based on Bacterial Artificial Chromosomes and Expressed Sequence Tags

    PubMed Central

    Lorenzen, Marcé D.; Doyungan, Zaldy; Savard, Joel; Snow, Kathy; Crumly, Lindsey R.; Shippy, Teresa D.; Stuart, Jeffrey J.; Brown, Susan J.; Beeman, Richard W.

    2005-01-01

    A genetic linkage map was constructed in a backcross family of the red flour beetle, Tribolium castaneum, based largely on sequences from bacterial artificial chromosome (BAC) ends and untranslated regions from random cDNA's. In most cases, dimorphisms were detected using heteroduplex or single-strand conformational polymorphism analysis after specific PCR amplification. The map incorporates a total of 424 markers, including 190 BACs and 165 cDNA's, as well as 69 genes, transposon insertion sites, sequence-tagged sites, microsatellites, and amplified fragment-length polymorphisms. Mapped loci are distributed along 571 cM, spanning all 10 linkage groups at an average marker separation of 1.3 cM. This genetic map provides a framework for positional cloning and a scaffold for integration of the emerging physical map and genome sequence assembly. The map and corresponding sequences can be accessed through BeetleBase (http://www.bioinformatics.ksu.edu/BeetleBase/). PMID:15834150

  4. Mapping DNA polymerase errors by single-molecule sequencing

    PubMed Central

    Lee, David F.; Lu, Jenny; Chang, Seungwoo; Loparo, Joseph J.; Xie, Xiaoliang S.

    2016-01-01

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replication product is tagged with a unique nucleotide sequence before amplification. This allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases. PMID:27185891

  5. Mapping DNA polymerase errors by single-molecule sequencing

    DOE PAGES

    Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...

    2016-05-16

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less

  6. BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data

    PubMed Central

    Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han

    2011-01-01

    Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792

  7. Sequence, molecular properties, and chromosomal mapping of mouse lumican

    NASA Technical Reports Server (NTRS)

    Funderburgh, J. L.; Funderburgh, M. L.; Hevelone, N. D.; Stech, M. E.; Justice, M. J.; Liu, C. Y.; Kao, W. W.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

    1995-01-01

    PURPOSE. Lumican is a major proteoglycan of vertebrate cornea. This study characterizes mouse lumican, its molecular form, cDNA sequence, and chromosomal localization. METHODS. Lumican sequence was determined from cDNA clones selected from a mouse corneal cDNA expression library using a bovine lumican cDNA probe. Tissue expression and size of lumican mRNA were determined using Northern hybridization. Glycosidase digestion followed by Western blot analysis provided characterization of molecular properties of purified mouse corneal lumican. Chromosomal mapping of the lumican gene (Lcn) used Southern hybridization of a panel of genomic DNAs from an interspecific murine backcross. RESULTS. Mouse lumican is a 338-amino acid protein with high-sequence identity to bovine and chicken lumican proteins. The N-terminus of the lumican protein contains consensus sequences for tyrosine sulfation. A 1.9-kb lumican mRNA is present in cornea and several other tissues. Antibody against bovine lumican reacted with recombinant mouse lumican expressed in Escherichia coli and also detected high molecular weight proteoglycans in extracts of mouse cornea. Keratanase digestion of corneal proteoglycans released lumican protein, demonstrating the presence of sulfated keratan sulfate chains on mouse corneal lumican in vivo. The lumican gene (Lcn) was mapped to the distal region of mouse chromosome 10. The Lcn map site is in the region of a previously identified developmental mutant, eye blebs, affecting corneal morphology. CONCLUSIONS. This study demonstrates sulfated keratan sulfate proteoglycan in mouse cornea and describes the tools (antibodies and cDNA) necessary to investigate the functional role of this important corneal molecule using naturally occurring and induced mutants of the murine lumican gene.

  8. Sequence, molecular properties, and chromosomal mapping of mouse lumican

    NASA Technical Reports Server (NTRS)

    Funderburgh, J. L.; Funderburgh, M. L.; Hevelone, N. D.; Stech, M. E.; Justice, M. J.; Liu, C. Y.; Kao, W. W.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

    1995-01-01

    PURPOSE. Lumican is a major proteoglycan of vertebrate cornea. This study characterizes mouse lumican, its molecular form, cDNA sequence, and chromosomal localization. METHODS. Lumican sequence was determined from cDNA clones selected from a mouse corneal cDNA expression library using a bovine lumican cDNA probe. Tissue expression and size of lumican mRNA were determined using Northern hybridization. Glycosidase digestion followed by Western blot analysis provided characterization of molecular properties of purified mouse corneal lumican. Chromosomal mapping of the lumican gene (Lcn) used Southern hybridization of a panel of genomic DNAs from an interspecific murine backcross. RESULTS. Mouse lumican is a 338-amino acid protein with high-sequence identity to bovine and chicken lumican proteins. The N-terminus of the lumican protein contains consensus sequences for tyrosine sulfation. A 1.9-kb lumican mRNA is present in cornea and several other tissues. Antibody against bovine lumican reacted with recombinant mouse lumican expressed in Escherichia coli and also detected high molecular weight proteoglycans in extracts of mouse cornea. Keratanase digestion of corneal proteoglycans released lumican protein, demonstrating the presence of sulfated keratan sulfate chains on mouse corneal lumican in vivo. The lumican gene (Lcn) was mapped to the distal region of mouse chromosome 10. The Lcn map site is in the region of a previously identified developmental mutant, eye blebs, affecting corneal morphology. CONCLUSIONS. This study demonstrates sulfated keratan sulfate proteoglycan in mouse cornea and describes the tools (antibodies and cDNA) necessary to investigate the functional role of this important corneal molecule using naturally occurring and induced mutants of the murine lumican gene.

  9. Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

    PubMed Central

    Glunčić, Matko; Paar, Vladimir

    2013-01-01

    The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes). PMID:22977183

  10. Mapping by sequencing the Pneumocystis genome using the ordering DNA sequences V3 tool.

    PubMed Central

    Xu, Zheng; Lance, Britton; Vargas, Claudia; Arpinar, Budak; Bhandarkar, Suchendra; Kraemer, Eileen; Kochut, Krys J; Miller, John A; Wagner, Jeff R; Weise, Michael J; Wunderlich, John K; Stringer, James; Smulian, George; Cushion, Melanie T; Arnold, Jonathan

    2003-01-01

    A bioinformatics tool called ODS3 has been created for mapping by sequencing. The tool allows the creation of integrated genomic maps from genetic, physical mapping, and sequencing data and permits an integrated genome map to be stored, retrieved, viewed, and queried in a stand-alone capacity, in a client/server relationship with the Fungal Genome Database (FGDB), and as a web-browsing tool for the FGDB. In that ODS3 is programmed in Java, the tool promotes platform independence and supports export of integrated genome-mapping data in the extensible markup language (XML) for data interchange with other genome information systems. The tool ODS3 is used to create an initial integrated genome map of the AIDS-related fungal pathogen, Pneumocystis carinii. Contig dynamics would indicate that this physical map is approximately 50% complete with approximately 200 contigs. A total of 10 putative multigene families were found. Two of these putative families were previously characterized in P. carinii, namely the major surface glycoproteins (MSGs) and HSP70 proteins; three of these putative families (not previously characterized in P. carinii) were found to be similar to families encoding the HSP60 in Schizosaccharomyces pombe, the heat-shock psi protein in S. pombe, and the RNA synthetase family (i.e., MES1) in Saccharomyces cerevisiae. Physical mapping data are consistent with the 16S, 5.8S, and 26S rDNA genes being single copy in P. carinii. No other fungus outside this genus is known to have the rDNA genes in single copy. PMID:12702676

  11. Random-breakage mapping method applied to human DNA sequences

    NASA Technical Reports Server (NTRS)

    Lobrich, M.; Rydberg, B.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)

    1996-01-01

    The random-breakage mapping method [Game et al. (1990) Nucleic Acids Res., 18, 4453-4461] was applied to DNA sequences in human fibroblasts. The methodology involves NotI restriction endonuclease digestion of DNA from irradiated calls, followed by pulsed-field gel electrophoresis, Southern blotting and hybridization with DNA probes recognizing the single copy sequences of interest. The Southern blots show a band for the unbroken restriction fragments and a smear below this band due to radiation induced random breaks. This smear pattern contains two discontinuities in intensity at positions that correspond to the distance of the hybridization site to each end of the restriction fragment. By analyzing the positions of those discontinuities we confirmed the previously mapped position of the probe DXS1327 within a NotI fragment on the X chromosome, thus demonstrating the validity of the technique. We were also able to position the probes D21S1 and D21S15 with respect to the ends of their corresponding NotI fragments on chromosome 21. A third chromosome 21 probe, D21S11, has previously been reported to be close to D21S1, although an uncertainty about a second possible location existed. Since both probes D21S1 and D21S11 hybridized to a single NotI fragment and yielded a similar smear pattern, this uncertainty is removed by the random-breakage mapping method.

  12. Random-breakage mapping method applied to human DNA sequences

    NASA Technical Reports Server (NTRS)

    Lobrich, M.; Rydberg, B.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)

    1996-01-01

    The random-breakage mapping method [Game et al. (1990) Nucleic Acids Res., 18, 4453-4461] was applied to DNA sequences in human fibroblasts. The methodology involves NotI restriction endonuclease digestion of DNA from irradiated calls, followed by pulsed-field gel electrophoresis, Southern blotting and hybridization with DNA probes recognizing the single copy sequences of interest. The Southern blots show a band for the unbroken restriction fragments and a smear below this band due to radiation induced random breaks. This smear pattern contains two discontinuities in intensity at positions that correspond to the distance of the hybridization site to each end of the restriction fragment. By analyzing the positions of those discontinuities we confirmed the previously mapped position of the probe DXS1327 within a NotI fragment on the X chromosome, thus demonstrating the validity of the technique. We were also able to position the probes D21S1 and D21S15 with respect to the ends of their corresponding NotI fragments on chromosome 21. A third chromosome 21 probe, D21S11, has previously been reported to be close to D21S1, although an uncertainty about a second possible location existed. Since both probes D21S1 and D21S11 hybridized to a single NotI fragment and yielded a similar smear pattern, this uncertainty is removed by the random-breakage mapping method.

  13. Rational experiment design for sequencing-based RNA structure mapping.

    PubMed

    Aviran, Sharon; Pachter, Lior

    2014-12-01

    Structure mapping is a classic experimental approach for determining nucleic acid structure that has gained renewed interest in recent years following advances in chemistry, genomics, and informatics. The approach encompasses numerous techniques that use different means to introduce nucleotide-level modifications in a structure-dependent manner. Modifications are assayed via cDNA fragment analysis, using electrophoresis or next-generation sequencing (NGS). The recent advent of NGS has dramatically increased the throughput, multiplexing capacity, and scope of RNA structure mapping assays, thereby opening new possibilities for genome-scale, de novo, and in vivo studies. From an informatics standpoint, NGS is more informative than prior technologies by virtue of delivering direct molecular measurements in the form of digital sequence counts. Motivated by these new capabilities, we introduce a novel model-based in silico approach for quantitative design of large-scale multiplexed NGS structure mapping assays, which takes advantage of the direct and digital nature of NGS readouts. We use it to characterize the relationship between controllable experimental parameters and the precision of mapping measurements. Our results highlight the complexity of these dependencies and shed light on relevant tradeoffs and pitfalls, which can be difficult to discern by intuition alone. We demonstrate our approach by quantitatively assessing the robustness of SHAPE-Seq measurements, obtained by multiplexing SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) chemistry in conjunction with NGS. We then utilize it to elucidate design considerations in advanced genome-wide approaches for probing the transcriptome, which recently obtained in vivo information using dimethyl sulfate (DMS) chemistry.

  14. MAP Estimation of Chin and Cheek Contours in Video Sequences

    NASA Astrophysics Data System (ADS)

    Kampmann, Markus

    2004-12-01

    An algorithm for the estimation of chin and cheek contours in video sequences is proposed. This algorithm exploits a priori knowledge about shape and position of chin and cheek contours in images. Exploiting knowledge about the shape, a parametric 2D model representing chin and cheek contours is introduced. Exploiting knowledge about the position, a MAP estimator is developed taking into account the observed luminance gradient as well as a priori probabilities of chin and cheek contours positions. The proposed algorithm was tested with head and shoulder video sequences (image resolution CIF). In nearly 70% of all investigated video frames, a subjectively error free estimation could be achieved. The 2D estimate error is measured as on average between 2.4 and[InlineEquation not available: see fulltext.].

  15. BS-RNA: An efficient mapping and annotation tool for RNA bisulfite sequencing data.

    PubMed

    Liang, Fang; Hao, Lili; Wang, Jinyue; Shi, Shuo; Xiao, Jingfa; Li, Rujiao

    2016-12-01

    Cytosine methylation is one of the most important RNA epigenetic modifications. With the development of experimental technology, scientists attach more importance to RNA cytosine methylation and find bisulfite sequencing is an effective experimental method for RNA cytosine methylation study. However, there are only a few tools can directly deal with RNA bisulfite sequencing data efficiently. Herein, we developed a specialized tool BS-RNA, which can analyze cytosine methylation of RNA based on bisulfite sequencing data and support both paired-end and single-end sequencing reads from directional bisulfite libraries. For paired-end reads, simply removing the biased positions from the 5' end may result in "dovetailing" reads, where one or both reads seem to extend past the start of the mate read. BS-RNA could map "dovetailing" reads successfully. The annotation result of BS-RNA is exported in BED (.bed) format, including locations, sequence context types (CG/CHG/CHH, H=A,T, or C), reference sequencing depths, cytosine sequencing depths, and methylation levels of covered cytosine sites on both Watson and Crick strands. BS-RNA is an efficient, specialized and highly automated mapping and annotation tool for RNA bisulfite sequencing data. It performs better than the existing program in terms of accuracy and efficiency. BS-RNA is developed by Perl language and the source code of this tool is freely available from the website: http://bs-rna.big.ac.cn. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  16. Mapping and sequencing of structural variation from eight human genomes

    PubMed Central

    Kidd, Jeffrey M.; Cooper, Gregory M.; Donahue, William F.; Hayden, Hillary S.; Sampas, Nick; Graves, Tina; Hansen, Nancy; Teague, Brian; Alkan, Can; Antonacci, Francesca; Haugen, Eric; Zerr, Troy; Yamada, N. Alice; Tsang, Peter; Newman, Tera L.; Tüzün, Eray; Cheng, Ze; Ebling, Heather M.; Tusneem, Nadeem; David, Robert; Gillett, Will; Phelps, Karen A.; Weaver, Molly; Saranga, David; Brand, Adrianne; Tao, Wei; Gustafson, Erik; McKernan, Kevin; Chen, Lin; Malig, Maika; Smith, Joshua D.; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David A.; Peiffer, Daniel A.; Dorschner, Michael; Stamatoyannopoulos, John; Schwartz, David; Nickerson, Deborah A.; Mullikin, James C.; Wilson, Richard K.; Bruhn, Laurakay; Olson, Maynard V.; Kaul, Rajinder; Smith, Douglas R.; Eichler, Evan E.

    2008-01-01

    Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects. PMID:18451855

  17. Mapping the zebrafish brain methylome using reduced representation bisulfite sequencing

    PubMed Central

    Chatterjee, Aniruddha; Ozaki, Yuichi; Stockwell, Peter A; Horsfield, Julia A; Morison, Ian M; Nakagawa, Shinichi

    2013-01-01

    Reduced representation bisulfite sequencing (RRBS) has been used to profile DNA methylation patterns in mammalian genomes such as human, mouse and rat. The methylome of the zebrafish, an important animal model, has not yet been characterized at base-pair resolution using RRBS. Therefore, we evaluated the technique of RRBS in this model organism by generating four single-nucleotide resolution DNA methylomes of adult zebrafish brain. We performed several simulations to show the distribution of fragments and enrichment of CpGs in different in silico reduced representation genomes of zebrafish. Four RRBS brain libraries generated 98 million sequenced reads and had higher frequencies of multiple mapping than equivalent human RRBS libraries. The zebrafish methylome indicates there is higher global DNA methylation in the zebrafish genome compared with its equivalent human methylome. This observation was confirmed by RRBS of zebrafish liver. High coverage CpG dinucleotides are enriched in CpG island shores more than in the CpG island core. We found that 45% of the mapped CpGs reside in gene bodies, and 7% in gene promoters. This analysis provides a roadmap for generating reproducible base-pair level methylomes for zebrafish using RRBS and our results provide the first evidence that RRBS is a suitable technique for global methylation analysis in zebrafish. PMID:23975027

  18. Murine Brca2: Sequence, map position, and expression pattern

    SciTech Connect

    Sharan, S.K.; Bradley, A.

    1997-03-01

    Mutations in the human BRCA2 gene are responsible for about 45% of hereditary early onset breast cancer. Recently, the human BRCA2 gene was cloned, and several germline mutations were identified. Here we describe the cloning of the mouse homologue of BRCA2. The mouse cDNA sequence predicts a 3328-amino-acid Brca2 protein, 90 amino acids shorter than the human protein. The overall identity between the mouse and the human proteins is 59%, while the similarity is 72%. At the nucleotide level the homology is 74%. By comparing the amino acid sequences of the two homologues we have identified five highly conserved novel domains that may be functionally significant. Brca2 has been mapped to the distal end of mouse chromosome 5, a region of the mouse genome that contains other genes that also map to human chromosome 13q12-q13, confirming the conservation of this linkage group between the two species. Expression of Brca2 was detected in midgestation embryos and adult testis, thymus, and ovary. 21 refs., 5 figs.

  19. An Autotetraploid Linkage Map of Rose (Rosa hybrida) Validated Using the Strawberry (Fragaria vesca) Genome Sequence

    PubMed Central

    Gar, Oron; Sargent, Daniel J.; Tsai, Ching-Jung; Pleban, Tzili; Shalev, Gil; Byrne, David H.; Zamir, Dani

    2011-01-01

    Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map. The release of the Fragaria vesca genome, which also belongs to the Rosoideae, allowed us to place 70 rose sequenced markers on the seven strawberry pseudo-chromosomes. Synteny between Rosa and Fragaria was high with an estimated four major translocations and six inversions required to place the 17 non-collinear markers in the same order. Based on a verified linear order of the rose markers, we could further partition each of the parents into its four homologous groups, thus providing an essential framework to aid the sequencing of an autotetraploid genome. PMID:21647382

  20. An autotetraploid linkage map of rose (Rosa hybrida) validated using the strawberry (Fragaria vesca) genome sequence.

    PubMed

    Gar, Oron; Sargent, Daniel J; Tsai, Ching-Jung; Pleban, Tzili; Shalev, Gil; Byrne, David H; Zamir, Dani

    2011-01-01

    Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map.The release of the Fragaria vesca genome, which also belongs to the Rosoideae, allowed us to place 70 rose sequenced markers on the seven strawberry pseudo-chromosomes. Synteny between Rosa and Fragaria was high with an estimated four major translocations and six inversions required to place the 17 non-collinear markers in the same order. Based on a verified linear order of the rose markers, we could further partition each of the parents into its four homologous groups, thus providing an essential framework to aid the sequencing of an autotetraploid genome.

  1. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

    PubMed

    Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

    2016-07-01

    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  2. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map

    SciTech Connect

    Kelleher, Colin; CHIU, Dr. R.; Shin, Dr. H.; Krywinski, Martin; Fjell, Chris; Wilkin, Jennifer; Yin, Tongming; Difazio, Stephen P.

    2007-01-01

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 {+-} 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.

  3. Rapid multipoint linkage analysis of recessive traits in nuclear families, including homozygosity mapping

    SciTech Connect

    Kruglyak, L.; Daly, M.J.; Lander, E.S. |

    1995-02-01

    Homozygosity mapping is a powerful strategy for mapping rare recessive traits in children of consanguineous marriages. Practical applications of this strategy are currently limited by the inability of conventional linkage analysis software to compute, in reasonable time, multipoint LOD scores for pedigrees with inbreeding loops. We have developed a new algorithm for rapid multipoint likelihood calculations in small pedigrees, including those with inbreeding loops. The running time of the algorithm grows, at most, linearly with the number of loci considered simultaneously. The running time is not sensitive to the presence of inbreeding loops, missing genotype information, and highly polymorphic loci. We have incorporated this algorithm into a software package, MAPMAKER/HOMOZ, that allows very rapid multipoint mapping of disease genes in nuclear families, including homozygosity mapping. Multipoint analysis with dozens of markers can be carried out in minutes on a personal workstation. 23 refs., 4 figs., 1 tab.

  4. Microsatellite Discovery from BAC End Sequences and Genetic Mapping to Anchor the Soybean Physical and Genetic Maps

    USDA-ARS?s Scientific Manuscript database

    Physical maps can be an invaluable resource for improving and assessing the quality of a whole-genome sequence assembly. Here we report the identification and screening of 3,290 microsatellites (SSRs) identified from BAC end sequences of clones comprising the physical map of the cultivar Williams 8...

  5. Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome

    PubMed Central

    Faino, Luigi; Seidl, Michael F.; Datema, Erwin; van den Berg, Grardy C. M.; Janssen, Antoine; Wittenberg, Alexander H. J.

    2015-01-01

    ABSTRACT Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio-generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled chromosomes. Furthermore, the addition of optical map data allowed us to produce a gapless and complete V. dahliae genome assembly of the expected eight chromosomes from telomere to telomere. Consequently, we can now study genomic regions that were previously not assembled or poorly assembled, including regions that are populated by repetitive sequences, such as transposons, allowing us to fully appreciate an organism’s biological complexity. Our data show that a combination of PacBio-generated long reads and optical mapping can be used to generate complete and gapless assemblies of fungal genomes. PMID:26286689

  6. A high resolution genetic map anchoring scaffolds of the sequenced watermelon genome

    USDA-ARS?s Scientific Manuscript database

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high-density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of ...

  7. Mapping protein-DNA interactions using ChIP-sequencing.

    PubMed

    Massie, Charles E; Mills, Ian G

    2012-01-01

    Chromatin immunoprecipitation (ChIP) allows enrichment of genomic regions which are associated with specific transcription factors, histone modifications, and indeed any other epitopes which are present on chromatin. The original ChIP methods used site-specific PCR and Southern blotting to confirm which regions of the genome were enriched, on a candidate basis. The combination of ChIP with genomic tiling arrays (ChIP-chip) allowed a more unbiased approach to map ChIP-enriched sites. However, limitations of microarray probe design and probe number have a detrimental impact on the coverage, resolution, sensitivity, and cost of whole-genome tiling microarray sets for higher eukaryotes with large genomes. The combination of ChIP with high-throughput sequencing technology has allowed more comprehensive surveys of genome occupancy, greater resolution, and lower cost for whole genome coverage. Herein, we provide a comparison of high-throughput sequencing platforms and a survey of ChIP-seq analysis tools, discuss experimental design, and describe a detailed ChIP-seq method.Chromatin immunoprecipitation (ChIP) allows enrichment of genomic regions which are associated with specific transcription factors, histone modifications, and indeed any other epitopes which are present on chromatin. The original ChIP methods used site-specific PCR and Southern blotting to confirm which regions of the genome were enriched, on a candidate basis. The combination of ChIP with genomic tiling arrays (ChIP-chip) allowed a more unbiased approach to map ChIP-enriched sites. However, limitations of microarray probe design and probe number have a detrimental impact on the coverage, resolution, sensitivity, and cost of whole-genome tiling microarray sets for higher eukaryotes with large genomes. The combination of ChIP with high-throughput sequencing technology has allowed more comprehensive surveys of genome occupancy, greater resolution, and lower cost for whole genome coverage. Herein, we

  8. Nondestructive, in situ, cellular-scale mapping of elemental abundances including organic carbon in permineralized fossils

    PubMed Central

    Boyce, C. K.; Hazen, R. M.; Knoll, A. H.

    2001-01-01

    The electron microprobe allows elemental abundances to be mapped at the μm scale, but until now high resolution mapping of light elements has been challenging. Modifications of electron microprobe procedure permit fine-scale mapping of carbon. When applied to permineralized fossils, this technique allows simultaneous mapping of organic material, major matrix-forming elements, and trace elements with μm-scale resolution. The resulting data make it possible to test taphonomic hypotheses for the formation of anatomically preserved silicified fossils, including the role of trace elements in the initiation of silica precipitation and in the prevention of organic degradation. The technique allows one to understand the localization of preserved organic matter before undertaking destructive chemical analyses and, because it is nondestructive, offers a potentially important tool for astrobiological investigations of samples returned from Mars or other solar system bodies. PMID:11371632

  9. Nondestructive, in situ, cellular-scale mapping of elemental abundances including organic carbon in permineralized fossils.

    PubMed

    Boyce, C K; Hazen, R M; Knoll, A H

    2001-05-22

    The electron microprobe allows elemental abundances to be mapped at the microm scale, but until now high resolution mapping of light elements has been challenging. Modifications of electron microprobe procedure permit fine-scale mapping of carbon. When applied to permineralized fossils, this technique allows simultaneous mapping of organic material, major matrix-forming elements, and trace elements with microm-scale resolution. The resulting data make it possible to test taphonomic hypotheses for the formation of anatomically preserved silicified fossils, including the role of trace elements in the initiation of silica precipitation and in the prevention of organic degradation. The technique allows one to understand the localization of preserved organic matter before undertaking destructive chemical analyses and, because it is nondestructive, offers a potentially important tool for astrobiological investigations of samples returned from Mars or other solar system bodies.

  10. Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing

    PubMed Central

    2014-01-01

    Background Genetic linkage maps are useful tools for mapping quantitative trait loci (QTL) influencing variation in traits of interest in a population. Genotyping-by-sequencing approaches such as Restriction-site Associated DNA sequencing (RAD-Seq) now enable the rapid discovery and genotyping of genome-wide SNP markers suitable for the development of dense SNP linkage maps, including in non-model organisms such as Atlantic salmon (Salmo salar). This paper describes the development and characterisation of a high density SNP linkage map based on SbfI RAD-Seq SNP markers from two Atlantic salmon reference families. Results Approximately 6,000 SNPs were assigned to 29 linkage groups, utilising markers from known genomic locations as anchors. Linkage maps were then constructed for the four mapping parents separately. Overall map lengths were comparable between male and female parents, but the distribution of the SNPs showed sex-specific patterns with a greater degree of clustering of sire-segregating SNPs to single chromosome regions. The maps were integrated with the Atlantic salmon draft reference genome contigs, allowing the unique assignment of ~4,000 contigs to a linkage group. 112 genome contigs mapped to two or more linkage groups, highlighting regions of putative homeology within the salmon genome. A comparative genomics analysis with the stickleback reference genome identified putative genes closely linked to approximately half of the ordered SNPs and demonstrated blocks of orthology between the Atlantic salmon and stickleback genomes. A subset of 47 RAD-Seq SNPs were successfully validated using a high-throughput genotyping assay, with a correspondence of 97% between the two assays. Conclusions This Atlantic salmon RAD-Seq linkage map is a resource for salmonid genomics research as genotyping-by-sequencing becomes increasingly common. This is aided by the integration of the SbfI RAD-Seq SNPs with existing reference maps and the draft reference genome, as well

  11. Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing.

    PubMed

    Gonen, Serap; Lowe, Natalie R; Cezard, Timothé; Gharbi, Karim; Bishop, Stephen C; Houston, Ross D

    2014-02-27

    Genetic linkage maps are useful tools for mapping quantitative trait loci (QTL) influencing variation in traits of interest in a population. Genotyping-by-sequencing approaches such as Restriction-site Associated DNA sequencing (RAD-Seq) now enable the rapid discovery and genotyping of genome-wide SNP markers suitable for the development of dense SNP linkage maps, including in non-model organisms such as Atlantic salmon (Salmo salar). This paper describes the development and characterisation of a high density SNP linkage map based on SbfI RAD-Seq SNP markers from two Atlantic salmon reference families. Approximately 6,000 SNPs were assigned to 29 linkage groups, utilising markers from known genomic locations as anchors. Linkage maps were then constructed for the four mapping parents separately. Overall map lengths were comparable between male and female parents, but the distribution of the SNPs showed sex-specific patterns with a greater degree of clustering of sire-segregating SNPs to single chromosome regions. The maps were integrated with the Atlantic salmon draft reference genome contigs, allowing the unique assignment of ~4,000 contigs to a linkage group. 112 genome contigs mapped to two or more linkage groups, highlighting regions of putative homeology within the salmon genome. A comparative genomics analysis with the stickleback reference genome identified putative genes closely linked to approximately half of the ordered SNPs and demonstrated blocks of orthology between the Atlantic salmon and stickleback genomes. A subset of 47 RAD-Seq SNPs were successfully validated using a high-throughput genotyping assay, with a correspondence of 97% between the two assays. This Atlantic salmon RAD-Seq linkage map is a resource for salmonid genomics research as genotyping-by-sequencing becomes increasingly common. This is aided by the integration of the SbfI RAD-Seq SNPs with existing reference maps and the draft reference genome, as well as the identification of

  12. A High-Density Genetic Map for Soybean Based on Specific Length Amplified Fragment Sequencing

    PubMed Central

    Zhu, Rongsheng; Xin, Dawei; Liu, Chunyan; Han, Xue; Jiang, Hongwei; Hong, Weiguo; Hu, Guohua; Zheng, Hongkun; Chen, Qingshan

    2014-01-01

    Soybean is an important oil seed crop, but very few high-density genetic maps have been published for this species. Specific length amplified fragment sequencing (SLAF-seq) is a recently developed high-resolution strategy for large scale de novo discovery and genotyping of single nucleotide polymorphisms. SLAF-seq was employed in this study to obtain sufficient markers to construct a high-density genetic map for soybean. In total, 33.10 Gb of data containing 171,001,333 paired-end reads were obtained after preprocessing. The average sequencing depth was 42.29 in the Dongnong594, 56.63 in the Charleston, and 3.92 in each progeny. In total, 164,197 high-quality SLAFs were detected, of which 12,577 SLAFs were polymorphic, and 5,308 of the polymorphic markers met the requirements for use in constructing a genetic map. The final map included 5,308 markers on 20 linkage groups and was 2,655.68 cM in length, with an average distance of 0.5 cM between adjacent markers. To our knowledge, this map has the shortest average distance of adjacent markers for soybean. We report here a high-density genetic map for soybean. The map was constructed using a recombinant inbred line population and the SLAF-seq approach, which allowed the efficient development of a large number of polymorphic markers in a short time. Results of this study will not only provide a platform for gene/quantitative trait loci fine mapping, but will also serve as a reference for molecular breeding of soybean. PMID:25118194

  13. Mapping-by-sequencing in complex polyploid genomes using genic sequence capture: a case study to map yellow rust resistance in hexaploid wheat.

    PubMed

    Gardiner, Laura-Jayne; Bansept-Basler, Pauline; Olohan, Lisa; Joynson, Ryan; Brenchley, Rachel; Hall, Neil; O'Sullivan, Donal M; Hall, Anthony

    2016-08-01

    Previously we extended the utility of mapping-by-sequencing by combining it with sequence capture and mapping sequence data to pseudo-chromosomes that were organized using wheat-Brachypodium synteny. This, with a bespoke haplotyping algorithm, enabled us to map the flowering time locus in the diploid wheat Triticum monococcum L. identifying a set of deleted genes (Gardiner et al., 2014). Here, we develop this combination of gene enrichment and sliding window mapping-by-synteny analysis to map the Yr6 locus for yellow stripe rust resistance in hexaploid wheat. A 110 MB NimbleGen capture probe set was used to enrich and sequence a doubled haploid mapping population of hexaploid wheat derived from an Avalon and Cadenza cross. The Yr6 locus was identified by mapping to the POPSEQ chromosomal pseudomolecules using a bespoke pipeline and algorithm (Chapman et al., 2015). Furthermore the same locus was identified using newly developed pseudo-chromosome sequences as a mapping reference that are based on the genic sequence used for sequence enrichment. The pseudo-chromosomes allow us to demonstrate the application of mapping-by-sequencing to even poorly defined polyploidy genomes where chromosomes are incomplete and sub-genome assemblies are collapsed. This analysis uniquely enabled us to: compare wheat genome annotations; identify the Yr6 locus - defining a smaller genic region than was previously possible; associate the interval with one wheat sub-genome and increase the density of SNP markers associated. Finally, we built the pipeline in iPlant, making it a user-friendly community resource for phenotype mapping. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  14. High-throughput physical map anchoring via BAC-pool sequencing.

    PubMed

    Cviková, Kateřina; Cattonaro, Federica; Alaux, Michael; Stein, Nils; Mayer, Klaus Fx; Doležel, Jaroslav; Bartoš, Jan

    2015-04-11

    Physical maps created from large insert DNA libraries, typically cloned in BAC vector, are valuable resources for map-based cloning and de novo genome sequencing. The maps are most useful if contigs of overlapping DNA clones are anchored to chromosome(s), and ordered along them using molecular markers. Here we present a novel approach for anchoring physical maps, based on sequencing three-dimensional pools of BAC clones from minimum tilling path. We used physical map of wheat chromosome arm 3DS to validate the method with two different DNA sequence datasets. The first comprised 567 genes ordered along the chromosome arm based on syntenic relationship of wheat with the sequenced genomes of Brachypodium, rice and sorghum. The second dataset consisted of 7,136 SNP-containing sequences, which were mapped genetically in Aegilops tauschii, the donor of the wheat D genome. Mapping of sequence reads from individual BAC pools to the first and the second datasets enabled unambiguous anchoring 447 and 311 3DS-specific sequences, respectively, or 758 in total. We demonstrate the utility of the novel approach for BAC contig anchoring based on mass parallel sequencing of three-dimensional pools prepared from minimum tilling path of physical map. The existing genetic markers as well as any other DNA sequence could be mapped to BAC clones in a single in silico experiment. The approach reduces significantly the cost and time needed for anchoring and is applicable to any genomic project involving the construction of anchored physical map.

  15. rnaSeqMap: a Bioconductor package for RNA sequencing data exploration

    PubMed Central

    2011-01-01

    Background The throughput of commercially available sequencers has recently significantly increased. It has reached the point where measuring the RNA expression by the depth of coverage has become feasible even for largest genomes. The development of software tools is constantly following the progress of biological hardware. In particular, as RNA sequencing software can be regarded genome browsers, exon junction tools and statistical tools operating on counts of reads in predefined regions. The library rnaSeqMap, freely available via Bioconductor, is an RNA sequencing software which is independent of any biological hardware platform. It is based upon standard Bioconductor infrastructure for sequencing data and includes several novel features focused on deeper understanding of coverage expression profiles and discovery of novel transcription regions. Results rnaSeqMap is a toolbox for analyses that may be performed with the use of gene annotations or alternatively, in an unsupervised mode, on any genomic region to find novel or non-standard transcripts. The data back-end may be a MySQL database or a set of files in standard BAM format. The processing in R can be run on a machine without any particular hardware requirements, and scales linearly with the number of genomic loci and number of samples analyzed. The main features of rnaSeqMap include coverage operations, discovering irreducible regions of high expression, significance search and splicing analyses with nucleotide granularity. Conclusions This software may be used for a range of applications related to RNA sequencing by building customized analysis pipelines. The applicability and precision is expected to increase in parallel with the progress of the genome coverage in sequencers. PMID:21612622

  16. Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

    PubMed

    Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

    2013-12-01

    Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.

  17. Construction of a map-based reference genome sequence for barley, Hordeum vulgare L.

    PubMed Central

    Beier, Sebastian; Himmelbach, Axel; Colmsee, Christian; Zhang, Xiao-Qi; Barrero, Roberto A.; Zhang, Qisen; Li, Lin; Bayer, Micha; Bolser, Daniel; Taudien, Stefan; Groth, Marco; Felder, Marius; Hastie, Alex; Šimková, Hana; Staňková, Helena; Vrána, Jan; Chan, Saki; Muñoz-Amatriaín, María; Ounit, Rachid; Wanamaker, Steve; Schmutzer, Thomas; Aliyeva-Schnorr, Lala; Grasso, Stefano; Tanskanen, Jaakko; Sampath, Dharanya; Heavens, Darren; Cao, Sujie; Chapman, Brett; Dai, Fei; Han, Yong; Li, Hua; Li, Xuan; Lin, Chongyun; McCooke, John K.; Tan, Cong; Wang, Songbo; Yin, Shuya; Zhou, Gaofeng; Poland, Jesse A.; Bellgard, Matthew I.; Houben, Andreas; Doležel, Jaroslav; Ayling, Sarah; Lonardi, Stefano; Langridge, Peter; Muehlbauer, Gary J.; Kersey, Paul; Clark, Matthew D.; Caccamo, Mario; Schulman, Alan H.; Platzer, Matthias; Close, Timothy J.; Hansson, Mats; Zhang, Guoping; Braumann, Ilka; Li, Chengdao; Waugh, Robbie; Scholz, Uwe; Stein, Nils; Mascher, Martin

    2017-01-01

    Barley (Hordeum vulgare L.) is a cereal grass mainly used as animal fodder and raw material for the malting industry. The map-based reference genome sequence of barley cv. ‘Morex’ was constructed by the International Barley Genome Sequencing Consortium (IBSC) using hierarchical shotgun sequencing. Here, we report the experimental and computational procedures to (i) sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map, (ii) find and validate overlaps between adjacent BACs, (iii) construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs, and (iv) order and orient these BAC clusters along the seven barley chromosomes using positional information provided by dense genetic maps, an optical map and chromosome conformation capture sequencing (Hi-C). Integrative access to these sequence and mapping resources is provided by the barley genome explorer (BARLEX). PMID:28448065

  18. Including non-public data and studies in systematic reviews and systematic maps.

    PubMed

    Haddaway, Neal R; Collins, Alexandra M; Coughlin, Deborah; Kohl, Christian

    2017-02-01

    Systematic reviews and maps should be based on the best available evidence, and reviewers should make all reasonable efforts to source and include potentially relevant studies. However, reviewers may not be able to consider all existing evidence, since some data and studies may not be publicly available. Including non-public studies in reviews provides a valuable opportunity to increase systematic review/map comprehensiveness, potentially mitigating negative impacts of publication bias. Studies may be non-public for many reasons: some may still be in the process of being published (publication can take a long time); some may not be published due to author/publisher restrictions; publication bias may make it difficult to publish non-significant or negative results. Here, we consider what forms these non-public studies may take and the implications of including them in systematic reviews and maps. Reviewers should carefully consider the advantages and disadvantages of including non-public studies, weighing risks of bias against benefits of increased comprehensiveness. As with all systematic reviews and maps, reviewers must be transparent about methods used to obtain data and avoid risks of bias in their synthesis. We make tentative suggestions for reviewers in situations where non-public data may be present in an evidence base.

  19. A sequence-based genetic map of Medicago truncatula and comparison of marker colinearity with M. sativa.

    PubMed Central

    Choi, Hong-Kyu; Kim, Dongjin; Uhm, Taesik; Limpens, Eric; Lim, Hyunju; Mun, Jeong-Hwan; Kalo, Peter; Penmetsa, R Varma; Seres, Andrea; Kulikova, Olga; Roe, Bruce A; Bisseling, Ton; Kiss, Gyorgy B; Cook, Douglas R

    2004-01-01

    A core genetic map of the legume Medicago truncatula has been established by analyzing the segregation of 288 sequence-characterized genetic markers in an F(2) population composed of 93 individuals. These molecular markers correspond to 141 ESTs, 80 BAC end sequence tags, and 67 resistance gene analogs, covering 513 cM. In the case of EST-based markers we used an intron-targeted marker strategy with primers designed to anneal in conserved exon regions and to amplify across intron regions. Polymorphisms were significantly more frequent in intron vs. exon regions, thus providing an efficient mechanism to map transcribed genes. Genetic and cytogenetic analysis produced eight well-resolved linkage groups, which have been previously correlated with eight chromosomes by means of FISH with mapped BAC clones. We anticipated that mapping of conserved coding regions would have utility for comparative mapping among legumes; thus 60 of the EST-based primer pairs were designed to amplify orthologous sequences across a range of legume species. As an initial test of this strategy, we used primers designed against M. truncatula exon sequences to rapidly map genes in M. sativa. The resulting comparative map, which includes 68 bridging markers, indicates that the two Medicago genomes are highly similar and establishes the basis for a Medicago composite map. PMID:15082563

  20. Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant.

    PubMed

    Wu, Pingzhi; Zhou, Changpin; Cheng, Shifeng; Wu, Zhenying; Lu, Wenjia; Han, Jinli; Chen, Yanbo; Chen, Yan; Ni, Peixiang; Wang, Ying; Xu, Xun; Huang, Ying; Song, Chi; Wang, Zhiwen; Shi, Nan; Zhang, Xudong; Fang, Xiaohua; Yang, Qing; Jiang, Huawu; Chen, Yaping; Li, Meiru; Wang, Ying; Chen, Fan; Wang, Jun; Wu, Guojiang

    2015-03-01

    The family Euphorbiaceae includes some of the most efficient biomass accumulators. Whole genome sequencing and the development of genetic maps of these species are important components in molecular breeding and genetic improvement. Here we report the draft genome of physic nut (Jatropha curcas L.), a biodiesel plant. The assembled genome has a total length of 320.5 Mbp and contains 27,172 putative protein-coding genes. We established a linkage map containing 1208 markers and anchored the genome assembly (81.7%) to this map to produce 11 pseudochromosomes. After gene family clustering, 15,268 families were identified, of which 13,887 existed in the castor bean genome. Analysis of the genome highlighted specific expansion and contraction of a number of gene families during the evolution of this species, including the ribosome-inactivating proteins and oil biosynthesis pathway enzymes. The genomic sequence and linkage map provide a valuable resource not only for fundamental and applied research on physic nut but also for evolutionary and comparative genomics analysis, particularly in the Euphorbiaceae.

  1. Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome.

    PubMed

    Faino, Luigi; Seidl, Michael F; Datema, Erwin; van den Berg, Grardy C M; Janssen, Antoine; Wittenberg, Alexander H J; Thomma, Bart P H J

    2015-08-18

    Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio-generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled chromosomes. Furthermore, the addition of optical map data allowed us to produce a gapless and complete V. dahliae genome assembly of the expected eight chromosomes from telomere to telomere. Consequently, we can now study genomic regions that were previously not assembled or poorly assembled, including regions that are populated by repetitive sequences, such as transposons, allowing us to fully appreciate an organism's biological complexity. Our data show that a combination of PacBio-generated long reads and optical mapping can be used to generate complete and gapless assemblies of fungal genomes. Studying whole-genome sequences has become an important aspect of biological research. The advent of next-generation sequencing (NGS) technologies has nowadays brought genomic science within reach of most research laboratories, including those that study nonmodel organisms. However, most genome sequencing initiatives typically yield (highly) fragmented genome assemblies. Nevertheless, considerable relevant information related to genome structure and evolution is likely hidden in those nonassembled regions. Here, we investigated a diverse set of strategies to obtain

  2. Integration of the Rat Recombination and EST Maps in the Rat Genomic Sequence and Comparative Mapping Analysis With the Mouse Genome

    PubMed Central

    Wilder, Steven P.; Bihoreau, Marie-Thérèse; Argoud, Karène; Watanabe, Takeshi K.; Lathrop, Mark; Gauguier, Dominique

    2004-01-01

    Inbred strains of the laboratory rat are widely used for identifying genetic regions involved in the control of complex quantitative phenotypes of biomedical importance. The draft genomic sequence of the rat now provides essential information for annotating rat quantitative trait locus (QTL) maps. Following the survey of unique rat microsatellite (11,585 including 1648 new markers) and EST (10,067) markers currently available, we have incorporated a selection of 7952 rat EST sequences in an improved version of the integrated linkage-radiation hybrid map of the rat containing 2058 microsatellite markers which provided over 10,000 potential anchor points between rat QTL and the genomic sequence of the rat. A total of 996 genetic positions were resolved (avg. spacing 1.77 cM) in a single large intercross and anchored in the rat genomic sequence (avg. spacing 1.62 Mb). Comparative genome maps between rat and mouse were constructed by successful computational alignment of 6108 mapped rat ESTs in the mouse genome. The integration of rat linkage maps in the draft genomic sequence of the rat and that of other species represents an essential step for translating rat QTL intervals into human chromosomal targets. PMID:15060020

  3. Chromosome mapping of repetitive sequences in four Serrasalmidae species (Characiformes)

    PubMed Central

    Ribeiro, Leila Braga; Matoso, Daniele Aparecida; Feldberg, Eliana

    2014-01-01

    The Serrasalmidae family is composed of a number of commercially interesting species, mainly in the Amazon region where most of these fishes occur. In the present study, we investigated the genomic organization of the 18S and 5S rDNA and telomeric sequences in mitotic chromosomes of four species from the basal clade of the Serrasalmidae family: Colossoma macropomum, Mylossoma aureum, M. duriventre, and Piaractus mesopotamicus, in order to understand the chromosomal evolution in the family. All the species studied had diploid numbers 2n = 54 and exclusively biarmed chromosomes, but variations of the karyotypic formulas were observed. C-banding resulted in similar patterns among the analyzed species, with heterochromatic blocks mainly present in centromeric regions. The 18S rDNA mapping of C. macropomum and P. mesopotamicus revealed multiple sites of this gene; 5S rDNA sites were detected in two chromosome pairs in all species, although not all of them were homeologs. Hybridization with a telomeric probe revealed signals in the terminal portions of chromosomes in all the species and an interstitial signal was observed in one pair of C. macropomum. PMID:24688290

  4. Periodic sequences of simple maps can support chaos

    NASA Astrophysics Data System (ADS)

    Cánovas, Jose S.

    2017-01-01

    In this paper, we explore the Parrondo's paradox when several dynamically simple maps are combined in a periodic way, producing chaotic dynamics. We show that the paradox is not commutative, that is, it depends on the way that the maps are iterated. We also see that the paradox happens more frequently when the number of maps that we iterate increases.

  5. Modeling RNA Secondary Structure with Sequence Comparison and Experimental Mapping Data.

    PubMed

    Tan, Zhen; Sharma, Gaurav; Mathews, David H

    2017-07-25

    Secondary structure prediction is an important problem in RNA bioinformatics because knowledge of structure is critical to understanding the functions of RNA sequences. Significant improvements in prediction accuracy have recently been demonstrated though the incorporation of experimentally obtained structural information, for instance using selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) mapping. However, such mapping data is currently available only for a limited number of RNA sequences. In this article, we present a method for extending the benefit of experimental mapping data in secondary structure prediction to homologous sequences. Specifically, we propose a method for integrating experimental mapping data into a comparative sequence analysis algorithm for secondary structure prediction of multiple homologs, whereby the mapping data benefits not only the prediction for the specific sequence that was mapped but also other homologs. The proposed method is realized by modifying the TurboFold II algorithm for prediction of RNA secondary structures to utilize basepairing probabilities guided by SHAPE experimental data when such data are available. The SHAPE-mapping-guided basepairing probabilities are obtained using the RSample method. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone (TurboFold II). The updated version of TurboFold II is freely available as part of the RNAstructure software package. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  6. Integrated and sequence-ordered BAC- and YAC-based physical maps for the rat genome.

    PubMed

    Krzywinski, Martin; Wallis, John; Gösele, Claudia; Bosdet, Ian; Chiu, Readman; Graves, Tina; Hummel, Oliver; Layman, Dan; Mathewson, Carrie; Wye, Natasja; Zhu, Baoli; Albracht, Derek; Asano, Jennifer; Barber, Sarah; Brown-John, Mabel; Chan, Susanna; Chand, Steve; Cloutier, Alison; Davito, Jonathon; Fjell, Chris; Gaige, Tony; Ganten, Detlev; Girn, Noreen; Guggenheimer, Kurtis; Himmelbauer, Heinz; Kreitler, Thomas; Leach, Stephen; Lee, Darlene; Lehrach, Hans; Mayo, Michael; Mead, Kelly; Olson, Teika; Pandoh, Pawan; Prabhu, Anna-Liisa; Shin, Heesun; Tänzer, Simone; Thompson, Jason; Tsai, Miranda; Walker, Jason; Yang, George; Sekhon, Mandeep; Hillier, LaDeana; Zimdahl, Heike; Marziali, Andre; Osoegawa, Kazutoyo; Zhao, Shaying; Siddiqui, Asim; de Jong, Pieter J; Warren, Wes; Mardis, Elaine; McPherson, John D; Wilson, Richard; Hübner, Norbert; Jones, Steven; Marra, Marco; Schein, Jacqueline

    2004-04-01

    As part of the effort to sequence the genome of Rattus norvegicus, we constructed a physical map comprised of fingerprinted bacterial artificial chromosome (BAC) clones from the CHORI-230 BAC library. These BAC clones provide approximately 13-fold redundant coverage of the genome and have been assembled into 376 fingerprint contigs. A yeast artificial chromosome (YAC) map was also constructed and aligned with the BAC map via fingerprinted BAC and P1 artificial chromosome clones (PACs) sharing interspersed repetitive sequence markers with the YAC-based physical map. We have annotated 95% of the fingerprint map clones in contigs with coordinates on the version 3.1 rat genome sequence assembly, using BAC-end sequences and in silico mapping methods. These coordinates have allowed anchoring 358 of the 376 fingerprint map contigs onto the sequence assembly. Of these, 324 contigs are anchored to rat genome sequences localized to chromosomes, and 34 contigs are anchored to unlocalized portions of the rat sequence assembly. The remaining 18 contigs, containing 54 clones, still require placement. The fingerprint map is a high-resolution integrative data resource that provides genome-ordered associations among BAC, YAC, and PAC clones and the assembled sequence of the rat genome.

  7. HetMappsS: Heterozygous mapping strategy for high resolution Genotyping-by-Sequencing Markers

    USDA-ARS?s Scientific Manuscript database

    Reduced representation genotyping approaches, such as genotyping-by-sequencing (GBS), provide opportunities to generate high-resolution genetic maps at a low per-sample cost. However, missing data and non-uniform sequence coverage can complicate map creation in highly heterozygous species. To facili...

  8. Sequencing the Pig Genome Using a Mapped BAC by BAC Approach

    USDA-ARS?s Scientific Manuscript database

    We have generated a highly contiguous physical map covering >98% of the pig genome in just 176 contigs. The map is localised to the genome through integration with the UIUC RH map as well BAC end sequence alignments to the human genome. Over 265k HindIII restriction digest fingerprints totalling 1...

  9. The widely used Nicotiana benthamiana 16c line has an unusual T-DNA integration pattern including a transposon sequence.

    PubMed

    Philips, Joshua G; Naim, Fatima; Lorenc, Michał T; Dudley, Kevin J; Hellens, Roger P; Waterhouse, Peter M

    2017-01-01

    Nicotiana benthamiana is employed around the world for many types of research and one transgenic line has been used more extensively than any other. This line, 16c, expresses the Aequorea victoria green fluorescent protein (GFP), highly and constitutively, and has been a major resource for visualising the mobility and actions of small RNAs. Insights into the mechanisms studied at a molecular level in N. benthamiana 16c are likely to be deeper and more accurate with a greater knowledge of the GFP gene integration site. Therefore, using next generation sequencing, genome mapping and local alignment, we identified the location and characteristics of the integrated T-DNA. As suggested from previous molecular hybridisation and inheritance data, the transgenic line contains a single GFP-expressing locus. However, the GFP coding sequence differs from that originally reported. Furthermore, a 3.2 kb portion of a transposon, appears to have co-integrated with the T-DNA. The location of the integration mapped to a region of the genome represented by Nbv0.5scaffold4905 in the www.benthgenome.com assembly, and with less integrity to Niben101Scf03641 in the www.solgenomics.net assembly. The transposon is not endogenous to laboratory strains of N. benthamiana or Agrobacterium tumefaciens strain GV3101 (MP90), which was reportedly used in the generation of line 16c. However, it is present in the popular LBA4404 strain. The integrated transposon sequence includes its 5' terminal repeat and a transposase gene, and is immediately adjacent to the GFP gene. This unexpected genetic arrangement may contribute to the characteristics that have made the 16c line such a popular research tool and alerts researchers, taking transgenic plants to commercial release, to be aware of this genomic hitchhiker.

  10. The widely used Nicotiana benthamiana 16c line has an unusual T-DNA integration pattern including a transposon sequence

    PubMed Central

    Lorenc, Michał T.; Dudley, Kevin J.; Hellens, Roger P.

    2017-01-01

    Nicotiana benthamiana is employed around the world for many types of research and one transgenic line has been used more extensively than any other. This line, 16c, expresses the Aequorea victoria green fluorescent protein (GFP), highly and constitutively, and has been a major resource for visualising the mobility and actions of small RNAs. Insights into the mechanisms studied at a molecular level in N. benthamiana 16c are likely to be deeper and more accurate with a greater knowledge of the GFP gene integration site. Therefore, using next generation sequencing, genome mapping and local alignment, we identified the location and characteristics of the integrated T-DNA. As suggested from previous molecular hybridisation and inheritance data, the transgenic line contains a single GFP-expressing locus. However, the GFP coding sequence differs from that originally reported. Furthermore, a 3.2 kb portion of a transposon, appears to have co-integrated with the T-DNA. The location of the integration mapped to a region of the genome represented by Nbv0.5scaffold4905 in the www.benthgenome.com assembly, and with less integrity to Niben101Scf03641 in the www.solgenomics.net assembly. The transposon is not endogenous to laboratory strains of N. benthamiana or Agrobacterium tumefaciens strain GV3101 (MP90), which was reportedly used in the generation of line 16c. However, it is present in the popular LBA4404 strain. The integrated transposon sequence includes its 5’ terminal repeat and a transposase gene, and is immediately adjacent to the GFP gene. This unexpected genetic arrangement may contribute to the characteristics that have made the 16c line such a popular research tool and alerts researchers, taking transgenic plants to commercial release, to be aware of this genomic hitchhiker. PMID:28231340

  11. Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus

    PubMed Central

    Raubeson, Linda A; Peery, Rhiannon; Chumley, Timothy W; Dziubek, Chris; Fourcade, H Matthew; Boore, Jeffrey L; Jansen, Robert K

    2007-01-01

    Background The number of completely sequenced plastid genomes available is growing rapidly. This array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is often useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (a basal eudicot). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as protein coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition. Results The Nuphar [GenBank:NC_008788] and Ranunculus [GenBank:NC_008796] plastid genomes share characteristics of gene content and organization with many other chloroplast genomes. Like other plastid genomes, these genomes are A+T-rich, except for rRNA and tRNA genes. Detailed comparisons of Nuphar with Nymphaea, another Nymphaeaceae, show that more than two-thirds of these genomes exhibit at least 95% sequence identity and that most SSRs are shared. In broader comparisons, SSRs vary among genomes in terms of abundance and length and most contain repeat motifs based on A and T nucleotides. Conclusion SSR and SDR abundance varies by genome and, for SSRs, is proportional to genome size. Long SDRs are rare in the genomes assessed. SSRs occur less frequently than predicted and, although the majority of the repeat motifs do include A and T nucleotides, the A+T bias in SSRs is less than that predicted from the underlying genomic nucleotide composition. In codon usage third positions show an A+T bias, however variation in codon usage does not correlate with differences in A+T-richness. Thus, although plastome nucleotide composition shows "A+T richness", an A+T bias is not

  12. Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.

    PubMed

    Raubeson, Linda A; Peery, Rhiannon; Chumley, Timothy W; Dziubek, Chris; Fourcade, H Matthew; Boore, Jeffrey L; Jansen, Robert K

    2007-06-15

    The number of completely sequenced plastid genomes available is growing rapidly. This array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is often useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (a basal eudicot). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as protein coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition. The Nuphar [GenBank:NC_008788] and Ranunculus [GenBank:NC_008796] plastid genomes share characteristics of gene content and organization with many other chloroplast genomes. Like other plastid genomes, these genomes are A+T-rich, except for rRNA and tRNA genes. Detailed comparisons of Nuphar with Nymphaea, another Nymphaeaceae, show that more than two-thirds of these genomes exhibit at least 95% sequence identity and that most SSRs are shared. In broader comparisons, SSRs vary among genomes in terms of abundance and length and most contain repeat motifs based on A and T nucleotides. SSR and SDR abundance varies by genome and, for SSRs, is proportional to genome size. Long SDRs are rare in the genomes assessed. SSRs occur less frequently than predicted and, although the majority of the repeat motifs do include A and T nucleotides, the A+T bias in SSRs is less than that predicted from the underlying genomic nucleotide composition. In codon usage third positions show an A+T bias, however variation in codon usage does not correlate with differences in A+T-richness. Thus, although plastome nucleotide composition shows "A+T richness", an A+T bias is not apparent upon more in

  13. Mapping Nucleotide Sequences that Encode Complex Binary Disease Traits with HapMap

    PubMed Central

    Cui, Yuehua; Fu, Wenjiang; Sun, Kelian; Romero, Roberto; Wu, Rongling

    2007-01-01

    Detecting the patterns of DNA sequence variants across the human genome is a crucial step for unraveling the genetic basis of complex human diseases. The human HapMap constructed by single nucleotide polymorphisms (SNPs) provides efficient sequence variation information that can speed up the discovery of genes related to common diseases. In this article, we present a generalized linear model for identifying specific nucleotide variants that encode complex human diseases. A novel approach is derived to group haplotypes to form composite diplotypes, which largely reduces the model degrees of freedom for an association test and hence increases the power when multiple SNP markers are involved. An efficient two-stage estimation procedure based on the expectation-maximization (EM) algorithm is derived to estimate parameters. Non-genetic environmental or clinical risk factors can also be fitted into the model. Computer simulations show that our model has reasonable power and type I error rate with appropriate sample size. It is also suggested through simulations that a balanced design with approximately equal number of cases and controls should be preferred to maintain small estimation bias and reasonable testing power. To illustrate the utility, we apply the method to a genetic association study of large for gestational age (LGA) neonates. The model provides a powerful tool for elucidating the genetic basis of complex binary diseases. PMID:19384427

  14. Mapping membrane activity in undiscovered peptide sequence space using machine learning.

    PubMed

    Lee, Ernest Y; Fulan, Benjamin M; Wong, Gerard C L; Ferguson, Andrew L

    2016-11-29

    There are some ∼1,100 known antimicrobial peptides (AMPs), which permeabilize microbial membranes but have diverse sequences. Here, we develop a support vector machine (SVM)-based classifier to investigate ⍺-helical AMPs and the interrelated nature of their functional commonality and sequence homology. SVM is used to search the undiscovered peptide sequence space and identify Pareto-optimal candidates that simultaneously maximize the distance σ from the SVM hyperplane (thus maximize its "antimicrobialness") and its ⍺-helicity, but minimize mutational distance to known AMPs. By calibrating SVM machine learning results with killing assays and small-angle X-ray scattering (SAXS), we find that the SVM metric σ correlates not with a peptide's minimum inhibitory concentration (MIC), but rather its ability to generate negative Gaussian membrane curvature. This surprising result provides a topological basis for membrane activity common to AMPs. Moreover, we highlight an important distinction between the maximal recognizability of a sequence to a trained AMP classifier (its ability to generate membrane curvature) and its maximal antimicrobial efficacy. As mutational distances are increased from known AMPs, we find AMP-like sequences that are increasingly difficult for nature to discover via simple mutation. Using the sequence map as a discovery tool, we find a unexpectedly diverse taxonomy of sequences that are just as membrane-active as known AMPs, but with a broad range of primary functions distinct from AMP functions, including endogenous neuropeptides, viral fusion proteins, topogenic peptides, and amyloids. The SVM classifier is useful as a general detector of membrane activity in peptide sequences.

  15. A sequencing-based linkage map of cucumber

    USDA-ARS?s Scientific Manuscript database

    Genetic maps are important tools for molecular breeding, gene cloning, and study of meiotic recombination. In cucumber (Cucumis sativus L.), the marker density, resolution and genome coverage of previously developed genetic maps using PCR-based molecular markers are relatively low. In this study we ...

  16. A mapping of an ensemble of mitochondrial sequences for various organisms into 3D space based on the word composition.

    PubMed

    Aita, Takuyo; Nishigaki, Koichi

    2012-11-01

    To visualize a bird's-eye view of an ensemble of mitochondrial genome sequences for various species, we recently developed a novel method of mapping a biological sequence ensemble into Three-Dimensional (3D) vector space. First, we represented a biological sequence of a species s by a word-composition vector x(s), where its length [absolute value]x(s)[absolute value] represents the sequence length, and its unit vector x(s)/[absolute value]x(s)[absolute value] represents the relative composition of the K-tuple words through the sequence and the size of the dimension, N=4(K), is the number of all possible words with the length of K. Second, we mapped the vector x(s) to the 3D position vector y(s), based on the two following simple principles: (1) [absolute value]y(s)[absolute value]=[absolute value]x(s)[absolute value] and (2) the angle between y(s) and y(t) maximally correlates with the angle between x(s) and x(t). The mitochondrial genome sequences for 311 species, including 177 Animalia, 85 Fungi and 49 Green plants, were mapped into 3D space by using K=7. The mapping was successful because the angles between vectors before and after the mapping highly correlated with each other (correlation coefficients were 0.92-0.97). Interestingly, the Animalia kingdom is distributed along a single arc belt (just like the Milky Way on a Celestial Globe), and the Fungi and Green plant kingdoms are distributed in a similar arc belt. These two arc belts intersect at their respective middle regions and form a cross structure just like a jet aircraft fuselage and its wings. This new mapping method will allow researchers to intuitively interpret the visual information presented in the maps in a highly effective manner.

  17. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats.

    PubMed

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H; Koller, Daniel L; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A; Worley, Kim C; Muzny, Donna M; Gibbs, Richard A; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J; Keane, Thomas; Atanur, Santosh S; Aitman, Tim J; Flicek, Paul; Malinauskas, Tomas; Jones, E Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-07-01

    Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating new genes in models of anxiety, heart disease and multiple sclerosis. The relationship between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci, a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show that the extent and spatial pattern of variation in inbred rats differ substantially from those of inbred mice and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species.

  18. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    PubMed Central

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-01-01

    Genetic mapping on fully sequenced individuals is transforming our understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating novel genes in models of anxiety, heart disease and multiple sclerosis. The relation between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show the extent and spatial pattern of variation in inbred rats differ significantly from those of inbred mice, and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. PMID:23708188

  19. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype–phenotype maps

    PubMed Central

    Greenbury, S. F.; Ahnert, S. E.

    2015-01-01

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype–phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into ‘constrained' and ‘unconstrained' sequences, in the broadest possible sense. As ‘constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. ‘Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with ‘coding' and ‘non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps. PMID:26609063

  20. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype-phenotype maps.

    PubMed

    Greenbury, S F; Ahnert, S E

    2015-12-06

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype-phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into 'constrained' and 'unconstrained' sequences, in the broadest possible sense. As 'constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. 'Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with 'coding' and 'non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps.

  1. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  2. The evolution of morbilliviruses: a comparison of nucleocapsid gene sequences including a porpoise morbillivirus.

    PubMed

    Rima, B K; Wishaupt, R G; Welsh, M J; Earle, J A

    1995-05-01

    Sequence data for the nucleocapsid protein (N) gene of the porpoise morbillivirus including the very conserved middle section of the protein and the hypervariable C terminus are reported. Analysis of dissimilarity indices based on an alignment of the N proteins of various morbilliviruses identifies a variable region of the N protein from amino acids residues 121 to 145 and a hypervariable part from amino acids 400 to 517. This type of analysis can be usefully applied when protein sequences of five or more morbillivirus species are available. Regions of variability between species identified by this index also represent regions of variation within one species e.g. measles virus (MV). Hence, comparative analysis of different morbilliviruses provides an insight into the potentially variable parts of viral proteins. From the great and unexplained nucleotide sequence conservation observed within MV, it would appear that the various morbilliviruses have diverged from each other a very long time ago. However, the data do not yet allow us to estimate the time span of these divergences. The relatedness and the number of different morbillivirus species provides a unique database for study of the evolution of RNA viruses.

  3. Theoretical description of depth pulse sequences, on and off resonance, including improvements and extensions thereof.

    PubMed

    Bendall, M R; Pegg, D T

    1985-04-01

    A general mathematical description of depth pulse sequences in terms of rotation matrices permits a single matrix, known as a cycle matrix, to be written down for each phase-cycled pulse in the overall sequence, such that the result for the total phase-cycled sequence is the product of the individual cycle matrices. It is straightforward to include the effect of the tilted rf axis off resonance and obtain exact solutions. The two types of phase-cycled pulse used in a depth pulse scheme are 2 theta [+/- x] and 2 theta [+/- x, +/- y] and for the general off-resonance case, four of the off-diagonal elements in the 2 theta [+/- x] cycle matrix, and all of the off-diagonal elements in the 2 theta [+/- x, +/- y] cycle matrix, are zero. These simplifications enable important improvements of depth pulse schemes for the elimination of high-flux signals, the reduction of signals from sample regions experiencing pulse angles differing from 90 degrees, and the avoidance of deleterious off-resonance effects such as the production of dispersion signals. In all cases, the dependence of signal intensity off resonance can be easily and exactly calculated. There are important applications in in vivo spectroscopy.

  4. Old can be new again: HAPPY whole genome sequencing, mapping and assembly.

    PubMed

    Jiang, Zhihua; Rokhsar, Daniel S; Harland, Richard M

    2009-01-01

    During the last three decades, both genome mapping and sequencing methods have advanced significantly to provide a foundation for scientists to understand genome structures and functions in many species. Generally speaking, genome mapping relies on genome sequencing to provide basic materials, such as DNA probes and markers for their localizations, thus constructing the maps. On the other hand, genome sequencing often requires a high-resolution map as a skeleton for whole genome assembly. However, both genome mapping and sequencing have never come together in one pipeline. After reviewing mapping and next-generation sequencing methods, we would like to share our thoughts with the genome community on how to combine the HAPPY mapping technique with the new-generation sequencing, thus integrating two systems into one pipeline, called HAPPY pipeline. The pipeline starts with preparation of a HAPPY panel, followed by multiple displacement amplification for producing a relatively large quantity of DNA. Instead of conventional marker genotyping, the amplified panel DNA samples are subject to new-generation sequencing with barcode method, which allows us to determine the presence/absence of a sequence contig as a traditional marker in the HAPPY panel. Statistical analysis will then be performed to infer how close or how far away from each other these contigs are within a genome and order the whole genome sequence assembly as well. We believe that such a universal approach will play an important role in genome sequencing, mapping, and assembly of many species; thus advancing genome science and its applications in biomedicine and agriculture.

  5. DNA sequence analyses of blended herbal products including synthetic cannabinoids as designer drugs.

    PubMed

    Ogata, Jun; Uchiyama, Nahoko; Kikura-Hanajiri, Ruri; Goda, Yukihiro

    2013-04-10

    In recent years, various herbal products adulterated with synthetic cannabinoids have been distributed worldwide via the Internet. These herbal products are mostly sold as incense, and advertised as not for human consumption. Although their labels indicate that they contain mixtures of several potentially psychoactive plants, and numerous studies have reported that they contain a variety of synthetic cannabinoids, their exact botanical contents are not always clear. In this study, we investigated the origins of botanical materials in 62 Spice-like herbal products distributed on the illegal drug market in Japan, by DNA sequence analyses and BLAST searches. The nucleotide sequences of four regions were analyzed to identify the origins of each plant species in the herbal mixtures. The sequences of "Damiana" (Turnera diffusa) and Lamiaceae herbs (Mellissa, Mentha and Thymus) were frequently detected in a number of products. However, the sequences of other plant species indicated on the packaging labels were not detected. In a few products, DNA fragments of potent psychotropic plants were found, including marijuana (Cannabis sativa), "Diviner's Sage" (Salvia divinorum) and "Kratom" (Mitragyna speciosa). Their active constituents were also confirmed using gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS), although these plant names were never indicated on the labels. Most plant species identified in the products were different from the plants indicated on the labels. The plant materials would be used mainly as diluents for the psychoactive synthetic compounds, because no reliable psychoactive effects have been reported for most of the identified plants, with the exception of the psychotropic plants named above.

  6. Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.

    PubMed

    Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying

    2013-05-01

    Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.

  7. Analysis of the primary sequence and microtubule-binding region of the Drosophila 205K MAP

    PubMed Central

    1990-01-01

    We have sequenced cDNA clones encoding the Drosophila 205K microtubule- associated protein (MAP), a protein that may be the species specific homologue of mammalian MAP4. The peptide sequence deduced from the longest open-reading frame reveals a hydrophilic protein, which has basic and acidic regions that are similar in organization to mammalian MAP2. Using truncated forms of the 205K MAP, a 232-amino acid region could be defined that is necessary for microtubule binding. The amino acid sequence of this region shares no similarity with the binding motif of MAP2 or tau. We also analyzed several embryonic cDNA clones, which show the existence of differentially spliced mRNAs. Finally, we identified several potential protein kinase target sequences. One of these is distal to the microtubule-binding site and fits the phosphorylation consensus sequence of proteins phosphorylated by the mitosis specific protein kinase cdc2. Our data suggest that the 205K MAP uses a microtubule-binding motif unlike that found in other MAPs, and also raise the possibility that the activities of the 205K MAP may be regulated by alternative splicing and phosphorylation. PMID:1703540

  8. Physical mapping of complex genomes by sampled sequencing: A theoretical analysis

    SciTech Connect

    Kupfer, K.; Smith, M.; Quackenbush, J.

    1995-05-01

    A method for high-throughput, high-resolution physical mapping of complex genomes and human chromosomes called Genomic Sequence Sampling (GSS) has recently been proposed. This mapping strategy employs high-density cosmid contig assembly over 200-kb to 1-Mb regions of the target genome coupled with DNA sequencing of the cosmid ends. The relative order and spacing of the sequence fragments is determined from the template contig, resulting in a physical map of 1-to 5-kb resolution that contains a substantial portion of the entire sequence at one-pass accuracy. The purpose of this paper is to determine the theoretical parameters for GSS mapping, to evaluate the effectiveness of the contig-building strategy, and to calculate the expected fraction of the target genome that can be recovered as mapped sequence. A novel aspect of the cosmid fingerprinting and contig-building strategy involves determining the orientation of the genomic inserts relative to the cloning vectors, so that the sampled sequence fragments can be mapped with high resolution. The algorithm is based upon complete restriction enzyme digestion, contig assembly by matching fragments, and end-orientation of individual cosmids by determining the best consistent fit of the labeled cosmid end fragments in the consensus restriction map. 32 refs., 7 figs.

  9. Mapping and sequencing DNA using nanopores and nanodetectors.

    PubMed

    Thompson, John F; Oliver, John S

    2012-12-01

    Even prior to the introduction of capillary DNA sequencers, nanopores were discussed as a low-cost, high-throughput substrate for sequencing. Since then, other next-generation sequencing technologies have been developed and achieved widespread use, but nanopores have lagged behind due to difficulties in generating usable sequence data. The practical and theoretical issues of translocation speed and signal detection encountered when attempting to sequence DNA with nanopores are discussed. Various methods that different laboratories have used to overcome difficulties in biologically based and solid-state nanopores are also presented. Different approaches designed to circumvent the overriding issue of detecting signals from individual bases in a time-resolved manner in nanopores are described. For example, genomic positional sequencing utilizes hybridization of short oligonucleotide probes to very long DNA templates and then detects these probes by variations in current blockade in solid-state nanodetectors. The positions of the probes relative to each other and relative to the ends of the DNA are determined by measuring the time between current blockade peaks. By assembling many such measurements, it is possible to overcome the problems encountered when attempting to sequence DNA at high speed in nanopores, providing the potential for true de novo sequencing of large genomes on a routine basis. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly

    PubMed Central

    Lam, Ernest T; Hastie, Alex; Lin, Chin; Ehrlich, Dean; Das, Somes K; Austin, Michael D; Deshpande, Paru; Cao, Han; Nagarajan, Niranjan; Xiao, Ming; Kwok, Pui-Yan

    2013-01-01

    We describe genome mapping on nanochannel arrays. In this approach, specific sequence motifs in single DNA molecules are fluorescently labeled, and the DNA molecules are uniformly stretched in thousands of silicon channels on a nanofluidic device. Fluorescence imaging allows the construction of maps of the physical distances between occurrences of the sequence motifs. We demonstrate the analysis, individually and as mixtures, of 95 bacterial artificial chromosome (BAC) clones that cover the 4.7-Mb human major histocompatibility complex region. We obtain accurate, haplotype-resolved, sequence motif maps hundreds of kilobases in length, resulting in a median coverage of 114× for the BACs. The final sequence motif map assembly contains three contigs. With an average distance of 9 kb between labels, we detect 22 haplotype differences. We also use the sequence motif maps to provide scaffolds for de novo assembly of sequencing data. Nanochannel genome mapping should facilitate de novo assembly of sequencing reads from complex regions in diploid organisms, haplotype and structural variation analysis and comparative genomics. PMID:22797562

  11. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation

    SciTech Connect

    Kelleher, Colin; Chiu, Readman; Shin, Heesun; Bosdet, Ian; Krywinski, Martin; Fjell, Chris; Wilkin, Jennifer; Yin, Tongming; DiFazio, Stephen P; Ali, Johar; Asano, Jennifer; Chan, Susanna; Cloutier, Alison; Girn, Noreen; Leach, Stephen; Lee, Darlene; Mathewson, Carrie; Olson, Teika; O'Connor, Katie; Prabhu, Anna-Liisa; Smailus, Duane; Stott, Jeffery; Tsai, Miranda; Wye, Natasaja; Yang, George; Zhuang, Jun; Holt, Robert A.; Putnam, Nicholas; Vrebalov, Julia; Giovannoni, James; Grimwood, Jane; Schmutz, Jeremy; Rokhsar, Daniel; Jones, Steven; Marra, Marco; Tuskan, Gerald A; Bohlmann, J.; Ellis, Brian; Ritland, Kermit; Douglas, Carl; Schein, Jacqueline

    2007-01-01

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the first maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2,802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the 485+10 Mb Populus genome, as estimated from the genome sequence assembly. BAC ends were sequenced to aid in long-range assembly of whole genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat (SSR)-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. 2,411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa v1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.

  12. Construction of an integrated high density simple sequence repeat linkage map in cultivated strawberry (Fragaria × ananassa) and its applicability.

    PubMed

    Isobe, Sachiko N; Hirakawa, Hideki; Sato, Shusei; Maeda, Fumi; Ishikawa, Masami; Mori, Toshiki; Yamamoto, Yuko; Shirasawa, Kenta; Kimura, Mitsuhiro; Fukami, Masanobu; Hashizume, Fujio; Tsuji, Tomoko; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Tsuruoka, Hisano; Minami, Chiharu; Takahashi, Chika; Wada, Tsuyuko; Ono, Akiko; Kawashima, Kumiko; Nakazaki, Naomi; Kishida, Yoshie; Kohara, Mitsuyo; Nakayama, Shinobu; Yamada, Manabu; Fujishiro, Tsunakazu; Watanabe, Akiko; Tabata, Satoshi

    2013-02-01

    The cultivated strawberry (Fragaria × ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family whose genomic architecture is still controversial. Several recent studies support the AAA'A'BBB'B' model, but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this difficulty and to assist genome-wide analysis of F. × ananassa, we constructed an integrated linkage map by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR markers] derived from F. vesca ESTs, 603 markers (F. × ananassa EST-derived SSR markers) from F. × ananassa ESTs, and 125 markers (F. × ananassa transcriptome-derived SSR markers) from F. × ananassa transcripts. Along with the previously published SSR markers, these markers were mapped onto five parent-specific linkage maps derived from three mapping populations, which were then assembled into an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of F. × ananassa and the genome of F. vesca. Variety distinction on 129 F. × ananassa lines was demonstrated using 45 selected SSR markers.

  13. Development of a high density integrated reference genetic linkage map for the multinational Brassica rapa Genome Sequencing Project.

    PubMed

    Li, Xiaonan; Ramchiary, Nirala; Choi, Su Ryun; Van Nguyen, Dan; Hossain, Md Jamil; Yang, Hyeon Kook; Lim, Yong Pyo

    2010-11-01

    We constructed a high-density Brassica rapa integrated linkage map by combining a reference genetic map of 78 doubled haploid lines derived from Chiifu-401-42 × Kenshin (CKDH) and a new map of 190 F2 lines derived from Chiifu-401-42 × rapid cycling B. rapa (CRF2). The integrated map contains 1017 markers and covers 1262.0 cM of the B. rapa genome, with an average interlocus distance of 1.24 cM. High similarity of marker order and position was observed among the linkage groups of the maps with few short-distance inversions. In total, 155 simple sequence repeat (SSR) markers, anchored to 102 new bacterial artificial chromosomes (BACs) and 146 intron polymorphic (IP) markers were mapped in the integrated map, which would be helpful to align the sequenced BACs in the ongoing multinational Brassica rapa Genome Sequencing Project (BrGSP). Further, comparison of the B. rapa consensus map with the 10 B. juncea A-genome linkage groups by using 98 common IP markers showed high-degree colinearity between the A-genome linkage groups, except for few markers showing inversion or translocation. Suggesting that chromosomes are highly conserved between these Brassica species, although they evolved independently after divergence. The sequence information coming out of BrGSP would be useful for B. juncea breeding. and the identified Arabidopsis chromosomal blocks and known quantitative trait loci (QTL) information of B. juncea could be applied to improve other Brassica crops including B. rapa.

  14. Effort to map and sequence the human genome makes significant progress

    SciTech Connect

    Borman, S.

    1994-11-07

    The Human Genome Project, an international research effort to map and sequence the genomes of humans and selected model organisms, is making significant progress toward its goals four years into its projected 15-year life. A detailed human genetic linkage map has been developed ahead of time. A physical map, consisting of overlapping pieces of DNA, is only slightly behind schedule. Base-by-base sequencing of the human genome is lagging, but sequencing of model organisms is moving along very well, with the first complete eukaryotic genome likely to be completed within two years. Human Genome Project sponsorship of a map that would show the location of expressed human genes is still in the planning stage. However, such maps have been and are being produced privately on a large scale--a state of affairs that has stirred up considerable controversy about whether the ''market'' for such data is being cornered by proprietary interests.

  15. Sequencing of cDNA Clones from the Genetic Map of Tomato (Lycopersicon esculentum)

    PubMed Central

    Ganal, Martin W.; Czihal, Rosemarie; Hannappel, Ulrich; Kloos, Dorothee-U.; Polley, Andreas; Ling, Hong-Qing

    1998-01-01

    The dense RFLP linkage map of tomato (Lycopersicon esculentum) contains >300 anonymous cDNA clones. Of those clones, 272 were partially or completely sequenced. The sequences were compared at the DNA and protein level to known genes in databases. For 57% of the clones, a significant match to previously described genes was found. The information will permit the conversion of those markers to STS markers and allow their use in PCR-based mapping experiments. Furthermore, it will facilitate the comparative mapping of genes across distantly related plant species by direct comparison of DNA sequences and map positions. [cDNA sequence data reported in this paper have been submitted to the EMBL database under accession nos. AA824695–AA825005 and the dbEST_Id database under accession nos. 1546519–1546862.] PMID:9724330

  16. cDNA encoding a polypeptide including a hev ein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  17. cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

    PubMed

    Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

    2016-01-01

    Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.

  18. Chromosomal structures and repetitive sequences divergence in Cucumis species revealed by comparative cytogenetic mapping.

    PubMed

    Zhang, Yunxia; Cheng, Chunyan; Li, Ji; Yang, Shuqiong; Wang, Yunzhu; Li, Ziang; Chen, Jinfeng; Lou, Qunfeng

    2015-09-25

    Differentiation and copy number of repetitive sequences affect directly chromosome structure which contributes to reproductive isolation and speciation. Comparative cytogenetic mapping has been verified an efficient tool to elucidate the differentiation and distribution of repetitive sequences in genome. In present study, the distinct chromosomal structures of five Cucumis species were revealed through genomic in situ hybridization (GISH) technique and comparative cytogenetic mapping of major satellite repeats. Chromosome structures of five Cucumis species were investigated using GISH and comparative mapping of specific satellites. Southern hybridization was employed to study the proliferation of satellites, whose structural characteristics were helpful for analyzing chromosome evolution. Preferential distribution of repetitive DNAs at the subtelomeric regions was found in C. sativus, C hystrix and C. metuliferus, while majority was positioned at the pericentromeric heterochromatin regions in C. melo and C. anguria. Further, comparative GISH (cGISH) through using genomic DNA of other species as probes revealed high homology of repeats between C. sativus and C. hystrix. Specific satellites including 45S rDNA, Type I/II, Type III, Type IV, CentM and telomeric repeat were then comparatively mapped in these species. Type I/II and Type IV produced bright signals at the subtelomeric regions of C. sativus and C. hystrix simultaneously, which might explain the significance of their amplification in the divergence of Cucumis subgenus from the ancient ancestor. Unique positioning of Type III and CentM only at the centromeric domains of C. sativus and C. melo, respectively, combining with unique southern bands, revealed rapid evolutionary patterns of centromeric DNA in Cucumis. Obvious interstitial telomeric repeats were observed in chromosomes 1 and 2 of C. sativus, which might provide evidence of the fusion hypothesis of chromosome evolution from x = 12 to x = 7 in

  19. Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L.

    PubMed

    Yang, Huaan; Tao, Ye; Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W; Howieson, John G; Li, Chengdao

    2013-01-01

    Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.

  20. Draft Genome Sequence, and a Sequence-Defined Genetic Linkage Map of the Legume Crop Species Lupinus angustifolius L

    PubMed Central

    Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W.; Howieson, John G.; Li, Chengdao

    2013-01-01

    Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species. PMID:23734219

  1. A Physical Map, Including a BAC/PAC Clone Contig, of the Williams-Beuren Syndrome–Deletion Region at 7q11.23

    PubMed Central

    Peoples, Risa; Franke, Yvonne; Wang, Yu-Ker; Pérez-Jurado, Luis; Paperna, Tamar; Cisco, Michael; Francke, Uta

    2000-01-01

    Summary Williams-Beuren syndrome (WBS) is a developmental disorder caused by haploinsufficiency for genes in a 2-cM region of chromosome band 7q11.23. With the exception of vascular stenoses due to deletion of the elastin gene, the various features of WBS have not yet been attributed to specific genes. Although ⩾16 genes have been identified within the WBS deletion, completion of a physical map of the region has been difficult because of the large duplicated regions flanking the deletion. We present a physical map of the WBS deletion and flanking regions, based on assembly of a bacterial artificial chromosome/P1-derived artificial chromosome contig, analysis of high-throughput genome-sequence data, and long-range restriction mapping of genomic and cloned DNA by pulsed-field gel electrophoresis. Our map encompasses 3 Mb, including 1.6 Mb within the deletion. Two large duplicons, flanking the deletion, of ⩾320 kb contain unique sequence elements from the internal border regions of the deletion, such as sequences from GTF2I (telomeric) and FKBP6 (centromeric). A third copy of this duplicon exists in inverted orientation distal to the telomeric flanking one. These duplicons show stronger sequence conservation with regard to each other than to the presumptive ancestral loci within the common deletion region. Sequence elements originating from beyond 7q11.23 are also present in these duplicons. Although the duplicons are not present in mice, the order of the single-copy genes in the conserved syntenic region of mouse chromosome 5 is inverted relative to the human map. A model is presented for a mechanism of WBS-deletion formation, based on the orientation of duplicons' components relative to each other and to the ancestral elements within the deletion region. PMID:10631136

  2. A physical map, including a BAC/PAC clone contig, of the Williams-Beuren syndrome--deletion region at 7q11.23.

    PubMed

    Peoples, R; Franke, Y; Wang, Y K; Pérez-Jurado, L; Paperna, T; Cisco, M; Francke, U

    2000-01-01

    Williams-Beuren syndrome (WBS) is a developmental disorder caused by haploinsufficiency for genes in a 2-cM region of chromosome band 7q11.23. With the exception of vascular stenoses due to deletion of the elastin gene, the various features of WBS have not yet been attributed to specific genes. Although >/=16 genes have been identified within the WBS deletion, completion of a physical map of the region has been difficult because of the large duplicated regions flanking the deletion. We present a physical map of the WBS deletion and flanking regions, based on assembly of a bacterial artificial chromosome/P1-derived artificial chromosome contig, analysis of high-throughput genome-sequence data, and long-range restriction mapping of genomic and cloned DNA by pulsed-field gel electrophoresis. Our map encompasses 3 Mb, including 1.6 Mb within the deletion. Two large duplicons, flanking the deletion, of >/=320 kb contain unique sequence elements from the internal border regions of the deletion, such as sequences from GTF2I (telomeric) and FKBP6 (centromeric). A third copy of this duplicon exists in inverted orientation distal to the telomeric flanking one. These duplicons show stronger sequence conservation with regard to each other than to the presumptive ancestral loci within the common deletion region. Sequence elements originating from beyond 7q11.23 are also present in these duplicons. Although the duplicons are not present in mice, the order of the single-copy genes in the conserved syntenic region of mouse chromosome 5 is inverted relative to the human map. A model is presented for a mechanism of WBS-deletion formation, based on the orientation of duplicons' components relative to each other and to the ancestral elements within the deletion region.

  3. Use of Composite Protein Database including Search Result Sequences for Mass Spectrometric Analysis of Cell Secretome

    PubMed Central

    Shin, Jihye; Kim, Gamin; Kabir, Mohammad Humayun; Park, Seong Jun; Lee, Seoung Taek; Lee, Cheolju

    2015-01-01

    Mass spectrometric (MS) data of human cell secretomes are usually run through the conventional human database for identification. However, the search may result in false identifications due to contamination of the secretome with fetal bovine serum (FBS) proteins. To overcome this challenge, here we provide a composite protein database including human as well as 199 FBS protein sequences for MS data search of human cell secretomes. Searching against the human-FBS database returned more reliable results with fewer false-positive and false-negative identifications compared to using either a human only database or a human-bovine database. Furthermore, the improved results validated our strategy without complex experiments like SILAC. We expect our strategy to improve the accuracy of human secreted protein identification and to also add value for general use. PMID:25822838

  4. A Time Sequence-Oriented Concept Map Approach to Developing Educational Computer Games for History Courses

    ERIC Educational Resources Information Center

    Chu, Hui-Chun; Yang, Kai-Hsiang; Chen, Jing-Hong

    2015-01-01

    Concept maps have been recognized as an effective tool for students to organize their knowledge; however, in history courses, it is important for students to learn and organize historical events according to the time of their occurrence. Therefore, in this study, a time sequence-oriented concept map approach is proposed for developing a game-based…

  5. Construction of a SNP and SSR linkage map in autotetraploid blueberry using genotyping by sequencing

    USDA-ARS?s Scientific Manuscript database

    A mapping population developed from a cross between two key highbush blueberry cultivars, Draper × Jewel (Vaccinium corymbosum), segregating for a number of important phenotypic traits, has been utilized to produce a genetic linkage map. Data on 233 single sequence repeat (SSR) markers and 1794 sing...

  6. Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing

    USDA-ARS?s Scientific Manuscript database

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitat...

  7. Sex Differences in Infants' Mapping of Complex Occlusion Sequences: Further Evidence

    ERIC Educational Resources Information Center

    Wilcox, Teresa

    2007-01-01

    Recently, infant researchers have reported sex differences in infants' capacity to map their representation of an occlusion sequence onto a subsequent no-occlusion display. The research reported here sought to identify the extent to which these sex differences are observed in event-mapping tasks and to identify the underlying basis for these…

  8. A Time Sequence-Oriented Concept Map Approach to Developing Educational Computer Games for History Courses

    ERIC Educational Resources Information Center

    Chu, Hui-Chun; Yang, Kai-Hsiang; Chen, Jing-Hong

    2015-01-01

    Concept maps have been recognized as an effective tool for students to organize their knowledge; however, in history courses, it is important for students to learn and organize historical events according to the time of their occurrence. Therefore, in this study, a time sequence-oriented concept map approach is proposed for developing a game-based…

  9. Genetic Linkage Map will aid the Whole genome Sequence Assembly

    USDA-ARS?s Scientific Manuscript database

    The allotetraploid peanut genome assembly will be a valuable resource to researchers studying polyploidy species, in addition to peanut genome evolution and domestication other than facilitating QTL analysis and the tools for marker-assisted breeding. Therefore, a peanut linkage map will aid genome ...

  10. Comparative organization of cattle chromosome 5 revealed by comparative mapping by annotation and sequence similarity and radiation hybrid mapping.

    PubMed

    Ozawa, A; Band, M R; Larson, J H; Donovan, J; Green, C A; Womack, J E; Lewin, H A

    2000-04-11

    A whole genome cattle-hamster radiation hybrid cell panel was used to construct a map of 54 markers located on bovine chromosome 5 (BTA5). Of the 54 markers, 34 are microsatellites selected from the cattle linkage map and 20 are genes. Among the 20 mapped genes, 10 are new assignments that were made by using the comparative mapping by annotation and sequence similarity strategy. A LOD-3 radiation hybrid framework map consisting of 21 markers was constructed. The relatively low retention frequency of markers on this chromosome (19%) prevented unambiguous ordering of the other 33 markers. The length of the map is 398.7 cR, corresponding to a ratio of approximately 2.8 cR(5,000)/cM. Type I genes were binned for comparison of gene order among cattle, humans, and mice. Multiple internal rearrangements within conserved syntenic groups were apparent upon comparison of gene order on BTA5 and HSA12 and HSA22. A similarly high number of rearrangements were observed between BTA5 and MMU6, MMU10, and MMU15. The detailed comparative map of BTA5 should facilitate identification of genes affecting economically important traits that have been mapped to this chromosome and should contribute to our understanding of mammalian chromosome evolution.

  11. Construction of an integrated genetic linkage map for the A genome of Brassica napus using SSR markers derived from sequenced BACs in B. rapa

    PubMed Central

    2010-01-01

    Background The Multinational Brassica rapa Genome Sequencing Project (BrGSP) has developed valuable genomic resources, including BAC libraries, BAC-end sequences, genetic and physical maps, and seed BAC sequences for Brassica rapa. An integrated linkage map between the amphidiploid B. napus and diploid B. rapa will facilitate the rapid transfer of these valuable resources from B. rapa to B. napus (Oilseed rape, Canola). Results In this study, we identified over 23,000 simple sequence repeats (SSRs) from 536 sequenced BACs. 890 SSR markers (designated as BrGMS) were developed and used for the construction of an integrated linkage map for the A genome in B. rapa and B. napus. Two hundred and nineteen BrGMS markers were integrated to an existing B. napus linkage map (BnaNZDH). Among these mapped BrGMS markers, 168 were only distributed on the A genome linkage groups (LGs), 18 distrubuted both on the A and C genome LGs, and 33 only distributed on the C genome LGs. Most of the A genome LGs in B. napus were collinear with the homoeologous LGs in B. rapa, although minor inversions or rearrangements occurred on A2 and A9. The mapping of these BAC-specific SSR markers enabled assignment of 161 sequenced B. rapa BACs, as well as the associated BAC contigs to the A genome LGs of B. napus. Conclusion The genetic mapping of SSR markers derived from sequenced BACs in B. rapa enabled direct links to be established between the B. napus linkage map and a B. rapa physical map, and thus the assignment of B. rapa BACs and the associated BAC contigs to the B. napus linkage map. This integrated genetic linkage map will facilitate exploitation of the B. rapa annotated genomic resources for gene tagging and map-based cloning in B. napus, and for comparative analysis of the A genome within Brassica species. PMID:20969760

  12. How to include the variability of TMS responses in simulations: a speech mapping case study

    NASA Astrophysics Data System (ADS)

    De Geeter, N.; Lioumis, P.; Laakso, A.; Crevecoeur, G.; Dupré, L.

    2016-11-01

    When delivered over a specific cortical site, TMS can temporarily disrupt the ongoing process in that area. This allows mapping of speech-related areas for preoperative evaluation purposes. We numerically explore the observed variability of TMS responses during a speech mapping experiment performed with a neuronavigation system. We selected four cases with very small perturbations in coil position and orientation. In one case (E) a naming error occurred, while in the other cases (NEA, B, C) the subject appointed the images as smoothly as without TMS. A realistic anisotropic head model was constructed of the subject from T1-weighted and diffusion-weighted MRI. The induced electric field distributions were computed, associated to the coil parameters retrieved from the neuronavigation system. Finally, the membrane potentials along relevant white matter fibre tracts, extracted from DTI-based tractography, were computed using a compartmental cable equation. While only minor differences could be noticed between the induced electric field distributions of the four cases, computing the corresponding membrane potentials revealed different subsets of tracts were activated. A single tract was activated for all coil positions. Another tract was only triggered for case E. NEA induced action potentials in 13 tracts, while NEB stimulated 11 tracts and NEC one. The calculated results are certainly sensitive to the coil specifications, demonstrating the observed variability in this study. However, even though a tract connecting Broca’s with Wernicke’s area is only triggered for the error case, further research is needed on other study cases and on refining the neural model with synapses and network connections. Case- and subject-specific modelling that includes both electromagnetic fields and neuronal activity enables demonstration of the variability in TMS experiments and can capture the interaction with complex neural networks.

  13. A map of human genome variation from population scale sequencing

    PubMed Central

    2011-01-01

    The 1000 Genomes Project aims to provide a deep characterisation of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. We present results of the pilot phase of the project, designed to develop and compare different strategies for genome wide sequencing with high throughput sequencing platforms. We undertook three projects: low coverage whole genome sequencing of 179 individuals from four populations, high coverage sequencing of two mother-father-child trios, and exon targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million SNPs, 1 million short insertions and deletions and 20,000 structural variants, the majority of which were previously undescribed. We show that over 95% of the currently accessible variants found in any individual are present in this dataset; on average, each person carries approximately 250 to 300 loss of function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We find many putative functional variants with large allele frequency differences between populations. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. PMID:20981092

  14. KvDB; mining and mapping sequence variants in voltage-gated potassium channels.

    PubMed

    Stead, Lucy F; Wood, Ian C; Westhead, David R

    2010-08-01

    We have created KvDB: a voltage-gated potassium (Kv) channel-specific database that houses natural and experimental variant data and includes highly curated multiple sequence alignments and additional analytical tools, such as structural variant mapping and transmembrane segment prediction. KvDB is available at www.bioinformatics.leeds.ac.uk/KvDB. Analyzing the characterized gene variants in terms of topological location revealed the following. The S4, S4-S5, S5, S5-S6, and S6 segments are most likely to house disease-causing variants. Neurological disorders are more likely to be caused by variants affecting voltage sensing, whereas cardiac disorders are more likely to be caused by variants in the pore. Long QT Syndrome 2 (LQT2) is more often caused by N-terminus variation, a region containing a domain that affects deactivation, suggesting a potential disease mechanism. Conversely, a higher proportion of LQT1-causing variants reside in S4-S5, suggesting communication of voltage-sensing to the pore as a disease mechanism. By structurally mapping functionally characterized variants, we also provide mechanistic insight into Kv channel function; identifying an intersubunit interaction that may be partly responsible for setting activation voltage. Investigating phenotypically characterized variants that map to the same position as functionally characterized ones indicates only weak association between locations that cause disease and those that alter electrophysiological properties.

  15. Identification of mesoderm development (mesd) candidate genes by comparative mapping and genome sequence analysis.

    PubMed

    Wines, M E; Lee, L; Katari, M S; Zhang, L; DeRossi, C; Shi, Y; Perkins, S; Feldman, M; McCombie, W R; Holdener, B C

    2001-02-15

    The proximal albino deletions identify several functional regions on mouse Chromosome 7 critical for differentiation of mesoderm (mesd), development of the hypothalamus neuroendocrine lineage (nelg), and function of the liver (hsdr1). Using comparative mapping and genomic sequence analysis, we have identified four novel genes and Il16 in the mesd deletion interval. Two of the novel genes, mesdc1 and mesdc2, are located within the mesd critical region defined by BAC transgenic rescue. We have investigated the fetal role of genes located outside the mesd critical region using BAC transgenic complementation of the mesd early embryonic lethality. Using human radiation hybrid mapping and BAC contig construction, we have identified a conserved region of human chromosome 15 homologous to the mesd, nelg, and hsdr1 functional regions. Three human diseases cosegregate with microsatellite markers used in construction of the human BAC/YAC physical map, including autosomal dominant nocturnal frontal lobe epilepsy (ENFL2; also known as ADNFLE), a syndrome of mental retardation, spasticity, and tapetoretinal degeneration (MRST); and a pyogenic arthritis, pyoderma gangrenosum, and acne syndrome (PAPA).

  16. Discovery of Candidate Disease Genes in ENU–Induced Mouse Mutants by Large-Scale Sequencing, Including a Splice-Site Mutation in Nucleoredoxin

    PubMed Central

    Wilming, Laurens G.; Liu, Bin; Probst, Frank J.; Harrow, Jennifer; Grafham, Darren; Hentges, Kathryn E.; Woodward, Lanette P.; Maxwell, Andrea; Mitchell, Karen; Risley, Michael D.; Johnson, Randy; Hirschi, Karen; Lupski, James R.; Funato, Yosuke; Miki, Hiroaki; Marin-Garcia, Pablo; Matthews, Lucy; Coffey, Alison J.; Parker, Anne; Hubbard, Tim J.; Rogers, Jane; Bradley, Allan; Adams, David J.; Justice, Monica J.

    2009-01-01

    An accurate and precisely annotated genome assembly is a fundamental requirement for functional genomic analysis. Here, the complete DNA sequence and gene annotation of mouse Chromosome 11 was used to test the efficacy of large-scale sequencing for mutation identification. We re-sequenced the 14,000 annotated exons and boundaries from over 900 genes in 41 recessive mutant mouse lines that were isolated in an N-ethyl-N-nitrosourea (ENU) mutation screen targeted to mouse Chromosome 11. Fifty-nine sequence variants were identified in 55 genes from 31 mutant lines. 39% of the lesions lie in coding sequences and create primarily missense mutations. The other 61% lie in noncoding regions, many of them in highly conserved sequences. A lesion in the perinatal lethal line l11Jus13 alters a consensus splice site of nucleoredoxin (Nxn), inserting 10 amino acids into the resulting protein. We conclude that point mutations can be accurately and sensitively recovered by large-scale sequencing, and that conserved noncoding regions should be included for disease mutation identification. Only seven of the candidate genes we report have been previously targeted by mutation in mice or rats, showing that despite ongoing efforts to functionally annotate genes in the mammalian genome, an enormous gap remains between phenotype and function. Our data show that the classical positional mapping approach of disease mutation identification can be extended to large target regions using high-throughput sequencing. PMID:20011118

  17. High-density linkage map construction and mapping of seed trait QTLs in chickpea (Cicer arietinum L.) using Genotyping-by-Sequencing (GBS)

    PubMed Central

    Verma, Subodh; Gupta, Shefali; Bandhiwal, Nitesh; Kumar, Tapan; Bharadwaj, Chellapilla; Bhatia, Sabhyata

    2015-01-01

    This study reports the use of Genotyping-by-Sequencing (GBS) for large-scale SNP discovery and simultaneous genotyping of recombinant inbred lines (RILs) of an intra-specific mapping population of chickpea contrasting for seed traits. A total of 119,672 raw SNPs were discovered, which after stringent filtering revealed 3,977 high quality SNPs of which 39.5% were present in genic regions. Comparative analysis using physically mapped marker loci revealed a higher degree of synteny with Medicago in comparison to soybean. The SNP genotyping data was utilized to construct one of the most saturated intra-specific genetic linkage maps of chickpea having 3,363 mapped positions including 3,228 SNPs on 8 linkage groups spanning 1006.98 cM at an average inter marker distance of 0.33 cM. The map was utilized to identify 20 quantitative trait loci (QTLs) associated with seed traits accounting for phenotypic variations ranging from 9.97% to 29.71%. Analysis of the genomic sequence corresponding to five robust QTLs led to the identification of 684 putative candidate genes whose expression profiling revealed that 101 genes exhibited seed specific expression. The integrated approach utilizing the identified QTLs along with the available genome and transcriptome could serve as a platform for candidate gene identification for molecular breeding of chickpea. PMID:26631981

  18. High-density linkage map construction and mapping of seed trait QTLs in chickpea (Cicer arietinum L.) using Genotyping-by-Sequencing (GBS).

    PubMed

    Verma, Subodh; Gupta, Shefali; Bandhiwal, Nitesh; Kumar, Tapan; Bharadwaj, Chellapilla; Bhatia, Sabhyata

    2015-12-03

    This study reports the use of Genotyping-by-Sequencing (GBS) for large-scale SNP discovery and simultaneous genotyping of recombinant inbred lines (RILs) of an intra-specific mapping population of chickpea contrasting for seed traits. A total of 119,672 raw SNPs were discovered, which after stringent filtering revealed 3,977 high quality SNPs of which 39.5% were present in genic regions. Comparative analysis using physically mapped marker loci revealed a higher degree of synteny with Medicago in comparison to soybean. The SNP genotyping data was utilized to construct one of the most saturated intra-specific genetic linkage maps of chickpea having 3,363 mapped positions including 3,228 SNPs on 8 linkage groups spanning 1006.98 cM at an average inter marker distance of 0.33 cM. The map was utilized to identify 20 quantitative trait loci (QTLs) associated with seed traits accounting for phenotypic variations ranging from 9.97% to 29.71%. Analysis of the genomic sequence corresponding to five robust QTLs led to the identification of 684 putative candidate genes whose expression profiling revealed that 101 genes exhibited seed specific expression. The integrated approach utilizing the identified QTLs along with the available genome and transcriptome could serve as a platform for candidate gene identification for molecular breeding of chickpea.

  19. Human insulin genome sequence map, biochemical structure of insulin for recombinant DNA insulin.

    PubMed

    Chakraborty, Chiranjib; Mungantiwar, Ashish A

    2003-08-01

    Insulin is a essential molecule for type I diabetes that is marketed by very few companies. It is the first molecule, which was made by recombinant technology; but the commercialization process is very difficult. Knowledge about biochemical structure of insulin and human insulin genome sequence map is pivotal to large scale manufacturing of recombinant DNA Insulin. This paper reviews human insulin genome sequence map, the amino acid sequence of porcine insulin, crystal structure of porcine insulin, insulin monomer, aggregation surfaces of insulin, conformational variation in the insulin monomer, insulin X-ray structures for recombinant DNA technology in the synthesis of human insulin in Escherichia coli.

  20. Toward a physical map of Drosophila buzzatii. Use of randomly amplified polymorphic dna polymorphisms and sequence-tagged site landmarks.

    PubMed Central

    Laayouni, H; Santos, M; Fontdevila, A

    2000-01-01

    We present a physical map based on RAPD polymorphic fragments and sequence-tagged sites (STSs) for the repleta group species Drosophila buzzatii. One hundred forty-four RAPD markers have been used as probes for in situ hybridization to the polytene chromosomes, and positive results allowing the precise localization of 108 RAPDs were obtained. Of these, 73 behave as effectively unique markers for physical map construction, and in 9 additional cases the probes gave two hybridization signals, each on a different chromosome. Most markers (68%) are located on chromosomes 2 and 4, which partially agree with previous estimates on the distribution of genetic variation over chromosomes. One RAPD maps close to the proximal breakpoint of inversion 2z(3) but is not included within the inverted fragment. However, it was possible to conclude from this RAPD that the distal breakpoint of 2z(3) had previously been wrongly assigned. A total of 39 cytologically mapped RAPDs were converted to STSs and yielded an aggregate sequence of 28,431 bp. Thirty-six RAPDs (25%) did not produce any detectable hybridization signal, and we obtained the DNA sequence from three of them. Further prospects toward obtaining a more developed genetic map than the one currently available for D. buzzatii are discussed. PMID:11102375

  1. MySSP: non-stationary evolutionary sequence simulation, including indels.

    PubMed

    Rosenberg, Michael S

    2007-02-26

    MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package.

  2. MySSP: Non-stationary evolutionary sequence simulation, including indels

    PubMed Central

    Rosenberg, Michael S.

    2007-01-01

    MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package. PMID:19325855

  3. A Topographic Image Map of the Sabrina Valles Region Including Information on Large Martian Impact Craters

    NASA Astrophysics Data System (ADS)

    Gehrke, S.; Köhring, R.; Barlow, N. G.; Gwinner, K.; Scholten, F.; Lehmann, H.; Albertz, J.

    2007-03-01

    The Catalog of Large Martian Impact Craters provides detailed information on 42,283 craters >5 km; it is planned to be integrated in the Topographic Image Map Mars 1:200,000 series. Such an update is shown in a special target map, based on HRSC data.

  4. ZOOM Lite: next-generation sequencing data mapping and visualization software.

    PubMed

    Zhang, Zefeng; Lin, Hao; Ma, Bin

    2010-07-01

    High-throughput next-generation sequencing technologies pose increasing demands on the efficiency, accuracy and usability of data analysis software. In this article, we present ZOOM Lite, a software for efficient reads mapping and result visualization. With a kernel capable of mapping tens of millions of Illumina or AB SOLiD sequencing reads efficiently and accurately, and an intuitive graphical user interface, ZOOM Lite integrates reads mapping and result visualization into a easy to use pipeline on desktop PC. The software handles both single-end and paired-end reads, and can output both the unique mapping result or the top N mapping results for each read. Additionally, the software takes a variety of input file formats and outputs to several commonly used result formats. The software is freely available at http://bioinfor.com/zoom/lite/.

  5. The hidden perils of read mapping as a quality assessment tool in genome sequencing

    PubMed Central

    Lehri, B.; Seddon, A. M.; Karlyshev, A. V.

    2017-01-01

    This article provides a comparative analysis of the various methods of genome sequencing focusing on verification of the assembly quality. The results of a comparative assessment of various de novo assembly tools, as well as sequencing technologies, are presented using a recently completed sequence of the genome of Lactobacillus fermentum 3872. In particular, quality of assemblies is assessed by using CLC Genomics Workbench read mapping and Optical mapping developed by OpGen. Over-extension of contigs without prior knowledge of contig location can lead to misassembled contigs, even when commonly used quality indicators such as read mapping suggest that a contig is well assembled. Precautions must also be undertaken when using long read sequencing technology, which may also lead to misassembled contigs. PMID:28225089

  6. Structure map including off-stoichiometric and ternary sp-d-valent compounds

    NASA Astrophysics Data System (ADS)

    Hammerschmidt, T.; Bialon, A. F.; Drautz, R.

    2017-10-01

    Structure maps predict the crystal structure of a compound from the knowledge of constituent elements and chemical composition. We recently developed a highly predictive, three-dimensional structure map for stoichiometric binary sp- d-valent compounds. Here we show that the descriptors of this structure map are transferable to off-stoichiometric compounds with similar predictive power. We furthermore demonstrate that the descriptors are suitable for ternary prototypes. In particular, we construct a three-dimensional structure map for 129 prototypical crystal structures for ternary compounds. The crystal structure is predicted correctly with a probability of 78%. With a confidence of 95% the correct crystal structure is among the three most likely crystal structures predicted by the structure map.

  7. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which

  8. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation.

    PubMed

    Ward, Judson A; Bhangoo, Jasbir; Fernández-Fernández, Felicidad; Moore, Patrick; Swanson, J D; Viola, Roberto; Velasco, Riccardo; Bassil, Nahla; Weber, Courtney A; Sargent, Daniel J

    2013-01-16

    Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which may help to identify

  9. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

    PubMed

    Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

    2016-09-07

    Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits.

  10. A sequence-ready map for human chromosome 12q15-21.

    PubMed

    Lee, S G; Cho, K A; Choi, Y H; Montgomery, K; Lee, E; Miller, A; Kucherlapati, R; Song, K

    2000-01-01

    Construction of sequence-ready clone map is an essential step toward sequencing the human genome. We chose a region that is frequently amplified in liposarcoma between D12S350 and D12S106 in chromosome 12q15-21 to build a PAC/BAC clone contig map. This region was spanned by 4 YACs and contained 30 STS on the YAC and radiation hybrid (RH) framework maps, providing an average STS spacing of 160 kb if each YAC is approximately 1.2 Mb in size. To convert a STS-based YAC map to a STS-based contig map of bacterial clones, 22 non-polymorphic STS markers were used as probes to screen the high density gridded arrays of PAC and BAC clones by filter hybridizations, followed by assembly of clones into contigs by marker content. Contigs have been extended and joined by direct end sequencing of appropriate clones, generating new STSs and rescreening the library as necessary. Using these approaches, we have constructed 5 contigs covering the region with the largest single contig being 1.4 Mb and a final size estimation of 3.6 Mb. The map is comprised of 17 YACs, 187 PACs, 160 BACs, and 17 cosmids; onto this, 6 polymorphic, 97 non-polymorphic, 24 ESTs, and 4 gene-based markers are now placed in a unique order, providing an average resolution of approximately 28 kb. Of a total of 131 markers, 97 were developed in the present study. The sequence-ready map should provide a framework to generate complete DNA sequence and ultimately gene map of this segment of chromosome 12.

  11. A sequence-based variation map of zebrafish.

    PubMed

    Patowary, Ashok; Purkanti, Ramya; Singh, Meghna; Chauhan, Rajendra; Singh, Angom Ramcharan; Swarnkar, Mohit; Singh, Naresh; Pandey, Vikas; Torroja, Carlos; Clark, Matthew D; Kocher, Jean-Pierre; Clark, Karl J; Stemple, Derek L; Klee, Eric W; Ekker, Stephen C; Scaria, Vinod; Sivasubbu, Sridhar

    2013-03-01

    Zebrafish (Danio rerio) is a popular vertebrate model organism largely deployed using outbred laboratory animals. The nonisogenic nature of the zebrafish as a model system offers the opportunity to understand natural variations and their effect in modulating phenotype. In an effort to better characterize the range of natural variation in this model system and to complement the zebrafish reference genome project, the whole genome sequence of a wild zebrafish at 39-fold genome coverage was determined. Comparative analysis with the zebrafish reference genome revealed approximately 5.2 million single nucleotide variations and over 1.6 million insertion-deletion variations. This dataset thus represents a new catalog of genetic variations in the zebrafish genome. Further analysis revealed selective enrichment for variations in genes involved in immune function and response to the environment, suggesting genome-level adaptations to environmental niches. We also show that human disease gene orthologs in the sequenced wild zebrafish genome show a lower ratio of nonsynonymous to synonymous single nucleotide variations.

  12. Synchronous imitation of continuous action sequences: The role of spatial and topological mapping.

    PubMed

    Ramenzoni, Verónica C; Sebanz, Natalie; Knoblich, Günther

    2015-10-01

    What are the mapping mechanisms that enable people to synchronously imitate continuous action sequences observed in others? We investigated this question in 4 experiments that used a tapping task where participants synchronously performed alternating bimanual hand movements with a model presented in an egocentric or allocentric orientation. Their task was to tap in synchrony, with each hand matching the movements of the ipsilateral model hand as closely as possible. The results show that automatic establishment of topological mappings, where the performer's hand is mapped onto the model's anatomically matching hand even if the 2 are spatially misaligned, can interfere with maintaining spatial mappings (Experiments 1 and 2). The interference was particularly strong in musicians who have expertise in establishing topological mappings in continuous performance (Experiment 4). Adopting an unusual body posture greatly interfered with establishing spatial as well as topological mappings (Experiment 3). Together, the results suggest that synchronous imitation of continuous action sequences depends on flexible predictive models that simultaneously apply spatial and topological mapping constraints to enable an actor to act in synchrony with observed action sequences. (c) 2015 APA, all rights reserved).

  13. Continuous intensity map optimization (CIMO): a novel approach to leaf sequencing in step and shoot IMRT.

    PubMed

    Cao, Daliang; Earl, Matthew A; Luan, Shuang; Shepard, David M

    2006-04-01

    A new leaf-sequencing approach has been developed that is designed to reduce the number of required beam segments for step-and-shoot intensity modulated radiation therapy (IMRT). This approach to leaf sequencing is called continuous-intensity-map-optimization (CIMO). Using a simulated annealing algorithm, CIMO seeks to minimize differences between the optimized and sequenced intensity maps. Two distinguishing features of the CIMO algorithm are (1) CIMO does not require that each optimized intensity map be clustered into discrete levels and (2) CIMO is not rule-based but rather simultaneously optimizes both the aperture shapes and weights. To test the CIMO algorithm, ten IMRT patient cases were selected (four head-and-neck, two pancreas, two prostate, one brain, and one pelvis). For each case, the optimized intensity maps were extracted from the Pinnacle3 treatment planning system. The CIMO algorithm was applied, and the optimized aperture shapes and weights were loaded back into Pinnacle. A final dose calculation was performed using Pinnacle's convolution/superposition based dose calculation. On average, the CIMO algorithm provided a 54% reduction in the number of beam segments as compared with Pinnacle's leaf sequencer. The plans sequenced using the CIMO algorithm also provided improved target dose uniformity and a reduced discrepancy between the optimized and sequenced intensity maps. For ten clinical intensity maps, comparisons were performed between the CIMO algorithm and the power-of-two reduction algorithm of Xia and Verhey [Med. Phys. 25(8), 1424-1434 (1998)]. When the constraints of a Varian Millennium multileaf collimator were applied, the CIMO algorithm resulted in a 26% reduction in the number of segments. For an Elekta multileaf collimator, the CIMO algorithm resulted in a 67% reduction in the number of segments. An average leaf sequencing time of less than one minute per beam was observed.

  14. Software tools for motif and pattern scanning: program descriptions including a universal sequence reading algorithm.

    PubMed

    Cockwell, K Y; Giles, I G

    1989-07-01

    Two programs, MOTIF and PATTERN, that scan sequences for matches to user-defined motifs and patterns of motifs based on identity and set membership are described. The programs use a simple and logical notation to define motifs, and may be used either interactively or by using command line parameters (suitable for batch processing). The two programs described also incorporate a simple, yet reliable, algorithm that automatically detects in which of six possible formats the sequence entry is written.

  15. A High-Density Genetic Linkage Map for Cucumber (Cucumis sativus L.): Based on Specific Length Amplified Fragment (SLAF) Sequencing and QTL Analysis of Fruit Traits in Cucumber

    PubMed Central

    Zhu, Wen-Ying; Huang, Long; Chen, Long; Yang, Jian-Tao; Wu, Jia-Ni; Qu, Mei-Ling; Yao, Dan-Qing; Guo, Chun-Li; Lian, Hong-Li; He, Huan-Le; Pan, Jun-Song; Cai, Run

    2016-01-01

    High-density genetic linkage map plays an important role in genome assembly and quantitative trait loci (QTL) fine mapping. Since the coming of next-generation sequencing, makes the structure of high-density linkage maps much more convenient and practical, which simplifies SNP discovery and high-throughput genotyping. In this research, a high-density linkage map of cucumber was structured using specific length amplified fragment sequencing, using 153 F2 populations of S1000 × S1002. The high-density genetic map composed 3,057 SLAFs, including 4,475 SNP markers on seven chromosomes, and spanned 1061.19 cM. The average genetic distance is 0.35 cM. Based on this high-density genome map, QTL analysis was performed on two cucumber fruit traits, fruit length and fruit diameter. There are 15 QTLs for the two fruit traits were detected. PMID:27148281

  16. Mapping of repetitive bovine DNA sequences on cattle Y chromosomes.

    PubMed

    Schwerin, M; Gallagher, D S; Miller, J R; Thomsen, P D

    1992-01-01

    Three male-specific PCR products of the sequences BC1.2, lambda ES6.0, and BRY.1 were used as probes for Southern blot analyses. Each of these probes generated a complex male-specific band pattern, which showed some quantitative variations among bulls. Hybridization patterns obtained with the BC1.2 and lambda ES6.0 PCR products were interrelated. Chromosomal locations of these repeats were determined by hybridizing the tritiated PCR products in situ to male metaphase spreads. The BC1.2 and lambda ES6.0 PCR products hybridized to Yp13-->p12, whereas the BRY.1 PCR product hybridized over the entire Y chromosome. In addition, the BC1.2 and lambda ES6.0 PCR products hybridized to the distal half of the acrocentric Y chromosome of Bos indicus, indicating that the short arm of the B. taurus Y chromosome is homologous with the telomeric end of the B. indicus Y and supporting the notion that the Y chromosomes of these two species differ by a pericentric inversion.

  17. Comparative mapping of human alphoid centromeric sequences in great apes

    SciTech Connect

    Archidiacono, N.; Antonacci, R.; Marzella, R.

    1994-09-01

    Metaphase spreads from chimpanzees (Pan troglodytes and Pan paniscus) and gorilla (Gorilla gorilla) have been hybridized in situ with 27 alphoid DNA probes specific for the centromere of human chromosomes, to investigate the evolutionary relationship between centromeric regions of human and great apes. The results showed that most human probes do not recognize their corresponding homologs in great apes. Chromosome X is the only chromosome showing localization consistency in all the four species. Each suprachromosomal family (SCF) exhibits a distinct and peculiar evolutionary history. SCF1 (chromosomes 1, 3, 6, 7, 19, 12, 16) is very heterogeneous: some probes gave intense signals, but always on non-homologous chromosomes; others did not produce any hybridization signal. All probes localized on SCF2 (chromosomes 2, 4, 8, 9, 13, 14, 15, 18, 20, 21, and 22) recognize a single chromosome: chromosome 11 (phylogenetic IX) in PTR and PPA; chromosome 4 (phylogenetic V) in GGO. SCF3 subsets (chromosomes 1, 11, 17, X) are substantially conserved in PTR and PPA, but not in GGO, with the exception restricted to chromosome X. No signals have been detected on PPA chromosomes I, III, IV, V, VI and in PTR chromosomes V, suggesting that the centromeric region of some chromsomes have probably lost homology with human alphoid sequences.

  18. The application of next-generation sequencing in the autozygosity mapping of human recessive diseases.

    PubMed

    Alkuraya, Fowzan S

    2013-11-01

    Autozygosity, or the inheritance of two copies of an ancestral allele, has the potential to not only reveal phenotypes caused by biallelic mutations in autosomal recessive genes, but to also facilitate the mapping of such mutations by flagging the surrounding haplotypes as tractable runs of homozygosity (ROH), a process known as autozygosity mapping. Since SNPs replaced microsatellites as markers for the purpose of genomewide identification of ROH, autozygosity mapping of Mendelian genes has witnessed a significant acceleration. Historically, successful mapping traditionally required favorable family structure that permits the identification of an autozygous interval that is amenable to candidate gene selection and confirmation by Sanger sequencing. This requirement presented a major bottleneck that hindered the utilization of simplex cases and many multiplex families with autosomal recessive phenotypes. However, the advent of next-generation sequencing that enables massively parallel sequencing of DNA has largely bypassed this bottleneck and thus ushered in an era of unprecedented pace of Mendelian disease gene discovery. The ability to identify a single causal mutation among a massive number of variants that are uncovered by next-generation sequencing can be challenging, but applying autozygosity as a filter can greatly enhance the enrichment process and its throughput. This review will discuss the power of combining the best of both techniques in the mapping of recessive disease genes and offer some tips to troubleshoot potential limitations.

  19. Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data.

    PubMed

    Tsuji, Junko; Weng, Zhiping

    2016-11-01

    Cytosine methylation regulates many biological processes such as gene expression, chromatin structure and chromosome stability. The whole genome bisulfite sequencing (WGBS) technique measures the methylation level at each cytosine throughout the genome. There are an increasing number of publicly available pipelines for analyzing WGBS data, reflecting many choices of read mapping algorithms as well as preprocessing and postprocessing methods. We simulated single-end and paired-end reads based on three experimental data sets, and comprehensively evaluated 192 combinations of three preprocessing, five postprocessing and five widely used read mapping algorithms. We also compared paired-end data with single-end data at the same sequencing depth for performance of read mapping and methylation level estimation. Bismark and LAST were the most robust mapping algorithms. We found that Mott trimming and quality filtering individually improved the performance of both read mapping and methylation level estimation, but combining them did not lead to further improvement. Furthermore, we confirmed that paired-end sequencing reduced error rate and enhanced sensitivity for both read mapping and methylation level estimation, especially for short reads and in repetitive regions of the human genome. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  20. Draft Genome Sequences of Nine Pseudomonas aeruginosa Strains, Including Eight Clinical Isolates

    PubMed Central

    Cunningham, Scott A.; Quest, Daniel; Sikkink, Robert A.; O’Brien, Daniel; Eckloff, Bruce W.; Patel, Robin

    2015-01-01

    We report on nine draft genomes of Pseudomonas aeruginosa isolates, assembled using a hybrid paired-end and Nextera mate-pair library approach. Eight are of clinical origin, and one is the ATCC 27853 strain. We also report their multilocus sequence types. PMID:26450729

  1. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.

    PubMed

    Nguyen, Tung; Shi, Weisong; Ruden, Douglas

    2011-06-06

    Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http

  2. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

    PubMed Central

    2011-01-01

    Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version

  3. High resolution genetic mapping by genome sequencing reveals genome duplication and tetraploid genetic structure of the diploid Miscanthus sinensis.

    PubMed

    Ma, Xue-Feng; Jensen, Elaine; Alexandrov, Nickolai; Troukhan, Maxim; Zhang, Liping; Thomas-Jones, Sian; Farrar, Kerrie; Clifton-Brown, John; Donnison, Iain; Swaller, Timothy; Flavell, Richard

    2012-01-01

    We have created a high-resolution linkage map of Miscanthus sinensis, using genotyping-by-sequencing (GBS), identifying all 19 linkage groups for the first time. The result is technically significant since Miscanthus has a very large and highly heterozygous genome, but has no or limited genomics information to date. The composite linkage map containing markers from both parental linkage maps is composed of 3,745 SNP markers spanning 2,396 cM on 19 linkage groups with a 0.64 cM average resolution. Comparative genomics analyses of the M. sinensis composite linkage map to the genomes of sorghum, maize, rice, and Brachypodium distachyon indicate that sorghum has the closest syntenic relationship to Miscanthus compared to other species. The comparative results revealed that each pair of the 19 M. sinensis linkages aligned to one sorghum chromosome, except for LG8, which mapped to two sorghum chromosomes (4 and 7), presumably due to a chromosome fusion event after genome duplication. The data also revealed several other chromosome rearrangements relative to sorghum, including two telomere-centromere inversions of the sorghum syntenic chromosome 7 in LG8 of M. sinensis and two paracentric inversions of sorghum syntenic chromosome 4 in LG7 and LG8 of M. sinensis. The results clearly demonstrate, for the first time, that the diploid M. sinensis is tetraploid origin consisting of two sub-genomes. This complete and high resolution composite linkage map will not only serve as a useful resource for novel QTL discoveries, but also enable informed deployment of the wealth of existing genomics resources of other species to the improvement of Miscanthus as a high biomass energy crop. In addition, it has utility as a reference for genome sequence assembly for the forthcoming whole genome sequencing of the Miscanthus genus.

  4. Whole genome mapping as a fast-track tool to assess genomic stability of sequenced Staphylococcus aureus strains.

    PubMed

    Sabirova, Julia S; Xavier, Basil Britto; Ieven, Margareta; Goossens, Herman; Malhotra-Kumar, Surbhi

    2014-10-08

    Whole genome (optical) mapping (WGM), a state-of-the-art mapping technology based on the generation of high resolution restriction maps, has so far been used for typing clinical outbreak strains and for mapping de novo sequence contigs in genome sequencing projects. We employed WGM to assess the genomic stability of previously sequenced Staphylococcus aureus strains that are commonly used in laboratories as reference standards. S. aureus strains (n = 12) were mapped on the Argus™ Optical Mapping System (Opgen Inc, Gaithersburg, USA). Assembly of NcoI-restricted DNA molecules, visualization, and editing of whole genome maps was performed employing MapManager and MapSolver softwares (Opgen Inc). In silico whole genome NcoI-restricted maps were also generated from available sequence data, and compared to the laboratory-generated maps. Strains showing differences between the two maps were resequenced using Nextera XT DNA Sample Preparation Kit and Miseq Reagent Kit V2 (MiSeq, Illumina) and de novo assembled into sequence contigs using the Velvet assembly tool. Sequence data were correlated with corresponding whole genome maps to perform contig mapping and genome assembly using MapSolver. Of the twelve strains tested, one (USA300_FPR3757) showed a 19-kbp deletion on WGM compared to its in silico generated map and reference sequence data. Resequencing of the USA300_FPR3757 identified the deleted fragment to be a 13 kbp-long integrative conjugative element ICE6013. Frequent subculturing and inter-laboratory transfers can induce genomic and therefore, phenotypic changes that could compromise the utility of standard reference strains. WGM can thus be used as a rapid genome screening method to identify genomic rearrangements whose size and type can be confirmed by sequencing.

  5. OmniMapFree: A unified tool to visualise and explore sequenced genomes

    PubMed Central

    2011-01-01

    • Background Acquiring and exploring whole genome sequence information for a species under investigation is now a routine experimental approach. On most genome browsers, typically, only the DNA sequence, EST support, motif search results, and GO annotations are displayed. However, for many species, a growing volume of additional experimental information is available but this is rarely searchable within the landscape of the entire genome. • Results We have developed a generic software which permits users to view a single genome in entirety either within its chromosome or supercontig context within a single window. This software permits the genome to be displayed at any scales and with any features. Different data types and data sets are displayed onto the genome, which have been acquired from other types of studies including classical genetics, forward and reverse genetics, transcriptomics, proteomics and improved annotation from alternative sources. In each display, different types of information can be overlapped, then retrieved in the desired combinations and scales and used in follow up analyses. The displays generated are of publication quality. • Conclusions OmniMapFree provides a unified, versatile and easy-to-use software tool for studying a single genome in association with all the other datasets and data types available for the organism. PMID:22085540

  6. Sequence and genetic map of Meloidogyne hapla: A compact nematode genome for plant parasitism

    PubMed Central

    Opperman, Charles H.; Bird, David M.; Williamson, Valerie M.; Rokhsar, Dan S.; Burke, Mark; Cohn, Jonathan; Cromer, John; Diener, Steve; Gajan, Jim; Graham, Steve; Houfek, T. D.; Liu, Qingli; Mitros, Therese; Schaff, Jennifer; Schaffer, Reenah; Scholl, Elizabeth; Sosinski, Bryon R.; Thomas, Varghese P.; Windham, Eric

    2008-01-01

    We have established Meloidogyne hapla as a tractable model plant-parasitic nematode amenable to forward and reverse genetics, and we present a complete genome sequence. At 54 Mbp, M. hapla represents not only the smallest nematode genome yet completed, but also the smallest metazoan, and defines a platform to elucidate mechanisms of parasitism by what is the largest uncontrolled group of plant pathogens worldwide. The M. hapla genome encodes significantly fewer genes than does the free-living nematode Caenorhabditis elegans (most notably through a reduction of odorant receptors and other gene families), yet it has acquired horizontally from other kingdoms numerous genes suspected to be involved in adaptations to parasitism. In some cases, amplification and tandem duplication have occurred with genes suspected of being acquired horizontally and involved in parasitism of plants. Although M. hapla and C. elegans diverged >500 million years ago, many developmental and biochemical pathways, including those for dauer formation and RNAi, are conserved. Although overall genome organization is not conserved, there are areas of microsynteny that may suggest a primary biological function in nematodes for those genes in these areas. This sequence and map represent a wealth of biological information on both the nature of nematode parasitism of plants and its evolution. PMID:18809916

  7. Next-Gen Sequencing-Based Mapping and Identification of Ethyl Methanesulfonate-Induced Mutations in Arabidopsis thaliana.

    PubMed

    Zhang, Xue-Cheng; Millet, Yves; Ausubel, Frederick M; Borowsky, Mark

    2014-10-01

    Forward genetic analysis using ethyl methanesulfonate (EMS) mutagenesis has proven to be a powerful tool in biological research, but identification and cloning of causal mutations by conventional genetic mapping approaches is a painstaking process. Recent advances in next-gen sequencing have greatly invigorated the process of identifying EMS-induced mutations corresponding to a specific phenotype in model genetic hosts, including the plant Arabidopsis thaliana and the nematode Caenorhabditis elegans. Next-gen sequencing of bulked F2 mutant recombinants produces a wealth of high-resolution genetic data, provides enhanced delimitation of the genomic location of mutations, and greatly reduces hands-on time while maintaining high accuracy and reproducibility. In this unit, a detailed procedure to simultaneously map and identify EMS mutations in Arabidopsis is described.

  8. Fast and cost-effective genetic mapping in apple using next-generation sequencing.

    PubMed

    Gardner, Kyle M; Brown, Patrick; Cooke, Thomas F; Cann, Scott; Costa, Fabrizio; Bustamante, Carlos; Velasco, Riccardo; Troggio, Michela; Myles, Sean

    2014-07-16

    Next-generation DNA sequencing (NGS) produces vast amounts of DNA sequence data, but it is not specifically designed to generate data suitable for genetic mapping. Recently developed DNA library preparation methods for NGS have helped solve this problem, however, by combining the use of reduced representation libraries with DNA sample barcoding to generate genome-wide genotype data from a common set of genetic markers across a large number of samples. Here we use such a method, called genotyping-by-sequencing (GBS), to produce a data set for genetic mapping in an F1 population of apples (Malus × domestica) segregating for skin color. We show that GBS produces a relatively large, but extremely sparse, genotype matrix: over 270,000 SNPs were discovered but most SNPs have too much missing data across samples to be useful for genetic mapping. After filtering for genotype quality and missing data, only 6% of the 85 million DNA sequence reads contributed to useful genotype calls. Despite this limitation, using existing software and a set of simple heuristics, we generated a final genotype matrix containing 3967 SNPs from 89 DNA samples from a single lane of Illumina HiSeq and used it to create a saturated genetic linkage map and to identify a known QTL underlying apple skin color. We therefore demonstrate that GBS is a cost-effective method for generating genome-wide SNP data suitable for genetic mapping in a highly diverse and heterozygous agricultural species. We anticipate future improvements to the GBS analysis pipeline presented here that will enhance the utility of next-generation DNA sequence data for the purposes of genetic mapping across diverse species.

  9. Mapping sensorimotor sequences to word sequences: a connectionist model of language acquisition and sentence generation.

    PubMed

    Takac, Martin; Benuskova, Lubica; Knott, Alistair

    2012-11-01

    In this article we present a neural network model of sentence generation. The network has both technical and conceptual innovations. Its main technical novelty is in its semantic representations: the messages which form the input to the network are structured as sequences, so that message elements are delivered to the network one at a time. Rather than learning to linearise a static semantic representation as a sequence of words, our network rehearses a sequence of semantic signals, and learns to generate words from selected signals. Conceptually, the network's use of rehearsed sequences of semantic signals is motivated by work in embodied cognition, which posits that the structure of semantic representations has its origin in the serial structure of sensorimotor processing. The rich sequential structure of the network's semantic inputs also allows it to incorporate certain Chomskyan ideas about innate syntactic knowledge and parameter-setting, as well as a more empiricist account of the acquisition of idiomatic syntactic constructions. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Mapping Sensorimotor Sequences to Word Sequences: A Connectionist Model of Language Acquisition and Sentence Generation

    ERIC Educational Resources Information Center

    Takac, Martin; Benuskova, Lubica; Knott, Alistair

    2012-01-01

    In this article we present a neural network model of sentence generation. The network has both technical and conceptual innovations. Its main technical novelty is in its semantic representations: the messages which form the input to the network are structured as sequences, so that message elements are delivered to the network one at a time. Rather…

  11. Mapping Sensorimotor Sequences to Word Sequences: A Connectionist Model of Language Acquisition and Sentence Generation

    ERIC Educational Resources Information Center

    Takac, Martin; Benuskova, Lubica; Knott, Alistair

    2012-01-01

    In this article we present a neural network model of sentence generation. The network has both technical and conceptual innovations. Its main technical novelty is in its semantic representations: the messages which form the input to the network are structured as sequences, so that message elements are delivered to the network one at a time. Rather…

  12. Fine-mapping diabetes-related traits, including insulin resistance, in heterogeneous stock rats.

    PubMed

    Solberg Woods, Leah C; Holl, Katie L; Oreper, Daniel; Xie, Yuying; Tsaih, Shirng-Wern; Valdar, William

    2012-11-01

    Type 2 diabetes (T2D) is a disease of relative insulin deficiency resulting from both insulin resistance and beta cell failure. We have previously used heterogeneous stock (HS) rats to fine-map a locus for glucose tolerance. We show here that glucose intolerance in the founder strains of the HS colony is mediated by different mechanisms: insulin resistance in WKY and an insulin secretion defect in ACI, and we demonstrate a high degree of variability for measures of insulin resistance and insulin secretion in HS rats. As such, our goal was to use HS rats to fine-map several diabetes-related traits within a region on rat chromosome 1. We measured blood glucose and plasma insulin levels after a glucose tolerance test in 782 male HS rats. Using 97 SSLP markers, we genotyped a 68 Mb region on rat chromosome 1 previously implicated in glucose and insulin regulation. We used linkage disequilibrium mapping by mixed model regression with inferred descent to identify a region from 198.85 to 205.9 that contains one or more quantitative trait loci (QTL) for fasting insulin and a measure of insulin resistance, the quantitative insulin sensitivity check index. This region also encompasses loci identified for fasting glucose and Insulin_AUC (area under the curve). A separate <3 Mb QTL was identified for body weight. Using a novel penalized regression method we then estimated effects of alternative haplotype pairings under each locus. These studies highlight the utility of HS rats for fine-mapping genetic loci involved in the underlying causes of T2D.

  13. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

    PubMed Central

    Nelson, William; Luo, Meizhong; Ma, Jianxin; Estep, Matt; Estill, James; He, Ruifeng; Talag, Jayson; Sisneros, Nicholas; Kudrna, David; Kim, HyeRan; Ammiraju, Jetty SS; Collura, Kristi; Bharti, Arvind K; Messing, Joachim; Wing, Rod A; SanMiguel, Phillip; Bennetzen, Jeffrey L; Soderlund, Carol

    2008-01-01

    Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig), while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%). These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of epigenetic boundaries are barely

  14. Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.

    PubMed

    Thankaswamy-Kosalai, Subazini; Sen, Partho; Nookaew, Intawat

    2017-07-01

    Massive data produced due to the advent of next-generation sequencing (NGS) technology is widely used for biological researches and medical diagnosis. The crucial step in NGS analysis is read alignment or mapping which is computationally intensive and complex. The mapping bias tends to affect the downstream analysis, including detection of polymorphisms. In order to provide guidelines to the biologist for suitable selection of aligners; we have evaluated and benchmarked 5 different aligners (BWA, Bowtie2, NovoAlign, Smalt and Stampy) and their mapping bias based on characteristics of 5 microbial genomes. Two million simulated read pairs of various sizes (36bp, 50bp, 72bp, 100bp, 125bp, 150bp, 200bp, 250bp and 300bp) were aligned. Specific alignment features such as sensitivity of mapping, percentage of properly paired reads, alignment time and effect of tandem repeats on incorrectly mapped reads were evaluated. BWA showed faster alignment followed by Bowtie2 and Smalt. NovoAlign and Stampy were comparatively slower. Most of the aligners showed high sensitivity towards long reads (>100bp) mapping. On the other hand NovoAlign showed higher sensitivity towards both short reads (36bp, 50bp, 72bp) and long reads (>100bp) mappings; It also showed higher sensitivity towards mapping a complex genome like Plasmodium falciparum. The percentage of properly paired reads aligned by NovoAlign, BWA and Stampy were markedly higher. None of the aligners outperforms the others in the benchmark, however the aligners perform differently with genome characteristics. We expect that the results from this study will be useful for the end user to choose aligner, thus enhance the accuracy of read mapping. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Curriculum Mapping in Higher Education: A Case Study and Proposed Content Scope and Sequence Mapping Tool

    ERIC Educational Resources Information Center

    Arafeh, Sousan

    2016-01-01

    Best practice in curriculum development and implementation requires that discipline-based standards or requirements embody both curricular and programme scopes and sequences. Ensuring these are present and aligned in course/programme content, activities and assessments to support student success requires formalised and systematised review and…

  16. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  17. Substrate-Driven Mapping of the Degradome by Comparison of Sequence Logos

    PubMed Central

    Fuchs, Julian E.; von Grafenstein, Susanne; Huber, Roland G.; Kramer, Christian; Liedl, Klaus R.

    2013-01-01

    Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available. PMID:24244149

  18. Rapid restriction mapping of cosmids by sequence-specific triple-helix-mediated affinity capture

    SciTech Connect

    Ji, Huamin; Francisco, T.; Smith, L.M.; Guilfoyle, R.A.

    1996-01-15

    A simple and rapid strategy for restriction mapping based on sequence-specific triple-helix affinity capture (TAC) was developed. The strategy was applied to the analysis of cosmid clones by the construction of a new cosmid vector, ScosTriplex-II, containing two different triple-helix-forming sequences flanking the cloning site of the original SuperCos-1 cosmid vector. For restriction mapping, the recombinant cosmid DNA is digested with NotI restriction enzyme or with one of four intron-encoded endonucleases for excision of intact inserts followed by controlled partial digestion with a mapping enzyme used in conjunction with the corresponding methyltransferase. The partial digestion products are combined with biotinylated triple-helix-forming oligonucleotides to form a triple-helical complex. The triple-helix complexes are immobilized on streptavidin-coated magnetic beads, washed, and eluted with pH 9 buffer solution. The fragments are separated and directly sized by agarose gel electrophoresis. Bidirectional maps are obtained simultaneously by binding to the two different triple-helix-forming oligonucleotides. No probe labeling, gel drying, blotting to membranes, hybridization, or autoradiography is necessary. Also, TAC conditions that permit gel-free isolation of the terminal restriction fragments from cosmid inserts were found. These advantages afforded by ScosTriplex-II should facilitate the automation of cosmid restriction site fingerprinting needed for large-scale mapping and sequencing projects. 24 refs., 5 figs.

  19. Simple sequence repeat-based consensus linkage map of Bombyx mori.

    PubMed

    Miao, Xue-Xia; Xub, Shi-Jie; Li, Ming-Hui; Li, Mu-Wang; Huang, Jian-Hua; Dai, Fang-Yin; Marino, Susan W; Mills, David R; Zeng, Peiyu; Mita, Kazuei; Jia, Shi-Hai; Zhang, Yong; Liu, Wen-Bin; Xiang, Hui; Guo, Qiu-Hong; Xu, An-Ying; Kong, Xiang-Yin; Lin, Hong-Xuan; Shi, Yao-Zhou; Lu, Gang; Zhang, Xianglin; Huang, Wei; Yasukochi, Yuji; Sugasaki, Toshiyuki; Shimada, Toru; Nagaraju, Javaregowda; Xiang, Zhong-Huai; Wang, Sheng-Yue; Goldsmith, Marian R; Lu, Cheng; Zhao, Guo-Ping; Huang, Yong-Ping

    2005-11-08

    We established a genetic linkage map employing 518 simple sequence repeat (SSR, or microsatellite) markers for Bombyx mori (silkworm), the economically and culturally important lepidopteran insect, as part of an international genomics program. A survey of six representative silkworm strains using 2,500 (CA)n- and (CT)n-based SSR markers revealed 17-24% polymorphism, indicating a high degree of homozygosity resulting from a long history of inbreeding. Twenty-nine SSR linkage groups were established in well characterized Dazao and C108 strains based on genotyping of 189 backcross progeny derived from an F(1) male mated with a C108 female. The clustering was further focused to 28 groups by genotyping 22 backcross progeny derived from an F(1) female mated with a C108 male. This set of SSR linkage groups was further assigned to the 28 chromosomes (established linkage groups) of silkworm aided by visible mutations and cleaved amplified polymorphic sequence markers developed from previously mapped genes, cDNA sequences, and cloned random amplified polymorphic DNAs. By integrating a visible mutation p (plain, larval marking) and 29 well conserved genes of insects onto this SSR-based linkage map, a second generation consensus silkworm genetic map with a range of 7-40 markers per linkage group and a total map length of approximately 3431.9 cM was constructed and its high efficiency for genotyping and potential application for synteny studies of Lepidoptera and other insects was demonstrated.

  20. Linkage Mapping and Comparative Genomics of Red Drum (Sciaenops ocellatus) Using Next-Generation Sequencing.

    PubMed

    Hollenbeck, Christopher M; Portnoy, David S; Wetzel, Dana; Sherwood, Tracy A; Samollow, Paul B; Gold, John R

    2017-03-10

    Developments in next-generation sequencing allow genotyping of thousands of genetic markers across hundreds of individuals in a cost-effective manner. Because of this, it is now possible to rapidly produce dense genetic linkage maps for nonmodel species. Here, we report a dense genetic linkage map for red drum, a marine fish species of considerable economic importance in the southeastern United States and elsewhere. We used a prior microsatellite-based linkage map as a framework and incorporated 1794 haplotyped contigs derived from high-throughput, reduced representation DNA sequencing to produce a linkage map containing 1794 haplotyped restriction-site associated DNA (RAD) contigs, 437 anonymous microsatellites, and 44 expressed sequence-tag-linked microsatellites (EST-SSRs). A total of 274 candidate genes, identified from transcripts from a preliminary hydrocarbon exposure study, were localized to specific chromosomes, using a shared synteny approach. The linkage map will be a useful resource for red drum commercial and restoration aquaculture, and for better understanding and managing populations of red drum in the wild.

  1. Linkage Mapping and Comparative Genomics of Red Drum (Sciaenops ocellatus) Using Next-Generation Sequencing

    PubMed Central

    Hollenbeck, Christopher M.; Portnoy, David S.; Wetzel, Dana; Sherwood, Tracy A.; Samollow, Paul B.; Gold, John R.

    2017-01-01

    Developments in next-generation sequencing allow genotyping of thousands of genetic markers across hundreds of individuals in a cost-effective manner. Because of this, it is now possible to rapidly produce dense genetic linkage maps for nonmodel species. Here, we report a dense genetic linkage map for red drum, a marine fish species of considerable economic importance in the southeastern United States and elsewhere. We used a prior microsatellite-based linkage map as a framework and incorporated 1794 haplotyped contigs derived from high-throughput, reduced representation DNA sequencing to produce a linkage map containing 1794 haplotyped restriction-site associated DNA (RAD) contigs, 437 anonymous microsatellites, and 44 expressed sequence-tag-linked microsatellites (EST-SSRs). A total of 274 candidate genes, identified from transcripts from a preliminary hydrocarbon exposure study, were localized to specific chromosomes, using a shared synteny approach. The linkage map will be a useful resource for red drum commercial and restoration aquaculture, and for better understanding and managing populations of red drum in the wild. PMID:28122951

  2. How Children Aged Seven to Twelve Organize the Opening Sequence in a Map Task

    ERIC Educational Resources Information Center

    Filipi, Anna

    2016-01-01

    Using the methods of conversation analysis, the opening sequences of a map task in the interactions of sixteen children aged seven to twelve were analyzed. The analytical concerns driving the study were who started, how they started, and how children dealt with differential access to information and the identification of phases within the opening.…

  3. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    USDA-ARS?s Scientific Manuscript database

    Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker d...

  4. Mapping sequenced E.coli genes by computer: software, strategies and examples.

    PubMed Central

    Rudd, K E; Miller, W; Werner, C; Ostell, J; Tolstoshev, C; Satterfield, S G

    1991-01-01

    Methods are presented for organizing and integrating DNA sequence data, restriction maps, and genetic maps for the same organism but from a variety of sources (databases, publications, personal communications). Proper software tools are essential for successful organization of such diverse data into an ordered, cohesive body of information, and a suite of novel software to support this endeavor is described. Though these tools automate much of the task, a variety of strategies is needed to cope with recalcitrant cases. We describe such strategies and illustrate their application with numerous examples. These strategies have allowed us to order, analyze, and display over one megabase of E. coli DNA sequence information. The integration task often exposes inconsistencies in the available data, perhaps caused by strain polymorphisms or human oversight, necessitating the application of sound biological judgment. The examples illustrate both the level of expertise required of the database curator and the knowledge gained as apparent inconsistencies are resolved. The software and mapping methods are applicable to the study of any genome for which a high resolution restriction map is available. They were developed to support a weakly coordinated sequencing effort involving many laboratories, but would also be useful for highly orchestrated sequencing projects. PMID:2011534

  5. Raman-based system for DNA sequencing-mapping and other separations

    DOEpatents

    Vo-Dinh, Tuan

    1994-01-01

    DNA sequencing and mapping are performed by using a Raman spectrometer with a surface enhanced Raman scattering (SERS) substrate to enhance the Raman signal. A SERS label is attached to a DNA fragment and then analyzed with the Raman spectrometer to identify the DNA fragment according to characteristics of the Raman spectrum generated.

  6. How Children Aged Seven to Twelve Organize the Opening Sequence in a Map Task

    ERIC Educational Resources Information Center

    Filipi, Anna

    2016-01-01

    Using the methods of conversation analysis, the opening sequences of a map task in the interactions of sixteen children aged seven to twelve were analyzed. The analytical concerns driving the study were who started, how they started, and how children dealt with differential access to information and the identification of phases within the opening.…

  7. A physical map of the X chromosome of Drosophila melanogaster: Cosmid contigs and sequence tagged sites

    SciTech Connect

    Madueno, E.; Modolell, J.; Papagiannakis, G.

    1995-04-01

    A physical map of the euchromatic X chromosome of Drosophila melanogaster has been constructed by assembling contiguous arrays of cosmids that were selected by screening a library with DNA isolated from microamplified chromosomal divisions. This map, consisting of 893 cosmids, covers {approximately}64% of the euchromatic part of the chromosome. In addition, 568 sequence tagged sites (STS), in aggregate representing 120 kb of sequenced DNA, were derived from selected cosmids. Most of these STSs, spaced at an average distance of {approximately} 35 kb along the euchromatic region of the chromosome, represent DNA tags that can be used as entry points to the fruitfly genome. Furthermore, 42 genes have been placed on the physical map, either through the hybridization of specific probes to the cosmids or through the fact that they were represented among the STSs. These provide a link between the physical and the genetic maps of D. melanogaster. Nine novel genes have been tentatively identified in Drosophila on the basis of matches between STS sequences and sequences from other species. 32 refs., 3 figs., 4 tabs.

  8. Raman-based system for DNA sequencing-mapping and other separations

    DOEpatents

    Vo-Dinh, T.

    1994-04-26

    DNA sequencing and mapping are performed by using a Raman spectrometer with a surface enhanced Raman scattering (SERS) substrate to enhance the Raman signal. A SERS label is attached to a DNA fragment and then analyzed with the Raman spectrometer to identify the DNA fragment according to characteristics of the Raman spectrum generated. 11 figures.

  9. A Physical Map of the X Chromosome of Drosophila Melanogaster: Cosmid Contigs and Sequence Tagged Sites

    PubMed Central

    Madueno, E.; Papagiannakis, G.; Rimmington, G.; Saunders, RDC.; Savakis, C.; Siden-Kiamos, I.; Skavdis, G.; Spanos, L.; Trenear, J.; Adam, P.; Ashburner, M.; Benos, P.; Bolshakov, V. N.; Coulson, D.; Glover, D. M.; Herrmann, S.; Kafatos, F. C.; Louis, C.; Majerus, T.; Modolell, J.

    1995-01-01

    A physical map of the euchromatic X chromosome of Drosophila melanogaster has been constructed by assembling contiguous arrays of cosmids that were selected by screening a library with DNA isolated from microamplified chromosomal divisions. This map, consisting of 893 cosmids, covers ~64% of the euchromatic part of the chromosome. In addition, 568 sequence tagged sites (STS), in aggregate representing 120 kb of sequenced DNA, were derived from selected cosmids. Most of these STSs, spaced at an average distance of ~35 kb along the euchromatic region of the chromosome, represent DNA tags that can be used as entry points to the fruitfly genome. Furthermore, 42 genes have been placed on the physical map, either through the hybridization of specific probes to the cosmids or through the fact that they were represented among the STSs. These provide a link between the physical and the genetic maps of D. melanogaster. Nine novel genes have been tentatively identified in Drosophila on the basis of matches between STS sequences and sequences from other species. PMID:7789765

  10. A complete DNA sequence map of the ovine Major Histocompatibility Complex

    PubMed Central

    2010-01-01

    Background The ovine Major Histocompatibility Complex (MHC) harbors clusters of genes involved in overall resistance/susceptibility of an animal to infectious pathogens. However, only a limited number of ovine MHC genes have been identified and no adequate sequence information is available, as compared to those of swine and bovine. We previously constructed a BAC clone-based physical map that covers entire class I, class II and class III region of ovine MHC. Here we describe the assembling of a complete DNA sequence map for the ovine MHC by shotgun sequencing of 26 overlapping BAC clones. Results DNA shotgun sequencing generated approximately 8-fold genome equivalent data that were successfully assembled into a finished sequence map of the ovine MHC. The sequence map spans approximately 2,434,000 nucleotides in length, covering almost all of the MHC loci currently known in the sheep and cattle. Gene annotation resulted in the identification of 177 protein-coding genes/ORFs, among which 145 were not previously reported in the sheep, and 10 were ovine species specific, absent in cattle or other mammals. A comparative sequence analyses among human, sheep and cattle revealed a high conservation in the MHC structure and loci order except for the class II, which were divided into IIa and IIb subregions in the sheep and cattle, separated by a large piece of non-MHC autosome of approximately 18.5 Mb. In addition, a total of 18 non-protein-coding microRNAs were predicted in the ovine MHC region for the first time. Conclusion An ovine MHC DNA sequence map was successfully assembled by shotgun sequencing of 26 overlapping BAC clone. This makes the sheep the second ruminant species for which the complete MHC sequence information is available for evolution and functional studies, following that of the bovine. The results of the comparative analysis support a hypothesis that an inversion of the ancestral chromosome containing the MHC has shaped the MHC structures of ruminants

  11. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data.

    PubMed

    Wu, Yang; Zheng, Zhili; Visscher, Peter M; Yang, Jian

    2017-05-16

    Understanding the mapping precision of genome-wide association studies (GWAS), that is the physical distances between the top associated single-nucleotide polymorphisms (SNPs) and the causal variants, is essential to design fine-mapping experiments for complex traits and diseases. Using simulations based on whole-genome sequencing (WGS) data from 3642 unrelated individuals of European descent, we show that the association signals at rare causal variants (minor allele frequency ≤ 0.01) are very unlikely to be mapped to common variants in GWAS using either WGS data or imputed data and vice versa. We predict that at least 80% of the common variants identified from published GWAS using imputed data are within 33.5 Kbp of the causal variants, a resolution that is comparable with that using WGS data. Mapping precision at these loci will improve with increasing sample sizes of GWAS in the future. For rare variants, the mapping precision of GWAS using WGS data is extremely high, suggesting WGS is an efficient strategy to detect and fine-map rare variants simultaneously. We further assess the mapping precision by linkage disequilibrium between GWAS hits and causal variants and develop an online tool (gwasMP) to query our results with different thresholds of physical distance and/or linkage disequilibrium ( http://cnsgenomics.com/shiny/gwasMP ). Our findings provide a benchmark to inform future design and development of fine-mapping experiments and technologies to pinpoint the causal variants at GWAS loci.

  12. Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database

    PubMed Central

    2017-01-01

    Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799

  13. Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database.

    PubMed

    Niu, Ying; Zhang, Xuncai; Han, Feng

    2017-01-01

    Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack.

  14. Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps.

    PubMed

    Manrubia, Susanna; Cuesta, José A

    2017-04-01

    An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes. © 2017 The Author(s).

  15. Trajectory design strategies applied to temporary comet capture including Poincaré maps and invariant manifolds

    NASA Astrophysics Data System (ADS)

    Haapala, A. F.; Howell, K. C.

    2013-07-01

    Temporary satellite capture (TSC) of Jupiter-family comets has been a focus of investigation within the astronomy community for decades. More recently, TSC has been approached from the perspective of dynamical systems theory, within the context of the circular restricted three-body problem (CR3BP). Thus, this problem serves as a testbed for exploring techniques that support trajectory design in similar dynamical regimes. In particular, an association between the invariant manifolds of libration point orbits and the paths of comets that experience TSC has been explored. In this investigation, TSC is further examined from the perspective of transit, that is, transition through the gateways associated with the collinear libration points, in the three-body problem. Periapsis Poincaré maps, previously employed for trajectory design in several investigations, are used to deliver insight into the nature of transit trajectories for energy levels near those associated with several Jupiter-family comets. The evolution of transit trajectories with increasing energy is explored, and the existence of solutions with similar characteristics to the paths of comets P/1996 R2, 82P/Gehrels 3, and 147P/Kushida-Muramatsu is demonstrated within the context of the planar CR3BP using planar periapsis maps. During TSC, the path of comet 111P/Helin-Roman-Crockett is highly inclined with respect to Jupiter; the motion of this comet is examined relative to invariant manifolds in the spatial CR3BP. A method to display the information contained in higher-dimensional Poincaré maps is also demonstrated, and is employed to locate a trajectory possessing the same qualitative characteristics as the path of 111P/Helin-Roman-Crockett.

  16. A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping

    NASA Astrophysics Data System (ADS)

    Saule, Erik; Bozdağ, Doruk; Catalyurek, Umit V.

    A crucial step in DNA sequence analysis is mapping short sequences generated by next-generation instruments to a reference genome. In this paper, we focus on efficient online scheduling of multi-user parallel short sequence mapping queries on a multiprocessor system. With the availability of parallel execution models, the problem at hand becomes a moldable task scheduling problem where the number of processors needed to execute a task is determined by the scheduler. We propose an online scheduling algorithm to minimize the stretch of the tasks in the system. This metric provides improved fairness to small tasks compared to flow time metric and suits well to the nature of the problem. Experimental evaluation on two workload scenarios indicate that the algorithm results in significantly smaller stretch compared to a recent algorithm and it is more fair to small sized tasks.

  17. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    PubMed

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  18. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    PubMed Central

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  19. Dual-pathway multi-echo sequence for simultaneous frequency and T2 mapping

    PubMed Central

    Cheng, Cheng-Chieh; Mei, Chang-Sheng; Duryea, Jeffrey; Chung, Hsiao-Wen; Chao, Tzu-Cheng; Panych, Lawrence P.; Madore, Bruno

    2016-01-01

    Purpose To present a dual-pathway multi-echo steady state sequence and reconstruction algorithm to capture T2, T2* and field map information. Methods Typically, pulse sequences based on spin echoes are needed for T2 mapping while gradient echoes are needed for field mapping, making it difficult to jointly acquire both types of information. A dual-pathway multi-echo pulse sequence is employed here to generate T2 and field maps from the same acquired data. The approach might be used, for example, to obtain both thermometry and tissue damage information during thermal therapies, or susceptibility and T2 information from a same head scan, or to generate bonus T2 maps during a knee scan. Results Quantitative T2, T2* and field maps were generated in gel phantoms, ex vivo bovine muscle, and twelve volunteers. T2 results were validated against a spin-echo reference standard: A linear regression based on ROI analysis in phantoms provided close agreement (slope/R2 = 0.99/0.998). A pixel-wise in vivo Bland-Altman analysis of R2=1/T2 showed a bias of 0.034 Hz (about 0.3%), as averaged over four volunteers. Ex vivo results, with and without motion, suggested that tissue damage detection based on T2 rather than temperature-dose measurements might prove more robust to motion. Conclusion T2, T2* and field maps were obtained simultaneously, from the same datasets, in thermometry, susceptibility-weighted imaging and knee-imaging contexts. PMID:26923150

  20. Dual-pathway multi-echo sequence for simultaneous frequency and T2 mapping

    NASA Astrophysics Data System (ADS)

    Cheng, Cheng-Chieh; Mei, Chang-Sheng; Duryea, Jeffrey; Chung, Hsiao-Wen; Chao, Tzu-Cheng; Panych, Lawrence P.; Madore, Bruno

    2016-04-01

    Purpose: To present a dual-pathway multi-echo steady state sequence and reconstruction algorithm to capture T2, T2∗ and field map information. Methods: Typically, pulse sequences based on spin echoes are needed for T2 mapping while gradient echoes are needed for field mapping, making it difficult to jointly acquire both types of information. A dual-pathway multi-echo pulse sequence is employed here to generate T2 and field maps from the same acquired data. The approach might be used, for example, to obtain both thermometry and tissue damage information during thermal therapies, or susceptibility and T2 information from a same head scan, or to generate bonus T2 maps during a knee scan. Results: Quantitative T2, T2∗ and field maps were generated in gel phantoms, ex vivo bovine muscle, and twelve volunteers. T2 results were validated against a spin-echo reference standard: A linear regression based on ROI analysis in phantoms provided close agreement (slope/R2 = 0.99/0.998). A pixel-wise in vivo Bland-Altman analysis of R2 = 1/T2 showed a bias of 0.034 Hz (about 0.3%), as averaged over four volunteers. Ex vivo results, with and without motion, suggested that tissue damage detection based on T2 rather than temperature-dose measurements might prove more robust to motion. Conclusion: T2, T2∗ and field maps were obtained simultaneously, from the same datasets, in thermometry, susceptibility-weighted imaging and knee-imaging contexts.

  1. SNP identification from RNA sequencing and linkage map construction of rubber tree for anchoring the draft genome.

    PubMed

    Shearman, Jeremy R; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-Areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2015-01-01

    Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly.

  2. Identification and Mapping of the Edwards Stratigraphic Sequence in the State of Chihuahua Assisted by ten ArcMap Based Layers

    NASA Astrophysics Data System (ADS)

    Martinez-Pina, C.; Granados, A.; Goodell, P.

    2007-05-01

    Edwards Formation is a reef limestone that hosts one of the largest aquifers of the State of Texas. In 2004 the United States and Mexico signed an agreement intended to characterize and identify the shared binational underground resources. Texas Water Development Board Report 360 established for the Edwards Aquifer an area of more than 31,000 km2, half of which is in the State of Coahuila, Mexico (the agreement did not include the State of Chihuahua). This led to the idea that Chihuahua may also have hydrologic potential in the Edwards equivalent, where numerous large cavern systems are already recognized (Naica's Sword Cavern, and the Coyame, Nombre de Dios and Bocagrande Caverns). The objective of this study is to establish the existence, in the State of Chihuahua, of the stratigraphic sequence and geohydrologic properties such as faulting, sinkholes, and springs, within the Edwards equivalent. The Consejo de Recursos Minerales geologic map, INEGI's hydrologic study, petroleum, mining and hydrogeology studies of Chihuahua, and many others, constitute the database used. ArcMap is used to define the geologic framework and construct different thematic layers (structural, lithological, hydrological) that would aid in the identification of the stratigraphic sequence. The results show that all the Edwards Stratigraphic Sequence (ESS) exists in Chihuahua; that there are isolated areas of groundwater production in eastern Chihuahua possibly from ESS but this is not well established. Overall the ESS presents an unusual opportunity as a potentially productive aquifer in the State of Chihuahua.

  3. Geologic map of the southern Funeral Mountains including nearby groundwater discharge sites in Death Valley National Park, California and Nevada

    USGS Publications Warehouse

    Fridrich, C.J.; Thompson, R.A.; Slate, J.L.; Berry, M.E.; Machette, M.N.

    2012-01-01

    This 1:50,000-scale geologic map covers the southern part of the Funeral Mountains, and adjoining parts of four structural basins—Furnace Creek, Amargosa Valley, Opera House, and central Death Valley—in California and Nevada. It extends over three full 7.5-minute quadrangles, and parts of eleven others—an area of about 1,000 square kilometers (km2). The boundaries of this map were drawn to include all of the known proximal hydrogeologic features that may affect the flow of groundwater that discharges from springs of the Furnace Creek basin, in the west-central part of the map. These springs provide the main potable water supply for Death Valley National Park. Major hydrogeologic features shown on this map include: (1) springs of the Furnace Creek basin, (2) a large Pleistocene groundwater discharge mound in the northeastern part of the map, (3) the exposed extent of limestones and dolomites that constitute the Paleozoic carbonate aquifer, and (4) the exposed extent of the alluvial conglomerates that constitute the Funeral Formation aquifer.

  4. FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads

    PubMed Central

    Zhang, Gong; Fedyunin, Ivan; Kirchner, Sebastian; Xiao, Chuanle; Valleriani, Angelo; Ignatova, Zoya

    2012-01-01

    The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith–Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets. PMID:22379138

  5. FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads.

    PubMed

    Zhang, Gong; Fedyunin, Ivan; Kirchner, Sebastian; Xiao, Chuanle; Valleriani, Angelo; Ignatova, Zoya

    2012-06-01

    The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith-Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.

  6. Sequence analysis and mapping of the Sry gene in species of the subfamily Arvicolinae (rodentia).

    PubMed

    Acosta, M J; Marchal, J A; Romero-Fernández, I; Megías-Nogales, B; Modi, W S; Sánchez Baca, Antonio

    2010-01-01

    The rodent subfamily Arvicolinae, which contains about 125 species, presents some interesting exceptions concerning Sry, the sex determining gene in mammals. In some species multiple Sry copies have been described on the Y chromosome and in the Iberian vole, Microtus cabrerae, several Sry sequences have been cloned and mapped not only on the Y but also on the X chromosome. Here we present a comparative analysis of Sry sequences from a total of 22 species. Our study demonstrates for the first time that for most North American species, as previously reported for the European species, multiple copies of the Sry gene exist on the Y chromosome. Furthermore, we have sequenced and analyzed the full sequence of Sry from several European species, showing that the sequence and structure of the gene in this group of species present the main features described for Sry in other mammals. Finally, FISH analyses on some of these species demonstrated that all Sry sequences, despite their functional status, mapped on the euchromatic short arm of the Y chromosome.

  7. Nested Association Mapping of Stem Rust Resistance in Wheat Using Genotyping by Sequencing

    PubMed Central

    Rouse, Matthew N.; Tsilo, Toi J.; Macharia, Godwin K.; Bhavani, Sridhar; Jin, Yue; Anderson, James A.

    2016-01-01

    We combined the recently developed genotyping by sequencing (GBS) method with joint mapping (also known as nested association mapping) to dissect and understand the genetic architecture controlling stem rust resistance in wheat (Triticum aestivum). Ten stem rust resistant wheat varieties were crossed to the susceptible line LMPG-6 to generate F6 recombinant inbred lines. The recombinant inbred line populations were phenotyped in Kenya, South Africa, and St. Paul, Minnesota, USA. By joint mapping of the 10 populations, we identified 59 minor and medium-effect QTL (explained phenotypic variance range of 1% – 20%) on 20 chromosomes that contributed towards adult plant resistance to North American Pgt races as well as the highly virulent Ug99 race group. Fifteen of the 59 QTL were detected in multiple environments. No epistatic relationship was detected among the QTL. While these numerous small- to medium-effect QTL are shared among the families, the founder parents were found to have different allelic effects for the QTL. Fourteen QTL identified by joint mapping were also detected in single-population mapping. As these QTL were mapped using SNP markers with known locations on the physical chromosomes, the genomic regions identified with QTL could be explored more in depth to discover candidate genes for stem rust resistance. The use of GBS-derived de novo SNPs in mapping resistance to stem rust shown in this study could be used as a model to conduct similar marker-trait association studies in other plant species. PMID:27186883

  8. Genome Assembly Improvement and Mapping Convergently Evolved Skeletal Traits in Sticklebacks with Genotyping-by-Sequencing.

    PubMed

    Glazer, Andrew M; Killingbeck, Emily E; Mitros, Therese; Rokhsar, Daniel S; Miller, Craig T

    2015-06-03

    Marine populations of the threespine stickleback (Gasterosteus aculeatus) have repeatedly colonized and rapidly adapted to freshwater habitats, providing a powerful system to map the genetic architecture of evolved traits. Here, we developed and applied a binned genotyping-by-sequencing (GBS) method to build dense genome-wide linkage maps of sticklebacks using two large marine by freshwater F2 crosses of more than 350 fish each. The resulting linkage maps significantly improve the genome assembly by anchoring 78 new scaffolds to chromosomes, reorienting 40 scaffolds, and rearranging scaffolds in 4 locations. In the revised genome assembly, 94.6% of the assembly was anchored to a chromosome. To assess linkage map quality, we mapped quantitative trait loci (QTL) controlling lateral plate number, which mapped as expected to a 200-kb genomic region containing Ectodysplasin, as well as a chromosome 7 QTL overlapping a previously identified modifier QTL. Finally, we mapped eight QTL controlling convergently evolved reductions in gill raker length in the two crosses, which revealed that this classic adaptive trait has a surprisingly modular and nonparallel genetic basis. Copyright © 2015 Glazer et al.

  9. Mapping wide row crops with video sequences acquired from a tractor moving at treatment speed.

    PubMed

    Sainz-Costa, Nadir; Ribeiro, Angela; Burgos-Artizzu, Xavier P; Guijarro, María; Pajares, Gonzalo

    2011-01-01

    This paper presents a mapping method for wide row crop fields. The resulting map shows the crop rows and weeds present in the inter-row spacing. Because field videos are acquired with a camera mounted on top of an agricultural vehicle, a method for image sequence stabilization was needed and consequently designed and developed. The proposed stabilization method uses the centers of some crop rows in the image sequence as features to be tracked, which compensates for the lateral movement (sway) of the camera and leaves the pitch unchanged. A region of interest is selected using the tracked features, and an inverse perspective technique transforms the selected region into a bird's-eye view that is centered on the image and that enables map generation. The algorithm developed has been tested on several video sequences of different fields recorded at different times and under different lighting conditions, with good initial results. Indeed, lateral displacements of up to 66% of the inter-row spacing were suppressed through the stabilization process, and crop rows in the resulting maps appear straight.

  10. Mapping sequence differences between thimet oligopeptidase and neurolysin implicates key residues in substrate recognition.

    PubMed

    Ray, Kallol; Hines, Christina S; Rodgers, David W

    2002-09-01

    The highly homologous endopeptidases thimet oligopeptidase and neurolysin are both restricted to short peptide substrates and share many of the same cleavage sites on bioactive and synthetic peptides. They sometimes target different sites on the same peptide, however, and defining the determinants of differential recognition will help us to understand how both enzymes specifically target a wide variety of cleavage site sequences. We have mapped the positions of the 224 surface residues that differ in sequence between the two enzymes onto the surface of the neurolysin crystal structure. Although the deep active site channel accounts for about one quarter of the total surface area, only 11% of the residue differences map to this region. Four isolated sequence changes (R470/E469, R491/M490, N496/H495, and T499/R498; neurolysin residues given first) are well positioned to affect recognition of substrate peptides, and differences in cleavage site specificity can be largely rationalized on the basis of these changes. We also mapped the positions of three cysteine residues believed to be responsible for multimerization of thimet oligopeptidase, a process that inactivates the enzyme. These residues are clustered on the outside of one channel wall, where multimerization via disulfide formation is unlikely to block the substrate-binding site. Finally, we mapped the regulatory phosphorylation site in thimet oligopeptidase to a location on the outside of the molecule well away from the active site, which indicates this modification has an indirect effect on activity.

  11. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Cancer.gov

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  12. Initial sequence characterization of the rhabdoviruses of squamate reptiles, including a novel rhabdovirus from a caiman lizard (Dracaena guianensis).

    PubMed

    Wellehan, James F X; Pessier, Allan P; Archer, Linda L; Childress, April L; Jacobson, Elliott R; Tesh, Robert B

    2012-08-17

    Rhabdoviruses infect a variety of hosts, including non-avian reptiles. Consensus PCR techniques were used to obtain partial RNA-dependent RNA polymerase gene sequence from five rhabdoviruses of South American lizards; Marco, Chaco, Timbo, Sena Madureira, and a rhabdovirus from a caiman lizard (Dracaena guianensis). The caiman lizard rhabdovirus formed inclusions in erythrocytes, which may be a route for infecting hematophagous insects. This is the first information on behavior of a rhabdovirus in squamates. We also obtained sequence from two rhabdoviruses of Australian lizards, confirming previous Charleville virus sequence and finding that, unlike a previous sequence report but in agreement with serologic reports, Almpiwar virus is clearly distinct from Charleville virus. Bayesian and maximum likelihood phylogenetic analysis revealed that most known rhabdoviruses of squamates cluster in the Almpiwar subgroup. The exception is Marco virus, which is found in the Hart Park group. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. A genetic map of melon highly enriched with fruit quality QTLs and EST markers, including sugar and carotenoid metabolism genes.

    PubMed

    Harel-Beja, R; Tzuri, G; Portnoy, V; Lotan-Pompan, M; Lev, S; Cohen, S; Dai, N; Yeselson, L; Meir, A; Libhaber, S E; Avisar, E; Melame, T; van Koert, P; Verbakel, H; Hofstede, R; Volpin, H; Oliver, M; Fougedoire, A; Stalh, C; Fauve, J; Copes, B; Fei, Z; Giovannoni, J; Ori, N; Lewinsohn, E; Sherman, A; Burger, J; Tadmor, Y; Schaffer, A A; Katzir, N

    2010-08-01

    A genetic map of melon enriched for fruit traits was constructed, using a recombinant inbred (RI) population developed from a cross between representatives of the two subspecies of Cucumis melo L.: PI 414723 (subspecies agrestis) and 'Dulce' (subspecies melo). Phenotyping of 99 RI lines was conducted over three seasons in two locations in Israel and the US. The map includes 668 DNA markers (386 SSRs, 76 SNPs, six INDELs and 200 AFLPs), of which 160 were newly developed from fruit ESTs. These ESTs include candidate genes encoding for enzymes of sugar and carotenoid metabolic pathways that were cloned from melon cDNA or identified through mining of the International Cucurbit Genomics Initiative database (http://www.icugi.org/). The map covers 1,222 cM with an average of 2.672 cM between markers. In addition, a skeleton physical map was initiated and 29 melon BACs harboring fruit ESTs were localized to the 12 linkage groups of the map. Altogether, 44 fruit QTLs were identified: 25 confirming QTLs described using other populations and 19 newly described QTLs. The map includes QTLs for fruit sugar content, particularly sucrose, the major sugar affecting sweetness in melon fruit. Six QTLs interacting in an additive manner account for nearly all the difference in sugar content between the two genotypes. Three QTLs for fruit flesh color and carotenoid content were identified. Interestingly, no clear colocalization of QTLs for either sugar or carotenoid content was observed with over 40 genes encoding for enzymes involved in their metabolism. The RI population described here provides a useful resource for further genomics and metabolomics studies in melon, as well as useful markers for breeding for fruit quality.

  14. A High-Density Linkage Map for Astyanax mexicanus Using Genotyping-by-Sequencing Technology

    PubMed Central

    Carlson, Brian M.; Onusko, Samuel W.; Gross, Joshua B.

    2014-01-01

    The Mexican tetra, Astyanax mexicanus, is a unique model system consisting of cave-adapted and surface-dwelling morphotypes that diverged >1 million years (My) ago. This remarkable natural experiment has enabled powerful genetic analyses of cave adaptation. Here, we describe the application of next-generation sequencing technology to the creation of a high-density linkage map. Our map comprises more than 2200 markers populating 25 linkage groups constructed from genotypic data generated from a single genotyping-by-sequencing project. We leveraged emergent genomic and transcriptomic resources to anchor hundreds of anonymous Astyanax markers to the genome of the zebrafish (Danio rerio), the most closely related model organism to our study species. This facilitated the identification of 784 distinct connections between our linkage map and the Danio rerio genome, highlighting several regions of conserved genomic architecture between the two species despite ∼150 My of divergence. Using a Mendelian cave-associated trait as a proof-of-principle, we successfully recovered the genomic position of the albinism locus near the gene Oca2. Further, our map successfully informed the positions of unplaced Astyanax genomic scaffolds within particular linkage groups. This ability to identify the relative location, orientation, and linear order of unaligned genomic scaffolds will facilitate ongoing efforts to improve on the current early draft and assemble future versions of the Astyanax physical genome. Moreover, this improved linkage map will enable higher-resolution genetic analyses and catalyze the discovery of the genetic basis for cave-associated phenotypes. PMID:25520037

  15. A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM).

    PubMed

    Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi

    2013-11-20

    With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.

  16. A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)

    PubMed Central

    Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi

    2013-01-01

    With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources. PMID:27694768

  17. Heterozygous Mapping Strategy (HetMappS) for High Resolution Genotyping-By-Sequencing Markers: A Case Study in Grapevine

    PubMed Central

    Wang, Minghui; Londo, Jason P.; Acharya, Charlotte B.; Mitchell, Sharon E.; Sun, Qi; Reisch, Bruce; Cadle-Davidson, Lance

    2015-01-01

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low genotyping cost, but for highly heterozygous species, missing data and heterozygote undercalling complicate the creation of GBS genetic maps. To overcome these issues, we developed a publicly available, modular approach called HetMappS, which functions independently of parental genotypes and corrects for genotyping errors associated with heterozygosity. For linkage group formation, HetMappS includes both a reference-guided synteny pipeline and a reference-independent de novo pipeline. The de novo pipeline can be utilized for under-characterized or high diversity families that lack an appropriate reference. We applied both HetMappS pipelines in five half-sib F1 families involving genetically diverse Vitis spp. Starting with at least 116,466 putative SNPs per family, the HetMappS pipelines identified 10,440 to 17,267 phased pseudo-testcross (Pt) markers and generated high-confidence maps. Pt marker density exceeded crossover resolution in all cases; up to 5,560 non-redundant markers were used to generate parental maps ranging from 1,047 cM to 1,696 cM. The number of markers used was strongly correlated with family size in both de novo and synteny maps (r = 0.92 and 0.91, respectively). Comparisons between allele and tag frequencies suggested that many markers were in tandem repeats and mapped as single loci, while markers in regions of more than two repeats were removed during map curation. Both pipelines generated similar genetic maps, and genetic order was strongly correlated with the reference genome physical order in all cases. Independently created genetic maps from shared parents exhibited nearly identical results. Flower sex was mapped in three families and correctly localized to the known sex locus in all cases. The HetMappS pipeline could have wide application for genetic mapping in highly heterozygous species, and its modularity provides opportunities to

  18. Heterozygous Mapping Strategy (HetMappS) for High Resolution Genotyping-By-Sequencing Markers: A Case Study in Grapevine.

    PubMed

    Hyma, Katie E; Barba, Paola; Wang, Minghui; Londo, Jason P; Acharya, Charlotte B; Mitchell, Sharon E; Sun, Qi; Reisch, Bruce; Cadle-Davidson, Lance

    2015-01-01

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low genotyping cost, but for highly heterozygous species, missing data and heterozygote undercalling complicate the creation of GBS genetic maps. To overcome these issues, we developed a publicly available, modular approach called HetMappS, which functions independently of parental genotypes and corrects for genotyping errors associated with heterozygosity. For linkage group formation, HetMappS includes both a reference-guided synteny pipeline and a reference-independent de novo pipeline. The de novo pipeline can be utilized for under-characterized or high diversity families that lack an appropriate reference. We applied both HetMappS pipelines in five half-sib F1 families involving genetically diverse Vitis spp. Starting with at least 116,466 putative SNPs per family, the HetMappS pipelines identified 10,440 to 17,267 phased pseudo-testcross (Pt) markers and generated high-confidence maps. Pt marker density exceeded crossover resolution in all cases; up to 5,560 non-redundant markers were used to generate parental maps ranging from 1,047 cM to 1,696 cM. The number of markers used was strongly correlated with family size in both de novo and synteny maps (r = 0.92 and 0.91, respectively). Comparisons between allele and tag frequencies suggested that many markers were in tandem repeats and mapped as single loci, while markers in regions of more than two repeats were removed during map curation. Both pipelines generated similar genetic maps, and genetic order was strongly correlated with the reference genome physical order in all cases. Independently created genetic maps from shared parents exhibited nearly identical results. Flower sex was mapped in three families and correctly localized to the known sex locus in all cases. The HetMappS pipeline could have wide application for genetic mapping in highly heterozygous species, and its modularity provides opportunities to

  19. Construction of an integrated pepper map using RFLP, SSR, CAPS, AFLP, WRKY, rRAMP, and BAC end sequences.

    PubMed

    Lee, Heung-Ryul; Bae, Ik-Hyun; Park, Soung-Woo; Kim, Hyoun-Joung; Min, Woong-Ki; Han, Jung-Heon; Kim, Ki-Taek; Kim, Byung-Dong

    2009-01-31

    Map-based cloning to find genes of interest, markerassisted selection (MAS), and marker-assisted breeding (MAB) all require good genetic maps with high reproducible markers. For map construction as well as chromosome assignment, development of single copy PCR-based markers and map integration process are necessary. In this study, the 132 markers (57 STS from BAC-end sequences, 13 STS from RFLP, and 62 SSR) were newly developed as single copy type PCR-based markers. They were used together with 1830 markers previously developed in our lab to construct an integrated map with the Joinmap 3.0 program. This integrated map contained 169 SSR, 354 RFLP, 23 STS from BAC-end sequences, 6 STS from RFLP, 152 AFLP, 51 WRKY, and 99 rRAMP markers on 12 chromosomes. The integrated map contained four genetic maps of two interspecific (Capsicum annuum 'TF68' and C. chinense 'Habanero') and two intraspecific (C. annuum 'CM334' and C. annuum 'Chilsungcho') populations of peppers. This constructed integrated map consisted of 805 markers (map distance of 1858 cM) in interspecific populations and 745 markers (map distance of 1892 cM) in intraspecific populations. The used pepper STS were first developed from end sequences of BAC clones from Capsicum annuum 'CM334'. This integrated map will provide useful information for construction of future pepper genetic maps and for assignment of linkage groups to pepper chromosomes.

  20. A sequence-based map of Arabidopsis genes with mutant phenotypes.

    PubMed

    Meinke, David W; Meinke, Laura K; Showalter, Thomas C; Schissel, Anna M; Mueller, Lukas A; Tzafrir, Iris

    2003-02-01

    The classical genetic map of Arabidopsis contains 462 genes with mutant phenotypes. Chromosomal locations of these genes have been determined over the past 25 years based on recombination frequencies with visible and molecular markers. The most recent update of the classical map was published in a special genome issue of Science that dealt with Arabidopsis (D.W. Meinke, J.M. Cherry, C. Dean, S.D. Rounsley, M. Koornneef [1998] Science 282: 662-682). We present here a comprehensive list and sequence-based map of 620 cloned genes with mutant phenotypes. This map documents for the first time the exact locations of large numbers of Arabidopsis genes that give a phenotype when disrupted by mutation. Such a community-based physical map should have broad applications in Arabidopsis research and should serve as a replacement for the classical genetic map in the future. Assembling a comprehensive list of genes with a loss-of-function phenotype will also focus attention on essential genes that are not functionally redundant and ultimately contribute to the identification of the minimal gene set required to make a flowering plant.

  1. Using pressure map sequences for recognition of on bed rehabilitation exercises.

    PubMed

    Huang, Ming-Chun; Liu, Jason J; Xu, Wenyao; Alshurafa, Nabil; Zhang, Xiaoyi; Sarrafzadeh, Majid

    2014-03-01

    Physical rehabilitation is an important process for patients recovering after surgery. In this paper, we propose and develop a framework to monitor on-bed range of motion exercises that allows physical therapists to evaluate patient adherence to set exercise programs. Using a dense pressure sensitive bedsheet, a sequence of pressure maps are produced and analyzed using manifold learning techniques. We compare two methods, Local Linear Embedding and Isomap, to reduce the dimensionality of the pressure map data. Once the image sequences are converted into a low dimensional manifold, the manifolds can be compared to expected prior data for the rehabilitation exercises. Furthermore, a measure to compare the similarity of manifolds is presented along with experimental results for five on-bed rehabilitation exercises. The evaluation of this framework shows that exercise compliance can be tracked accurately according to prescribed treatment programs.

  2. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing.

    PubMed

    Crosetto, Nicola; Mitra, Abhishek; Silva, Maria Joao; Bienko, Magda; Dojer, Norbert; Wang, Qi; Karaca, Elif; Chiarle, Roberto; Skrzypczak, Magdalena; Ginalski, Krzysztof; Pasero, Philippe; Rowicka, Maga; Dikic, Ivan

    2013-04-01

    We present a genome-wide approach to map DNA double-strand breaks (DSBs) at nucleotide resolution by a method we termed BLESS (direct in situ breaks labeling, enrichment on streptavidin and next-generation sequencing). We validated and tested BLESS using human and mouse cells and different DSBs-inducing agents and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs and complex genome-wide DSB landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and we identified >2,000 nonuniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions, with a specificity and resolution unachievable by current techniques.

  3. Molecular cytogenetic mapping of Cucumis sativus and C. melo using highly repetitive DNA sequences.

    PubMed

    Koo, Dal-Hoe; Nam, Young-Woo; Choi, Doil; Bang, Jae-Wook; de Jong, Hans; Hur, Yoonkang

    2010-04-01

    Chromosomes often serve as one of the most important molecular aspects of studying the evolution of species. Indeed, most of the crucial mutations that led to differentiation of species during the evolution have occurred at the chromosomal level. Furthermore, the analysis of pachytene chromosomes appears to be an invaluable tool for the study of evolution due to its effectiveness in chromosome identification and precise physical gene mapping. By applying fluorescence in situ hybridization of 45S rDNA and CsCent1 probes to cucumber pachytene chromosomes, here, we demonstrate that cucumber chromosomes 1 and 2 may have evolved from fusions of ancestral karyotype with chromosome number n = 12. This conclusion is further supported by the centromeric sequence similarity between cucumber and melon, which suggests that these sequences evolved from a common ancestor. It may be after or during speciation that these sequences were specifically amplified, after which they diverged and specific sequence variants were homogenized. Additionally, a structural change on the centromeric region of cucumber chromosome 4 was revealed by fiber-FISH using the mitochondrial-related repetitive sequences, BAC-E38 and CsCent1. These showed the former sequences being integrated into the latter in multiple regions. The data presented here are useful resources for comparative genomics and cytogenetics of Cucumis and, in particular, the ongoing genome sequencing project of cucumber.

  4. A high-density SNP Map of sunflower derived from RAD-sequencing facilitating fine-mapping of the rust resistance gene R12.

    PubMed

    Talukder, Zahirul I; Gong, Li; Hulke, Brent S; Pegadaraju, Venkatramana; Song, Qijian; Schultz, Quentin; Qi, Lili

    2014-01-01

    A high-resolution genetic map of sunflower was constructed by integrating SNP data from three F2 mapping populations (HA 89/RHA 464, B-line/RHA 464, and CR 29/RHA 468). The consensus map spanned a total length of 1443.84 cM, and consisted of 5,019 SNP markers derived from RAD tag sequencing and 118 publicly available SSR markers distributed in 17 linkage groups, corresponding to the haploid chromosome number of sunflower. The maximum interval between markers in the consensus map is 12.37 cM and the average distance is 0.28 cM between adjacent markers. Despite a few short-distance inversions in marker order, the consensus map showed high levels of collinearity among individual maps with an average Spearman's rank correlation coefficient of 0.972 across the genome. The order of the SSR markers on the consensus map was also in agreement with the order of the individual map and with previously published sunflower maps. Three individual and one consensus maps revealed the uneven distribution of markers across the genome. Additionally, we performed fine mapping and marker validation of the rust resistance gene R12, providing closely linked SNP markers for marker-assisted selection of this gene in sunflower breeding programs. This high resolution consensus map will serve as a valuable tool to the sunflower community for studying marker-trait association of important agronomic traits, marker assisted breeding, map-based gene cloning, and comparative mapping.

  5. Genetic Mapping and Exome Sequencing Identify Variants Associated with Five Novel Diseases

    PubMed Central

    Puffenberger, Erik G.; Jinks, Robert N.; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A.; Achilly, Nathan P.; Cassidy, Ryan P.; Fiorentini, Christopher J.; Heiken, Kory F.; Lawrence, Johnny J.; Mahoney, Molly H.; Miller, Christopher J.; Nair, Devika T.; Politi, Kristin A.; Worcester, Kimberly N.; Setton, Roni A.; DiPiazza, Rosa; Sherman, Eric A.; Eastman, James T.; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L.; Gabriel, Stacey; Morton, D. Holmes; Strauss, Kevin A.

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data. PMID:22279524

  6. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.

  7. Genetic mapping and exome sequencing identify variants associated with five novel diseases.

    PubMed

    Puffenberger, Erik G; Jinks, Robert N; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A; Achilly, Nathan P; Cassidy, Ryan P; Fiorentini, Christopher J; Heiken, Kory F; Lawrence, Johnny J; Mahoney, Molly H; Miller, Christopher J; Nair, Devika T; Politi, Kristin A; Worcester, Kimberly N; Setton, Roni A; Dipiazza, Rosa; Sherman, Eric A; Eastman, James T; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L; Gabriel, Stacey; Morton, D Holmes; Strauss, Kevin A

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data.

  8. Quantitative Trait Locus Mapping and Candidate Gene Analysis for Plant Architecture Traits Using Whole Genome Re-Sequencing in Rice

    PubMed Central

    Lim, Jung-Hyun; Yang, Hyun-Jung; Jung, Ki-Hong; Yoo, Soo-Cheul; Paek, Nam-Chon

    2014-01-01

    Plant breeders have focused on improving plant architecture as an effective means to increase crop yield. Here, we identify the main-effect quantitative trait loci (QTLs) for plant shape-related traits in rice (Oryza sativa) and find candidate genes by applying whole genome re-sequencing of two parental cultivars using next-generation sequencing. To identify QTLs influencing plant shape, we analyzed six traits: plant height, tiller number, panicle diameter, panicle length, flag leaf length, and flag leaf width. We performed QTL analysis with 178 F7 recombinant in-bred lines (RILs) from a cross of japonica rice line ‘SNUSG1’ and indica rice line ‘Milyang23’. Using 131 molecular markers, including 28 insertion/deletion markers, we identified 11 main- and 16 minor-effect QTLs for the six traits with a threshold LOD value > 2.8. Our sequence analysis identified fifty-four candidate genes for the main-effect QTLs. By further comparison of coding sequences and meta-expression profiles between japonica and indica rice varieties, we finally chose 15 strong candidate genes for the 11 main-effect QTLs. Our study shows that the whole-genome sequence data substantially enhanced the efficiency of polymorphic marker development for QTL fine-mapping and the identification of possible candidate genes. This yields useful genetic resources for breeding high-yielding rice cultivars with improved plant architecture. PMID:24599000

  9. Genomic shotgun array: a procedure linking large-scale DNA sequencing with regional transcript mapping.

    PubMed

    Li, Ling-Hui; Li, Jian-Chiuan; Lin, Yung-Feng; Lin, Chung-Yen; Chen, Chung-Yung; Tsai, Shih-Feng

    2004-02-11

    To facilitate transcript mapping and to investigate alterations in genomic structure and gene expression in a defined genomic target, we developed a novel microarray-based method to detect transcriptional activity of the human chromosome 4q22-24 region. Loss of heterozygosity of human 4q22-24 is frequently observed in hepatocellular carcinoma (HCC). One hundred and eighteen well-characterized genes have been identified from this region. We took previously sequenced shotgun subclones as templates to amplify overlapping sequences for the genomic segment and constructed a chromosome-region-specific microarray. Using genomic DNA fragments as probes, we detected transcriptional activity from within this region among five different tissues. The hybridization results indicate that there are new transcripts that have not yet been identified by other methods. The existence of new transcripts encoded by genes in this region was confirmed by PCR cloning or cDNA library screening. The procedure reported here allows coupling of shotgun sequencing with transcript mapping and, potentially, detailed analysis of gene expression and chromosomal copy of the genomic sequence for the putative HCC tumor suppressor gene(s) in the 4q candidate region.

  10. SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages

    PubMed Central

    Bianchetti, Laurent; Wu, Yan; Guerin, Eric; Plewniak, Frédéric; Poch, Olivier

    2007-01-01

    SAGE (Serial Analysis of Gene Expression) experiments generate short nucleotide sequences called ‘tags’ which are assumed to map unambiguously to their original transcripts (1 tag to 1 transcript mapping). Nevertheless, many tags are generated that do not map to any transcript or map to multiple transcripts. Current bioinformatics resources, such as SAGEmap and TAGmapper, have focused on reducing the number of unmapped tags. Here, we describe SAGETTARIUS, a new high-throughput program that performs successive precise Nla3 and Sau3A tag to transcript mapping, based on specifically designed Virtual Tag (VT) libraries. First, SAGETTARIUS decreases the number of tags mapped to multiple transcripts. Among the various mapping resources compared, SAGETTARIUS performed the best in this respect by decreasing up to 11% the number of multiply mapped tags. Second, SAGETTARIUS allows the establishment of a guideline for SAGE experiment sequencing efforts through efficient mapping of the CRT (Cytoplasmic Ribosomal protein Transcripts)-specific tags. Using all publicly available human and mouse Nla3 SAGE experiments, we show that sequencing 100 000 tags is sufficient to map almost all CRT-specific tags and that four sequencing stages can be identified when carrying out a human or mouse SAGE project. SAGETTARIUS is web interfaced and freely accessible to academic users. PMID:17884916

  11. SAGETTARIUS: a program to reduce the number of tags mapped to multiple transcripts and to plan SAGE sequencing stages.

    PubMed

    Bianchetti, Laurent; Wu, Yan; Guerin, Eric; Plewniak, Frédéric; Poch, Olivier

    2007-01-01

    SAGE (Serial Analysis of Gene Expression) experiments generate short nucleotide sequences called 'tags' which are assumed to map unambiguously to their original transcripts (1 tag to 1 transcript mapping). Nevertheless, many tags are generated that do not map to any transcript or map to multiple transcripts. Current bioinformatics resources, such as SAGEmap and TAGmapper, have focused on reducing the number of unmapped tags. Here, we describe SAGETTARIUS, a new high-throughput program that performs successive precise Nla3 and Sau3A tag to transcript mapping, based on specifically designed Virtual Tag (VT) libraries. First, SAGETTARIUS decreases the number of tags mapped to multiple transcripts. Among the various mapping resources compared, SAGETTARIUS performed the best in this respect by decreasing up to 11% the number of multiply mapped tags. Second, SAGETTARIUS allows the establishment of a guideline for SAGE experiment sequencing efforts through efficient mapping of the CRT (Cytoplasmic Ribosomal protein Transcripts)-specific tags. Using all publicly available human and mouse Nla3 SAGE experiments, we show that sequencing 100,000 tags is sufficient to map almost all CRT-specific tags and that four sequencing stages can be identified when carrying out a human or mouse SAGE project. SAGETTARIUS is web interfaced and freely accessible to academic users.

  12. Improving Transmission Efficiency of Large Sequence Alignment/Map (SAM) Files

    PubMed Central

    Sakib, Muhammad Nazmus; Tang, Jijun; Zheng, W. Jim; Huang, Chin-Tser

    2011-01-01

    Research in bioinformatics primarily involves collection and analysis of a large volume of genomic data. Naturally, it demands efficient storage and transfer of this huge amount of data. In recent years, some research has been done to find efficient compression algorithms to reduce the size of various sequencing data. One way to improve the transmission time of large files is to apply a maximum lossless compression on them. In this paper, we present SAMZIP, a specialized encoding scheme, for sequence alignment data in SAM (Sequence Alignment/Map) format, which improves the compression ratio of existing compression tools available. In order to achieve this, we exploit the prior knowledge of the file format and specifications. Our experimental results show that our encoding scheme improves compression ratio, thereby reducing overall transmission time significantly. PMID:22164252

  13. Improving transmission efficiency of large sequence alignment/map (SAM) files.

    PubMed

    Sakib, Muhammad Nazmus; Tang, Jijun; Zheng, W Jim; Huang, Chin-Tser

    2011-01-01

    Research in bioinformatics primarily involves collection and analysis of a large volume of genomic data. Naturally, it demands efficient storage and transfer of this huge amount of data. In recent years, some research has been done to find efficient compression algorithms to reduce the size of various sequencing data. One way to improve the transmission time of large files is to apply a maximum lossless compression on them. In this paper, we present SAMZIP, a specialized encoding scheme, for sequence alignment data in SAM (Sequence Alignment/Map) format, which improves the compression ratio of existing compression tools available. In order to achieve this, we exploit the prior knowledge of the file format and specifications. Our experimental results show that our encoding scheme improves compression ratio, thereby reducing overall transmission time significantly.

  14. Transcriptome sequencing to produce SNP-based genetic maps of onion.

    PubMed

    Duangjit, J; Bohanec, B; Chan, A P; Town, C D; Havey, M J

    2013-08-01

    We used the Roche-454 platform to sequence from normalized cDNA libraries from each of two inbred lines of onion (OH1 and 5225). From approximately 1.6 million reads from each inbred, 27,065 and 33,254 cDNA contigs were assembled from OH1 and 5225, respectively. In total, 3,364 well supported single nucleotide polymorphisms (SNPs) on 1,716 cDNA contigs were identified between these two inbreds. One SNP on each of 1,256 contigs was randomly selected for genotyping. OH1 and 5225 were crossed and 182 gynogenic haploids extracted from hybrid plants were used for SNP mapping. A total of 597 SNPs segregated in the OH1 × 5225 haploid family and a genetic map of ten linkage groups (LOD ≥8) was constructed. Three hundred and thirty-nine of the newly identified SNPs were also mapped using a previously developed segregating family from BYG15-23 × AC43, and 223 common SNPs were used to join the two maps. Because these new SNPs are in expressed regions of the genome and commonly occur among onion germplasms, they will be useful for genetic mapping, gene tagging, marker-aided selection, quality control of seed lots, and fingerprinting of cultivars.

  15. Mapping the sequence of brain events in response to disgusting food.

    PubMed

    Pujol, Jesus; Blanco-Hinojo, Laura; Coronas, Ramón; Esteba-Castillo, Susanna; Rigla, Mercedes; Martínez-Vilavella, Gerard; Deus, Joan; Novell, Ramón; Caixàs, Assumpta

    2017-10-11

    Warning signals indicating that a food is potentially dangerous may evoke a response that is not limited to the feeling of disgust. We investigated the sequence of brain events in response to visual representations of disgusting food using a dynamic image analysis. Functional MRI was acquired in 30 healthy subjects while they were watching a movie showing disgusting food scenes interspersed with the scenes of appetizing food. Imaging analysis included the identification of the global brain response and the generation of frame-by-frame activation maps at the temporal resolution of 2 s. Robust activations were identified in brain structures conventionally associated with the experience of disgust, but our analysis also captured a variety of other brain elements showing distinct temporal evolutions. The earliest events included transient changes in the orbitofrontal cortex and visual areas, followed by a more durable engagement of the periaqueductal gray, a pivotal element in the mediation of responses to threat. A subsequent core phase was characterized by the activation of subcortical and cortical structures directly concerned not only with the emotional dimension of disgust (e.g., amygdala-hippocampus, insula), but also with the regulation of food intake (e.g., hypothalamus). In a later phase, neural excitement extended to broad cortical areas, the thalamus and cerebellum, and finally to the default mode network that signaled the progressive termination of the evoked response. The response to disgusting food representations is not limited to the emotional domain of disgust, and may sequentially involve a variety of broadly distributed brain networks. Hum Brain Mapp, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  16. Multiparametric magnetic resonance imaging including oxygenation mapping of experimental ischaemic stroke.

    PubMed

    Boisserand, Ligia Simões Braga; Lemasson, Benjamin; Hirschler, Lydiane; Moisan, Anaïck; Hubert, Violaine; Barbier, Emmanuel L; Rémy, Chantal; Detante, Olivier

    2017-06-01

    Recent advances in MRI methodology, such as microvascular and brain oxygenation (StO2) imaging, may prove useful in obtaining information about the severity of the acute stroke. We assessed the potential of StO2 to detect the ischaemic core in the acute phase compared to apparent diffusion coefficient and to predict the final necrosis. Sprague-Dawley rats (n = 38) were imaged during acute stroke (D0) and 21 days after (D21). A multiparametric MRI protocol was performed at 4.7T to characterize brain damage within three region of interest: 'LesionD0' (diffusion), 'Mismatch' representing penumbra (perfusion/diffusion) and 'Hypoxia' (voxels < 40% of StO2 within the region of interest LesionD0). Voxel-based analysis of stroke revealed heterogeneity of the region of interest LesionD0, which included voxels with different degrees of oxygenation decrease. This finding was supported by a dramatic decrease of vascular and perfusion parameters within the region of interest hypoxia. This zone presented the lowest values of almost all parameters analysed, indicating a higher severity. Our study demonstrates the potential of StO2 magnetic resonance imaging to more accurately detect the ischaemic core without the inclusion of any reversible ischaemic damage. Our follow-up study indicates that apparent diffusion coefficient imaging overestimated the final necrosis while StO2 imaging did not.

  17. ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

    PubMed Central

    2011-01-01

    Background The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin. PMID:21635747

  18. Description of durum wheat linkage map and comparative sequence analysis of wheat mapped DArT markers with rice and Brachypodium genomes

    PubMed Central

    2013-01-01

    Background The importance of wheat to the world economy, together with progresses in high-throughput next-generation DNA sequencing, have accelerated initiatives of genetic research for wheat improvement. The availability of high density linkage maps is crucial to identify genotype-phenotype associations, but also for anchoring BAC contigs to genetic maps, a strategy followed for sequencing the wheat genome. Results Here we report a genetic linkage map in a durum wheat segregating population and the study of mapped DArT markers. The linkage map consists of 126 gSSR, 31 EST-SSR and 351 DArT markers distributed in 24 linkage groups for a total length of 1,272 cM. Through bioinformatic approaches we have analysed 327 DArT clones to reveal their redundancy, syntenic and functional aspects. The DNA sequences of 174 DArT markers were assembled into a non-redundant set of 60 marker clusters. This explained the generation of clusters in very small chromosome regions across genomes. Of these DArT markers, 61 showed highly significant (Expectation < E-10) BLAST similarity to gene sequences in public databases of model species such as Brachypodium and rice. Based on sequence alignments, the analysis revealed a mosaic gene conservation, with 54 and 72 genes present in rice and Brachypodium species, respectively. Conclusions In the present manuscript we provide a detailed DArT markers characterization and the basis for future efforts in durum wheat map comparing. PMID:24304553

  19. Mapping and Sequencing of the Canine NRAMP1 Gene and Identification of Mutations in Leishmaniasis-Susceptible Dogs

    PubMed Central

    Altet, Laura; Francino, Olga; Solano-Gallego, Laia; Renier, Corinne; Sánchez, Armand

    2002-01-01

    The NRAMP1 gene (Slc11a1) encodes an ion transporter protein involved in the control of intraphagosomal replication of parasites and in macrophage activation. It has been described in mice as the determinant of natural resistance or susceptibility to infection with antigenically unrelated pathogens, including Leishmania. Our aims were to sequence and map the canine Slc11a1 gene and to identify mutations that may be associated with resistance or susceptibility to Leishmania infection. The canine Slc11a1 gene has been mapped to dog chromosome CFA37 and covers 9 kb, including a 700-bp promoter region, 15 exons, and a polymorphic microsatellite in intron 1. It encodes a 547-amino-acid protein that has over 87% identity with the Slc11a1 proteins of different mammalian species. A case-control study with 33 resistant and 84 susceptible dogs showed an association between allele 145 of the microsatellite and susceptible dogs. Sequence variant analysis was performed by direct sequencing of the cDNA and the promoter region of four unrelated beagles experimentally infected with Leishmania infantum to search for possible functional mutations. Two of the dogs were classified as susceptible and the other two were classified as resistant based on their immune responses. Two important mutations were found in susceptible dogs: a G-rich region in the promoter that was common to both animals and a complete deletion of exon 11, which encodes the consensus transport motif of the protein, in the unique susceptible dog that needed an additional and prolonged treatment to avoid continuous relapses. A study with a larger dog population would be required to prove the association of these sequence variants with disease susceptibility. PMID:12010961

  20. Mapping and sequencing of the canine NRAMP1 gene and identification of mutations in leishmaniasis-susceptible dogs.

    PubMed

    Altet, Laura; Francino, Olga; Solano-Gallego, Laia; Renier, Corinne; Sánchez, Armand

    2002-06-01

    The NRAMP1 gene (Slc11a1) encodes an ion transporter protein involved in the control of intraphagosomal replication of parasites and in macrophage activation. It has been described in mice as the determinant of natural resistance or susceptibility to infection with antigenically unrelated pathogens, including Leishmania. Our aims were to sequence and map the canine Slc11a1 gene and to identify mutations that may be associated with resistance or susceptibility to Leishmania infection. The canine Slc11a1 gene has been mapped to dog chromosome CFA37 and covers 9 kb, including a 700-bp promoter region, 15 exons, and a polymorphic microsatellite in intron 1. It encodes a 547-amino-acid protein that has over 87% identity with the Slc11a1 proteins of different mammalian species. A case-control study with 33 resistant and 84 susceptible dogs showed an association between allele 145 of the microsatellite and susceptible dogs. Sequence variant analysis was performed by direct sequencing of the cDNA and the promoter region of four unrelated beagles experimentally infected with Leishmania infantum to search for possible functional mutations. Two of the dogs were classified as susceptible and the other two were classified as resistant based on their immune responses. Two important mutations were found in susceptible dogs: a G-rich region in the promoter that was common to both animals and a complete deletion of exon 11, which encodes the consensus transport motif of the protein, in the unique susceptible dog that needed an additional and prolonged treatment to avoid continuous relapses. A study with a larger dog population would be required to prove the association of these sequence variants with disease susceptibility.

  1. Genetic Mapping and QTL Analysis of Growth-Related Traits in Pinctada fucata Using Restriction-Site Associated DNA Sequencing

    PubMed Central

    Li, Yaoguo; He, Maoxian

    2014-01-01

    The pearl oyster, Pinctada fucata (P. fucata), is one of the marine bivalves that is predominantly cultured for pearl production. To obtain more genetic information for breeding purposes, we constructed a high-density linkage map of P. fucata and identified quantitative trait loci (QTL) for growth-related traits. One F1 family, which included the two parents, 48 largest progeny and 50 smallest progeny, was sampled to construct a linkage map using restriction site-associated DNA sequencing (RAD-Seq). With low coverage data, 1956.53 million clean reads and 86,342 candidate RAD loci were generated. A total of 1373 segregating SNPs were used to construct a sex-average linkage map. This spanned 1091.81 centimorgans (cM), with 14 linkage groups and an average marker interval of 1.41 cM. The genetic linkage map coverage, Coa, was 97.24%. Thirty-nine QTL-peak loci, for seven growth-related traits, were identified using the single-marker analysis, nonparametric mapping Kruskal-Wallis (KW) test. Parameters included three for shell height, six for shell length, five for shell width, four for hinge length, 11 for total weight, eight for soft tissue weight and two for shell weight. The QTL peak loci for shell height, shell length and shell weight were all located in linkage group 6. The genotype frequencies of most QTL peak loci showed significant differences between the large subpopulation and the small subpopulation (P<0.05). These results highlight the effectiveness of RAD-Seq as a tool for generation of QTL-targeted and genome-wide marker data in the non-model animal, P. fucata, and its possible utility in marker-assisted selection (MAS). PMID:25369421

  2. Sequencing Spo11 Oligonucleotides for Mapping Meiotic DNA Double-Strand Breaks in Yeast.

    PubMed

    Lam, Isabel; Mohibullah, Neeman; Keeney, Scott

    2017-01-01

    Meiosis is a specialized form of cell division resulting in reproductive cells with a reduced, usually haploid, genome complement. A key step after premeiotic DNA replication is the occurrence of homologous recombination at multiple places throughout the genome, initiated with the formation of DNA double-strand breaks (DSBs) catalyzed by the topoisomerase-like protein Spo11. DSBs are distributed non-randomly in genomes, and understanding the mechanisms that shape this distribution is important for understanding how meiotic recombination influences heredity and genome evolution. Several methods exist for mapping where Spo11 acts. Of these, sequencing of Spo11-associated oligonucleotides (Spo11 oligos) is the most precise, specifying the locations of DNA breaks to the base pair. In this chapter we detail the steps involved in Spo11-oligo mapping in the SK1 strain of budding yeast Saccharomyces cerevisiae, from harvesting cells of highly synchronous meiotic cultures, through preparation of sequencing libraries, to the mapping pipeline used for processing the data.

  3. A third-generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map.

    PubMed

    Solignac, Michel; Mougel, Florence; Vautrin, Dominique; Monnerot, Monique; Cornuet, Jean-Marie

    2007-01-01

    The honey bee is a key model for social behavior and this feature led to the selection of the species for genome sequencing. A genetic map is a necessary companion to the sequence. In addition, because there was originally no physical map for the honey bee genome project, a meiotic map was the only resource for organizing the sequence assembly on the chromosomes. We present the genetic (meiotic) map here and describe the main features that emerged from comparison with the sequence-based physical map. The genetic map of the honey bee is saturated and the chromosomes are oriented from the centromeric to the telomeric regions. The map is based on 2,008 markers and is about 40 Morgans (M) long, resulting in a marker density of one every 2.05 centiMorgans (cM). For the 186 megabases (Mb) of the genome mapped and assembled, this corresponds to a very high average recombination rate of 22.04 cM/Mb. Honey bee meiosis shows a relatively homogeneous recombination rate along and across chromosomes, as well as within and between individuals. Interference is higher than inferred from the Kosambi function of distance. In addition, numerous recombination hotspots are dispersed over the genome. The very large genetic length of the honey bee genome, its small physical size and an almost complete genome sequence with a relatively low number of genes suggest a very promising future for association mapping in the honey bee, particularly as the existence of haploid males allows easy bulk segregant analysis.

  4. Radiation hybrid mapping and comparative sequence analysis of bovine RIG-I and MAVS genes.

    PubMed

    Cargill, Edward J; Paetzold, Li; Womack, James E

    2006-08-01

    Retinoic acid inducible gene I (RIG-I) and mitochondrial antiviral signaling (MAVS) proteins have recently been found to operate in a pathway for the detection and subsequent elimination of replicating viral genomes. Because of this innate immunity role, RIG-I and MAVS are candidates for studies of disease resistance. The objectives of this work were to (1) radiation hybrid (RH) map bovine RIG-I and MAVS and (2) perform comparative sequence analysis of partial genomic sequence from each gene. Using a bovine 5000(rad) RH panel, RIG-I was localized to BTA08 (LOD > 12) and MAVS was localized to BTA13 (LOD > 12). RIG-I exon 14 and partial MAVS exon five were sequenced in nine breeds and compared with available sequence from the Bovine Genome Project. RIG-I exon 14 and partial MAYS exon five were conserved in all samples examined. One T-A transversion SNP was found in intronic sequence downstream of RIG-I exon 14.

  5. Sequencing and mapping hemoglobin gene clusters in the australian model dasyurid marsupial sminthopsis macroura

    SciTech Connect

    De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.

    2004-07-26

    Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.

  6. Chromosomal mapping, sequence and transcription analysis of the porcine fertilin beta gene (ADAM2).

    PubMed

    Day, A E; Quilter, C R; Sargent, C A; Mileham, A J

    2003-10-01

    Fertilin beta (ADAM2) forms a part of the heterodimeric surface protein fertilin, found on the plasma membrane of mammalian sperm, and has been implicated in the process of sperm-egg fusion. Analysis of cDNA products obtained from adult porcine testis mRNA has presented a sequence corresponding to 2620 bp of the ADAM2 gene. This sequence contained an open reading frame encoding a 735-amino acid protein and homologous to ADAM2 genes known in other mammalian species. Polymerase chain reaction (PCR) analysis of genomic DNA showed that the 2620 bp of cDNA sequence comprises at least 21 exons and spans approximately 76 kb of genomic DNA, with its size and structure being relatively conserved between mouse, human and pig. Fluorescence in situ hybridization was used to map ADAM2 to chromosome 15 of the pig, using a bacterial artificial chromosome clone from the PigE BAC library. This finding is consistent with comparative mapping experiments performed between pig and human chromosomes. Analysis of nine mRNA samples, by reverse transcriptase-PCR, from different porcine tissues has also suggested that expression of ADAM2 is limited to the testis, a finding that is consistent with other mammalian species.

  7. A 1463 gene cattle-human comparative map with anchor points defined by human genome sequence coordinates.

    PubMed

    Everts-van der Wind, Annelie; Kata, Srinivas R; Band, Mark R; Rebeiz, Mark; Larkin, Denis M; Everts, Robin E; Green, Cheryl A; Liu, Lei; Natarajan, Shreedhar; Goldammer, Tom; Lee, Jun Heon; McKay, Stephanie; Womack, James E; Lewin, Harris A

    2004-07-01

    A second-generation 5000 rad radiation hybrid (RH) map of the cattle genome was constructed primarily using cattle ESTs that were targeted to gaps in the existing cattle-human comparative map, as well as to sparsely populated map intervals. A total of 870 targeted markers were added, bringing the number of markers mapped on the RH(5000) panel to 1913. Of these, 1463 have significant BLASTN hits (E < e(-5)) against the human genome sequence. A cattle-human comparative map was created using human genome sequence coordinates of the paired orthologs. One-hundred and ninety-five conserved segments (defined by two or more genes) were identified between the cattle and human genomes, of which 31 are newly discovered and 34 were extended singletons on the first-generation map. The new map represents an improvement of 20% genome-wide comparative coverage compared with the first-generation map. Analysis of gene content within human genome regions where there are gaps in the comparative map revealed gaps with both significantly greater and significantly lower gene content. The new, more detailed cattle-human comparative map provides an improved resource for the analysis of mammalian chromosome evolution, the identification of candidate genes for economically important traits, and for proper alignment of sequence contigs on cattle chromosomes. Copyright 2004 Cold Spring Harbor Laboratory Press ISSN

  8. Mitochondrial DNA sequence evolution and phylogeny of the Atlantic Alcidae, including the extinct great auk (Pinguinus impennis).

    PubMed

    Moum, Truls; Arnason, Ulfur; Arnason, Einar

    2002-09-01

    The Atlantic auk assemblage includes four extant species, razorbill (Alca torda), dovekie (Alle alle), common murre (Uria aalge), and thick-billed murre (U. lomvia), and one recently extinct species, the flightless great auk (Pinguinus impennis). To determine the phylogenetic relationships among the species, a contiguous 4.2-kb region of the mitochondrial genome from the extant species was amplified using PCR. This region included one ribosomal RNA gene, four transfer RNA genes, two protein-coding genes, the control region, and intergenic spacers. Sets of PCR primers for amplifying the same region from great auk were designed from sequences of the extant species. The authenticity of the great auk sequence was ascertained by alternative amplifications, cloning, and separate analyses in an independent laboratory. Phylogenetic analyses of the entire assemblage, made possible by the great auk sequence, fully resolved the phylogenetic relationships and split it into two primary lineages, Uria versus Alle, Alca, and Pinguinus. A sister group relationship was identified between Alca and Pinguinus to the exclusion of ALLE: Phylogenetically, the flightless great auk originated late relative to other divergences within the assemblage. This suggests that three highly divergent species in terms of adaptive specializations, Alca, Alle, and Pinguinus, evolved from a single lineage in the Atlantic Ocean, in a process similar to the initial adaptive radiation of alcids in the Pacific Ocean.

  9. Differentiation of strains in Mycobacterium tuberculosis complex by DNA sequence polymorphisms, including rapid identification of M. bovis BCG.

    PubMed Central

    Frothingham, R

    1995-01-01

    The Mycobacterium tuberculosis complex includes M. tuberculosis, M. bovis, M. microti, and M. africanum. Seven strains of the M. tuberculosis complex were sequenced in a region of about 300 bp which contains multiple 15-bp tandem repeats and which is part of a 1,551-bp open reading frame. Four distinct sequences were obtained, each defining a sequevar. A sequevar includes the strain or strains with a given sequence. The type strain M. tuberculosis TMC 102 (H37Rv) was designated sequevar MED-G. When compared to MED-G, sequevar LONG had an insertion of one 15-bp tandem repeat and sequevar SHORT had a deletion of one tandem repeat. Sequevar MED-C had a G-->C substitution, coding for the conservative change Ser-->Thr. BanI cuts only sequevar MED-C at the site of the substitution. PCR-restriction enzyme analysis was used to determine the sequevars of 92 M. tuberculosis complex strains. All 23 M. bovis BCG strains belonged to sequevar MED-C. The M. africanum type strain was sequevar SHORT. The remaining 68 strains of M. tuberculosis, M. bovis (not BCG), and M. microti were sequevars LONG (3 strains) or MED-G (65 strains). PCR-restriction enzyme analysis was applied to reference strains and clinical isolates with a worldwide distribution. This method provides rapid, sensitive, and specific identification of the important vaccine strain M. bovis BCG. PMID:7790448

  10. Construction of genotyping-by-sequencing based high-density genetic maps and QTL mapping for fusarium wilt resistance in pigeonpea.

    PubMed

    Saxena, Rachit K; Singh, Vikas K; Kale, Sandip M; Tathineni, Revathi; Parupalli, Swathi; Kumar, Vinay; Garg, Vanika; Das, Roma R; Sharma, Mamta; Yamini, K N; Muniswamy, S; Ghanta, Anuradha; Rathore, Abhishek; Kumar, C V Sameer; Saxena, K B; Kishor, P B Kavi; Varshney, Rajeev K

    2017-05-15

    Fusarium wilt (FW) is one of the most important biotic stresses causing yield losses in pigeonpea. Genetic improvement of pigeonpea through genomics-assisted breeding (GAB) is an economically feasible option for the development of high yielding FW resistant genotypes. In this context, two recombinant inbred lines (RILs) (ICPB 2049 × ICPL 99050 designated as PRIL_A and ICPL 20096 × ICPL 332 designated as PRIL_B) and one F2 (ICPL 85063 × ICPL 87119) populations were used for the development of high density genetic maps. Genotyping-by-sequencing (GBS) approach was used to identify and genotype SNPs in three mapping populations. As a result, three high density genetic maps with 964, 1101 and 557 SNPs with an average marker distance of 1.16, 0.84 and 2.60 cM were developed in PRIL_A, PRIL_B and F2, respectively. Based on the multi-location and multi-year phenotypic data of FW resistance a total of 14 quantitative trait loci (QTLs) including six major QTLs explaining >10% phenotypic variance explained (PVE) were identified. Comparative analysis across the populations has revealed three important QTLs (qFW11.1, qFW11.2 and qFW11.3) with upto 56.45% PVE for FW resistance. This is the first report of QTL mapping for FW resistance in pigeonpea and identified genomic region could be utilized in GAB.

  11. Construction of a high-density genetic map for sesame based on large scale marker development by specific length amplified fragment (SLAF) sequencing

    PubMed Central

    2013-01-01

    Background The genetics and molecular biology of sesame has only recently begun to be studied even though sesame is an important oil seed crop. A high-density genetic map for sesame has not been published yet due to a lack of sufficient molecular markers. Specific length amplified fragment sequencing (SLAF-seq) is a recently developed high-resolution strategy for large-scale de novo SNP discovery and genotyping. SLAF-seq was employed in this study to obtain sufficient markers to construct a high-density genetic map for sesame. Results In total, 28.21 Gb of data containing 201,488,285 pair-end reads was obtained after sequencing. The average coverage for each SLAF marker was 23.48-fold in the male parent, 23.38-fold in the female parent, and 14.46-fold average in each F2 individual. In total, 71,793 high-quality SLAFs were detected of which 3,673 SLAFs were polymorphic and 1,272 of the polymorphic markers met the requirements for use in the construction of a genetic map. The final map included 1,233 markers on the 15 linkage groups (LGs) and was 1,474.87 cM in length with an average distance of 1.20 cM between adjacent markers. To our knowledge, this map is the densest genetic linkage map to date for sesame. 'SNP_only’ markers accounted for 87.51% of the markers on the map. A total of 205 markers on the map showed significant (P < 0.05) segregation distortion. Conclusions We report here the first high-density genetic map for sesame. The map was constructed using an F2 population and the SLAF-seq approach, which allowed the efficient development of a large number of polymorphic markers in a short time. Results of this study will not only provide a platform for gene/QTL fine mapping, map-based gene isolation, and molecular breeding for sesame, but will also serve as a reference for positioning sequence scaffolds on a physical map, to assist in the process of assembling the sesame genome sequence. PMID:24060091

  12. Construction of a high-density genetic map for sesame based on large scale marker development by specific length amplified fragment (SLAF) sequencing.

    PubMed

    Zhang, Yanxin; Wang, Linhai; Xin, Huaigen; Li, Donghua; Ma, Chouxian; Ding, Xia; Hong, Weiguo; Zhang, Xiurong

    2013-09-24

    The genetics and molecular biology of sesame has only recently begun to be studied even though sesame is an important oil seed crop. A high-density genetic map for sesame has not been published yet due to a lack of sufficient molecular markers. Specific length amplified fragment sequencing (SLAF-seq) is a recently developed high-resolution strategy for large-scale de novo SNP discovery and genotyping. SLAF-seq was employed in this study to obtain sufficient markers to construct a high-density genetic map for sesame. In total, 28.21 Gb of data containing 201,488,285 pair-end reads was obtained after sequencing. The average coverage for each SLAF marker was 23.48-fold in the male parent, 23.38-fold in the female parent, and 14.46-fold average in each F2 individual. In total, 71,793 high-quality SLAFs were detected of which 3,673 SLAFs were polymorphic and 1,272 of the polymorphic markers met the requirements for use in the construction of a genetic map. The final map included 1,233 markers on the 15 linkage groups (LGs) and was 1,474.87 cM in length with an average distance of 1.20 cM between adjacent markers. To our knowledge, this map is the densest genetic linkage map to date for sesame. 'SNP_only' markers accounted for 87.51% of the markers on the map. A total of 205 markers on the map showed significant (P < 0.05) segregation distortion. We report here the first high-density genetic map for sesame. The map was constructed using an F2 population and the SLAF-seq approach, which allowed the efficient development of a large number of polymorphic markers in a short time. Results of this study will not only provide a platform for gene/QTL fine mapping, map-based gene isolation, and molecular breeding for sesame, but will also serve as a reference for positioning sequence scaffolds on a physical map, to assist in the process of assembling the sesame genome sequence.

  13. Personal sleep pattern visualization using sequence-based kernel self-organizing map on sound data.

    PubMed

    Wu, Hongle; Kato, Takafumi; Yamada, Tomomi; Numao, Masayuki; Fukui, Ken-Ichi

    2017-07-01

    We propose a method to discover sleep patterns via clustering of sound events recorded during sleep. The proposed method extends the conventional self-organizing map algorithm by kernelization and sequence-based technologies to obtain a fine-grained map that visualizes the distribution and changes of sleep-related events. We introduced features widely applied in sound processing and popular kernel functions to the proposed method to evaluate and compare performance. The proposed method provides a new aspect of sleep monitoring because the results demonstrate that sound events can be directly correlated to an individual's sleep patterns. In addition, by visualizing the transition of cluster dynamics, sleep-related sound events were found to relate to the various stages of sleep. Therefore, these results empirically warrant future study into the assessment of personal sleep quality using sound data. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Geologic map of southwestern Sequoia National Park and vicinity, Tulare County, California, including the Mineral King metamorphic pendant

    NASA Astrophysics Data System (ADS)

    Sisson, T. W.; Moore, J. G.

    2012-12-01

    From the late 1940s to the early 1990s, scientists of the U.S. Geological Survey (USGS) mapped the geology of most of Sequoia and Kings Canyon National Parks, California, and published the results as a series of 15-minute (1:62,500 scale) Geologic Quadrangles. The southwest corner of Sequoia National Park, encompassing the Mineral King and eastern edge of the Kaweah 15-minute topographic quadrangles, however, remained unfinished. At the request of the National Park Service's Geologic Resources Division (NPS-GRD), the USGS has mapped the geology of that area using 7.5-minute (1:24,000 scale) topographic bases and high-resolution ortho-imagery. With partial support from NPS-GRD, the major plutons in the map area were dated by the U-Pb zircon method with the Stanford-USGS SHRIMP-RG ion microprobe. Highlights include: (1) Identification of the Early Cretaceous volcano-plutonic suite of Mineral King (informally named), consisting of three deformed granodiorite plutons and the major metarhyolite tuffs of the Mineral King metamorphic pendant. Members of the suite erupted or intruded at 130-140 Ma (pluton ages: this study; rhyolite ages: lower-intercept concordia from zircon results of Busby-Spera, 1983, Princeton Ph.D. thesis, and from Klemetti et al., 2011, AGU abstract) during the pause of igneous activity between emplacement of the Jurassic and Cretaceous Sierran batholiths. (2) Some of the deformation of the Mineral King metamorphic pendant is demonstrably Cretaceous, with evidence including map-scale folding of Early Cretaceous metarhyolite tuff, and an isoclinally folded aplite dike dated at 98 Ma, concurrent with the large 98-Ma granodiorite of Castle Creek that intruded the Mineral King pendant on the west. (3) A 21-km-long magmatic synform within the 99-100 Ma granite of Coyote Pass that is defined both by inward-dipping mafic inclusions (enclaves) and by sporadic, cm-thick, sharply defined mineral layering. The west margin of the granite of Coyote Pass overlies

  15. Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach

    PubMed Central

    Hahn, Christoph; Bachmann, Lutz; Chevreux, Bastien

    2013-01-01

    We present an in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data—mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling (Thymallus thymallus) and Rainbow trout (Oncorhynchus mykiss). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in <24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/chrishah/MITObim. PMID:23661685

  16. Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads--a baiting and iterative mapping approach.

    PubMed

    Hahn, Christoph; Bachmann, Lutz; Chevreux, Bastien

    2013-07-01

    We present an in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data-mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling (Thymallus thymallus) and Rainbow trout (Oncorhynchus mykiss). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in <24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/chrishah/MITObim.

  17. Transcriptome sequencing for high throughput SNP development and genetic mapping in Pea

    PubMed Central

    2014-01-01

    Background Pea has a complex genome of 4.3 Gb for which only limited genomic resources are available to date. Although SNP markers are now highly valuable for research and modern breeding, only a few are described and used in pea for genetic diversity and linkage analysis. Results We developed a large resource by cDNA sequencing of 8 genotypes representative of modern breeding material using the Roche 454 technology, combining both long reads (400 bp) and high coverage (3.8 million reads, reaching a total of 1,369 megabases). Sequencing data were assembled and generated a 68 K unigene set, from which 41 K were annotated from their best blast hit against the model species Medicago truncatula. Annotated contigs showed an even distribution along M. truncatula pseudochromosomes, suggesting a good representation of the pea genome. 10 K pea contigs were found to be polymorphic among the genetic material surveyed, corresponding to 35 K SNPs. We validated a subset of 1538 SNPs through the GoldenGate assay, proving their ability to structure a diversity panel of breeding germplasm. Among them, 1340 were genetically mapped and used to build a new consensus map comprising a total of 2070 markers. Based on blast analysis, we could establish 1252 bridges between our pea consensus map and the pseudochromosomes of M. truncatula, which provides new insight on synteny between the two species. Conclusions Our approach created significant new resources in pea, i.e. the most comprehensive genetic map to date tightly linked to the model species M. truncatula and a large SNP resource for both academic research and breeding. PMID:24521263

  18. Mapping and sequencing the human genome: Science, ethics, and public policy. Final report

    SciTech Connect

    McInerney, J.D.

    1993-03-31

    Development of Mapping and Sequencing the Human Genome: Science, Ethics, and Public Policy followed the standard process of curriculum development at the Biological Sciences Curriculum Study (BSCS), the process is described. The production of this module was a collaborative effort between BSCS and the American Medical Association (AMA). Appendix A contains a copy of the module. Copies of reports sent to the Department of Energy (DOE) during the development process are contained in Appendix B; all reports should be on file at DOE. Appendix B also contains copies of status reports submitted to the BSCS Board of Directors.

  19. Physical mapping and complete nucleotide sequence of the denV gene of bacteriophage T4.

    PubMed Central

    Radany, E H; Naumovski, L; Love, J D; Gutekunst, K A; Hall, D H; Friedberg, E C

    1984-01-01

    Phage T4 deletion mutants that are folate analog resistant (far) and contain deletions in the region of the T4 genome near denV have been isolated previously. We showed that one of these mutants (T4farP12) expressed normal denV gene activity, whereas another mutant (T4farP13) was defective in the denV gene. The rII-distal (right) physical endpoints of these deletions defined the limits of the interval in which the rII-proximal (left) endpoint of the denV gene should be located. The deletion endpoints were identified by restriction and Southern hybridization analyses of phage derivatives containing deoxycytidine instead of hydroxymethyldeoxycytidine in their DNAs. The results of these analyses localized the rII-proximal (left) end of the denV gene to a region between 62.4 and 64.3 kilobases on the T4 physical map. denV+ phage resulted from marker rescue with two of five denV- alleles tested, using plasmids containing a 1.8-kilobase fragment from this region or a 179-base-pair terminal fragment derived from it. Sequencing of the 179-base-pair fragment from wild-type DNA showed a 130-base-pair open reading frame with its termination codon at the rII-proximal end. Confirmation that this open reading frame is part of the denV coding sequence was obtained by identifying a TAG amber codon in the homologous DNA derived from a denV amber mutant strain. This mutant strain rescued the denV+ allele from plasmids containing the wild-type sequence. An adjacent overlapping restriction fragment was also cloned, permitting determination of the remaining denV gene sequence. Based on these results, the 3' end of the coding region of the denV locus was mapped to kilobase position 64.07 on the T4 physical map, and the 5' end was mapped to position 64.48. Images PMID:6092716

  20. Small genomes: New initiatives in mapping and sequencing. Workshop summary report

    SciTech Connect

    McKenney, K.; Robb, F.

    1993-12-31

    The workshop was held 5--7 July 1993 at the Center for Advanced Research in Biotechnology (CARB) and hosted by the University of Maryland Biotechnology Institute (UMBI) and the National Institute of Standards and Technology (NIST). The objective of this workshop was to bring together individuals interested in DNA technologies and to determine the impact of these current and potential improvements of the speed and cost-effectiveness of mapping and sequencing on the planning of future small genome projects. A major goal of the workshop was to spur the collaboration of more diverse groups of scientists working on this topic, and to minimize competitiveness as an inhibitory factor to progress.

  1. Heterozygous mapping strategy (HetMapps)for high resolution genotyping-by-sequencing markers: a case study in grapevine

    USDA-ARS?s Scientific Manuscript database

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low per-sample genotyping cost, but missing data and under-calling of heterozygotes complicate the creation of GBS linkage maps for highly heterozygous species. To overcome these issues, we developed ...

  2. Genetic linkage map of Chinese native variety faba bean (Vicia faba L.) based on simple sequence repeat markers

    USDA-ARS?s Scientific Manuscript database

    Simple sequence repeat (SSR) marker is a powerful tool for construction of genetic linkage map which can be applied for locating quantitative trait loci (QTL) and marker-assisted selection (MAS). In this study, a genetic map of faba bean was constructed with SSR markers using a population of 129 F2 ...

  3. Genetic validation of whole-transcriptome sequencing for mapping expression affected by cis-regulatory variation

    PubMed Central

    2010-01-01

    Background Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Results Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Conclusion Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing. PMID:20707912

  4. Variant mapping and mutation discovery in inbred mice using next-generation sequencing.

    PubMed

    Gallego-Llamas, Jabier; Timms, Andrew E; Geister, Krista A; Lindsay, Anna; Beier, David R

    2015-11-09

    The development of powerful new methods for DNA sequencing enable the discovery of sequence variants, their utilization for the mapping of mutant loci, and the identification of causal variants in a single step. We have applied this approach for the analysis of ENU-mutagenized mice maintained on an inbred background. We ascertained ENU-induced variants in four different phenotypically mutant lines. These were then used as informative markers for positional cloning of the mutated genes. We tested both whole genome (WGS) and whole exome (WES) datasets. Both approaches were successful as a means to localize a region of homozygosity, as well as identifying mutations of candidate genes, which could be individually assessed. As expected, the WGS strategy was more reliable, since many more ENU-induced variants were ascertained.

  5. Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction.

    PubMed

    Chu, Wei; Ghahramani, Zoubin; Podtelezhnikov, Alexei; Wild, David L

    2006-01-01

    In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in beta-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html.

  6. High Density Linkage Map Construction and Mapping of Yield Trait QTLs in Maize (Zea mays) Using the Genotyping-by-Sequencing (GBS) Technology

    PubMed Central

    Su, Chengfu; Wang, Wei; Gong, Shunliang; Zuo, Jinghui; Li, Shujiang; Xu, Shizhong

    2017-01-01

    Increasing grain yield is the ultimate goal for maize breeding. High resolution quantitative trait loci (QTL) mapping can help us understand the molecular basis of phenotypic variation of yield and thus facilitate marker assisted breeding. The aim of this study is to use genotyping-by-sequencing (GBS) for large-scale SNP discovery and simultaneous genotyping of all F2 individuals from a cross between two varieties of maize that are in clear contrast in yield and related traits. A set of 199 F2 progeny derived from the cross of varieties SG-5 and SG-7 were generated and genotyped by GBS. A total of 1,046,524,604 reads with an average of 5,258,918 reads per F2 individual were generated. This number of reads represents an approximately 0.36-fold coverage of the maize reference genome Zea_mays.AGPv3.29 for each F2 individual. A total of 68,882 raw SNPs were discovered in the F2 population, which, after stringent filtering, led to a total of 29,927 high quality SNPs. Comparative analysis using these physically mapped marker loci revealed a higher degree of synteny with the reference genome. The SNP genotype data were utilized to construct an intra-specific genetic linkage map of maize consisting of 3,305 bins on 10 linkage groups spanning 2,236.66 cM at an average distance of 0.68 cM between consecutive markers. From this map, we identified 28 QTLs associated with yield traits (100-kernel weight, ear length, ear diameter, cob diameter, kernel row number, corn grains per row, ear weight, and grain weight per plant) using the composite interval mapping (CIM) method and 29 QTLs using the least absolute shrinkage selection operator (LASSO) method. QTLs identified by the CIM method account for 6.4% to 19.7% of the phenotypic variation. Small intervals of three QTLs (qCGR-1, qKW-2, and qGWP-4) contain several genes, including one gene (GRMZM2G139872) encoding the F-box protein, three genes (GRMZM2G180811, GRMZM5G828139, and GRMZM5G873194) encoding the WD40-repeat protein, and

  7. Including Faults Detected By Near-Surface Seismic Methods in the USGS National Seismic Hazard Maps - Some Restrictions Apply

    NASA Astrophysics Data System (ADS)

    Williams, R. A.; Haller, K. M.

    2014-12-01

    Every 6 years, the USGS updates the National Seismic Hazard Maps (new version released July 2014) that are intended to help society reduce risk from earthquakes. These maps affect hundreds of billions of dollars in construction costs each year as they are used to develop seismic-design criteria of buildings, bridges, highways, railroads, and provide data for risk assessment that help determine insurance rates. Seismic source characterization, an essential component of hazard model development, ranges from detailed trench excavations across faults at the ground surface to less detailed analysis of broad regions defined mainly on the basis of historical seismicity. Though it is a priority for the USGS to discover new Quaternary fault sources, the discovered faults only become a part of the hazard model if there are corresponding constraints on their geometry (length and depth extent) and slip-rate (or recurrence interval). When combined with fault geometry and slip-rate constraints, near-surface seismic studies that detect young (Quaternary) faults have become important parts of the hazard source model. Examples of seismic imaging studies with significant hazard impact include the Southern Whidbey Island fault, Washington; Santa Monica fault, San Andreas fault, and Palos Verdes fault zone, California; and Commerce fault, Missouri. There are many more faults in the hazard model in the western U.S. than in the expansive region east of the Rocky Mountains due to the higher rate of tectonic deformation, frequent surface-rupturing earthquakes and, in some cases, lower erosion rates. However, the recent increase in earthquakes in the central U.S. has revealed previously unknown faults for which we need additional constraints before we can include them in the seismic hazard maps. Some of these new faults may be opportunities for seismic imaging studies to provide basic data on location, dip, style of faulting, and recurrence.

  8. Delimitation of the Thoracosphaeraceae (Dinophyceae), including the calcareous dinoflagellates, based on large amounts of ribosomal RNA sequence data.

    PubMed

    Gottschling, Marc; Soehner, Sylvia; Zinssmeister, Carmen; John, Uwe; Plötner, Jörg; Schweikert, Michael; Aligizaki, Katerina; Elbrächter, Malte

    2012-01-01

    The phylogenetic relationships of the Dinophyceae (Alveolata) are not sufficiently resolved at present. The Thoracosphaeraceae (Peridiniales) are the only group of the Alveolata that include members with calcareous coccoid stages; this trait is considered apomorphic. Although the coccoid stage apparently is not calcareous, Bysmatrum has been assigned to the Thoracosphaeraceae based on thecal morphology. We tested the monophyly of the Thoracosphaeraceae using large sets of ribosomal RNA sequence data of the Alveolata including the Dinophyceae. Phylogenetic analyses were performed using Maximum Likelihood and Bayesian approaches. The Thoracosphaeraceae were monophyletic, but included also a number of non-calcareous dinophytes (such as Pentapharsodinium and Pfiesteria) and even parasites (such as Duboscquodinium and Tintinnophagus). Bysmatrum had an isolated and uncertain phylogenetic position outside the Thoracosphaeraceae. The phylogenetic relationships among calcareous dinophytes appear complex, and the assumption of the single origin of the potential to produce calcareous structures is challenged. The application of concatenated ribosomal RNA sequence data may prove promising for phylogenetic reconstructions of the Dinophyceae in future. Copyright © 2011 Elsevier GmbH. All rights reserved.

  9. Linking the human cytogenetic map with nucleotide sequence: the CCAP clone set.

    PubMed

    Jang, Wonhee; Yonescu, Raluca; Knutsen, Turid; Brown, Theresa; Reppert, Tricia; Sirotkin, Karl; Schuler, Gregory D; Ried, Thomas; Kirsch, Ilan R

    2006-07-15

    We present the completed dataset and clone repository of the Cancer Chromosome Aberration Project (CCAP), an initiative developed and funded through the intramural program of the U.S. National Cancer Institute, to provide seamless linkage of human cytogenetic markers with the primary nucleotide sequence of the human genome. Spaced at 1-2 Mb intervals across the human genome, 1,339 bacterial artificial chromosome (BAC) clones have been localized to chromosomal bands through high-resolution fluorescence in situ hybridization (FISH) mapping. Of these clones, 99.8% can be positioned on the primary human genome sequence and 95% are placed at or close to their precise nucleotide starts and stops. This dataset can be studied and manipulated within generally available public Web sites. The clones are available from a commercial repository. The CCAP BAC clone set provides anchors for the interrogation of gene and sequence involvement in oncogenic and developmental disorders when the starting point is the recognition of a structural, numerical, or interstitial chromosomal aberration. This dataset also provides a current view of the quality and coherence of the available genome sequence and insight into the nucleotide and three-dimensional structures that manifest as Giemsa light and dark chromosomal banding patterns.

  10. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-12-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 {sup 32}P- or {sup 33}P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  11. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-01-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 [sup 32]P- or [sup 33]P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  12. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  13. OligoHeatMap (OHM): an online tool to estimate and display hybridizations of oligonucleotides onto DNA sequences.

    PubMed

    Croce, Olivier; Chevenet, François; Christen, Richard

    2008-07-01

    The efficiency of molecular methods involving DNA/DNA hybridizations depends on the accurate prediction of the melting temperature (T(m)) of the duplex. Many softwares are available for T(m) calculations, but difficulties arise when one wishes to check if a given oligomer (PCR primer or probe) hybridizes well or not on more than a single sequence. Moreover, the presence of mismatches within the duplex is not sufficient to estimate specificity as it does not always significantly decrease the T(m). OHM (OligoHeatMap) is an online tool able to provide estimates of T(m) for a set of oligomers and a set of aligned sequences, not only as text files of complete results but also in a graphical way: T(m) values are translated into colors and displayed as a heat map image, either stand alone or to be used by softwares such as TreeDyn to be included in a phylogenetic tree. OHM is freely available at http://bioinfo.unice.fr/ohm/, with links to the full source code and online help.

  14. Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

    PubMed Central

    2012-01-01

    Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993

  15. A High-Density SNP Map of Sunflower Derived from RAD-Sequencing Facilitating Fine-Mapping of the Rust Resistance Gene R12

    PubMed Central

    Talukder, Zahirul I.; Gong, Li; Hulke, Brent S.; Pegadaraju, Venkatramana; Song, Qijian; Schultz, Quentin; Qi, Lili

    2014-01-01

    A high-resolution genetic map of sunflower was constructed by integrating SNP data from three F2 mapping populations (HA 89/RHA 464, B-line/RHA 464, and CR 29/RHA 468). The consensus map spanned a total length of 1443.84 cM, and consisted of 5,019 SNP markers derived from RAD tag sequencing and 118 publicly available SSR markers distributed in 17 linkage groups, corresponding to the haploid chromosome number of sunflower. The maximum interval between markers in the consensus map is 12.37 cM and the average distance is 0.28 cM between adjacent markers. Despite a few short-distance inversions in marker order, the consensus map showed high levels of collinearity among individual maps with an average Spearman's rank correlation coefficient of 0.972 across the genome. The order of the SSR markers on the consensus map was also in agreement with the order of the individual map and with previously published sunflower maps. Three individual and one consensus maps revealed the uneven distribution of markers across the genome. Additionally, we performed fine mapping and marker validation of the rust resistance gene R12, providing closely linked SNP markers for marker-assisted selection of this gene in sunflower breeding programs. This high resolution consensus map will serve as a valuable tool to the sunflower community for studying marker-trait association of important agronomic traits, marker assisted breeding, map-based gene cloning, and comparative mapping. PMID:25014030

  16. Progress towards the construction of a sequence-ready physical map of the 3AS chromosome arm of hexaploid wheat

    USDA-ARS?s Scientific Manuscript database

    The large genome size (~17 Gb), polyploid nature, and repetitive sequence content (>90%) of hexaploid wheat (Triticum aestivum) present a challenge for constructing physical maps, which are fundamental resources to aid genomic sequencing and annotation, and gene cloning. One approach to reduce the c...

  17. Information on a Major New Initiative: Mapping and Sequencing the Human Genome (1986 DOE Memorandum)

    DOE R&D Accomplishments Database

    DeLisi, Charles (Associate Director, Health and Environmental Research, DOE Office of Energy Research)

    1986-05-06

    In the history of the Human Genome Program, Dr. Charles DeLisi and Dr. Alvin Trivelpiece of the Department of Energy (DOE) were instrumental in moving the seeds of the program forward. This May 1986 memo from DeLisi to Trivelpiece, Director of DOE's Office of Energy Research, documents this fact. Following the March 1986 Santa Fe workshop on the subject of mapping and sequencing the human genome, DeLisi's memo outlines workshop conclusions, explains the relevance of this project to DOE and the importance of the Department's laboratories and capabilities, notes the critical experience of DOE in managing projects of this scale and potential magnitude, and recognizes the fact that the project will impact biomedical science in ways which could not be fully anticipated at the time. Subsequently, program guidance was further sought from the DOE Health Effects Research Advisory Committee (HERAC) and the April 1987 HERAC report recommended that DOE and the nation commit to a large, multidisciplinary, scientific and technological undertaking to map and sequence the human genome.

  18. Information on a Major New Initiative: Mapping and Sequencing the Human Genome (1986 DOE Memorandum)

    SciTech Connect

    DeLisi, Charles

    1986-05-06

    In the history of the Human Genome Program, Dr. Charles DeLisi and Dr. Alvin Trivelpiece of the Department of Energy (DOE) were instrumental in moving the seeds of the program forward. This May 1986 memo from DeLisi to Trivelpiece, director of DOE's Office of Energy Research, documents this fact. Following the March 1986 Santa Fe workshop on the subject of mapping and sequencing the human genome, Delisi's memo outlines workshop conclusions, explains the relevance of this project to DOE and the importance of the Department's laboratories and capabilities, notes the critical experience of DOE in managing projects of this scale and potential magnitude, and recognizes the fact that the project will impact biomedical science in ways which could not be fully anticipated at the time. Subsequently, program guidance was further sought from the DOE Health Effects Research Advisory Committee (HERAC) and the April 1987 HERAC report recommmended that DOE and the nation commit to a large, multidisciplinary, scientific and technological undertaking to map and sequence the human genome.

  19. Ordered shotgun sequencing of a 135 kb Xq25 YAC containing ANT2 and four possible genes, including three confirmed by EST matches.

    PubMed Central

    Chen, C N; Su, Y; Baybayan, P; Siruno, A; Nagaraja, R; Mazzarella, R; Schlessinger, D; Chen, E

    1996-01-01

    Ordered shotgun sequencing (OSS) has been successfully carried out with an Xq25 YAC substrate. yWXD703 DNA was subcloned into lambda phage and sequences of insert ends of the lambda subclones were used to generate a map to select a minimum tiling path of clones to be completely sequenced. The sequence of 135 038 nt contains the entire ANT2 cDNA as well as four other candidates suggested by computer-assisted analyses. One of the putative genes is homologous to a gene implicated in Graves' disease and it, ANT2 and two others are confirmed by EST matches. The results suggest that OSS can be applied to YACs in accord with earlier simulations and further indicate that the sequence of the YAC accurately reflects the sequence of uncloned human DNA. PMID:8918809

  20. Isolation and refined regional mapping of expressed sequences from human chromosome 21

    SciTech Connect

    Kao, F.T.; Yu, J.; Patterson, D.

    1994-10-01

    To increase candidate genes from human chromosome 21 for the analysis of Down syndrome and other genetic diseases localized on this chromosome, we have isolated and studied 9 cDNA clones encoded by chromosome 21. For isolating cDNAs, single-copy microclones from a chromosome 21 microdissection library were used in direct screening of various cDNA libraries. Seven of the cDNA clones have been regionally mapped on chromosome 21 using a comprehensive hybrid mapping panel comprising 24 cell hybrids that divide the chromosome into 33 subregions. These cDNA clones with refined mapping positions should be useful for identification and cloning of genes responsible for the specific component phenotypes of Down syndrome and other diseases on chromosome 21, including progressive myoclonus epilepsy in 21q22.3. 12 refs., 2 figs., 1 tab.

  1. A Whole-Genome DNA Marker Map for Cotton Based on the D-Genome Sequence of Gossypium raimondii L.

    PubMed Central

    Wang, Zining; Zhang, Dong; Wang, Xiyin; Tan, Xu; Guo, Hui; Paterson, Andrew H.

    2013-01-01

    We constructed a very-high-density, whole-genome marker map (WGMM) for cotton by using 18,597 DNA markers corresponding to 48,958 loci that were aligned to both a consensus genetic map and a reference genome sequence. The WGMM has a density of one locus per 15.6 kb, or an average of 1.3 loci per gene. The WGMM was anchored by the use of colinear markers to a detailed genetic map, providing recombinational information. Mapped markers occurred at relatively greater physical densities in distal chromosomal regions and lower physical densities in the central regions, with all 1 Mb bins having at least nine markers. Hotspots for quantitative trait loci and resistance gene analog clusters were aligned to the map and DNA markers identified for targeting of these regions of high practical importance. Based on the cotton D genome reference sequence, the locations of chromosome structural rearrangements plotted on the map facilitate its translation to other Gossypium genome types. The WGMM is a versatile genetic map for marker assisted breeding, fine mapping and cloning of genes and quantitative trait loci, developing new genetic markers and maps, genome-wide association mapping, and genome evolution studies. PMID:23979945

  2. Mapping autosomal recessive intellectual disability: combined microarray and exome sequencing identifies 26 novel candidate genes in 192 consanguineous families.

    PubMed

    Harripaul, R; Vasli, N; Mikhailov, A; Rafiq, M A; Mittal, K; Windpassinger, C; Sheikh, T I; Noor, A; Mahmood, H; Downey, S; Johnson, M; Vleuten, K; Bell, L; Ilyas, M; Khan, F S; Khan, V; Moradi, M; Ayaz, M; Naeem, F; Heidari, A; Ahmed, I; Ghadami, S; Agha, Z; Zeinali, S; Qamar, R; Mozhdehipanah, H; John, P; Mir, A; Ansar, M; French, L; Ayub, M; Vincent, J B

    2017-04-11

    Approximately 1% of the global population is affected by intellectual disability (ID), and the majority receive no molecular diagnosis. Previous studies have indicated high levels of genetic heterogeneity, with estimates of more than 2500 autosomal ID genes, the majority of which are autosomal recessive (AR). Here, we combined microarray genotyping, homozygosity-by-descent (HBD) mapping, copy number variation (CNV) analysis, and whole exome sequencing (WES) to identify disease genes/mutations in 192 multiplex Pakistani and Iranian consanguineous families with non-syndromic ID. We identified definite or candidate mutations (or CNVs) in 51% of families in 72 different genes, including 26 not previously reported for ARID. The new ARID genes include nine with loss-of-function mutations (ABI2, MAPK8, MPDZ, PIDD1, SLAIN1, TBC1D23, TRAPPC6B, UBA7 and USP44), and missense mutations include the first reports of variants in BDNF or TET1 associated with ID. The genes identified also showed overlap with de novo gene sets for other neuropsychiatric disorders. Transcriptional studies showed prominent expression in the prenatal brain. The high yield of AR mutations for ID indicated that this approach has excellent clinical potential and should inform clinical diagnostics, including clinical whole exome and genome sequencing, for populations in which consanguinity is common. As with other AR disorders, the relevance will also apply to outbred populations.Molecular Psychiatry advance online publication, 11 April 2017; doi:10.1038/mp.2017.60.

  3. High-density genetic map construction and gene mapping of pericarp color in wax gourd using specific-locus amplified fragment (SLAF) sequencing.

    PubMed

    Jiang, Biao; Liu, Wenrui; Xie, Dasen; Peng, Qingwu; He, Xiaoming; Lin, Yu'e; Liang, Zhaojun

    2015-12-09

    High-density map is a valuable tool for genetic and genomic analysis. Although wax gourd is a widely distributed vegetable of Cucurbitaceae and has important medicinal and health value, no genetic map has been constructed because of the lack of efficient markers. Specific-locus amplified fragment sequencing (SLAF-seq) is a newly developed high-throughput strategy for large-scale single nucleotide polymorphism (SNP) discovery and genotyping. In our present study, we constructed a high-density genetic map by using SLAF-seq and identified a locus controlling pericarp color in wax gourd. An F2 population of 140 individuals and their two parents were subjected to SLAF-seq. A total of 143.38 M pair-end reads were generated. The average sequencing depth was 26.51 in the maternal line (B214), 27.01 in the parental line (B227), and 5.11 in each F2 individual. When filtering low-depth SLAF tags, a total of 142,653 high-quality SLAFs were detected, and 22,151 of them were polymorphic, with a polymorphism rate of 15.42 %. And finally, 4,607 of the polymorphic markers were selected for genetic map construction, and 12 linkage groups (LGs) were generated. The map spanned 2,172.86 cM with an average distance between adjacent markers for 0.49 cM. The inheritance of pericarp color was also studied, which showed that the pericarp color was controlled by one single gene. And based on the newly constructed high-density map, a single locus locating on chromosome 5 was identified for controlling the pericarp color of wax gourd. This is the first report of high-density genetic map construction and gene mapping in wax gourd, which will be served as an invaluable tool for gene mapping, marker assisted breeding, map-based gene cloning, comparative mapping and draft genome assembling of wax gourd.

  4. Connectivity mapping using a combined gene signature from multiple colorectal cancer datasets identified candidate drugs including existing chemotherapies

    PubMed Central

    2015-01-01

    Background While the discovery of new drugs is a complex, lengthy and costly process, identifying new uses for existing drugs is a cost-effective approach to therapeutic discovery. Connectivity mapping integrates gene expression profiling with advanced algorithms to connect genes, diseases and small molecule compounds and has been applied in a large number of studies to identify potential drugs, particularly to facilitate drug repurposing. Colorectal cancer (CRC) is a commonly diagnosed cancer with high mortality rates, presenting a worldwide health problem. With the advancement of high throughput omics technologies, a number of large scale gene expression profiling studies have been conducted on CRCs, providing multiple datasets in gene expression data repositories. In this work, we systematically apply gene expression connectivity mapping to multiple CRC datasets to identify candidate therapeutics to this disease. Results We developed a robust method to compile a combined gene signature for colorectal cancer across multiple datasets. Connectivity mapping analysis with this signature of 148 genes identified 10 candidate compounds, including irinotecan and etoposide, which are chemotherapy drugs currently used to treat CRCs. These results indicate that we have discovered high quality connections between the CRC disease state and the candidate compounds, and that the gene signature we created may be used as a potential therapeutic target in treating the disease. The method we proposed is highly effective in generating quality gene signature through multiple datasets; the publication of the combined CRC gene signature and the list of candidate compounds from this work will benefit both cancer and systems biology research communities for further development and investigations. PMID:26356760

  5. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  6. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea.

    PubMed

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-08-25

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties.

  7. HOXA10 and HOXA13 sequence variations in human female genital malformations including congenital absence of the uterus and vagina.

    PubMed

    Ekici, Arif B; Strissel, Pamela L; Oppelt, Patricia G; Renner, Stefan P; Brucker, Sara; Beckmann, Matthias W; Strick, Reiner

    2013-04-15

    Congenital genital malformations occurring in the female population are estimated to be 5 per 1000 and associate with infertility, abortion, stillbirth, preterm delivery and other organ abnormalities. Complete aplasia of the uterus, cervix and upper vagina (Mayer-Rokitansky-Küster-Hauser (MRKH) syndrome) has an incidence of 1 per 4000 female live births. The molecular etiology of congenital genital malformations including MRKH is unknown up to date. The homeobox (HOX) genes HOXA10 and HOXA13 are involved in the development of human genitalia. In this investigation, HOXA10 and HOXA13 genes of 20 patients with the MRKH syndrome, 7 non-MRKH patients with genital malformations and 53 control women were sequenced to assess for DNA variations. A total of 14 DNA sequence variations (10 novel and 4 known) within exonic and untranslated regions were detected in HOXA10 and HOXA13 among our cohorts. Four HOXA10 and two HOXA13 DNA sequence variations were found solely in patients with genital malformations. In addition to mutations resulting in synonymous amino acid substitutions, in the HOXA10 gene a missense mutation was identified and predicted by computer analysis as probably damaging to protein function in two non-MRKH patients, one with a bicornate and the other patient with a septated uterus. A novel exonic HOXA10 cytosine deletion was also identified in a non-MRKH patient with a septate uterus and renal malformations resulting in a premature stop codon and loss of the homeodomain helix 3/4. This cytosine deletion and the missense mutation in HOXA10 were analysed by real time PCR and sequencing, respectively, in two additional larger cohorts of 103 patients with MRKH and 109 non-MRKH patients with genital malformations. No other patients were found with the cytosine deletion however one additional patient was identified regarding the missense mutation. Rare DNA sequence variations in the HOXA10 gene could contribute to the misdevelopment of female internal genitalia

  8. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes.

    PubMed

    Kalbfleisch, Ted; Heaton, Michael P

    2013-01-01

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1).  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene

  9. Quantitative susceptibility mapping using principles of echo shifting with a train of observations sequence on 1.5T MRI.

    PubMed

    Kan, Hirohito; Arai, Nobuyuki; Kasai, Harumasa; Kunitomo, Hiroshi; Hirose, Yasujiro; Shibamoto, Yuta

    2017-10-01

    To evaluate the accuracy of susceptibility estimated from the principles of echo shifting with a train of observations (PRESTO) sequence using a 1.5T MRI system, we conducted experiments on the human brain using the PRESTO sequence and compared our results with the susceptibility obtained from spoiled gradient-recalled echo (GRE) sequence with flow compensation using quantitative susceptibility mapping (QSM) reconstruction. Experiments on the human brain were conducted on 12 healthy volunteers (27±4years) using PRESTO and spoiled GRE sequences on a 1.5T scanner. The PRESTO sequence is an echo-shifted gradient echo sequence that allows high susceptibility sensitivity and rapid acquisition because of TE>TR compared with the spoiled GRE sequence. QSM analysis was performed on the obtained phase images using the iLSQR method. Estimated susceptibility maps were used for region of interest analyses and estimation of line profiles through iron-rich tissue and major vessels. Our results demonstrated that susceptibility maps were accurately estimated, without error, by QSM analysis of PRESTO and spoiled GRE sequences. Acquisition time in the PRESTO sequence was reduced by 43% compared with that in the spoiled GRE sequence. Differences did exist between susceptibility maps in PRESTO and spoiled GRE sequences for visualization and quantitative values of major blood vessels and the areas around them CONCLUSION: The PRESTO sequence enables correct estimation of tissue susceptibility with rapid acquisition and may be useful for QSM analysis of clinical use of 1.5T scanners. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Unified tests for fine-scale mapping and identifying sparse high-dimensional sequence associations

    PubMed Central

    Cao, Shaolong; Qin, Huaizhen; Gossmann, Alexej; Deng, Hong-Wen; Wang, Yu-Ping

    2016-01-01

    Motivation: In searching for genetic variants for complex diseases with deep sequencing data, genomic marker sets of high-dimensional genotypic data and sparse functional variants are quite common. Existing sequence association tests are incapable of identifying such marker sets or individual causal loci, although they appeared powerful to identify small marker sets with dense functional variants. In sequence association studies of admixed individuals, cryptic relatedness and population structure are known to confound the association analyses. Method: We here propose a unified marker wise test (uFineMap) to accurately localize causal loci and a unified high-dimensional set based test (uHDSet) to identify high-dimensional sparse associations in deep sequencing genomic data of multi-ethnic individuals with random relatedness. These two novel tests are based on scaled sparse linear mixed regressions with Lp (0 < p < 1) norm regularization. They jointly adjust for cryptic relatedness, population structure and other confounders to prevent false discoveries and improve statistical power for identifying promising individual markers and marker sets that harbor functional genetic variants of a complex trait. Results: With large scale simulation data and real data analyses, the proposed tests appropriately controlled Type I error rates and appeared to be more powerful than several prominent methods. We illustrated their practical utilities by the applications to DNA sequence data of Framingham Heart Study for osteoporosis. The proposed tests identified 11 novel significant genes that were missed by the prominent famSKAT and GEMMA. In particular, four out of six most significant pathways identified by the uHDSet but missed by famSKAT have been reported to be related to BMD or osteoporosis in the literature. Availability and implementation: The computational toolkit is available for academic use: https://sites.google.com/site/shaolongscode/home/uhdset Contact: wyp

  11. Crop Type Mapping from a Sequence of Terrasar-X Images with Dynamic Conditional Random Fields

    NASA Astrophysics Data System (ADS)

    Kenduiywo, B. K.; Bargiel, D.; Soergel, U.

    2016-06-01

    Crop phenology is dynamic as it changes with times of the year. Such biophysical processes also look spectrally different to remote sensing satellites. Some crops may depict similar spectral properties if their phenology coincide, but differ later when their phenology diverge. Thus, conventional approaches that select only images from phenological stages where crops are distinguishable for classification, have low discrimination. In contrast, stacking images within a cropping season limits discrimination to a single feature space that can suffer from overlapping classes. Since crop backscatter varies with time, it can aid discrimination. Therefore, our main objective is to develop a crop sequence classification method using multitemporal TerraSAR-X images. We adopt first order markov assumption in undirected temporal graph sequence. This property is exploited to implement Dynamic Conditional Random Fields (DCRFs). Our DCRFs model has a repeated structure of temporally connected Conditional Random Fields (CRFs). Each node in the sequence is connected to its predecessor via conditional probability matrix. The matrix is computed using posterior class probabilities from association potential. This way, there is a mutual temporal exchange of phenological information observed in TerraSAR-X images. When compared to independent epoch classification, the designed DCRF model improved crop discrimination at each epoch in the sequence. However, government, insurers, agricultural market traders and other stakeholders are interested in the quantity of a certain crop in a season. Therefore, we further develop a DCRF ensemble classifier. The ensemble produces an optimal crop map by maximizing over posterior class probabilities selected from the sequence based on maximum F1-score and weighted by correctness. Our ensemble technique is compared to standard approach of stacking all images as bands for classification using Maximum Likelihood Classifier (MLC) and standard CRFs. It

  12. A high-density genetic map of cucumber derived from Specific Length Amplified Fragment sequencing (SLAF-seq)

    PubMed Central

    Xu, Xuewen; Xu, Ruixue; Zhu, Biyun; Yu, Ting; Qu, Wenqin; Lu, Lu; Xu, Qiang; Qi, Xiaohua; Chen, Xuehao

    2015-01-01

    High-density genetic map provides an essential framework for accurate and efficient genome assembly and QTL fine mapping. Construction of high-density genetic maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. In this research, a high-density genetic map of cucumber (Cucumis sativus L.) was successfully constructed across an F2 population by a recently developed Specific Length Amplified Fragment sequencing (SLAF-seq) method. In total, 18.69 GB of data containing 93,460,000 paired-end reads were obtained after preprocessing. The average sequencing depth was 44.92 in the D8 (female parent), 42.16 in the Jin5-508 (male parent), and 5.01 in each progeny. 79,092 high-quality SLAFs were detected, of which 6784 SLAFs were polymorphic, and 1892 of the polymorphic markers met the requirements for constructing genetic map. The genetic map spanned 845.87 cm with an average genetic distance of 0.45 cm. It is a reliable linkage map for fine mapping and molecular breeding of cucumber for its high marker density and well-ordered markers. PMID:25610449

  13. A consensus linkage map for sugi (Cryptomeria japonica) from two pedigrees, based on microsatellites and expressed sequence tags.

    PubMed

    Tani, Naoki; Takahashi, Tomokazu; Iwata, Hiroyoshi; Mukai, Yuzuru; Ujino-Ihara, Tokuko; Matsumoto, Asako; Yoshimura, Kensuke; Yoshimaru, Hiroshi; Murai, Masafumi; Nagasaka, Kazutoshi; Tsumura, Yoshihiko

    2003-11-01

    A consensus map for sugi (Cryptomeria japonica) was constructed by integrating linkage data from two unrelated third-generation pedigrees, one derived from a full-sib cross and the other by self-pollination of F1 individuals. The progeny segregation data of the first pedigree were derived from cleaved amplified polymorphic sequences, microsatellites, restriction fragment length polymorphisms, and single nucleotide polymorphisms. The data of the second pedigree were derived from cleaved amplified polymorphic sequences, isozyme markers, morphological traits, random amplified polymorphic DNA markers, and restriction fragment length polymorphisms. Linkage analyses were done for the first pedigree with JoinMap 3.0, using its parameter set for progeny derived by cross-pollination, and for the second pedigree with the parameter set for progeny derived from selfing of F1 individuals. The 11 chromosomes of C. japonica are represented in the consensus map. A total of 438 markers were assigned to 11 large linkage groups, 1 small linkage group, and 1 nonintegrated linkage group from the second pedigree; their total length was 1372.2 cM. On average, the consensus map showed 1 marker every 3.0 cM. PCR-based codominant DNA markers such as cleaved amplified polymorphic sequences and microsatellite markers were distributed in all linkage groups and occupied about half of mapped loci. These markers are very useful for integration of different linkage maps, QTL mapping, and comparative mapping for evolutional study, especially for species with a large genome size such as conifers.

  14. SNP discovery and genetic mapping using genotyping by sequencing of whole genome genomic DNA from a pea RIL population.

    PubMed

    Boutet, Gilles; Alves Carvalho, Susete; Falque, Matthieu; Peterlongo, Pierre; Lhuillier, Emeline; Bouchez, Olivier; Lavaud, Clément; Pilet-Nayel, Marie-Laure; Rivière, Nathalie; Baranger, Alain

    2016-02-18

    Progress in genetics and breeding in pea still suffers from the limited availability of molecular resources. SNP markers that can be identified through affordable sequencing processes, without the need for prior genome reduction or a reference genome to assemble sequencing data would allow the discovery and genetic mapping of thousands of molecular markers. Such an approach could significantly speed up genetic studies and marker assisted breeding for non-model species. A total of 419,024 SNPs were discovered using HiSeq whole genome sequencing of four pea lines, followed by direct identification of SNP markers without assembly using the discoSnp tool. Subsequent filtering led to the identification of 131,850 highly designable SNPs, polymorphic between at least two of the four pea lines. A subset of 64,754 SNPs was called and genotyped by short read sequencing on a subpopulation of 48 RILs from the cross 'Baccara' x 'PI180693'. This data was used to construct a WGGBS-derived pea genetic map comprising 64,263 markers. This map is collinear with previous pea consensus maps and therefore with the Medicago truncatula genome. Sequencing of four additional pea lines showed that 33 % to 64 % of the mapped SNPs, depending on the pairs of lines considered, are polymorphic and can therefore be useful in other crosses. The subsequent genotyping of a subset of 1000 SNPs, chosen for their mapping positions using a KASP™ assay, showed that almost all generated SNPs are highly designable and that most (95 %) deliver highly qualitative genotyping results. Using rather low sequencing coverages in SNP discovery and in SNP inferring did not hinder the identification of hundreds of thousands of high quality SNPs. The development and optimization of appropriate tools in SNP discovery and genetic mapping have allowed us to make available a massive new genomic resource in pea. It will be useful for both fine mapping within chosen QTL confidence intervals and marker assisted breeding for

  15. A high-density gene map of loblolly pine (Pinus taeda L.) based on exome sequence capture genotyping.

    PubMed

    Neves, Leandro Gomide; Davis, John M; Barbazuk, William B; Kirst, Matias

    2014-01-10

    Loblolly pine (Pinus taeda L.) is an economically and ecologically important conifer for which a suite of genomic resources is being generated. Despite recent attempts to sequence the large genome of conifers, their assembly and the positioning of genes remains largely incomplete. The interspecific synteny in pines suggests that a gene-based map would be useful to support genome assemblies and analysis of conifers. To establish a reference gene-based genetic map, we performed exome sequencing of 14729 genes on a mapping population of 72 haploid samples, generating a resource of 7434 sequence variants segregating for 3787 genes. Most markers are single-nucleotide polymorphisms, although short insertions/deletions and multiple nucleotide polymorphisms also were used. Marker segregation in the population was used to generate a high-density, gene-based genetic map. A total of 2841 genes were mapped to pine's 12 linkage groups with an average of one marker every 0.58 cM. Capture data were used to detect gene presence/absence variations and position 65 genes on the map. We compared the marker order of genes previously mapped in loblolly pine and found high agreement. We estimated that 4123 genes had enough sequencing depth for reliable detection of markers, suggesting a high marker conversation rate of 92% (3787/4123). This is possible because a significant portion of the gene is captured and sequenced, increasing the chances of identifying a polymorphic site for characterization and mapping. This sub-centiMorgan genetic map provides a valuable resource for gene positioning on chromosomes and guide for the assembly of a reference pine genome.

  16. Addition of the microchromosome GGA25 to the chicken genome sequence assembly through radiation hybrid and genetic mapping

    PubMed Central

    Douaud, Marine; Fève, Katia; Gerus, Marie; Fillon, Valérie; Bardes, Suzanne; Gourichon, David; Dawson, Deborah A; Hanotte, Olivier; Burke, Terry; Vignoles, Florence; Morisson, Mireille; Tixier-Boichard, Michèle; Vignal, Alain; Pitel, Frédérique

    2008-01-01

    Background The publication of the first draft chicken sequence assembly became available in 2004 and was updated in 2006. However, this does not constitute a definitive and complete sequence of the chicken genome, since the microchromosomes are notably under-represented. In an effort to develop maps for the microchromosomes absent from the chicken genome assembly, we developed radiation hybrid (RH) and genetic maps with markers isolated from sequence currently assigned to "chromosome Unknown" (chrUn). The chrUn is composed of sequence contigs not assigned to named chromosomes. To identify and map sequence belonging to the microchromosomes we used a comparative mapping strategy, and we focused on the small linkage group E26C13. Results In total, 139 markers were analysed with the chickRH6 panel, of which 120 were effectively assigned to the E26C13 linkage group, the remainder mapping elsewhere in the genome. The final RH map is composed of 22 framework markers extending over a 245.6 cR distance. A corresponding genetic map was developed, whose length is 103 cM in the East Lansing reference population. The E26C13 group was assigned to GGA25 (Gallus gallus chromosome 25) by FISH (fluorescence in situ hybridisation) mapping. Conclusion The high-resolution RH framework map obtained here covers the entire chicken chromosome 25 and reveals the existence of a high number of intrachromosomal rearrangements when compared to the human genome. The strategy used here for the characterization of GGA25 could be used to improve knowledge on the other uncharacterized small, yet gene-rich microchromosomes. PMID:18366813

  17. Transposon Tc1-derived, sequence-tagged sites in Caenorhabditis elegans as markers for gene mapping

    PubMed Central

    Korswagen, Hendrik C.; Durbin, Richard M.; Smits, Miriam T.; Plasterk, Ronald H. A.

    1996-01-01

    We present an approach to map large numbers of Tc1 transposon insertions in the genome of Caenorhabditis elegans. Strains have been described that contain up to 500 polymorphic Tc1 insertions. From these we have cloned and shotgun sequenced over 2000 Tc1 flanks, resulting in an estimated set of 400 or more distinct Tc1 insertion alleles. Alignment of these sequences revealed a weak Tc1 insertion site consensus sequence that was symmetric around the invariant TA target site and reads CAYATATRTG. The Tc1 flanking sequences were compared with 40 Mbp of a C. elegans genome sequence. We found 151 insertions within the sequenced area, a density of ≈1 Tc1 insertion in every 265 kb. As the rest of the C. elegans genome sequence is obtained, remaining Tc1 alleles will fall into place. These mapped Tc1 insertions can serve two functions: (i) insertions in or near genes can be used to isolate deletion derivatives that have that gene mutated; and (ii) they represent a dense collection of polymorphic sequence-tagged sites. We demonstrate a strategy to use these Tc1 sequence-tagged sites in fine-mapping mutations. PMID:8962114

  18. A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework

    PubMed Central

    2012-01-01

    Background State-of-the-art high-throughput sequencers, e.g., the Illumina HiSeq series, generate sequencing reads that are longer than 150 bp up to a total of 600 Gbp of data per run. The high-throughput sequencers generate lengthier reads with greater sequencing depth than those generated by previous technologies. Two major challenges exist in using the high-throughput technology for de novo assembly of genomes. First, the amount of physical memory may be insufficient to store the data structure of the assembly algorithm, even for high-end multicore processors. Moreover, the graph-theoretical model used to capture intersection relationships of the reads may contain structural defects that are not well managed by existing assembly algorithms. Results We developed a distributed genome assembler based on string graphs and MapReduce framework, known as the CloudBrush. The assembler includes a novel edge-adjustment algorithm to detect structural defects by examining the neighboring reads of a specific read for sequencing errors and adjusting the edges of the string graph, if necessary. CloudBrush is evaluated against GAGE benchmarks to compare its assembly quality with the other assemblers. The results show that our assemblies have a moderate N50, a low misassembly rate of misjoins, and indels of > 5 bp. In addition, we have introduced two measures, known as precision and recall, to address the issues of faithfully aligned contigs to target genomes. Compared with the assembly tools used in the GAGE benchmarks, CloudBrush is shown to produce contigs with high precision and recall. We also verified the effectiveness of the edge-adjustment algorithm using simulated datasets and ran CloudBrush on a nematode dataset using a commercial cloud. CloudBrush assembler is available at https://github.com/ice91/CloudBrush. PMID:23282094

  19. Genome-Wide Single-Nucleotide Polymorphisms Discovery and High-Density Genetic Map Construction in Cauliflower Using Specific-Locus Amplified Fragment Sequencing

    PubMed Central

    Zhao, Zhenqing; Gu, Honghui; Sheng, Xiaoguang; Yu, Huifang; Wang, Jiansheng; Huang, Long; Wang, Dan

    2016-01-01

    Molecular markers and genetic maps play an important role in plant genomics and breeding studies. Cauliflower is an important and distinctive vegetable; however, very few molecular resources have been reported for this species. In this study, a novel, specific-locus amplified fragment (SLAF) sequencing strategy was employed for large-scale single nucleotide polymorphism (SNP) discovery and high-density genetic map construction in a double-haploid, segregating population of cauliflower. A total of 12.47 Gb raw data containing 77.92 M pair-end reads were obtained after processing and 6815 polymorphic SLAFs between the two parents were detected. The average sequencing depths reached 52.66-fold for the female parent and 49.35-fold for the male parent. Subsequently, these polymorphic SLAFs were used to genotype the population and further filtered based on several criteria to construct a genetic linkage map of cauliflower. Finally, 1776 high-quality SLAF markers, including 2741 SNPs, constituted the linkage map with average data integrity of 95.68%. The final map spanned a total genetic length of 890.01 cM with an average marker interval of 0.50 cM, and covered 364.9 Mb of the reference genome. The markers and genetic map developed in this study could provide an important foundation not only for comparative genomics studies within Brassica oleracea species but also for quantitative trait loci identification and molecular breeding of cauliflower. PMID:27047515

  20. Genome-Wide Single-Nucleotide Polymorphisms Discovery and High-Density Genetic Map Construction in Cauliflower Using Specific-Locus Amplified Fragment Sequencing.

    PubMed

    Zhao, Zhenqing; Gu, Honghui; Sheng, Xiaoguang; Yu, Huifang; Wang, Jiansheng; Huang, Long; Wang, Dan

    2016-01-01

    Molecular markers and genetic maps play an important role in plant genomics and breeding studies. Cauliflower is an important and distinctive vegetable; however, very few molecular resources have been reported for this species. In this study, a novel, specific-locus amplified fragment (SLAF) sequencing strategy was employed for large-scale single nucleotide polymorphism (SNP) discovery and high-density genetic map construction in a double-haploid, segregating population of cauliflower. A total of 12.47 Gb raw data containing 77.92 M pair-end reads were obtained after processing and 6815 polymorphic SLAFs between the two parents were detected. The average sequencing depths reached 52.66-fold for the female parent and 49.35-fold for the male parent. Subsequently, these polymorphic SLAFs were used to genotype the population and further filtered based on several criteria to construct a genetic linkage map of cauliflower. Finally, 1776 high-quality SLAF markers, including 2741 SNPs, constituted the linkage map with average data integrity of 95.68%. The final map spanned a total genetic length of 890.01 cM with an average marker interval of 0.50 cM, and covered 364.9 Mb of the reference genome. The markers and genetic map developed in this study could provide an important foundation not only for comparative genomics studies within Brassica oleracea species but also for quantitative trait loci identification and molecular breeding of cauliflower.

  1. Processing of the precursor of protamine P2 in mouse. Peptide mapping and N-terminal sequence analysis of intermediates.

    PubMed Central

    Carré-Eusèbe, D; Lederer, F; Lê, K H; Elsevier, S M

    1991-01-01

    Protamine P2, the major basic chromosomal protein of mouse spermatozoa, is synthesized as a precursor almost twice as long as the mature protein, its extra length arising from an N-terminal extension of 44 amino acid residues. This precursor is integrated into chromatin of spermatids, and the extension is processed during chromatin condensation in the haploid cells. We have studied processing in the mouse and have identified two intermediates generated by proteolytic cleavage of the precursor. H.p.l.c. separated protamine P2 from four other spermatid proteins, including the precursor and three proteins known to possess physiological characteristics expected of processing intermediates. Peptide mapping indicated that all of these proteins were structurally similar. Two major proteins were further purified by PAGE, transferred to poly(vinylidene difluoride) membranes and submitted to automated N-terminal sequence analysis. Both sequences were found within the deduced sequence of the precursor extension. The N-terminus of the larger intermediate, PP2C, was Gly-12, whereas the N-terminus of the smaller, PP2D, was His-21. Both processing sites involved a peptide bond in which the carbonyl function was contributed by an acidic amino acid. Images Fig. 1. Fig. 3. Fig. 4. PMID:1854346

  2. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

    PubMed

    McKenna, Aaron; Hanna, Matthew; Banks, Eric; Sivachenko, Andrey; Cibulskis, Kristian; Kernytsky, Andrew; Garimella, Kiran; Altshuler, David; Gabriel, Stacey; Daly, Mark; DePristo, Mark A

    2010-09-01

    Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

  3. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  4. Mapping vaccinia virus DNA replication origins at nucleotide level by deep sequencing.

    PubMed

    Senkevich, Tatiana G; Bruno, Daniel; Martens, Craig; Porcella, Stephen F; Wolf, Yuri I; Moss, Bernard

    2015-09-01

    Poxviruses reproduce in the host cytoplasm and encode most or all of the enzymes and factors needed for expression and synthesis of their double-stranded DNA genomes. Nevertheless, the mode of poxvirus DNA replication and the nature and location of the replication origins remain unknown. A current but unsubstantiated model posits only leading strand synthesis starting at a nick near one covalently closed end of the genome and continuing around the other end to generate a concatemer that is subsequently resolved into unit genomes. The existence of specific origins has been questioned because any plasmid can replicate in cells infected by vaccinia virus (VACV), the prototype poxvirus. We applied directional deep sequencing of short single-stranded DNA fragments enriched for RNA-primed nascent strands isolated from the cytoplasm of VACV-infected cells to pinpoint replication origins. The origins were identified as the switching points of the fragment directions, which correspond to the transition from continuous to discontinuous DNA synthesis. Origins containing a prominent initiation point mapped to a sequence within the hairpin loop at one end of the VACV genome and to the same sequence within the concatemeric junction of replication intermediates. These findings support a model for poxvirus genome replication that involves leading and lagging strand synthesis and is consistent with the requirements for primase and ligase activities as well as earlier electron microscopic and biochemical studies implicating a replication origin at the end of the VACV genome.

  5. Refined mapping of X-linked reticulate pigmentary disorder and sequencing of candidate genes

    PubMed Central

    2009-01-01

    X-linked reticulate pigmentary disorder with systemic manifestations in males (PDR) is very rare. Affected males are characterized by cutaneous and visceral symptoms suggestive of abnormally regulated inXammation. A genetic linkage study of a large Canadian kindred previously mapped the PDR gene to a greater than 40 Mb interval of Xp22–p21. The aim of this study was to identify the causative gene for PDR. The Canadian pedigree was expanded and additional PDR families recruited. Genetic linkage was performed using newer microsatellite markers. Positional and functional candidate genes were screened by PCR and sequencing of coding exons in affected males. The location of the PDR gene was narrowed to a ~4.9 Mb interval of Xp22.11–p21.3 between markers DXS1052 and DXS1061. All annotated coding exons within this interval were sequenced in one affected male from each of the three multiplex families as well as one singleton, but no causative mutation was identiWed. Sequencing of other X-linked genes outside of the linked interval also failed to identify the cause of PDR but revealed a novel nonsynonymous cSNP in the GRPR gene in the Maltese population. PDR is most likely due to a mutation within the linked interval not affecting currently annotated coding exons. PMID:18404279

  6. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.

  7. Whole-Genome Restriction Mapping by "Subhaploid"-Based RAD Sequencing: An Efficient and Flexible Approach for Physical Mapping and Genome Scaffolding.

    PubMed

    Dou, Jinzhuang; Dou, Huaiqian; Mu, Chuang; Zhang, Lingling; Li, Yangping; Wang, Jia; Li, Tianqi; Li, Yuli; Hu, Xiaoli; Wang, Shi; Bao, Zhenmin

    2017-07-01

    Assembly of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Generation of ultradense linkage maps is promising for anchoring such assemblies, but traditional linkage mapping methods are hindered by the infrequency and unevenness of meiotic recombination that limit attainable map resolution. Here we develop a sequencing-based "in vitro" linkage mapping approach (called RadMap), where chromosome breakage and segregation are realized by generating hundreds of "subhaploid" fosmid/bacterial-artificial-chromosome clone pools, and by restriction site-associated DNA sequencing of these clone pools to produce an ultradense whole-genome restriction map to facilitate genome scaffolding. A bootstrap-based minimum spanning tree algorithm is developed for grouping and ordering of genome-wide markers and is implemented in a user-friendly, integrated software package (AMMO). We perform extensive analyses to validate the power and accuracy of our approach in the model plant Arabidopsis thaliana and human. We also demonstrate the utility of RadMap for enhancing the contiguity of a variety of whole-genome shotgun assemblies generated using either short Illumina reads (300 bp) or long PacBio reads (6-14 kb), with up to 15-fold improvement of N50 (∼816 kb-3.7 Mb) and high scaffolding accuracy (98.1-98.5%). RadMap outperforms BioNano and Hi-C when input assembly is highly fragmented (contig N50 = 54 kb). RadMap can capture wide-range contiguity information and provide an efficient and flexible tool for high-resolution physical mapping and scaffolding of highly fragmented assemblies. Copyright © 2017 Dou et al.

  8. Whole-Genome Restriction Mapping by “Subhaploid”-Based RAD Sequencing: An Efficient and Flexible Approach for Physical Mapping and Genome Scaffolding

    PubMed Central

    Dou, Jinzhuang; Dou, Huaiqian; Mu, Chuang; Zhang, Lingling; Li, Yangping; Wang, Jia; Li, Tianqi; Li, Yuli; Hu, Xiaoli; Wang, Shi; Bao, Zhenmin

    2017-01-01

    Assembly of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Generation of ultradense linkage maps is promising for anchoring such assemblies, but traditional linkage mapping methods are hindered by the infrequency and unevenness of meiotic recombination that limit attainable map resolution. Here we develop a sequencing-based “in vitro” linkage mapping approach (called RadMap), where chromosome breakage and segregation are realized by generating hundreds of “subhaploid” fosmid/bacterial-artificial-chromosome clone pools, and by restriction site-associated DNA sequencing of these clone pools to produce an ultradense whole-genome restriction map to facilitate genome scaffolding. A bootstrap-based minimum spanning tree algorithm is developed for grouping and ordering of genome-wide markers and is implemented in a user-friendly, integrated software package (AMMO). We perform extensive analyses to validate the power and accuracy of our approach in the model plant Arabidopsis thaliana and human. We also demonstrate the utility of RadMap for enhancing the contiguity of a variety of whole-genome shotgun assemblies generated using either short Illumina reads (300 bp) or long PacBio reads (6–14 kb), with up to 15-fold improvement of N50 (∼816 kb-3.7 Mb) and high scaffolding accuracy (98.1–98.5%). RadMap outperforms BioNano and Hi-C when input assembly is highly fragmented (contig N50 = 54 kb). RadMap can capture wide-range contiguity information and provide an efficient and flexible tool for high-resolution physical mapping and scaffolding of highly fragmented assemblies. PMID:28468906

  9. Successful cord blood transplantation for a CHARGE syndrome with CHD7 mutation showing DiGeorge sequence including hypoparathyroidism.

    PubMed

    Inoue, Hirosuke; Takada, Hidetoshi; Kusuda, Takeshi; Goto, Takako; Ochiai, Masayuki; Kinjo, Tadamune; Muneuchi, Jun; Takahata, Yasushi; Takahashi, Naomi; Morio, Tomohiro; Kosaki, Kenjiro; Hara, Toshiro

    2010-07-01

    It is rare that coloboma, heart anomalies, choanal atresia, retarded growth and development, and genital and ear anomalies (CHARGE) syndrome patients have DiGeorge sequence showing severe immunodeficiency due to the defect of the thymus. Although the only treatment to achieve immunological recovery for these patients in countries where thymic transplantation is not ethically approved would be hematopoietic cell transplantation, long-term survival has not been obtained in most patients. On the other hand, it is still not clarified whether hypoparathyroidism is one of the manifestations of CHARGE syndrome. We observed a CHARGE syndrome patient with chromodomain helicase DNA-binding protein 7 mutation showing DiGeorge sequence including the defect of T cells accompanied with the aplasia of the thymus, severe hypoparathyroidism, and conotruncal cardiac anomaly. He received unrelated cord blood transplantation without conditioning at 4 months of age. Recovery of T cell number and of proliferative response against mitogens was achieved by peripheral expansion of mature T cells in cord blood without thymic output. Although he is still suffering from severe hypoparathyroidism, he is alive without serious infections for 10 months.

  10. A High-Density SNP-Based Linkage Map of the Chicken Genome Reveals Sequence Features Correlated With Recombination Rate

    USDA-ARS?s Scientific Manuscript database

    The resolution of the widely used chicken consensus linkage map was highly enlarged by genotyping a total of 12,945 SNPs on the three existing mapping populations in chicken; the Wageningen (WU), East Lansing (EL) and Uppsala (UPP) mapping populations. A total of 8608 SNPs could be included on the m...

  11. A girl with 15q overgrowth syndrome and dup(15)(q24q26.3) that included telomeric sequences.

    PubMed

    Gutiérrez-Franco, María de Los Angeles; Madariaga-Campos, María de la Luz; Vásquez-Velásquez, Ana I; Matute, Esmeralda; Guevara-Yáñez, Roberto; Rivera, Horacio

    2010-06-01

    Distal 15q trisomy or tetrasomy is associated with a characteristic phenotype that includes mild to moderate intellectual disability, abnormal behavior, speech impairment, overgrowth, hyperlaxity, long face, prominent nose, puffy cheeks, pointed chin, small ears, and hand anomalies (mainly arachno- and camptodactyly). We present the case of a 13-yr-old girl with the main clinical features of 15q overgrowth syndrome and a 46,XX,dup(15)(q24q26.3)[117]/46,XX[3].ish dup(15)(q24q26.3) (SNPRN+,PML+,subtel++,tel++) de novo karyotype. The findings in this case are consistent with those in the previous distal 15q trisomy cases that presented with overgrowth and mental retardation. Further, the rearranged chromosome had a double set of directly oriented telomeric and subtelomeric sequences.

  12. Microbial genome program report: Optical approaches for physical mapping and sequence assembly of the Deinococcus radiodurans chromosome

    SciTech Connect

    Schwartz, David C.

    1999-11-23

    Maps of genomic or cloned DNA are frequently constructed by analyzing the cleavage patterns produced by restriction enzymes. Restriction enzymes are remarkable reagents that faithfully cleave only at specific sequences of between 4 and 8 nucleotides, which vary according to the specific enzymes. Restriction enzymes are reliable, numerous, and easily obtainable and presently, there are approximately 250 different sequences represented among thousands of enzymes. Restriction maps characterize gene structure and even entire genomes. Furthermore, such maps provide a useful scaffold for the alignment and verification of sequence data. Restriction maps generated by computer and predicted from the sequence are aligned with the actual restriction map. Restriction enzyme action has traditionally been assayed by gel electrophoresis. This technique separates cleaved molecules on the basis of their nobilities under the influence of an applied electrical field, within a gel separation matrix (small fragments have a greater mobility than large ones). Although gel electrophoresis distinguishes different sized DNA fragments (known as a fingerprint), the original order of these fragments remains unknown. The subsequent task of determining the order of such fragments is a labor intensive task, especially when making restriction maps of whole genomes, and therefore despite its obvious utility to genome analysis, it is not widely used.

  13. The Iccare web server: an attempt to merge sequence and mapping information for plant and animal species.

    PubMed

    Muller, Cédric; Denis, Mathieu; Gentzbittel, Laurent; Faraut, Thomas

    2004-07-01

    The Iccare web server, http://genopole.toulouse.inra.fr/bioinfo/Iccare, provides a simple yet efficient tool for crude EST (expressed sequence tag) annotation specifically dedicated to comparative mapping approaches. Iccare uses all the EST and mRNA sequences from public databases for an organism of interest (query species) and compares them to all the transcripts of one reference organism (Homo sapiens or Arabidopsis thaliana). The results are displayed according to the location of the genes on the chromosomes of the reference organism. Gene structure information and sequence similarities are combined in a graphical representation in order to pinpoint the nature of the transcript query sequence. The user can subsequently design primers or probes for the purpose of physical or genetic mapping. In addition to the query organisms already available in Iccare, users can perform a tailor-made search with their own sequences against the animal or plant reference organism genes.

  14. Application of the High Resolution Melting analysis for genetic mapping of Sequence Tagged Site markers in narrow-leafed lupin (Lupinus angustifolius L.).

    PubMed

    Kamel, Katarzyna A; Kroc, Magdalena; Święcicki, Wojciech

    2015-01-01

    Sequence tagged site (STS) markers are valuable tools for genetic and physical mapping that can be successfully used in comparative analyses among related species. Current challenges for molecular markers genotyping in plants include the lack of fast, sensitive and inexpensive methods suitable for sequence variant detection. In contrast, high resolution melting (HRM) is a simple and high-throughput assay, which has been widely applied in sequence polymorphism identification as well as in the studies of genetic variability and genotyping. The present study is the first attempt to use the HRM analysis to genotype STS markers in narrow-leafed lupin (Lupinus angustifolius L.). The sensitivity and utility of this method was confirmed by the sequence polymorphism detection based on melting curve profiles in the parental genotypes and progeny of the narrow-leafed lupin mapping population. Application of different approaches, including amplicon size and a simulated heterozygote analysis, has allowed for successful genetic mapping of 16 new STS markers in the narrow-leafed lupin genome.

  15. Drawing a high-resolution functional map of adeno-associated virus capsid by massively parallel sequencing.

    PubMed

    Adachi, Kei; Enoki, Tatsuji; Kawano, Yasuhiro; Veraz, Michael; Nakai, Hiroyuki

    2014-01-01

    Adeno-associated virus (AAV) capsid engineering is an emerging approach to advance gene therapy. However, a systematic analysis on how each capsid amino acid contributes to multiple functions remains challenging. Here we show proof-of-principle and successful application of a novel approach, termed AAV Barcode-Seq, that allows us to characterize phenotypes of hundreds of different AAV strains in a high-throughput manner and therefore overcomes technical difficulties in the systematic analysis. In this approach, we generate DNA barcode-tagged AAV libraries and determine a spectrum of phenotypes of each AAV strain by Illumina barcode sequencing. By applying this method to AAV capsid mutant libraries tagged with DNA barcodes, we can draw a high-resolution map of AAV capsid amino acids important for the structural integrity and functions including receptor binding, tropism, neutralization and blood clearance. Thus, Barcode-Seq provides a new tool to generate a valuable resource for virus and gene therapy research.

  16. Read-mapping using personalized diploid reference genome for RNA sequencing data reduced bias for detecting allele-specific expression

    PubMed Central

    Yuan, Shuai; Qin, Zhaohui

    2014-01-01

    Next generation sequencing (NGS) technologies have been applied extensively in many areas of genetics and genomics research. A fundamental problem when comes to analyzing NGS data is mapping short sequencing reads back to the reference genome. Most of existing software packages rely on a single uniform reference genome and do not automatically take into the consideration of genetic variants. On the other hand, large proportions of incorrectly mapped reads affect the correct interpretation of the NGS experimental results. As an example, Degner et al. showed that detecting allele-specific expression from RNA sequencing data was biased toward the reference allele. In this study, we developed a method that utilize DirectX 11 enabled graphics processing unit (GPU)’s parallel computing power to produces a personalized diploid reference genome based on all known genetic variants of that particular individual. We show that using such a personalized diploid reference genome can improve mapping accuracy and significantly reduce the bias toward reference allele in allele-specific expression analysis. Our method can be applied to any individual that has genotype information obtained either from array-based genotyping or resequencing. Besides the reference genome, no additional changes to alignment algorithm are needed for performing read mapping therefore one can utilize any of the existing read mapping tools and achieve the improved read mapping result. C++ and GPU compute shader source code of the software program is available at: http://code.google.com/p/diploid-mapping/downloads/list. PMID:25621316

  17. Read-mapping using personalized diploid reference genome for RNA sequencing data reduced bias for detecting allele-specific expression.

    PubMed

    Yuan, Shuai; Qin, Zhaohui

    2012-10-01

    Next generation sequencing (NGS) technologies have been applied extensively in many areas of genetics and genomics research. A fundamental problem when comes to analyzing NGS data is mapping short sequencing reads back to the reference genome. Most of existing software packages rely on a single uniform reference genome and do not automatically take into the consideration of genetic variants. On the other hand, large proportions of incorrectly mapped reads affect the correct interpretation of the NGS experimental results. As an example, Degner et al. showed that detecting allele-specific expression from RNA sequencing data was biased toward the reference allele. In this study, we developed a method that utilize DirectX 11 enabled graphics processing unit (GPU)'s parallel computing power to produces a personalized diploid reference genome based on all known genetic variants of that particular individual. We show that using such a personalized diploid reference genome can improve mapping accuracy and significantly reduce the bias toward reference allele in allele-specific expression analysis. Our method can be applied to any individual that has genotype information obtained either from array-based genotyping or resequencing. Besides the reference genome, no additional changes to alignment algorithm are needed for performing read mapping therefore one can utilize any of the existing read mapping tools and achieve the improved read mapping result. C++ and GPU compute shader source code of the software program is available at: http://code.google.com/p/diploid-mapping/downloads/list.

  18. Mapping specificity landscapes of RNA-protein interactions by high throughput sequencing.

    PubMed

    Jankowsky, Eckhard; Harris, Michael E

    2017-03-02

    To function in a biological setting, RNA binding proteins (RBPs) have to discriminate between alternative binding sites in RNAs. This discrimination can occur in the ground state of an RNA-protein binding reaction, in its transition state, or in both. The extent by which RBPs discriminate at these reaction states defines RBP specificity landscapes. Here, we describe the HiTS-Kin and HiTS-EQ techniques, which combine kinetic and equilibrium binding experiments with high throughput sequencing to quantitatively assess substrate discrimination for large numbers of substrate variants at ground and transition states of RNA-protein binding reactions. We discuss experimental design, practical considerations and data analysis and outline how a combination of HiTS-Kin and HiTS-EQ allows the mapping of RBP specificity landscapes.

  19. Mapping of attenuating sequences of an avirulent poliovirus type 2 strain.

    PubMed

    Moss, E G; O'Neill, R E; Racaniello, V R

    1989-05-01

    A mouse model for poliomyelitis was used to identify genomic sequences that attenuate neurovirulence of poliovirus strain P2/P712. This type 2 strain is avirulent in primates and mice yet grows as well as virulent strains in cell culture. The approach used was to exchange portions of the genome of the mouse-virulent P2/Lansing strain with the corresponding region from P2/P712 to identify sequences that could attenuate Lansing neurovirulence in mice. A full-length infectious cDNA of P2/P712 was assembled and used to construct recombinants between P2/P712 and P2/Lansing. The results of neurovirulence testing of 11 recombinants indicated that strong attenuating determinants are located in the 5' noncoding region of P2/P712 and a region encoding capsid protein VP1 and 2Apro, 2B, and part of 2C. An attenuating determinant was further localized to between nucleotides 456 and 628 of P2/P712. A third sequence from P2/P712, nucleotides 752 to 2268, encoding VP4, VP2, and part of VP3, was weakly attenuating. The sequence from nucleotide 4454, approximately halfway through the 2C-coding region, to the end of the P2/P712 genome did not contain attenuating determinants. Nucleotide sequence analysis revealed that P2/P712 differs from the type 2 Sabin vaccine strain by only 22 nucleotides. Six differences lead to amino acid changes in the coding region, and four differences are in the 5' noncoding region. These studies show that, like the type 1 and type 3 Sabin vaccine strains, the attenuated type 2 strain P712 contains multiple attenuating sequences, including strongly attenuating sequences in the 5' noncoding region of the genome.

  20. Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis

    PubMed Central

    Martin, Véronique; Zytnicki, Matthias; Fayolle, Julien; Loux, Valentin; Gibrat, Jean-François

    2012-01-01

    Abstract Mapping short reads against a reference genome is classically the first step of many next-generation sequencing data analyses, and it should be as accurate as possible. Because of the large number of reads to handle, numerous sophisticated algorithms have been developped in the last 3 years to tackle this problem. In this article, we first review the underlying algorithms used in most of the existing mapping tools, and then we compare the performance of nine of these tools on a well controled benchmark built for this purpose. We built a set of reads that exist in single or multiple copies in a reference genome and for which there is no mismatch, and a set of reads with three mismatches. We considered as reference genome both the human genome and a concatenation of all complete bacterial genomes. On each dataset, we quantified the capacity of the different tools to retrieve all the occurrences of the reads in the reference genome. Special attention was paid to reads uniquely reported and to reads with multiple hits. PMID:22506536

  1. A simple sequence repeat- and single-nucleotide polymorphism-based genetic linkage map of the brown planthopper, Nilaparvata lugens.

    PubMed

    Jairin, Jirapong; Kobayashi, Tetsuya; Yamagata, Yoshiyuki; Sanada-Morimura, Sachiyo; Mori, Kazuki; Tashiro, Kosuke; Kuhara, Satoru; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Yamamoto, Kimiko; Matsumura, Masaya; Yasui, Hideshi

    2013-02-01

    In this study, we developed the first genetic linkage map for the major rice insect pest, the brown planthopper (BPH, Nilaparvata lugens). The linkage map was constructed by integrating linkage data from two backcross populations derived from three inbred BPH strains. The consensus map consists of 474 simple sequence repeats, 43 single-nucleotide polymorphisms, and 1 sequence-tagged site, for a total of 518 markers at 472 unique positions in 17 linkage groups. The linkage groups cover 1093.9 cM, with an average distance of 2.3 cM between loci. The average number of marker loci per linkage group was 27.8. The sex-linkage group was identified by exploiting X-linked and Y-specific markers. Our linkage map and the newly developed markers used to create it constitute an essential resource and a useful framework for future genetic analyses in BPH.

  2. Construction of a high-density genetic map and the X/Y sex-determining gene mapping in spinach based on large-scale markers developed by specific-locus amplified fragment sequencing (SLAF-seq).

    PubMed

    Qian, Wei; Fan, Guiyan; Liu, Dandan; Zhang, Helong; Wang, Xiaowu; Wu, Jian; Xu, Zhaosheng

    2017-04-04

    Cultivated spinach (Spinacia oleracea L.) is one of the most widely cultivated types of leafy vegetable in the world, and it has a high nutritional value. Spinach is also an ideal plant for investigating the mechanism of sex determination because it is a dioecious species with separate male and female plants. Some reports on the sex labeling and localization of spinach in the study of molecular markers have surfaced. However, there have only been two reports completed on the genetic map of spinach. The lack of rich and reliable molecular markers and the shortage of high-density linkage maps are important constraints in spinach research work. In this study, a high-density genetic map of spinach based on the Specific-locus Amplified Fragment Sequencing (SLAF-seq) technique was constructed; the sex-determining gene was also finely mapped. Through bio-information analysis, 50.75 Gb of data in total was obtained, including 207.58 million paired-end reads. Finally, 145,456 high-quality SLAF markers were obtained, with 27,800 polymorphic markers and 4080 SLAF markers were finally mapped onto the genetic map after linkage analysis. The map spanned 1,125.97 cM with an average distance of 0.31 cM between the adjacent marker loci. It was divided into 6 linkage groups corresponding to the number of spinach chromosomes. Besides, the combination of Bulked Segregation Analysis (BSA) with SLAF-seq technology(super-BSA) was employed to generate the linkage markers with the sex-determining gene. Combined with the high-density genetic map of spinach, the sex-determining gene X/Y was located at the position of the linkage group (LG) 4 (66.98 cM-69.72 cM and 75.48 cM-92.96 cM), which may be the ideal region for the sex-determining gene. A high-density genetic map of spinach based on the SLAF-seq technique was constructed with a backcross (BC1) population (which is the highest density genetic map of spinach reported at present). At the same time, the sex-determining gene X/Y was mapped

  3. Development of polymorphic expressed sequence tag-derived microsatellites for the extension of the genetic linkage map of the black tiger shrimp (Penaeus monodon).

    PubMed

    Maneeruttanarungroj, C; Pongsomboon, S; Wuthisuthimethavee, S; Klinbunga, S; Wilson, K J; Swan, J; Li, Y; Whan, V; Chu, K-H; Li, C P; Tong, J; Glenn, K; Rothschild, M; Jerry, D; Tassanakajon, A

    2006-08-01

    In this study, microsatellite markers were developed for the genetic linkage mapping and breeding program of the black tiger shrimp Penaeus monodon. A total of 997 unique microsatellite-containing expressed sequence tags (ESTs) were identified from 10 100 EST sequences in the P. monodon EST database. AT-rich microsatellite types were predominant in the EST sequences. Homology searching by the blastn and blastx programs revealed that these 997 ESTs represented 8.6% known gene products, 27.8% hypothetical proteins and 63.6% unknown gene products. Characterization of 50 markers on a panel of 35-48 unrelated shrimp indicated an average number of alleles of 12.6 and an average polymorphic information content of 0.723. These EST microsatellite markers along with 208 other markers (185 amplified fragment length polymorphisms, one exon-primed intron-crossing, six single strand conformation polymorphisms, one single nucleotide polymorphism, 13 non-EST-associated microsatellites and two EST-associated microsatellites) were analysed across the international P. monodon mapping family. A total of 144 new markers were added to the P. monodon maps, including 36 of the microsatellite-containing ESTs. The current P. monodon male and female linkage maps have 47 and 36 linkage groups respectively with coverage across half the P. monodon genome.

  4. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    NASA Astrophysics Data System (ADS)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  5. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus).

    PubMed

    Dong, Yang; Xie, Min; Jiang, Yu; Xiao, Nianqing; Du, Xiaoyong; Zhang, Wenguang; Tosser-Klopp, Gwenola; Wang, Jinhuan; Yang, Shuang; Liang, Jie; Chen, Wenbin; Chen, Jing; Zeng, Peng; Hou, Yong; Bian, Chao; Pan, Shengkai; Li, Yuxiang; Liu, Xin; Wang, Wenliang; Servin, Bertrand; Sayre, Brian; Zhu, Bin; Sweeney, Deacon; Moore, Rich; Nie, Wenhui; Shen, Yongyi; Zhao, Ruoping; Zhang, Guojie; Li, Jinquan; Faraut, Thomas; Womack, James; Zhang, Yaping; Kijas, James; Cockett, Noelle; Xu, Xun; Zhao, Shuhong; Wang, Jun; Wang, Wen

    2013-02-01

    We report the ∼2.66-Gb genome sequence of a female Yunnan black goat. The sequence was obtained by combining short-read sequencing data and optical mapping data from a high-throughput whole-genome mapping instrument. The whole-genome mapping data facilitated the assembly of super-scaffolds >5× longer by the N50 metric than scaffolds augmented by fosmid end sequencing (scaffold N50 = 3.06 Mb, super-scaffold N50 = 16.3 Mb). Super-scaffolds are anchored on chromosomes based on conserved synteny with cattle, and the assembly is well supported by two radiation hybrid maps of chromosome 1. We annotate 22,175 protein-coding genes, most of which were recovered in the RNA-seq data of ten tissues. Comparative transcriptomic analysis of the primary and secondary follicles of a cashmere goat reveal 51 genes that are differentially expressed between the two types of hair follicles. This study, whose results will facilitate goat genomics, shows that whole-genome mapping technology can be used for the de novo assembly of large genomes.

  6. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats

    PubMed Central

    van der Weide, Robin H.; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts. PMID:27501045

  7. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    PubMed

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  8. Integrated Georeferencing of Stereo Image Sequences Captured with a Stereovision Mobile Mapping System - Approaches and Practical Results

    NASA Astrophysics Data System (ADS)

    Eugster, H.; Huber, F.; Nebiker, S.; Gisi, A.

    2012-07-01

    Stereovision based mobile mapping systems enable the efficient capturing of directly georeferenced stereo pairs. With today's camera and onboard storage technologies imagery can be captured at high data rates resulting in dense stereo sequences. These georeferenced stereo sequences provide a highly detailed and accurate digital representation of the roadside environment which builds the foundation for a wide range of 3d mapping applications and image-based geo web-services. Georeferenced stereo images are ideally suited for the 3d mapping of street furniture and visible infrastructure objects, pavement inspection, asset management tasks or image based change detection. As in most mobile mapping systems, the georeferencing of the mapping sensors and observations - in our case of the imaging sensors - normally relies on direct georeferencing based on INS/GNSS navigation sensors. However, in urban canyons the achievable direct georeferencing accuracy of the dynamically captured stereo image sequences is often insufficient or at least degraded. Furthermore, many of the mentioned application scenarios require homogeneous georeferencing accuracy within a local reference frame over the entire mapping perimeter. To achieve these demands georeferencing approaches are presented and cost efficient workflows are discussed which allows validating and updating the INS/GNSS based trajectory with independently estimated positions in cases of prolonged GNSS signal outages in order to increase the georeferencing accuracy up to the project requirements.

  9. AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework

    PubMed Central

    Zheng, Qi; Grice, Elizabeth A.

    2016-01-01

    Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost’s algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost. PMID:27706155

  10. AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

    PubMed

    Zheng, Qi; Grice, Elizabeth A

    2016-10-01

    Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

  11. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    SciTech Connect

    Shou, S.; Kvikstad, E.; Kile, A.; Severin, J.; Forrest, D.; Runnheim, R.; Churas, C.; Hickman, J. W.; Mackenzie, C.; Choudhary, M.; Donohue, T.; Kaplan, S.; Schwartz, D. C.

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.

  12. Genotyping by Sequencing Using Specific Allelic Capture to Build a High-Density Genetic Map of Durum Wheat

    PubMed Central

    Holtz, Yan; Ardisson, Morgane; Ranwez, Vincent; Besnard, Alban; Leroy, Philippe; Poux, Gérard; Roumet, Pierre; Viader, Véronique; Santoni, Sylvain; David, Jacques

    2016-01-01

    Targeted sequence capture is a promising technology which helps reduce costs for sequencing and genotyping numerous genomic regions in large sets of individuals. Bait sequences are designed to capture specific alleles previously discovered in parents or reference populations. We studied a set of 135 RILs originating from a cross between an emmer cultivar (Dic2) and a recent durum elite cultivar (Silur). Six thousand sequence baits were designed to target Dic2 vs. Silur polymorphisms discovered in a previous RNAseq study. These baits were exposed to genomic DNA of the RIL population. Eighty percent of the targeted SNPs were recovered, 65% of which were of high quality and coverage. The final high density genetic map consisted of more than 3,000 markers, whose genetic and physical mapping were consistent with those obtained with large arrays. PMID:27171472

  13. Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing

    PubMed Central

    Chan, Chon-Kit Kenneth; Hsu, Arthur L.; Tang, Sen-Lin; Halgamuge, Saman K.

    2008-01-01

    Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%–15% speed improvement. PMID:18288261

  14. Sequenced Alleles of the Caenorhabditis Elegans Sex-Determining Gene Her-1 Include a Novel Class of Conditional Promoter Mutations

    PubMed Central

    Perry, M. D.; Trent, C.; Robertson, B.; Chamblin, C.; Wood, W. B.

    1994-01-01

    In the control of Caenorhabditis elegans sex determination, the her-1 gene must normally be activated to allow male development of XO animals and deactivated to allow hermaphrodite development of XX animals. The gene is regulated at the transcriptional level and has two nested male-specific transcripts. The larger of these encodes a small, novel, cysteine-rich protein responsible for masculinizing activity. Of the 32 extant mutant alleles, 30 cause partial or complete loss of masculinizing function (lf), while 2 are gain-of-function (gf) alleles resulting in abnormal masculinization of XX animals. We have identified the DNA sequence changes in each of these 32 alleles. Most affect the protein coding functions of the gene, but six are in the promoter region, including the two gf mutations. These two mutations may define a binding site for negative regulators of her-1. Three of the four remaining promoter mutations are single base changes that cause, surprisingly, temperature-sensitive loss of her-1 function. Such conditional promoter mutations have previously not been found among either prokaryotic or eukaryotic mutants analyzed at the molecular level. PMID:7828816

  15. Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping.

    PubMed

    Wang, Tingting; Chen, Yi-Ping Phoebe; MacLeod, Iona M; Pryce, Jennie E; Goddard, Michael E; Hayes, Ben J

    2017-08-15

    Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long. Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies. The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in

  16. Sequence characterization, in silico mapping and cytosine methylation analysis of markers linked to apospory in Paspalum notatum

    PubMed Central

    Podio, Maricel; Rodríguez, María P.; Felitti, Silvina; Stein, Juliana; Martínez, Eric J.; Siena, Lorena A.; Quarin, Camilo L.; Pessino, Silvina C.; Ortiz, Juan Pablo A.

    2012-01-01

    In previous studies we reported the identification of several AFLP, RAPD and RFLP molecular markers linked to apospory in Paspalum notatum. The objective of this work was to sequence these markers, obtain their flanking regions by chromosome walking and perform an in silico mapping analysis in rice and maize. The methylation status of two apospory-related sequences was also assessed using methylation-sensitive RFLP experiments. Fourteen molecular markers were analyzed and several protein-coding sequences were identified. Copy number estimates and RFLP linkage analysis showed that the sequence PnMAI3 displayed 2–4 copies per genome and linkage to apospory. Extension of this marker by chromosome walking revealed an additional protein-coding sequence mapping in silico in the apospory-syntenic regions of rice and maize. Approximately 5 kb corresponding to different markers were characterized through the global sequencing procedure. A more refined analysis based on sequence information indicated synteny with segments of chromosomes 2 and 12 of rice and chromosomes 3 and 5 of maize. Two loci associated with apomixis locus were tested in methylation-sensitive RFLP experiments using genomic DNA extracted from leaves. Although both target sequences were methylated no methylation polymorphisms associated with the mode of reproduction were detected. PMID:23271945

  17. Sequence characterization, in silico mapping and cytosine methylation analysis of markers linked to apospory in Paspalum notatum.

    PubMed

    Podio, Maricel; Rodríguez, María P; Felitti, Silvina; Stein, Juliana; Martínez, Eric J; Siena, Lorena A; Quarin, Camilo L; Pessino, Silvina C; Ortiz, Juan Pablo A

    2012-12-01

    In previous studies we reported the identification of several AFLP, RAPD and RFLP molecular markers linked to apospory in Paspalum notatum. The objective of this work was to sequence these markers, obtain their flanking regions by chromosome walking and perform an in silico mapping analysis in rice and maize. The methylation status of two apospory-related sequences was also assessed using methylation-sensitive RFLP experiments. Fourteen molecular markers were analyzed and several protein-coding sequences were identified. Copy number estimates and RFLP linkage analysis showed that the sequence PnMAI3 displayed 2-4 copies per genome and linkage to apospory. Extension of this marker by chromosome walking revealed an additional protein-coding sequence mapping in silico in the apospory-syntenic regions of rice and maize. Approximately 5 kb corresponding to different markers were characterized through the global sequencing procedure. A more refined analysis based on sequence information indicated synteny with segments of chromosomes 2 and 12 of rice and chromosomes 3 and 5 of maize. Two loci associated with apomixis locus were tested in methylation-sensitive RFLP experiments using genomic DNA extracted from leaves. Although both target sequences were methylated no methylation polymorphisms associated with the mode of reproduction were detected.

  18. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle.

    PubMed

    Daetwyler, Hans D; Capitan, Aurélien; Pausch, Hubert; Stothard, Paul; van Binsbergen, Rianne; Brøndum, Rasmus F; Liao, Xiaoping; Djari, Anis; Rodriguez, Sabrina C; Grohs, Cécile; Esquerré, Diane; Bouchez, Olivier; Rossignol, Marie-Noëlle; Klopp, Christophe; Rocha, Dominique; Fritz, Sébastien; Eggen, André; Bowman, Phil J; Coote, David; Chamberlain, Amanda J; Anderson, Charlotte; VanTassell, Curt P; Hulsegge, Ina; Goddard, Mike E; Guldbrandtsen, Bernt; Lund, Mogens S; Veerkamp, Roel F; Boichard, Didier A; Fries, Ruedi; Hayes, Ben J

    2014-08-01

    The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes project, we sequenced the whole genomes of 234 cattle to an average of 8.3-fold coverage. This sequencing includes data for 129 individuals from the global Holstein-Friesian population, 43 individuals from the Fleckvieh breed and 15 individuals from the Jersey breed. We identified a total of 28.3 million variants, with an average of 1.44 heterozygous sites per kilobase for each individual. We demonstrate the use of this database in identifying a recessive mutation underlying embryonic death and a dominant mutation underlying lethal chrondrodysplasia. We also performed genome-wide association studies for milk production and curly coat, using imputed sequence variants, and identified variants associated with these traits in cattle.

  19. Walking, cloning, and mapping with YACs in 3q27: Localization of five ESTs including three members of the cystatin gene family and identification of CpG islands

    SciTech Connect

    James, L.A.; Ogilvie, D.J.; Anand, R.

    1996-03-05

    Using yeast artificial chromosomes, we have generated a high-resolution physical map for 2.7 Mb of human chromosomal region 3q27. The YAC clones group into three contigs, one of which has also been linked to the CEPH YAC contig map of human chromosome 3. Fluorescence in situ hybridization has been used to order the contigs on the chromosome and to estimate the distance between them. Expressed sequence tags for five genes, including three members of the cystatin gene family and a gene thought to be involved in B-cell non-Hodgkin lymphoma, have been placed within the YAC contigs, and 12 putative CpG islands have been identified. These YACs provide a useful resource to complete the physical mapping of 3q27 and to begin identification and characterization of further genes that are located there. 27 refs., 1 fig., 1 tab.

  20. A nonhomogeneous hidden markov model for gene mapping based on next-generation sequencing data.

    PubMed

    Ghavidel, Fatemeh Zamanzad; Claesen, Jürgen; Burzykowski, Tomasz

    2015-02-01

    The analysis of polygenetic characteristics for mapping quantitative trait loci (QTL) remains an important challenge. QTL analysis requires two or more strains of organisms that differ substantially in the (poly-)genetic trait of interest, resulting in a heterozygous offspring. The offspring with the trait of interest is selected and subsequently screened for molecular markers such as single-nucleotide polymorphisms (SNPs) with next-generation sequencing. Gene mapping relies on the co-segregation between genes and/or markers. Genes and/or markers that are linked to a QTL influencing the trait will segregate more frequently with this locus. For each identified marker, observed mismatch frequencies between the reads of the offspring and the parental reference strains can be modeled by a multinomial distribution with the probabilities depending on the state of an underlying, unobserved Markov process. The states indicate whether the SNP is located in a (vicinity of a) QTL or not. Consequently, genomic loci associated with the QTL can be discovered by analyzing hidden states along the genome. The aforementioned hidden Markov model assumes that the identified SNPs are equally distributed along the chromosome and does not take the distance between neighboring SNPs into account. The distance between the neighboring SNPs could influence the chance of co-segregation between genes and markers. To address this issue, we propose a nonhomogeneous hidden Markov model with a transition matrix that depends on a set of distance-varying observed covariates. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast.

  1. Mapping the sex determination locus in the Atlantic halibut (Hippoglossus hippoglossus) using RAD sequencing

    PubMed Central

    2013-01-01

    Background Atlantic halibut (Hippoglossus hippoglossus) is a high-value, niche market species for cold-water marine aquaculture. Production of monosex female stocks is desirable in commercial production since females grow faster and mature later than males. Understanding the sex determination mechanism and developing sex-associated markers will shorten the time for the development of monosex female production, thus decreasing the costs of farming. Results Halibut juveniles were masculinised with 17 α-methyldihydrotestosterone (MDHT) and grown to maturity. Progeny groups from four treated males were reared and sexed. Two of these groups (n = 26 and 70) consisted of only females, while the other two (n = 30 and 71) contained balanced sex ratios (50% and 48% females respectively). DNA from parents and offspring from the two mixed-sex families were used as a template for Restriction-site Associated DNA (RAD) sequencing. The 648 million raw reads produced 90,105 unique RAD-tags. A linkage map was constructed based on 5703 Single Nucleotide Polymorphism (SNP) markers and 7 microsatellites consisting of 24 linkage groups, which corresponds to the number of chromosome pairs in this species. A major sex determining locus was mapped to linkage group 13 in both families. Assays for 10 SNPs with significant association with phenotypic sex were tested in both population data and in 3 additional families. Using a variety of machine-learning algorithms 97% correct classification could be obtained with the 3% of errors being phenotypic males predicted to be females. Conclusion Altogether our findings support the hypothesis that the Atlantic halibut has an XX/XY sex determination system. Assays are described for sex-associated DNA markers developed from the RAD sequencing analysis to fast track progeny testing and implement monosex female halibut production for an immediate improvement in productivity. These should also help to speed up the inclusion of neomales derived

  2. MOLLI and AIR T1 mapping pulse sequences yield different myocardial T1 and ECV measurements.

    PubMed

    Hong, KyungPyo; Kim, Daniel

    2014-11-01

    Both post-contrast myocardial T1 and extracellular volume (ECV) have been reported to be associated with diffuse interstitial fibrosis. Recently, the cardiovascular magnetic resonance (CMR) field is recognizing that post-contrast myocardial T1 is sensitive to several confounders and migrating towards ECV as a measure of collagen volume fraction. Several recent studies using widely available Modified Look-Locker Inversion-recovery (MOLLI) have reported ECV cutoff values to distinguish between normal and diseased myocardium. It is unclear if these cutoff values are translatable to different T1 mapping pulse sequences such as arrhythmia-insensitive-rapid (AIR) cardiac T1 mapping, which was recently developed to rapidly image patients with cardiac rhythm disorders. We sought to evaluate, in well-controlled canine and pig experiments, the relative accuracy and precision, as well as intra- and inter-observer variability in data analysis, of ECV measured with AIR as compared with MOLLI. In 16 dogs, as expected, the mean T1 was significantly different (p < 0.001) between MOLLI (891 ± 373 ms) and AIR (1071 ± 503 ms), but, surprisingly, the mean ECV between MOLLI (21.8 ± 2.1%) and AIR (19.6 ± 2.4%) was also significantly different (p < 0.001). Both intra- and inter-observer agreements in T1 calculations were higher for MOLLI than AIR, but intra- and inter-observer agreements in ECV calculations were similar between MOLLI and AIR. In six pigs, the coefficient of repeatability (CR), as defined by the Bland-Altman analysis, in T1 calculation was considerably lower for MOLLI (32.5 ms) than AIR (82.3 ms), and the CR in ECV calculation was also lower for MOLLI (1.8%) than AIR (4.5%). In conclusion, this study shows that MOLLI and AIR yield significantly different T1 and ECV values in large animals and that MOLLI yields higher precision than AIR. Findings from this study suggest that CMR researchers must consider the specific pulse sequence when translating published ECV cutoff

  3. MOLLI and AIR T1 Mapping Pulse Sequences Yield Different Myocardial T1 and ECV Measurements

    PubMed Central

    Hong, KyungPyo; Kim, Daniel

    2014-01-01

    Both post-contrast myocardial T1 and extracellular volume (ECV) have been reported to be associated with diffuse interstitial fibrosis. Recently, the cardiovascular magnetic resonance (CMR) field is recognizing that post-contrast myocardial T1 is sensitive to several confounders and migrating towards ECV as a measure of collagen volume fraction. Several recent studies using widely available Modified Look-Locker Inversion-recovery (MOLLI) have reported ECV cutoff values to distinguish between normal and diseased myocardium. It is unclear if these cutoff values are translatable to different T1 mapping pulse sequences such as arrhythmia-insensitive-rapid (AIR) cardiac T1 mapping, which was recently developed to rapidly image patients with cardiac rhythm disorders. We sought to evaluate, in well-controlled canine and pig experiments, the relative accuracy and precision, as well as intra- and inter-observer variability in data analysis, of ECV measured with AIR as compared with MOLLI. In 16 dogs, as expected, mean T1 was significantly different (p < 0.001) between MOLLI (891±373 ms) and AIR (1071±503 ms), but, surprisingly, mean ECV between MOLLI (21.8±2.1%) and AIR (19.6±2.4%) was also significantly different (p < 0.001). Both intra- and inter-observer agreements in T1 calculations were higher for MOLLI than AIR, but intra- and inter-observer agreements in ECV calculations were similar between MOLLI and AIR. In 6 pigs, coefficient of repeatability (CR), as defined by Bland-Altman analysis, of T1 was considerably lower for MOLLI (32.5 ms) than AIR (82.3 ms), and CR of ECV was also lower for MOLLI (1.8%) than AIR (4.5%). In conclusion, this study shows that MOLLI and AIR yield significantly different T1 and ECV values in large animals and that MOLLI yields higher precision than AIR. Findings from this study suggest that CMR researchers must consider the specific pulse sequence when translating published ECV cutoff values into their own studies. PMID:25323070

  4. Measurement of Myocardial T1ρ with a Motion Corrected, Parametric Mapping Sequence in Humans

    PubMed Central

    Shahid, Mohammed; Han, Yuchi; Witschey, Walter R. T.

    2016-01-01

    Purpose To develop a robust T1ρ magnetic resonance imaging (MRI) sequence for assessment of myocardial disease in humans. Materials and Methods We developed a breath-held T1ρ mapping method using a single-shot, T1ρ-prepared balanced steady-state free-precession (bSSFP) sequence. The magnetization trajectory was simulated to identify sources of T1ρ error. To limit motion artifacts, an optical flow-based image registration method was used to align T1ρ images. The reproducibility and accuracy of these methods was assessed in phantoms and 10 healthy subjects. Results are shown in 1 patient with pre-ventricular contractions (PVCs), 1 patient with chronic myocardial infarction (MI) and 2 patients with hypertrophic cardiomyopathy (HCM). Results In phantoms, the mean bias was 1.0 ± 2.7 msec (100 msec phantom) and 0.9 ± 0.9 msec (60 msec phantom) at 60 bpm and 2.2 ± 3.2 msec (100 msec) and 1.4 ± 0.9 msec (60 msec) at 80 bpm. The coefficient of variation (COV) was 2.2 (100 msec) and 1.3 (60 msec) at 60 bpm and 2.6 (100 msec) and 1.4 (60 msec) at 80 bpm. Motion correction improved the alignment of T1ρ images in subjects, as determined by the increase in Dice Score Coefficient (DSC) from 0.76 to 0.88. T1ρ reproducibility was high (COV < 0.05, intra-class correlation coefficient (ICC) = 0.85–0.97). Mean myocardial T1ρ value in healthy subjects was 63.5 ± 4.6 msec. There was good correspondence between late-gadolinium enhanced (LGE) MRI and increased T1ρ relaxation times in patients. Conclusion Single-shot, motion corrected, spin echo, spin lock MRI permits 2D T1ρ mapping in a breath-hold with good accuracy and precision. PMID:27003184

  5. Tracking the evolution of sex chromosome systems in Melanoplinae grasshoppers through chromosomal mapping of repetitive DNA sequences

    PubMed Central

    2013-01-01

    Background The accumulation of repetitive DNA during sex chromosome differentiation is a common feature of many eukaryotes and becomes more evident after recombination has been restricted or abolished. The accumulated repetitive sequences include multigene families, microsatellites, satellite DNAs and mobile elements, all of which are important for the structural remodeling of heterochromatin. In grasshoppers, derived sex chromosome systems, such as neo-XY♂/XX♀ and neo-X1X2Y♂/X1X1X2X2♀, are frequently observed in the Melanoplinae subfamily. However, no studies concerning the evolution of sex chromosomes in Melanoplinae have addressed the role of the repetitive DNA sequences. To further investigate the evolution of sex chromosomes in grasshoppers, we used classical cytogenetic and FISH analyses to examine the repetitive DNA sequences in six phylogenetically related Melanoplinae species with X0♂/XX♀, neo-XY♂/XX♀ and neo-X1X2Y♂/X1X1X2X2♀ sex chromosome systems. Results Our data indicate a non-spreading of heterochromatic blocks and pool of repetitive DNAs (C0t-1 DNA) in the sex chromosomes; however, the spreading of multigene families among the neo-sex chromosomes of Eurotettix and Dichromatos was remarkable, particularly for 5S rDNA. In autosomes, FISH mapping of multigene families revealed distinct patterns of chromosomal organization at the intra- and intergenomic levels. Conclusions These results suggest a common origin and subsequent differential accumulation of repetitive DNAs in the sex chromosomes of Dichromatos and an independent origin of the sex chromosomes of the neo-XY and neo-X1X2Y systems. Our data indicate a possible role for repetitive DNAs in the diversification of sex chromosome systems in grasshoppers. PMID:23937327

  6. Tracking the evolution of sex chromosome systems in Melanoplinae grasshoppers through chromosomal mapping of repetitive DNA sequences.

    PubMed

    Palacios-Gimenez, Octavio M; Castillo, Elio R; Martí, Dardo A; Cabral-de-Mello, Diogo C

    2013-08-09

    The accumulation of repetitive DNA during sex chromosome differentiation is a common feature of many eukaryotes and becomes more evident after recombination has been restricted or abolished. The accumulated repetitive sequences include multigene families, microsatellites, satellite DNAs and mobile elements, all of which are important for the structural remodeling of heterochromatin. In grasshoppers, derived sex chromosome systems, such as neo-XY♂/XX♀ and neo-X1X2Y♂/X1X1X2X2♀, are frequently observed in the Melanoplinae subfamily. However, no studies concerning the evolution of sex chromosomes in Melanoplinae have addressed the role of the repetitive DNA sequences. To further investigate the evolution of sex chromosomes in grasshoppers, we used classical cytogenetic and FISH analyses to examine the repetitive DNA sequences in six phylogenetically related Melanoplinae species with X0♂/XX♀, neo-XY♂/XX♀ and neo-X1X2Y♂/X1X1X2X2♀ sex chromosome systems. Our data indicate a non-spreading of heterochromatic blocks and pool of repetitive DNAs (C0t-1 DNA) in the sex chromosomes; however, the spreading of multigene families among the neo-sex chromosomes of Eurotettix and Dichromatos was remarkable, particularly for 5S rDNA. In autosomes, FISH mapping of multigene families revealed distinct patterns of chromosomal organization at the intra- and intergenomic levels. These results suggest a common origin and subsequent differential accumulation of repetitive DNAs in the sex chromosomes of Dichromatos and an independent origin of the sex chromosomes of the neo-XY and neo-X1X2Y systems. Our data indicate a possible role for repetitive DNAs in the diversification of sex chromosome systems in grasshoppers.

  7. A Comprehensive Expressed Sequence Tag Linkage Map for Tiger Salamander and Mexican Axolotl: Enabling Gene Mapping and Comparative Genomics in Ambystoma

    PubMed Central

    Smith, J. J.; Kump, D. K.; Walker, J. A.; Parichy, D. M.; Voss, S. R.

    2005-01-01

    Expressed sequence tag (EST) markers were developed for Ambystoma tigrinum tigrinum (Eastern tiger salamander) and for A. mexicanum (Mexican axolotl) to generate the first comprehensive linkage map for these model amphibians. We identified 14 large linkage groups (125.5–836.7 cM) that presumably correspond to the 14 haploid chromosomes in the Ambystoma genome. The extent of genome coverage for these linkage groups is apparently high because the total map size (5251 cM) falls within the range of theoretical estimates and is consistent with independent empirical estimates. Unlike most vertebrate species, linkage map size in Ambystoma is not strongly correlated with chromosome arm number. Presumably, the large physical genome size (∼30 Gbp) is a major determinant of map size in Ambystoma. To demonstrate the utility of this resource, we mapped the position of two historically significant A. mexicanum mutants, white and melanoid, and also met, a quantitative trait locus (QTL) that contributes to variation in metamorphic timing. This new collection of EST-based PCR markers will better enable the Ambystoma system by facilitating development of new molecular probes, and the linkage map will allow comparative studies of this important vertebrate group. PMID:16079226

  8. A comprehensive expressed sequence tag linkage map for tiger salamander and Mexican axolotl: enabling gene mapping and comparative genomics in Ambystoma.

    PubMed

    Smith, J J; Kump, D K; Walker, J A; Parichy, D M; Voss, S R

    2005-11-01

    Expressed sequence tag (EST) markers were developed for Ambystoma tigrinum tigrinum (Eastern tiger salamander) and for A. mexicanum (Mexican axolotl) to generate the first comprehensive linkage map for these model amphibians. We identified 14 large linkage groups (125.5-836.7 cM) that presumably correspond to the 14 haploid chromosomes in the Ambystoma genome. The extent of genome coverage for these linkage groups is apparently high because the total map size (5251 cM) falls within the range of theoretical estimates and is consistent with independent empirical estimates. Unlike most vertebrate species, linkage map size in Ambystoma is not strongly correlated with chromosome arm number. Presumably, the large physical genome size ( approximately 30 Gbp) is a major determinant of map size in Ambystoma. To demonstrate the utility of this resource, we mapped the position of two historically significant A. mexicanum mutants, white and melanoid, and also met, a quantitative trait locus (QTL) that contributes to variation in metamorphic timing. This new collection of EST-based PCR markers will better enable the Ambystoma system by facilitating development of new molecular probes, and the linkage map will allow comparative studies of this important vertebrate group.

  9. A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome

    USDA-ARS?s Scientific Manuscript database

    Cotton genome complexity was investigated with a saturated molecular genetic map that combined several sets of microsatellites or simple sequence repeats (SSR) and the first major public set of single nucleotide polymorphism (SNP) markers in cotton genomes (Gossypium spp.), and that was constructed ...

  10. An evaluation of genotyping by sequencing (GBS) to map the Breviaristatum-e (ari-e) locus in cultivated barley

    USDA-ARS?s Scientific Manuscript database

    We explored the use of genotyping by sequencing (GBS) on a recombinant inbred line population (GPMx) derived from a cross between the two-rowed barley cultivar ‘Golden Promise’ (ari-e.GP/Vrs1) and the six-rowed cultivar ‘Morex’ (Ari-e/vrs1) to map plant height. We identified three Quantitative Trait...

  11. Construction of high resolution genetic linkage maps to improve the soybean genome sequence assembly Glyma1.01

    USDA-ARS?s Scientific Manuscript database

    A landmark in soybean research, Glyma1.01, the first whole genome sequence of variety Williams 82 (Glycine max L. Merr.) was completed in 2010 and is widely used. However, because the assembly was primarily built based on the linkage maps constructed with a limited number of markers and recombinant...

  12. Novel repeated DNA sequences in safflower (Carthamus tinctorius L.) (Asteraceae): cloning, sequencing, and physical mapping by fluorescence in situ hybridization.

    PubMed

    Raina, S N; Sharma, S; Sasakuma, T; Kishii, M; Vaishnavi, S

    2005-01-01

    Two novel repetitive DNA sequences, pCtKpnI-1 and pCtKpnI-2, were isolated from Carthamus tinctorius (2n = 2x = 24) and cloned. Both represent tandemly repeated sequences. The pCtKpnI-1 and pCtKpnI-2 clones constitute repeat units of 343-345 bp and 367 bp, respectively, with 63% sequence heterogeneity between the two. Fluorescence in situ hybridization (FISH) was employed on metaphase chromosomes of C. tinctorius using, simultaneously, pCtKpnI-1 and pCtKpnI-2 repeated sequences. The pCtKpnI-1 sequence was found to be exclusively localized at subtelomeric regions on most of the chromosomes. On the other hand, sequence of the pCtKpnI-2 clone was distributed on two nucleolar and one nonnucleolar chromosome pairs. The satellite, and the intervening chromosome segment between the primary and secondary constrictions, in the two nucleolar chromosome pairs were wholly constituted by pCtKpnI-2 repeated sequence. The pCtKpnI-2 repeated sequence, showing partial homology to intergenic spacer (IGS) of 18S-25S ribosomal RNA genes of an Asteraceae taxon (Centaurea stoebe), and the 18S-25S rRNA gene clusters were located at independent, but juxtaposed sites in the nucleolar chromosomes. Variability in the number, size, and location of the two repeated sequences provided identification of most of the chromosomes in the otherwise not too distinctive homologues within the complement. This article reports the start of a molecular cytogenetics program targeting the genome of safflower, a major world oil crop about whose genetics very little is known.

  13. Using Genotyping by Sequencing to Map Two Novel Anthracnose Resistance Loci in Sorghum bicolor.

    PubMed

    J Felderhoff, Terry; M McIntyre, Lauren; Saballos, Ana; Vermerris, Wilfred

    2016-07-07

    Colletotrichum sublineola is an aggressive fungal pathogen that causes anthracnose in sorghum [Sorghum bicolor (L.) Moench]. The obvious symptoms of anthracnose are leaf blight and stem rot. Sorghum, the fifth most widely grown cereal crop in the world, can be highly susceptible to the disease, most notably in hot and humid environments. In the southeastern United States the acreage of sorghum has been increasing steadily in recent years, spurred by growing interest in producing biofuels, bio-based products, and animal feed. Resistance to anthracnose is, therefore, of paramount importance for successful sorghum production in this region. To identify anthracnose resistance loci present in the highly resistant cultivar 'Bk7', a biparental mapping population of F3:4 and F4:5 sorghum lines was generated by crossing 'Bk7' with the susceptible inbred 'Early Hegari-Sart'. Lines were phenotyped in three environments and in two different years following natural infection. The population was genotyped by sequencing. Following a stringent custom filtering protocol, totals of 5186 and 2759 informative SNP markers were identified in the two populations. Segregation data and association analysis identified resistance loci on chromosomes 7 and 9, with the resistance alleles derived from 'Bk7'. Both loci contain multiple classes of defense-related genes based on sequence similarity and gene ontologies. Genetic analysis following an independent selection experiment of lines derived from a cross between 'Bk7' and sweet sorghum 'Mer81-4' narrowed the resistance locus on chromosome 9 substantially, validating this QTL. As observed in other species, sorghum appears to have regions of clustered resistance genes. Further characterization of these regions will facilitate the development of novel germplasm with resistance to anthracnose and other diseases. Copyright © 2016 Felderhoff et al.

  14. Using Genotyping by Sequencing to Map Two Novel Anthracnose Resistance Loci in Sorghum bicolor

    PubMed Central

    J. Felderhoff, Terry; M. McIntyre, Lauren; Saballos, Ana; Vermerris, Wilfred

    2016-01-01

    Colletotrichum sublineola is an aggressive fungal pathogen that causes anthracnose in sorghum [Sorghum bicolor (L.) Moench]. The obvious symptoms of anthracnose are leaf blight and stem rot. Sorghum, the fifth most widely grown cereal crop in the world, can be highly susceptible to the disease, most notably in hot and humid environments. In the southeastern United States the acreage of sorghum has been increasing steadily in recent years, spurred by growing interest in producing biofuels, bio-based products, and animal feed. Resistance to anthracnose is, therefore, of paramount importance for successful sorghum production in this region. To identify anthracnose resistance loci present in the highly resistant cultivar ‘Bk7’, a biparental mapping population of F3:4 and F4:5 sorghum lines was generated by crossing ‘Bk7’ with the susceptible inbred ‘Early Hegari-Sart’. Lines were phenotyped in three environments and in two different years following natural infection. The population was genotyped by sequencing. Following a stringent custom filtering protocol, totals of 5186 and 2759 informative SNP markers were identified in the two populations. Segregation data and association analysis identified resistance loci on chromosomes 7 and 9, with the resistance alleles derived from ‘Bk7’. Both loci contain multiple classes of defense-related genes based on sequence similarity and gene ontologies. Genetic analysis following an independent selection experiment of lines derived from a cross between ‘Bk7’ and sweet sorghum ‘Mer81-4’ narrowed the resistance locus on chromosome 9 substantially, validating this QTL. As observed in other species, sorghum appears to have regions of clustered resistance genes. Further characterization of these regions will facilitate the development of novel germplasm with resistance to anthracnose and other diseases. PMID:27194807

  15. Using genotyping by sequencing to map two novel anthracnose resistance Loci in Sorghum bicolor

    DOE PAGES

    Felderhoff, Terry J.; McIntyre, Lauren M.; Saballos, Ana; ...

    2016-05-18

    Colletotrichum sublineola is an aggressive fungal pathogen that causes anthracnose in sorghum [Sorghum bicolor (L.) Moench]. The obvious symptoms of anthracnose are leaf blight and stem rot. Sorghum, the fifth most widely grown cereal crop in the world, can be highly susceptible to the disease, most notably in hot and humid environments. In the southeastern United States the acreage of sorghum has been increasing steadily in recent years, spurred by growing interest in producing biofuels, bio-based products, and animal feed. Resistance to anthracnose is, therefore, of paramount importance for successful sorghum production in this region. To identify anthracnose resistance locimore » present in the highly resistant cultivar ‘Bk7’, a biparental mapping population of F3:4 and F4:5 sorghum lines was generated by crossing ‘Bk7’ with the susceptible inbred ‘Early Hegari-Sart’. Lines were phenotyped in three environments and in two different years following natural infection. The population was genotyped by sequencing. Following a stringent custom filtering protocol, totals of 5186 and 2759 informative SNP markers were identified in the two populations. Segregation data and association analysis identified resistance loci on chromosomes 7 and 9, with the resistance alleles derived from ‘Bk7’. Both loci contain multiple classes of defense-related genes based on sequence similarity and gene ontologies. In addition, genetic analysis following an independent selection experiment of lines derived from a cross between ‘Bk7’ and sweet sorghum ‘Mer81-4’ narrowed the resistance locus on chromosome 9 substantially, validating this QTL. As observed in other species, sorghum appears to have regions of clustered resistance genes. Further characterization of these regions will facilitate the development of novel germplasm with resistance to anthracnose and other diseases.« less

  16. Construction of a linkage map based on retrotransposon insertion polymorphisms in sweetpotato via high-throughput sequencing.

    PubMed

    Monden, Yuki; Hara, Takuya; Okada, Yoshihiro; Jahana, Osamu; Kobayashi, Akira; Tabuchi, Hiroaki; Onaga, Shoko; Tahara, Makoto

    2015-03-01

    Sweetpotato (Ipomoea batatas L.) is an outcrossing hexaploid species with a large number of chromosomes (2n = 6x = 90). Although sweetpotato is one of the world's most important crops, genetic analysis of the species has been hindered by its genetic complexity combined with the lack of a whole genome sequence. In the present study, we constructed a genetic linkage map based on retrotransposon insertion polymorphisms using a mapping population derived from a cross between 'Purple Sweet Lord' (PSL) and '90IDN-47' cultivars. High-throughput sequencing and subsequent data analyses identified many Rtsp-1 retrotransposon insertion sites, and their allele dosages (simplex, duplex, triplex, or double-simplex) were determined based on segregation ratios in the mapping population. Using a pseudo-testcross strategy, 43 and 47 linkage groups were generated for PSL and 90IDN-47, respectively. Interestingly, most of these insertions (~90%) were present in a simplex manner, indicating their utility for linkage map construction in polyploid species. Additionally, our approach led to savings of time and labor for genotyping. Although the number of markers herein was insufficient for map-based cloning, our trial analysis exhibited the utility of retrotransposon-based markers for linkage map construction in sweetpotato.

  17. Construction of a linkage map based on retrotransposon insertion polymorphisms in sweetpotato via high-throughput sequencing

    PubMed Central

    Monden, Yuki; Hara, Takuya; Okada, Yoshihiro; Jahana, Osamu; Kobayashi, Akira; Tabuchi, Hiroaki; Onaga, Shoko; Tahara, Makoto

    2015-01-01

    Sweetpotato (Ipomoea batatas L.) is an outcrossing hexaploid species with a large number of chromosomes (2n = 6x = 90). Although sweetpotato is one of the world’s most important crops, genetic analysis of the species has been hindered by its genetic complexity combined with the lack of a whole genome sequence. In the present study, we constructed a genetic linkage map based on retrotransposon insertion polymorphisms using a mapping population derived from a cross between ‘Purple Sweet Lord’ (PSL) and ‘90IDN-47’ cultivars. High-throughput sequencing and subsequent data analyses identified many Rtsp-1 retrotransposon insertion sites, and their allele dosages (simplex, duplex, triplex, or double-simplex) were determined based on segregation ratios in the mapping population. Using a pseudo-testcross strategy, 43 and 47 linkage groups were generated for PSL and 90IDN-47, respectively. Interestingly, most of these insertions (~90%) were present in a simplex manner, indicating their utility for linkage map construction in polyploid species. Additionally, our approach led to savings of time and labor for genotyping. Although the number of markers herein was insufficient for map-based cloning, our trial analysis exhibited the utility of retrotransposon-based markers for linkage map construction in sweetpotato. PMID:26069444

  18. Wideband Arrhythmia-Insensitive-Rapid (AIR) Pulse Sequence for Cardiac T1 mapping without Image Artifacts induced by ICD

    PubMed Central

    Hong, KyungPyo; Jeong, Eun-Kee; Wall, T. Scott; Drakos, Stavros G.; Kim, Daniel

    2015-01-01

    Purpose To develop and evaluate a wideband arrhythmia-insensitive-rapid (AIR) pulse sequence for cardiac T1 mapping without image artifacts induced by implantable-cardioverter-defibrillator (ICD). Methods We developed a wideband AIR pulse sequence by incorporating a saturation pulse with wide frequency bandwidth (8.9 kHz), in order to achieve uniform T1 weighting in the heart with ICD. We tested the performance of original and “wideband” AIR cardiac T1 mapping pulse sequences in phantom and human experiments at 1.5T. Results In 5 phantoms representing native myocardium and blood and post-contrast blood/tissue T1 values, compared with the control T1 values measured with an inversion-recovery pulse sequence without ICD, T1 values measured with original AIR with ICD were considerably lower (absolute percent error >29%), whereas T1 values measured with wideband AIR with ICD were similar (absolute percent error <5%). Similarly, in 11 human subjects, compared with the control T1 values measured with original AIR without ICD, T1 measured with original AIR with ICD was significantly lower (absolute percent error >10.1%), whereas T1 measured with wideband AIR with ICD was similar (absolute percent error <2.0%). Conclusion This study demonstrates the feasibility of a wideband pulse sequence for cardiac T1 mapping without significant image artifacts induced by ICD. PMID:25975192

  19. Interactive segmentation of tongue contours in ultrasound video sequences using quality maps

    NASA Astrophysics Data System (ADS)

    Ghrenassia, Sarah; Ménard, Lucie; Laporte, Catherine

    2014-03-01

    Ultrasound (US) imaging is an effective and non invasive way of studying the tongue motions involved in normal and pathological speech, and the results of US studies are of interest for the development of new strategies in speech therapy. State-of-the-art tongue shape analysis techniques based on US images depend on semi-automated tongue segmentation and tracking techniques. Recent work has mostly focused on improving the accuracy of the tracking techniques themselves. However, occasional errors remain inevitable, regardless of the technique used, and the tongue tracking process must thus be supervised by a speech scientist who will correct these errors manually or semi-automatically. This paper proposes an interactive framework to facilitate this process. In this framework, the user is guided towards potentially problematic portions of the US image sequence by a segmentation quality map that is based on the normalized energy of an active contour model and automatically produced during tracking. When a problematic segmentation is identified, corrections to the segmented contour can be made on one image and propagated both forward and backward in the problematic subsequence, thereby improving the user experience. The interactive tools were tested in combination with two different tracking algorithms. Preliminary results illustrate the potential of the proposed framework, suggesting that the proposed framework generally improves user interaction time, with little change in segmentation repeatability.

  20. Time Sequence Diffeomorphic Metric Mapping and Parallel Transport Track Time-Dependent Shape Changes

    PubMed Central

    Qiu, Anqi; Albert, Marilyn; Younes, Laurent; Miller, Michael I.

    2009-01-01

    Serial MRI human brain scans have facilitated the detection of brain development and of the earliest signs of neuropsychiatric and neurodegenerative diseases, monitoring disease progression, and resolving drug effects in clinical trials for preventing or slowing the rate of brain degeneration. To track anatomical shape changes in serial images, we introduce new point-based time sequence large deformation diffeomorphic metric mapping (TS-LDDMM) to infer the time flow of within-subject geometric shape changes that carry known observations through a period. Its Euler-Lagrange equation is generalized for anatomies whose shapes are characterized by point sets, such as landmarks, curves, and surfaces. The time-dependent momentum obtained from the TS-LDDMM encodes within-subject shape changes. For the purpose of across-subject shape comparison, we then propose a diffeomorphic analysis framework to translate within-subject deformation in a global template without incorporating across-subject anatomical variations via parallel transport technique. The analysis involves the retraction of the within-subject timedependent momentum along the TS-LDDMM trajectory from each time to the baseline, the translation of the momentum in a global template, and the reconstruction of the TS-LDDMM trajectory starting from the global template. PMID:19041947

  1. Haplotype mapping and sequence analysis of the mouse Nramp gene predict susceptibility to infection with intracellular parasites

    SciTech Connect

    Malo, D.; Hu, Jinxin; Schurr, E.

    1994-09-01

    The mouse chromosome 1 locus Bcg (Ity, Lsh) controls the capacity of the tissue macrophage to restrict the replication of antigenically unrelated intracellular parasites and therefore determines the natural resistance (BCG-R, dominant) or susceptibility (BCG-S, recessive) of inbred mouse strains to infection with diverse pathogens. We have used a positional cloning strategy based on genetic and physical mapping, YAC cloning, and exon trapping to isolate a candidate gene for Beg (Nramp) that encodes a predicted macrophage-specific transport protein. We have analyzed a total of 27 inbred mouse strains of BCG-R and BCG-S phenotypes for the presence of nucleotide sequence variations within the coding portion of Nramp and have carried out haplotype typing of the corresponding chromosome 1 region in these mice, using 11 additional polymorphic markers mapping in the immediate vicinity of Nramp. cDNA cloning and nucleotide sequencing identified 5 nucleotide sequence variations within Nramp in the inbred strains.

  2. Drosophila melanogaster paramyosin: developmental pattern, mapping and properties deduced from its complete coding sequence.

    PubMed

    Vinós, J; Maroto, M; Garesse, R; Marco, R; Cervera, M

    1992-02-01

    Several cDNA clones encoding the complete Drosophila paramyosin sequence, including two potential polyadenylation sites, have been obtained. Southern analysis and in situ hybridization to polytene chromosomes indicate that in Drosophila the paramyosin gene is single copy, located on the left arm of the third chromosome at region 66D14. Northern analyses show predominantly two different RNAs which are the products of the choice between the two alternative polyadenylation sites. The two species begin to be synthesized around 10 h of development when embryonic muscles are formed, expression peaking at the end of embryogenesis. The protein is first expressed at germ band shortening in association with muscle precursor cells. A second maximum of paramyosin RNA expression occurs at late pupal stages when the higher molecular weight form becomes more abundant. In young adults this species becomes the main transcript detected. The 102 kDa polypeptide sequence is highly similar to that of Caenorhabditis elegans paramyosin. The protein has a central alpha-helical coiled-coil rod, organized in 29 groups of four typical seven-residue repeats and flanked by two short non-alpha-helical regions. Several leucine zippers are located on the hydrophobic face of the alpha-helix in paramyosin which, together with disulfide bonds between cysteines, are probably involved in the stabilization of the dimer. The structural and functional properties of Drosophila paramyosin deduced from the sequence are compared with those of known invertebrate myosins and paramyosins.

  3. A genetic map of melon highly enriched with fruit quality QTLs and EST markers, including sugar and carotenoid metabolism genes

    USDA-ARS?s Scientific Manuscript database

    A genetic map of melon enriched for fruit traits was constructed, using a recombinant inbred (RI) population developed from a cross between representatives of the two subspecies of Cucumis melo L.: PI 414723 (subspecies agrestis) and 'Dulce' (subspecies melo). Phenotyping of 99 RI lines was conducte...

  4. Genotyping-by-sequencing facilitates a high resolution consensus linkage map for Aegilops umbellulata, a wild relative of cultivated wheat

    USDA-ARS?s Scientific Manuscript database

    High density genetic maps are useful to precisely localize QTL or genes that might be used to improve traits of nutritional and/or economical importance in crops. However, high-density genetic maps are lacking for most wild relatives of crop species including wheat. Aegilops umbelluata is a wild rel...

  5. Construction of High-Density Linkage Maps of Populus deltoides × P. simonii Using Restriction-Site Associated DNA Sequencing

    PubMed Central

    Tong, Chunfa; Li, Huogen; Wang, Ying; Li, Xuran; Ou, Jiajia; Wang, Deyuan; Xu, Houxi; Ma, Chao; Lang, Xianye; Liu, Guangxin; Zhang, Bo; Shi, Jisen

    2016-01-01

    Although numerous linkage maps have been constructed in the genus Populus, they are typically sparse and thus have limited applications due to low throughput of traditional molecular markers. Restriction-site associated DNA sequencing (RADSeq) technology allows us to identify a large number of single nucleotide polymorphisms (SNP) across genomes of many individuals in a fast and cost-effective way, and makes it possible to construct high-density genetic linkage maps. We performed RADSeq for 299 progeny and their two parents in an F1 hybrid population generated by crossing the female Populus deltoides ‘I-69’ and male Populus simonii ‘L3’. A total of 2,545 high quality SNP markers were obtained and two parent-specific linkage maps were constructed. The female genetic map contained 1601 SNPs and 20 linkage groups, spanning 4,249.12 cM of the genome with an average distance of 2.69 cM between adjacent markers, while the male map consisted of 940 SNPs and also 20 linkage groups with a total length of 3,816.24 cM and an average marker interval distance of 4.15 cM. Finally, our analysis revealed that synteny and collinearity are highly conserved between the parental linkage maps and the reference genome of P. trichocarpa. We demonstrated that RAD sequencing is a powerful technique capable of rapidly generating a large number of SNPs for constructing genetic maps in outbred forest trees. The high-quality linkage maps constructed here provided reliable genetic resources to facilitate locating quantitative trait loci (QTLs) that control growth and wood quality traits in the hybrid population. PMID:26964097

  6. Construction of High-Density Linkage Maps of Populus deltoides × P. simonii Using Restriction-Site Associated DNA Sequencing.

    PubMed

    Tong, Chunfa; Li, Huogen; Wang, Ying; Li, Xuran; Ou, Jiajia; Wang, Deyuan; Xu, Houxi; Ma, Chao; Lang, Xianye; Liu, Guangxin; Zhang, Bo; Shi, Jisen

    2016-01-01

    Although numerous linkage maps have been constructed in the genus Populus, they are typically sparse and thus have limited applications due to low throughput of traditional molecular markers. Restriction-site associated DNA sequencing (RADSeq) technology allows us to identify a large number of single nucleotide polymorphisms (SNP) across genomes of many individuals in a fast and cost-effective way, and makes it possible to construct high-density genetic linkage maps. We performed RADSeq for 299 progeny and their two parents in an F1 hybrid population generated by crossing the female Populus deltoides 'I-69' and male Populus simonii 'L3'. A total of 2,545 high quality SNP markers were obtained and two parent-specific linkage maps were constructed. The female genetic map contained 1601 SNPs and 20 linkage groups, spanning 4,249.12 cM of the genome with an average distance of 2.69 cM between adjacent markers, while the male map consisted of 940 SNPs and also 20 linkage groups with a total length of 3,816.24 cM and an average marker interval distance of 4.15 cM. Finally, our analysis revealed that synteny and collinearity are highly conserved between the parental linkage maps and the reference genome of P. trichocarpa. We demonstrated that RAD sequencing is a powerful technique capable of rapidly generating a large number of SNPs for constructing genetic maps in outbred forest trees. The high-quality linkage maps constructed here provided reliable genetic resources to facilitate locating quantitative trait loci (QTLs) that control growth and wood quality traits in the hybrid population.

  7. Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information

    PubMed Central

    2012-01-01

    Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource

  8. OBLIMAP 2.0: a fast climate model-ice sheet model coupler including online embeddable mapping routines

    NASA Astrophysics Data System (ADS)

    Reerink, Thomas J.; van de Berg, Willem Jan; van de Wal, Roderik S. W.

    2016-11-01

    This paper accompanies the second OBLIMAP open-source release. The package is developed to map climate fields between a general circulation model (GCM) and an ice sheet model (ISM) in both directions by using optimal aligned oblique projections, which minimize distortions. The curvature of the surfaces of the GCM and ISM grid differ, both grids may be irregularly spaced and the ratio of the grids is allowed to differ largely. OBLIMAP's stand-alone version is able to map data sets that differ in various aspects on the same ISM grid. Each grid may either coincide with the surface of a sphere, an ellipsoid or a flat plane, while the grid types might differ. Re-projection of, for example, ISM data sets is also facilitated. This is demonstrated by relevant applications concerning the major ice caps. As the stand-alone version also applies to the reverse mapping direction, it can be used as an offline coupler. Furthermore, OBLIMAP 2.0 is an embeddable GCM-ISM coupler, suited for high-frequency online coupled experiments. A new fast scan method is presented for structured grids as an alternative for the former time-consuming grid search strategy, realising a performance gain of several orders of magnitude and enabling the mapping of high-resolution data sets with a much larger number of grid nodes. Further, a highly flexible masked mapping option is added. The limitation of the fast scan method with respect to unstructured and adaptive grids is discussed together with a possible future parallel Message Passing Interface (MPI) implementation.

  9. Hypothesis: Artifacts, Including Spurious Chimeric RNAs with a Short Homologous Sequence, Caused by Consecutive Reverse Transcriptions and Endogenous Random Primers

    PubMed Central

    Peng, Zhiyu; Yuan, Chengfu; Zellmer, Lucas; Liu, Siqi; Xu, Ningzhi; Liao, D. Joshua

    2015-01-01

    Recent RNA-sequencing technology and associated bioinformatics have led to identification of tens of thousands of putative human chimeric RNAs, i.e. RNAs containing sequences from two different genes, most of which are derived from neighboring genes on the same chromosome. In this essay, we redefine “two neighboring genes” as those producing individual transcripts, and point out two known mechanisms for chimeric RNA formation, i.e. transcription from a fusion gene or trans-splicing of two RNAs. By our definition, most putative RNA chimeras derived from canonically-defined neighboring genes may either be technical artifacts or be cis-splicing products of 5'- or 3'-extended RNA of either partner that is redefined herein as an unannotated gene, whereas trans-splicing events are rare in human cells. Therefore, most authentic chimeric RNAs result from fusion genes, about 1,000 of which have been identified hitherto. We propose a hypothesis of “consecutive reverse transcriptions (RTs)”, i.e. another RT reaction following the previous one, for how most spurious chimeric RNAs, especially those containing a short homologous sequence, may be generated during RT, especially in RNA-sequencing wherein RNAs are fragmented. We also point out that RNA samples contain numerous RNA and DNA shreds that can serve as endogenous random primers for RT and ensuing polymerase chain reactions (PCR), creating artifacts in RT-PCR. PMID:26000048

  10. Hypothesis: Artifacts, Including Spurious Chimeric RNAs with a Short Homologous Sequence, Caused by Consecutive Reverse Transcriptions and Endogenous Random Primers.

    PubMed

    Peng, Zhiyu; Yuan, Chengfu; Zellmer, Lucas; Liu, Siqi; Xu, Ningzhi; Liao, D Joshua

    2015-01-01

    Recent RNA-sequencing technology and associated bioinformatics have led to identification of tens of thousands of putative human chimeric RNAs, i.e. RNAs containing sequences from two different genes, most of which are derived from neighboring genes on the same chromosome. In this essay, we redefine "two neighboring genes" as those producing individual transcripts, and point out two known mechanisms for chimeric RNA formation, i.e. transcription from a fusion gene or trans-splicing of two RNAs. By our definition, most putative RNA chimeras derived from canonically-defined neighboring genes may either be technical artifacts or be cis-splicing products of 5'- or 3'-extended RNA of either partner that is redefined herein as an unannotated gene, whereas trans-splicing events are rare in human cells. Therefore, most authentic chimeric RNAs result from fusion genes, about 1,000 of which have been identified hitherto. We propose a hypothesis of "consecutive reverse transcriptions (RTs)", i.e. another RT reaction following the previous one, for how most spurious chimeric RNAs, especially those containing a short homologous sequence, may be generated during RT, especially in RNA-sequencing wherein RNAs are fragmented. We also point out that RNA samples contain numerous RNA and DNA shreds that can serve as endogenous random primers for RT and ensuing polymerase chain reactions (PCR), creating artifacts in RT-PCR.

  11. Nested association mapping of stem rust resistance in wheat using genotyping by sequencing

    USDA-ARS?s Scientific Manuscript database

    Nested association mapping is an approach to map trait loci in which families within populations are interconnected by a common parent. By implementing joint-linkage association analysis, this approach is able to map causative loci with higher power and resolution compared to biparental linkage mapp...

  12. Evaluation of a novel food composition database that includes glutamine and other amino acids derived from gene sequencing data

    PubMed Central

    Lenders, CM; Liu, S; Wilmore, DW; Sampson, L; Dougherty, LW; Spiegelman, D; Willett, WC

    2011-01-01

    Objectives To determine the content of glutamine in major food proteins. Subjects/Methods We used a validated 131-food item food frequency questionnaire (FFQ) to identify the foods that contributed the most to protein intake among 70 356 women in the Nurses’ Health Study (NHS, 1984). The content of glutamine and other amino acids in foods was calculated based on protein fractions generated from gene sequencing methods (Swiss Institute of Bioinformatics) and compared with data from conventional (USDA) and modified biochemical (Khun) methods. Pearson correlation coefficients were used to compare the participants’ dietary intakes of amino acids by sequencing and USDA methods. Results The glutamine content varied from 0.01 to to 9.49 g/100 g of food and contributed from 1 to to 33% of total protein for all FFQ foods with protein. When comparing the sequencing and Kuhn’s methods, the proportion of glutamine in meat was 4.8 vs 4.4%. Among NHS participants, mean glutamine intake was 6.84 (s.d.=2.19) g/day and correlation coefficients for amino acid between intakes assessed by sequencing and USDA methods ranged from 0.94 to 0.99 for absolute intake, −0.08 to 0.90 after adjusting for 100 g of protein, and 0.88 to 0.99 after adjusting for 1000 kcal. The between-person coefficient of variation of energy-adjusted intake of glutamine was 16%. Conclusions These data suggest that (1) glutamine content can be estimated from gene sequencing methods and (2) there is a reasonably wide variation in energy-adjusted glutamine intake, allowing for exploration of glutamine consumption and disease. PMID:19756030

  13. Pitfalls of mapping high throughput sequencing data to repetitive sequences: Piwi’s genomic targets still not identified

    PubMed Central

    Marinov, Georgi K.; Wang, Jie; Handler, Dominik; Wold, Barbara J.; Weng, Zhiping; Hannon, Gregory J.; Aravin, Alexei A.; Zamore, Phillip D.; Brennecke, Julius; Toth, Katalin Fejes

    2015-01-01

    Huang et al. (2013) recently reported that chromatin immuno-precipitation followed by sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi - a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their underlying deep sequencing data and report that the data do not support the author’s central conclusions. PMID:25805138

  14. Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

    USDA-ARS?s Scientific Manuscript database

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...

  15. The Locus Lookup Tool at MaizeGDB: Identification of Genomic Regions in Maize by Integrating Sequence Information with Physical and Genetic Maps

    USDA-ARS?s Scientific Manuscript database

    Methods to automatically integrate sequence information with physical and genetic maps are scarce. The Locus Lookup Tool enables researchers to define windows of genomic sequence likely to contain loci of interest where only genetic or physical mapping associations are reported. Using the Locus Look...

  16. Transcriptome sequencing of Hevea brasiliensis for development of microsatellite markers and construction of a genetic linkage map.

    PubMed

    Triwitayakorn, Kanokporn; Chatkulkawin, Pornsupa; Kanjanawattanawong, Supanath; Sraphet, Supajit; Yoocha, Thippawan; Sangsrakru, Duangjai; Chanprasert, Juntima; Ngamphiw, Chumpol; Jomchai, Nukoon; Therawattanasuk, Kanikar; Tangphatsornruang, Sithichoke

    2011-12-01

    To obtain more information on the Hevea brasiliensis genome, we sequenced the transcriptome from the vegetative shoot apex yielding 2 311 497 reads. Clustering and assembly of the reads produced a total of 113 313 unique sequences, comprising 28 387 isotigs and 84 926 singletons. Also, 17 819 expressed sequence tag (EST)-simple sequence repeats (SSRs) were identified from the data set. To demonstrate the use of this EST resource for marker development, primers were designed for 430 of the EST-SSRs. Three hundred and twenty-three primer pairs were amplifiable in H. brasiliensis clones. Polymorphic information content values of selected 47 SSRs among 20 H. brasiliensis clones ranged from 0.13 to 0.71, with an average of 0.51. A dendrogram of genetic similarities between the 20 H. brasiliensis clones using these 47 EST-SSRs suggested two distinct groups that correlated well with clone pedigree. These novel EST-SSRs together with the published SSRs were used for the construction of an integrated parental linkage map of H. brasiliensis based on 81 lines of an F1 mapping population. The map consisted of 97 loci, consisting of 37 novel EST-SSRs and 60 published SSRs, distributed on 23 linkage groups and covered 842.9 cM with a mean interval of 11.9 cM and ∼4 loci per linkage group. Although the numbers of linkage groups exceed the haploid number (18), but with several common markers between homologous linkage groups with the previous map indicated that the F1 map in this study is appropriate for further study in marker-assisted selection.

  17. Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication

    PubMed Central

    Amores, Angel; Catchen, Julian; Ferrara, Allyse; Fontenot, Quenton; Postlethwait, John H.

    2011-01-01

    Genomic resources for hundreds of species of evolutionary, agricultural, economic, and medical importance are unavailable due to the expense of well-assembled genome sequences and difficulties with multigenerational studies. Teleost fish provide many models for human disease but possess anciently duplicated genomes that sometimes obfuscate connectivity. Genomic information representing a fish lineage that diverged before the teleost genome duplication (TGD) would provide an outgroup for exploring the mechanisms of evolution after whole-genome duplication. We exploited massively parallel DNA sequencing to develop meiotic maps with thrift and speed by genotyping F1 offspring of a single female and a single male spotted gar (Lepisosteus oculatus) collected directly from nature utilizing only polymorphisms existing in these two wild individuals. Using Stacks, software that automates the calling of genotypes from polymorphisms assayed by Illumina sequencing, we constructed a map containing 8406 markers. RNA-seq on two map-cross larvae provided a reference transcriptome that identified nearly 1000 mapped protein-coding markers and allowed genome-wide analysis of conserved synteny. Results showed that the gar lineage diverged from teleosts before the TGD and its genome is organized more similarly to that of humans than teleosts. Thus, spotted gar provides a critical link between medical models in teleost fish, to which gar is biologically similar, and humans, to which gar is genomically similar. Application of our F1 dense mapping strategy to species with no prior genome information promises to facilitate comparative genomics and provide a scaffold for ordering the numerous contigs arising from next generation genome sequencing. PMID:21828280

  18. PRIMAL: Page Rank-Based Indoor Mapping and Localization Using Gene-Sequenced Unlabeled WLAN Received Signal Strength

    PubMed Central

    Zhou, Mu; Zhang, Qiao; Xu, Kunjie; Tian, Zengshan; Wang, Yanmeng; He, Wei

    2015-01-01

    Due to the wide deployment of wireless local area networks (WLAN), received signal strength (RSS)-based indoor WLAN localization has attracted considerable attention in both academia and industry. In this paper, we propose a novel page rank-based indoor mapping and localization (PRIMAL) by using the gene-sequenced unlabeled WLAN RSS for simultaneous localization and mapping (SLAM). Specifically, first of all, based on the observation of the motion patterns of the people in the target environment, we use the Allen logic to construct the mobility graph to characterize the connectivity among different areas of interest. Second, the concept of gene sequencing is utilized to assemble the sporadically-collected RSS sequences into a signal graph based on the transition relations among different RSS sequences. Third, we apply the graph drawing approach to exhibit both the mobility graph and signal graph in a more readable manner. Finally, the page rank (PR) algorithm is proposed to construct the mapping from the signal graph into the mobility graph. The experimental results show that the proposed approach achieves satisfactory localization accuracy and meanwhile avoids the intensive time and labor cost involved in the conventional location fingerprinting-based indoor WLAN localization. PMID:26404274

  19. Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping.

    PubMed

    Rowan, Beth A; Patel, Vipul; Weigel, Detlef; Schneeberger, Korbinian

    2015-01-13

    The reshuffling of existing genetic variation during meiosis is important both during evolution and in breeding. The reassortment of genetic variants relies on the formation of crossovers (COs) between homologous chromosomes. The pattern of genome-wide CO distributions can be rapidly and precisely established by the short-read sequencing of individuals from F2 populations, which in turn are useful for quantitative trait locus (QTL) mapping. Although sequencing costs have decreased precipitously in recent years, the costs of library preparation for hundreds of individuals have remained high. To enable rapid and inexpensive CO detection and QTL mapping using low-coverage whole-genome sequencing of large mapping populations, we have developed a new method for library preparation along with Trained Individual GenomE Reconstruction, a probabilistic method for genotype and CO predictions for recombinant individuals. In an example case with hundreds of F2 individuals from two Arabidopsis thaliana accessions, we resolved most CO breakpoints to within 2 kb and reduced a major flowering time QTL to a 9-kb interval. In addition, an extended region of unusually low recombination revealed a 1.8-Mb inversion polymorphism on the long arm of chromosome 4. We observed no significant differences in the frequency and distribution of COs between F2 individuals with and without a functional copy of the DNA helicase gene RECQ4A. In summary, we present a new, cost-efficient method for large-scale, high-precision genotyping-by-sequencing. Copyright © 2015 Rowan et al.

  20. The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing

    PubMed Central

    2010-01-01

    Background Food supply from the ocean is constrained by the shortage of domesticated and selected fish. Development of genomic models of economically important fishes should assist with the removal of this bottleneck. European sea bass Dicentrarchus labrax L. (Moronidae, Perciformes, Teleostei) is one of the most important fishes in European marine aquaculture; growing genomic resources put it on its way to serve as an economic model. Results End sequencing of a sea bass genomic BAC-library enabled the comparative mapping of the sea bass genome using the three-spined stickleback Gasterosteus aculeatus genome as a reference. BAC-end sequences (102,690) were aligned to the stickleback genome. The number of mappable BACs was improved using a two-fold coverage WGS dataset of sea bass resulting in a comparative BAC-map covering 87% of stickleback chromosomes with 588 BAC-contigs. The minimum size of 83 contigs covering 50% of the reference was 1.2 Mbp; the largest BAC-contig comprised 8.86 Mbp. More than 22,000 BAC-clones aligned with both ends to the reference genome. Intra-chromosomal rearrangements between sea bass and stickleback were identified. Size distributions of mapped BACs were used to calculate that the genome of sea bass may be only 1.3 fold larger than the 460 Mbp stickleback genome. Conclusions The BAC map is used for sequencing single BACs or BAC-pools covering defined genomic entities by second generation sequencing technologies. Together with the WGS dataset it initiates a sea bass genome sequencing project. This will allow the quantification of polymorphisms through resequencing, which is important for selecting highly performing domesticated fish. PMID:20105308

  1. UV cross-link mapping of the substrate-binding site of an RNase P ribozyme to a target mRNA sequence.

    PubMed Central

    Kilani, A F; Liu, F

    1999-01-01

    RNase P ribozyme cleaves an RNA helix that resembles the acceptor stem and T-stem structure of its natural ptRNA substrate. When covalently linked with a guide sequence, the ribozyme can function as a sequence-specific endonuclease and cleave any target RNA sequences that base pair with the guide sequence. Using a site-directed ultraviolet (UV) cross-linking approach, we have mapped the regions of the ribozyme that are in close proximity to a substrate that contains the mRNA sequence encoding thymidine kinase of human herpes simplex virus 1. Our data suggest that the cleavage site of the mRNA substrate is positioned at the same regions of the ribozyme that bind to the cleavage site of a ptRNA. The mRNA-binding domains include regions that interact with the acceptor stem and T-stem and in addition, regions that are unique and not in close contact with a ptRNA. Identification of the mRNA-binding site provides a foundation to study how RNase P ribozymes achieve their sequence specificity and facilitates the development of gene-targeting ribozymes. PMID:10496224

  2. easyPAC: A Tool for Fast Prediction, Testing and Reference Mapping of Degenerate PCR Primers from Alignments or Consensus Sequences

    PubMed Central

    Rosenkranz, David

    2012-01-01

    The PCR-amplification of unknown homologous or paralogous genes generally relies on PCR primers predicted from multi sequence alignments. But increasing sequence divergence can induce the need to use degenerate primers which entails the problem of testing the characteristics, unwanted interactions and potential mispriming of degenerate primers. Here I introduce easyPAC, a new software for the prediction of degenerate primers from multi sequence alignments or single consensus sequences. As a major innovation, easyPAC allows to apply all customary primer test procedures to degenerate primer sequences including fast mapping to reference files. Thus, easyPAC simplifies and expedites the designing of specific degenerate primers enormously. Degenerate primers suggested by easyPAC were used in PCR amplification with subsequent de novo sequencing of TDRD1 exon 11 homologs from several representatives of the haplorrhine primate phylogeny. The results demonstrate the efficient performance of the suggested primers and therefore show that easyPAC can advance upcoming comparative genetic studies.

  3. Mapping.

    ERIC Educational Resources Information Center

    Kinney, Douglas M.; McIntosh, Willard L.

    1979-01-01

    The area of geological mapping in the United States in 1978 increased greatly over that reported in 1977; state geological maps were added for California, Idaho, Nevada, and Alaska last year. (Author/BB)

  4. High-Throughput Sequencing of Campylobacter jejuni Insertion Mutant Libraries Reveals mapA as a Fitness Factor for Chicken Colonization

    PubMed Central

    Johnson, Jeremiah G.; Livny, Jonathan

    2014-01-01

    Campylobacter jejuni is a leading cause of gastrointestinal infections worldwide, due primarily to its ability to asymptomatically colonize the gastrointestinal tracts of agriculturally relevant animals, including chickens. Infection often occurs following consumption of meat that was contaminated by C. jejuni during harvest. Because of this, much interest lies in understanding the mechanisms that allow C. jejuni to colonize the chicken gastrointestinal tract. To address this, we generated a C. jejuni transposon mutant library that is amenable to insertion sequencing and introduced this mutant pool into day-of-hatch chicks. Following deep sequencing of C. jejuni mutants in the cecal outputs, several novel factors required for efficient colonization of the chicken gastrointestinal tract were identified, including the predicted outer membrane protein MapA. A mutant strain lacking mapA was constructed and found to be significantly reduced for chicken colonization in both competitive infections and monoinfections. Further, we found that mapA is required for in vitro competition with wild-type C. jejuni but is dispensable for growth in monoculture. PMID:24633877

  5. High-throughput sequencing of Campylobacter jejuni insertion mutant libraries reveals mapA as a fitness factor for chicken colonization.

    PubMed

    Johnson, Jeremiah G; Livny, Jonathan; Dirita, Victor J

    2014-06-01

    Campylobacter jejuni is a leading cause of gastrointestinal infections worldwide, due primarily to its ability to asymptomatically colonize the gastrointestinal tracts of agriculturally relevant animals, including chickens. Infection often occurs following consumption of meat that was contaminated by C. jejuni during harvest. Because of this, much interest lies in understanding the mechanisms that allow C. jejuni to colonize the chicken gastrointestinal tract. To address this, we generated a C. jejuni transposon mutant library that is amenable to insertion sequencing and introduced this mutant pool into day-of-hatch chicks. Following deep sequencing of C. jejuni mutants in the cecal outputs, several novel factors required for efficient colonization of the chicken gastrointestinal tract were identified, including the predicted outer membrane protein MapA. A mutant strain lacking mapA was constructed and found to be significantly reduced for chicken colonization in both competitive infections and monoinfections. Further, we found that mapA is required for in vitro competition with wild-type C. jejuni but is dispensable for growth in monoculture.

  6. Whole genome sequence analysis of circulating Bluetongue virus serotype 11 strains from the United States including two domestic canine isolates.

    PubMed

    Gaudreault, Natasha N; Jasperson, Dane C; Dubovi, Edward J; Johnson, Donna J; Ostlund, Eileen N; Wilson, William C

    2015-07-01

    Bluetongue virus (BTV) is a vector-transmitted pathogen that typically infects and causes disease in domestic and wild ruminants. BTV is also known to infect domestic canines as discovered when dogs were vaccinated with a BTV-contaminated vaccine. Canine BTV infections have been documented through serological surveys, and natural infection by the Culicoides vector has been suggested. The report of isolation of BTV serotype 11 (BTV-11) from 2 separate domestic canine abortion cases in the states of Texas in 2011 and Kansas in 2012, were apparently unrelated to BTV-contaminated vaccination or consumption of BTV-contaminated raw meat as had been previously speculated. To elucidate the origin and relationship of these 2 domestic canine BTV-11 isolates, whole genome sequencing was performed. Six additional BTV-11 field isolates from Texas, Florida, and Washington, submitted for diagnostic investigation during 2011 and 2013, were also fully sequenced and analyzed. The phylogenetic analysis indicates that the BTV-11 domestic canine isolates are virtually identical, and both share high identity with 2 BTV-11 isolates identified from white-tailed deer in Texas in 2011. The results of the current study further support the hypothesis that a BTV-11 strain circulating in the Midwestern states could have been transmitted to the dogs by the infected Culicoides vector. Our study also expands the short list of available BTV-11 sequences, which may aid BTV surveillance and epidemiology.

  7. Use of sequence-bounding surfaces for correlation and mapping in nonmarine, Incised-Valley reservoirs

    SciTech Connect

    Leckie, D.A.; Vanbeselaere, N.

    1995-11-01

    One of the problems with the application of sequence stratigraphy to nonmarine sediments is the use of effective surfaces for correlations. This case study from the Mannville Group of southern Saskatchewan demonstrates how major, regional bounding surfaces can be identified and correlated to produce a suite of maps that can be used for exploration purposes. In southern Saskatchewan, Cretaceous Mannville sediments, termed the Pense, Cantuar, and Success (S2) formations, overlie Jurassic S1 and older deposits. The interval, which is up to 100 m thick, was deposited over 40 to 50 m.y. and is riddled with unconformities and weathered horizons. Detailed stratigraphic correlations using well logs are difficult, imprecise, and highly suspect unless corroborated by core control. Jurassic Success S1 sediment was deposited in a restricted shallow-marine environment. The S2 was deposited as a sheet of quartzose, braided fluvial sandstone that unconformably cuts into the S1. The overlying Cantuar Formation consists of dominantly lithic sandstone, siltstone, and shale overlying a basal quartzose unit. The base of the Cantuar Formation has a high local relief and in places has eroded long, wide valleys into the Success and older Jurassic strata. The valleys were hundreds of kilometers long and up to 74 in deep. Remnants of the Success sediment are preserved as isolated, buried cuestas on the margins of the valley walls. Cantuar sediments represent the infill of an extensive valley system that took millions of years to fill. The fill was from meandering streams with abundant paleosols, shallow lacustrine, and splay deposits. The top of the Cantuar Formation is represented by chert and quartzose sandstones deposited in a north-south-trending estuarine system with several tributaries. Several play types, which are dominantly stratigraphic, have been identified and are related to the valley incision, valley fill, and preserved erosional cuesta remnants.

  8. Mapping QTL for popping expansion volume in popcorn with simple sequence repeat markers.

    PubMed

    Lu, H-J; Bernardo, R; Ohm, H W

    2003-02-01

    Popping expansion volume is the most important quality trait in popcorn ( Zea mays L.), but its genetics is not well understood. The objectives of this study were to map quantitative trait loci (QTLs) responsible for popping expansion volume in a popcorn x dent corn cross, and to compare the predicted efficiencies of phenotypic selection, marker-based selection, and marker-assisted selection for popping expansion volume. Of 259 simple sequence repeat (SSR) primer pairs screened, 83 pairs were polymorphic between the H123 (dent corn) and AG19 (popcorn) parental inbreds. Popping test data were obtained for 160 S(1) families developed from the [AG19(H123 x AG19)] BC(1) population. The heritability ( h(2)) for popping expansion volume on an S(1) family mean basis was 0.73. The presence of the gametophyte factor Ga1(s) in popcorn complicates the analysis of popcorn x dent corn crosses. But, from a practical perspective, the linkage between a favorable QTL allele and Ga1(s) in popcorn will lead to selection for the favorable QTL allele. Four QTLs, on chromosomes 1S, 3S, 5S and 5L, jointly explained 45% of the phenotypic variation. Marker-based selection for popping expansion volume would require less time and work than phenotypic selection. But due to the high h(2) of popping expansion volume, marker-based selection was predicted to be only 92% as efficient as phenotypic selection. Marker-assisted selection, which comprises index selection on phenotypic and marker scores, was predicted to be 106% as efficient as phenotypic selection. Overall, our results suggest that phenotypic selection will remain the preferred method for selection in popcorn x dent corn crosses.

  9. Toward a high-resolution Plasmodium falciparum linkage map: Polymorphic markers from hundreds of simple sequence repeats

    SciTech Connect

    Su, Xin-Zhuan; Wellems, T.E.

    1996-05-01

    A total of 5.7 simple sequence repeats (SSRs or {open_quotes}microsatellites{close_quotes}) were identified from Plasmodium falciparum sequences in GenBank and from inserts in a genomic DNA library. Oligonucleotide primers from sequences that flank 224 of these SSRs were synthesized and used in PCR assays to test for simple sequence length polymorphisms (SSLPs). Of the 224 SSRs, 188 showed SSLPs were assigned to chromosome linkage groups by physical mapping and by comparing their inheritance patterns against those of restriction fragment length polymorphism markers in a genetic cross (HB3XDd2). The predominant SSLPs in P. falciparum were found to contain [TA]{sub n}, and [TAA]{sub n}, a feature that is reminiscent of plant genomes and is consistent with the proposed algal-like origin of malaria parasites. Since such SSLPs are abundant and readily isolated, they are a powerful resource for genetic analysis of P. falciparum. 38 refs., 2 figs., 2 tabs.

  10. New Mapping in the Sand Springs Range of Western Nevada Clarifies and Constrains Regional Deformation Sequences of the Luning-Fencemaker Thrust Belt

    NASA Astrophysics Data System (ADS)

    Czarnecki, S.; Jarvis, J.; Satterfield, J. I.

    2016-12-01

    The Sand Springs Range in western Nevada exposes Mesozoic through Cenozoic structures of the eastern Sierra Nevada, Luning-Fencemaker Thrust Belt (LFTB), Basin and Range province, and Walker Lane. A recent undergraduate geologic mapping project in the northern Sand Springs Range (nSSR) set out to map igneous intrusions in detail, specifically smaller intrusions which had not been a focus in previous work. This was accomplished using different techniques including mapping at a smaller scale (1:8000 vs. 1:24000), locating contacts and faults using handheld GPS, and focusing on relationships between metamorphic tectonites and igneous units. This revealed key cross-cutting relations between structures and diverse Triassic through Tertiary igneous rocks as well as distinctions between the nSSR and the surrounding LFTB assemblages. During our mapping we identified four metamorphic tectonite map units, Cretaceous granitoid and diorite plutons and sills, Tertiary rhyolite sills and dikes, and interbedded Tertiary basalt and ash flow tuff. The cross-cutting relations of these units overturn previously published sequences of events and constrain the timing of a deformation sequence which differs from the surrounding LFTB assemblages. We found that the nSSR contains three phases of deformation: a pre-LFTB syn-metamorphic event which achieved amphibolite facies that is not described elsewhere in the LFTB (D1), followed by two non-metamorphic folding and thrusting phases characteristic of the LFTB (D2 and D3). Our mapping provided four key timing constraints. First, D1 axial-planar cleavage (S1) deformed Triassic intrusions. Second, Cretaceous granitoid and diorite units cross-cut S1 foliation, D1 folds, and low-angle faults. Third, Cretaceous and Tertiary sills that locally terminate at a low-angle fault actually post-dated faulting. Fourth, cross-cutting relations showed a basaltic lava previously mapped as Jurassic is actually Tertiary. The large Sand Springs Pluton was the

  11. Three-dimensional chemical mapping by EFTEM-TomoJ including improvement of SNR by PCA and ART reconstruction of volume by noise suppression.

    PubMed

    Messaoudi, Cédric; Aschman, Nicolas; Cunha, Marcel; Oikawa, Tetsuo; Sorzano, Carlos O Sanchez; Marco, Sergio

    2013-12-01

    Electron tomography is becoming one of the most used methods for structural analysis at nanometric scale in biological and materials sciences. Combined with chemical mapping, it provides qualitative and semiquantitative information on the distribution of chemical elements on a given sample. Due to the current difficulties in obtaining three-dimensional (3D) maps by energy-filtered transmission electron microscopy (EFTEM), the use of 3D chemical mapping has not been widely adopted by the electron microscopy community. The lack of specialized software further complicates the issue, especially in the case of data with a low signal-to-noise ratio (SNR). Moreover, data interpretation is rendered difficult by the absence of efficient segmentation tools. Thus, specialized software for the computation of 3D maps by EFTEM needs to include optimized methods for image series alignment, algorithms to improve SNR, different background subtraction models, and methods to facilitate map segmentation. Here we present a software package (EFTEM-TomoJ, which can be downloaded from http://u759.curie.fr/fr/download/softwares/EFTEM-TomoJ), specifically dedicated to computation of EFTEM 3D chemical maps including noise filtering by image reconstitution based on multivariate statistical analysis. We also present an algorithm named BgART (for background removing algebraic reconstruction technique) allowing the discrimination between background and signal and improving the reconstructed volume in an iterative way.

  12. A bioinformatics insight to rhizobial globins: gene identification and mapping, polypeptide sequence and phenetic analysis, and protein modeling.

    PubMed Central

    Gesto-Borroto, Reinier; Sánchez-Sánchez, Miriam; Arredondo-Peter, Raúl

    2015-01-01

    Globins (Glbs) are proteins widely distributed in organisms. Three evolutionary families have been identified in Glbs: the M, S and T Glb families. The M Glbs include flavohemoglobins (fHbs) and single-domain Glbs (SDgbs); the S Glbs include globin-coupled sensors (GCSs), protoglobins and sensor single domain globins, and the T Glbs include truncated Glbs (tHbs). Structurally, the M and S Glbs exhibit 3/3-folding whereas the T Glbs exhibit 2/2-folding. Glbs are widespread in bacteria, including several rhizobial genomes. However, only few rhizobial Glbs have been characterized. Hence, we characterized Glbs from 62 rhizobial genomes using bioinformatics methods such as data mining in databases, sequence alignment, phenogram construction and protein modeling. Also, we analyzed soluble extracts from Bradyrhizobium japonicum USDA38 and USDA58 by (reduced + carbon monoxide (CO) minus reduced) differential spectroscopy. Database searching showed that only fhb, sdgb, gcs and thb genes exist in the rhizobia analyzed in this work. Promoter analysis revealed that apparently several rhizobial glb genes are not regulated by a -10 promoter but might be regulated by -35 and Fnr (fumarate-nitrate reduction regulator)-like promoters. Mapping analysis revealed that rhizobial fhbs and thbs are flanked by a variety of genes whereas several rhizobial sdgbs and gcss are flanked by genes coding for proteins involved in the metabolism of nitrates and nitrites and chemotaxis, respectively. Phenetic analysis showed that rhizobial Glbs segregate into the M, S and T Glb families, while structural analysis showed that predicted rhizobial SDgbs and fHbs and GCSs globin domain and tHbs fold into the 3/3- and 2/2-folding, respectively. Spectra from B. japonicum USDA38 and USDA58 soluble extracts exhibited peaks and troughs characteristic of bacterial and vertebrate Glbs thus indicating that putative Glbs are synthesized in B. japonicum USDA38 and USDA58. PMID:26594329

  13. Association mapping of disease resistance traits in rainbow trout using restriction site associated DNA sequencing.

    PubMed

    Campbell, Nathan R; LaPatra, Scott E; Overturf, Ken; Towner, Richard; Narum, Shawn R

    2014-10-28

    Recent advances in genotyping-by-sequencing have enabled genome-wide association studies in nonmodel species including those in aquaculture programs. As with other aquaculture species, rainbow trout and steelhead (Oncorhynchus mykiss) are susceptible to disease and outbreaks can lead to significant losses. Fish culturists have therefore been pursuing strategies to prevent losses to common pathogens such as Flavobacterium psychrophilum (the etiological agent for bacterial cold water disease [CWD]) and infectious hematopoietic necrosis virus (IHNV) by adjusting feed formulations, vaccine development, and selective breeding. However, discovery of genetic markers linked to disease resistance offers the potential to use marker-assisted selection to increase resistance and reduce outbreaks. For this study we sampled juvenile fish from 40 families from 2-yr classes that either survived or died after controlled exposure to either CWD or IHNV. Restriction site-associated DNA sequencing produced 4661 polymorphic single-nucleotide polymorphism loci after strict filtering. Genotypes from individual survivors and mortalities were then used to test for association between disease resistance and genotype at each locus using the program TASSEL. After we accounted for kinship and stratification of the samples, tests revealed 12 single-nucleotide polymorphism markers that were highly associated with resistance to CWD and 19 markers associated with resistance to IHNV. These markers are candidates for further investigation and are expected to be useful for marker assisted selection in future broodstock selection for various aquaculture programs.

  14. Mapping of sex-linked genes onto the genome sequence using various aberrations of the Z chromosome in Bombyx mori.

    PubMed

    Fujii, Tsuguru; Abe, Hiroaki; Katsuma, Susumu; Mita, Kazuei; Shimada, Toru

    2008-12-01

    Many strains of Bombyx mori carry chromosomal aberrations, and they are useful resources for integration between phenotypes and genomic sequences. We compared the molecular structures of three kinds of Z chromosomes, i.e., two strains with chromosome deletions and one strain with translocation involving the Z chromosome. Using polymerase chain reaction markers, we showed that: (1) the Z(1) chromosome lacks more than 6Mb, including the proximal end; (2) the Z(Vg) chromosome lacks 1.5Mb in the interstitial portion; and (3) the +(od)p(Sa)+(p)W carries a 0.6-Mb Z-derived fragment surrounding the +(od) gene. The breakpoint junctions of these deletions and a translocation were precisely determined. Through deletion mapping, we narrowed down the regions where distinct oily (od), vestigial (Vg), and muscle dystrophy (Md) are located and identified a candidate gene for od. A retroposon-mediated deletion in BmBLOS2--the Bombyx gene homologous to human "biogenesis of lysosome-related organelles complex-1, subunit 2''--was detected in the od mutant. Although the genes responsible for Vg and Md were not definitively identified, we propose the candidate genes on the basis of their locations and phenotypes.

  15. Homozygosity Mapping and Targeted Sanger Sequencing Identifies Three Novel CRB1 (Crumbs homologue 1) Mutations in Iranian Retinal Degeneration Families

    PubMed

    Ghofrani, Mohammad; Yahyaei, Mahin; Brunner, Han G.; Cremers, Frans P.M.; Movasat, Morteza; Imran Khan, Muhammad; Keramatipour, Mohammad

    2017-09-01

    Inherited retinal diseases (IRDs) are a group of genetic disorders with high degrees of clinical, genetic and allelic heterogeneity. IRDs generally show progressive retinal cell death resulting in gradual vision loss. IRDs constitute a broad spectrum of disorders including retinitis pigmentosa and Leber congenital amaurosis. In this study, we performed genotyping studies to identify the underlying mutations in three Iranian families. Having employed homozygosity mapping and Sanger sequencing, we identified the underlying mutations in the crumbs homologue 1 gene. The CRB1 protein is a part of a macromolecular complex with a vital role in retinal cell polarity, morphogenesis, and maintenance. We identified a novel homozygous variant (c.1053_1061del; p.Gly352_Cys354del) in one family, a combination of a novel (c.2086T>C; p.Cys696Arg) and a known variant (c.2234C>T, p.Thr745Met) in another family and a homozygous novel variant (c.3090T>A; p.Asn1030Lys) in a third family. This study shows that mutations in CRB1 are relatively common in Iranian non-syndromic IRD patients.

  16. Homozygosity Mapping and Targeted Sanger Sequencing Identifies Three Novel CRB1 (Crumbs homologue 1) Mutations in Iranian Retinal Degeneration Families

    PubMed Central

    Ghofrani, Mohammad; Yahyaei, Mahin; Brunner, Han G.; Cremers, Frans P.M.; Movasat, Morteza; Khan, Muhammad Imran; Keramatipour, Mohammad

    2017-01-01

    Background: Inherited retinal diseases (IRDs) are a group of genetic disorders with high degrees of clinical, genetic and allelic heterogeneity. IRDs generally show progressive retinal cell death resulting in gradual vision loss. IRDs constitute a broad spectrum of disorders including retinitis pigmentosa and Leber congenital amaurosis. In this study, we performed genotyping studies to identify the underlying mutations in three Iranian families. Methods: Having employed homozygosity mapping and Sanger sequencing, we identified the underlying mutations in the crumbs homologue 1 gene. The CRB1 protein is a part of a macromolecular complex with a vital role in retinal cell polarity, morphogenesis, and maintenance. Results: We identified a novel homozygous variant (c.1053_1061del; p.Gly352_Cys354del) in one family, a combination of a novel (c.2086T>C; p.Cys696Arg) and a known variant (c.2234C>T, p.Thr745Met) in another family and a homozygous novel variant (c.3090T>A; p.Asn1030Lys) in a third family. Conclusion: This study shows that mutations in CRB1 are relatively common in Iranian non-syndromic IRD patients.

  17. Resequencing the whole MYH7 gene (including the intronic, promoter, and 3' UTR sequences) in hypertrophic cardiomyopathy.

    PubMed

    Coto, Eliecer; Reguero, Julián R; Palacín, María; Gómez, Juan; Alonso, Belén; Iglesias, Sara; Martín, María; Tavira, Beatriz; Díaz-Molina, Beatriz; Morales, Carlos; Morís, César; Rodríguez-Lambert, José L; Corao, Ana I; Díaz, Marta; Alvarez, Victoria

    2012-09-01

    MYH7 mutations are found in ~20% of hypertrophic cardiomyopathy (HCM) patients. Currently, mutational analysis is based on the sequencing of the coding exons and a few exon-flanking intronic nucleotides, resulting in omission of single-exon deletions and mutations in internal intronic, promoter, and 3' UTR regions. We amplified and sequenced large MYH7 fragments in 60 HCM patients without previously identified sarcomere mutations. Lack of aberrant PCR fragments excluded single-exon deletions in the patients. Instead, we identified several new rare intronic variants. An intron 26 single nucleotide insertion (-5 insC) was predicted to affect pre-mRNA splicing, but allele frequencies did not differ between patients and controls (n = 150). We found several rare promoter variants in the patients compared to controls, some of which were in binding sites for transcription factors and could thus affect gene expression. Only one rare 3' UTR variant (c.*29T>C) found in the patients was absent among the controls. This nucleotide change would not affect the binding of known microRNAs. Therefore, MYH7 mutations outside the coding exon sequences would be rarely found among HCM patients. However, changes in the promoter region could be linked to the risk of developing HCM. Further research to define the functional effect of these variants on gene expression is necessary to confirm the role of the MYH7 promoter in cardiac hypertrophy. Copyright © 2012 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  18. SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean

    SciTech Connect

    Song, Qijian; Jia, Gaofeng; Hyten, David L.; Jenkins, Jerry; Hwang, Eun-Young; Schroeder, Steven G.; Osorno, Juan M.; Schmutz, Jeremy; Jackson, Scott A.; McClean, Phillip E.; Cregan, Perry B.

    2015-08-28

    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of large scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad.

  19. SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean.

    PubMed

    Song, Qijian; Jia, Gaofeng; Hyten, David L; Jenkins, Jerry; Hwang, Eun-Young; Schroeder, Steven G; Osorno, Juan M; Schmutz, Jeremy; Jackson, Scott A; McClean, Phillip E; Cregan, Perry B

    2015-08-28

    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of large scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad. Copyright © 2015 Song et al.

  20. SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean

    PubMed Central

    Song, Qijian; Jia, Gaofeng; Hyten, David L.; Jenkins, Jerry; Hwang, Eun-Young; Schroeder, Steven G.; Osorno, Juan M.; Schmutz, Jeremy; Jackson, Scott A.; McClean, Phillip E.; Cregan, Perry B.

    2015-01-01

    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of large scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad. PMID:26318155

  1. SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean

    DOE PAGES

    Song, Qijian; Jia, Gaofeng; Hyten, David L.; ...

    2015-08-28

    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of largemore » scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad.« less

  2. Glacial and periglacial geomorphology and its paleoclimatological significance in three North Ethiopian Mountains, including a detailed geomorphological map

    NASA Astrophysics Data System (ADS)

    Hendrickx, Hanne; Jacob, Miro; Frankl, Amaury; Nyssen, Jan

    2015-10-01

    Geomorphological investigations and detailed mapping of past and present (peri)glacial landforms are required in order to understand the impact of climatic anomalies. The Ethiopian Highlands show a great variety in past and contemporary climate, and therefore, in the occurrence of glacial and periglacial landforms. However, only a few mountain areas have been studied, and detailed geomorphological understanding is lacking. In order to allow a fine reconstruction of the impact of the past glacial cycle on the geomorphology, vegetation complexes, and temperature anomalies, a detailed geomorphological map of three mountain areas (Mt. Ferrah Amba, 12°51‧N 39°29‧E; Mt. Lib Amba, 12°04‧N 39°22‧; and Mt. Abuna Yosef, 12°08‧N 39°11‧E) was produced. In all three study areas, inactive solifluction lobes, presumably from the Last Glacial Maximum (LGM), were found. In the highest study area of Abuna Yosef, three sites were discovered bearing morainic material from small late Pleistocene glaciers. These marginal glaciers occurred below the modeled snowline and existed because of local topo-climatic conditions. Evidence of such Pleistocene avalanche-fed glaciers in Ethiopia (and Africa) has not been produced earlier. Current frost action is limited to frost cracks and small-scale patterned ground phenomena. The depression of the altitudinal belts of periglacial and glacial processes during the last cold period was assessed through periglacial and glacial landform mapping and comparisons with data from other mountain areas taking latitude into account. The depression of glacial and periglacial belts of approximately 600 m implies a temperature drop around 6 °C in the last cold period. This cooling is in line with temperature depressions elsewhere in East Africa during the LGM. This study serves as a case study for all the intermediate mountains (3500-4200 m) of the North Ethiopian highlands.

  3. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

    USDA-ARS?s Scientific Manuscript database

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these s...

  4. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  5. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    PubMed

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  6. Genome-wide mapping of ORC and Mcm2p binding sites on tiling arrays and identification of essential ARS consensus sequences in S. cerevisiae.

    PubMed

    Xu, Weihong; Aparicio, Jennifer G; Aparicio, Oscar M; Tavaré, Simon

    2006-10-26

    Eukaryotic replication origins exhibit different initiation efficiencies and activation times within S-phase. Although local chromatin structure and function influences origin activity, the exact mechanisms remain poorly understood. A key to understanding the exact features of chromatin that impinge on replication origin function is to define the precise locations of the DNA sequences that control origin function. In S. cerevisiae, Autonomously Replicating Sequences (ARSs) contain a consensus sequence (ACS) that binds the Origin Recognition Complex (ORC) and is essential for origin function. However, an ACS is not sufficient for origin function and the majority of ACS matches do not function as ORC binding sites, complicating the specific identification of these sites. To identify essential origin sequences genome-wide, we utilized a tiled oligonucleotide array (NimbleGen) to map the ORC and Mcm2p binding sites at high resolution. These binding sites define a set of potential Autonomously Replicating Sequences (ARSs), which we term nimARSs. The nimARS set comprises 529 ORC and/or Mcm2p binding sites, which includes 95% of known ARSs, and experimental verification demonstrates that 94% are functional. The resolution of the analysis facilitated identification of potential ACSs (nimACSs) within 370 nimARSs. Cross-validation shows that the nimACS predictions include 58% of known ACSs, and experimental verification indicates that 82% are essential for ARS activity. These findings provide the most comprehensive, accurate, and detailed mapping of ORC binding sites to date, adding to the emerging picture of the chromatin organization of the budding yeast genome.

  7. Genomic organization and sequence of the human NRAMP gene: identification and mapping of a promoter region polymorphism.

    PubMed Central

    Blackwell, J. M.; Barton, C. H.; White, J. K.; Searle, S.; Baker, A. M.; Williams, H.; Shaw, M. A.

    1995-01-01

    BACKGROUND: Murine Nramp is a candidate for the macrophage resistance gene Ity/Lsh/Bcg. Sequence analysis of human NRAMP was undertaken to determine its role in man. MATERIALS AND METHODS: A yeast artificial chromosome carrying NRAMP was subcloned and positive clones sequenced. The transcriptional start site was mapped using 5' RACE PCR. Polymorphic variants were amplified by PCR. Linkage analysis was used to map NRAMP. RESULTS: NRAMP spans 12kb and has 15 exons encoding a 550 amino acid protein showing 85% identity (92% similarity) with Nramp. Two conserved PKC sites occur in exon 2 encoding the Pro/Ser rich SH3 binding domain, and in exon 3. Striking sequence similarities (57 and 53%) were observed with yeast mitochondrial proteins, SMF1 and SMF2, especially within putative functional domains: exon 6 encoding the second transmembrane spanning domain, site of the murine susceptibility mutation; and exon 11 encoding a conserved transport motif. No mutations comparable to the murine susceptibility mutation were found. The transcriptional initiation site mapped 148 bp 5' of the translational initiation codon. 440bp of 5' flanking sequence contained putative promoter region elements: 6 interferon-gamma response elements, 3 W-elements, 3 NF kappa B binding sites and 1 AP-1 site. Nine purine-rich GGAA core motifs for the myeloid-specific PU.1 transcription factor were identified, two combining with imperfect AP1-like sites to create PEA3 motifs. TATA, GC and CCAAT boxes were absent. A possible enhancer element containing the Z-DNA forming dinucleotide repeat t(gt),ac(gt),ac(gt),g was polymorphic (4 alleles; n = 4,9,10,11), and was used to map NRAMP to 2q35. CONCLUSIONS: This analysis provides important resources to study the role of NRAMP in human disease. Images FIG. 3 FIG. 4 PMID:8529098

  8. Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

    Treesearch

    K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

    1998-01-01

    DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...

  9. Construction of a High-Density American Cranberry (Vaccinium macrocarpon Ait.) Composite Map Using Genotyping-by-Sequencing for Multi-pedigree Linkage Mapping

    PubMed Central

    Schlautman, Brandon; Covarrubias-Pazaran, Giovanny; Diaz-Garcia, Luis; Iorizzo, Massimo; Polashock, James; Grygleski, Edward; Vorsa, Nicholi; Zalapa, Juan

    2017-01-01

    The American cranberry (Vaccinium macrocarpon Ait.) is a recently domesticated, economically important, fruit crop with limited molecular resources. New genetic resources could accelerate genetic gain in cranberry through characterization of its genomic structure and by enabling molecular-assisted breeding strategies. To increase the availability of cranberry genomic resources, genotyping-by-sequencing (GBS) was used to discover and genotype thousands of single nucleotide polymorphisms (SNPs) within three interrelated cranberry full-sib populations. Additional simple sequence repeat (SSR) loci were added to the SNP datasets and used to construct bin maps for the parents of the populations, which were then merged to create the first high-density cranberry composite map containing 6073 markers (5437 SNPs and 636 SSRs) on 12 linkage groups (LGs) spanning 1124 cM. Interestingly, higher rates of recombination were observed in maternal than paternal gametes. The large number of markers in common (mean of 57.3) and the high degree of observed collinearity (mean Pair-wise Spearman rank correlations >0.99) between the LGs of the parental maps demonstrates the utility of GBS in cranberry for identifying polymorphic SNP loci that are transferable between pedigrees and populations in future trait-association studies. Furthermore, the high-density of markers anchored within the component maps allowed identification of segregation distortion regions, placement of centromeres on each of the 12 LGs, and anchoring of genomic scaffolds. Collectively, the results represent an important contribution to the current understanding of cranberry genomic structure and to the availability of molecular tools for future genetic research and breeding efforts in cranberry. PMID:28250016

  10. Construction of a High-Density American Cranberry (Vaccinium macrocarpon Ait.) Composite Map Using Genotyping-by-Sequencing for Multi-pedigree Linkage Mapping.

    PubMed

    Schlautman, Brandon; Covarrubias-Pazaran, Giovanny; Diaz-Garcia, Luis; Iorizzo, Massimo; Polashock, James; Grygleski, Edward; Vorsa, Nicholi; Zalapa, Juan

    2017-04-03

    The American cranberry (Vaccinium macrocarpon Ait.) is a recently domesticated, economically important, fruit crop with limited molecular resources. New genetic resources could accelerate genetic gain in cranberry through characterization of its genomic structure and by enabling molecular-assisted breeding strategies. To increase the availability of cranberry genomic resources, genotyping-by-sequencing (GBS) was used to discover and genotype thousands of single nucleotide polymorphisms (SNPs) within three interrelated cranberry full-sib populations. Additional simple sequence repeat (SSR) loci were added to the SNP datasets and used to construct bin maps for the parents of the populations, which were then merged to create the first high-density cranberry composite map containing 6073 markers (5437 SNPs and 636 SSRs) on 12 linkage groups (LGs) spanning 1124 cM. Interestingly, higher rates of recombination were observed in maternal than paternal gametes. The large number of markers in common (mean of 57.3) and the high degree of observed collinearity (mean Pair-wise Spearman rank correlations >0.99) between the LGs of the parental maps demonstrates the utility of GBS in cranberry for identifying polymorphic SNP loci that are transferable between pedigrees and populations in future trait-association studies. Furthermore, the high-density of markers anchored within the component maps allowed identification of segregation distortion regions, placement of centromeres on each of the 12 LGs, and anchoring of genomic scaffolds. Collectively, the results represent an important contribution to the current understanding of cranberry genomic structure and to the availability of molecular tools for future genetic research and breeding efforts in cranberry. Copyright © 2017 Schlautman et al.

  11. Porcine PPARGC1A (peroxisome proliferative activated receptor gamma coactivator 1A): coding sequence, genomic organization, polymorphisms and mapping.

    PubMed

    Jacobs, K; Rohrer, G; Van Poucke, M; Piumi, F; Yerle, M; Barthenschlager, H; Mattheeuws, M; Van Zeveren, A; Peelman, L J

    2006-01-01

    We report here the characterisation of porcine PPARGC1A. Primers based on human PPARGC1A were used to isolate two porcine BAC clones. Porcine coding sequences of PPARGC1A were sequenced together with the splice site regions and the 5' and 3' regions. Using direct sequencing nine SNPs were found. Allele frequencies were determined in unrelated animals of five different pig breeds. In the MARC Meishan-White Composite resource population, the polymorphism in exon 9 was significantly associated with leaf fat weight. PPARGC1A has been mapped by FISH to SSC8p21. A (CA)n microsatellite (SGU0001) has been localised near marker SWR1101 on chromosome 8 by RH mapping and at the same position as marker KS195 (32.5 cM) by linkage mapping. The AseI (nt857, Asn/Asn489) polymorphism in exon 8 was used to perform linkage analysis in the Hohenheim pedigrees and located the gene in the same genomic region. Transcription of the gene was detected in adipose, muscle, kidney, liver, brain, heart and adrenal gland tissues, which is in agreement with the function of PPARGC1A in adaptive thermogenesis. Copyright 2006 S. Karger AG, Basel.

  12. Construction of the first high-density genetic linkage map of Salvia miltiorrhiza using specific length amplified fragment (SLAF) sequencing

    PubMed Central

    Liu, Tian; Guo, Linlin; Pan, Yuling; Zhao, Qi; Wang , Jianhua; Song, Zhenqiao

    2016-01-01

    Salvia miltiorrhiza is an important medicinal crop in traditional Chinese medicine (TCM). Knowledge of its genetic foundation is limited because sufficient molecular markers have not been developed, and therefore a high-density genetic linkage map is incomplete. Specific length amplified fragment sequencing (SLAF-seq) is a recently developed high-throughput strategy for large-scale SNP (Single Nucleotide Polymorphisms) discovery and genotyping based on next generation sequencing (NGS). In this study, genomic DNA extracted from two parents and their 96 F1 individuals was subjected to high-throughput sequencing and SLAF library construction. A total of 155.96 Mb of data containing 155,958,181 pair-end reads were obtained after preprocessing. The average coverage of each SLAF marker was 83.43-fold for the parents compared with 10.36-fold for the F1 offspring. The final linkage map consists of 5,164 SLAFs in 8 linkage groups (LGs) and spans 1,516.43 cM, with an average distance of 0.29 cM between adjacent markers. The results will not only provide a platform for mapping quantitative trait loci but also offer a critical new tool for S. miltiorrhiza biotechnology and comparative genomics as well as a valu