Science.gov

Sample records for sequence including maps

  1. Physical mapping 220 kb centromeric of the human MHC and DNA sequence analysis of the 43-kb segment including the RING1, HKE6, and HKE4 genes.

    PubMed

    Kikuti, Y Y; Tamiya, G; Ando, A; Chen, L; Kimura, M; Ferreira, E; Tsuji, K; Trowsdale, J; Inoko, H

    1997-06-15

    A cosmid contig was constructed from a YAC clone with a 220-kb insert that spans the centromeric side of the human MHC class II region, corresponding to the mouse t complex. The gene order was identified to be HSET-HKE1.5-HKE2-HKE3-RING1-HKE6- HKE4 (RING5). The genomic sequence of a 42,801-bp long region encoded by one cosmid clone in the RING1, HKE6, and HKE4 subregions was determined by the shotgun method. The exon-intron organization of these three genes, RING1 (Ring finger protein), HKE6 (steroid dehydrogenase-like protein), and HKE4 (transmembrane protein with histidine-rich charge clusters), was determined. The previously reported RING2 gene was revealed to be identical to HKE6. Transcripts from HKE4 were detected in the placenta, lung, kidney, and pancreas. Those of HKE6 were found in the liver and pancreas. The 25-kb region proximal to the RING1 gene includes an extensive dense cluster of Alu repeats (about 1.2 Alu per kb), and no gene has been identified in this so far. The region is equivalent to part of the mouse t complex and could be of relevance to human development. PMID:9205114

  2. Genetic mapping and DNA sequencing

    SciTech Connect

    Speed, T.; Waterman, M.S.

    1996-12-31

    The Human Genome Initiative has as its primary objective the characterization of the human genome. High-resolution linkage maps of genetic markers will play an important role in completing the human genome project. This is one of two volumes based on the proceedings of the 1994 IMA Summer Program on Molecular Biology and comprises Weeks 1 and 2 of the four-week program. This volume focuses on genetic mapping and DNA sequencing. Selected papers are indexed separately for inclusion in the Energy Science and Technology Database.

  3. Benchmarking short sequence mapping tools

    PubMed Central

    2013-01-01

    Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764

  4. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    SciTech Connect

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  5. A Statistical Approach for Ambiguous Sequence Mappings

    Technology Transfer Automated Retrieval System (TEKTRAN)

    When attempting to map RNA sequences to a reference genome, high percentages of short sequence reads are often assigned to multiple genomic locations. One approach to handling these “ambiguous mappings” has been to discard them. This results in a loss of data, which can sometimes be as much as 45% o...

  6. Quantitative texton sequences for legible bivariate maps.

    PubMed

    Ware, Colin

    2009-01-01

    Representing bivariate scalar maps is a common but difficult visualization problem. One solution has been to use two dimensional color schemes, but the results are often hard to interpret and inaccurately read. An alternative is to use a color sequence for one variable and a texture sequence for another. This has been used, for example, in geology, but much less studied than the two dimensional color scheme, although theory suggests that it should lead to easier perceptual separation of information relating to the two variables. To make a texture sequence more clearly readable the concept of the quantitative texton sequence (QTonS) is introduced. A QTonS is defined a sequence of small graphical elements, called textons, where each texton represents a different numerical value and sets of textons can be densely displayed to produce visually differentiable textures. An experiment was carried out to compare two bivariate color coding schemes with two schemes using QTonS for one bivariate map component and a color sequence for the other. Two different key designs were investigated (a key being a sequence of colors or textures used in obtaining quantitative values from a map). The first design used two separate keys, one for each dimension, in order to measure how accurately subjects could independently estimate the underlying scalar variables. The second key design was two dimensional and intended to measure the overall integral accuracy that could be obtained. The results show that the accuracy is substantially higher for the QTonS/color sequence schemes. A hypothesis that texture/color sequence combinations are better for independent judgments of mapped quantities was supported. A second experiment probed the limits of spatial resolution for QTonSs. PMID:19834229

  7. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  8. Interior view looking SW includes map hanging from ceiling and ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Interior view looking SW includes map hanging from ceiling and edge of fire finder stand on right. - Badger Mountain Lookout, .125 mile northwest of Badger Mountain summit, East Wenatchee, Douglas County, WA

  9. Strong nucleosomes of mouse genome including recovered centromeric sequences.

    PubMed

    Salih, Bilal F; Teif, Vladimir B; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Recently discovered strong nucleosomes (SNs) characterized by visibly periodical DNA sequences have been found to concentrate in centromeres of Arabidopsis thaliana and in transient meiotic centromeres of Caenorhabditis elegans. To find out whether such affiliation of SNs to centromeres is a more general phenomenon, we studied SNs of the Mus musculus. The publicly available genome sequences of mouse, as well as of practically all other eukaryotes do not include the centromere regions which are difficult to assemble because of a large amount of repeat sequences in the centromeres and pericentromeric regions. We recovered those missing sequences using the data from MNase-seq experiments in mouse embryonic stem cells, where the sequence of DNA inside nucleosomes, including missing regions, was determined by 100-bp paired-end sequencing. Those nucleosome sequences, which are not matching to the published genome sequence, would largely belong to the centromeres. By evaluating SN densities in centromeres and in non-centromeric regions, we conclude that mouse SNs concentrate in the centromeres of telocentric mouse chromosomes, with ~3.9 times excess compared to their density in the rest of the genome. The remaining non-centromeric SNs are harbored mainly by introns and intergenic regions, by retro-transposons, in particular. The centromeric involvement of the SNs opens new horizons for the chromosome and centromere structure studies. PMID:24998943

  10. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  11. Mapping and Sequencing the Human Genome

    DOE R&D Accomplishments Database

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  12. Sequence analysis by iterated maps, a review

    PubMed Central

    2014-01-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, ‘Chaos Game Representation’. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  13. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.

    PubMed

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  14. Sequence mapping of the Californian MSW strain of Myxoma virus.

    PubMed

    Labudovic, A; Perkins, H; van Leeuwen, B; Kerr, P

    2004-03-01

    Partial sequence mapping of the MSW Californian strain of Myxoma virus was performed by cloning EcoRI and SalI restriction fragments of viral DNA and sequencing the ends of these. In this way, regions of 74 MSW open reading frames were sequenced and mapped onto the complete genome sequences of the related leporipoxviruses South American Myxoma virus and Rabbit fibroma virus to form a partial map of the MSW strain. In general, gene locations and sequences were conserved between the three viruses. However the Californian Myxoma virus was more closely related to South American myxoma virus than to Rabbit fibroma virus based on sequence comparisons and the presence of three genes that have been lost from the Rabbit fibroma virus genome. Compared to the other two viruses, the main difference found in the MSW genome was that the terminal inverted repeats were extended with the duplication of 5 complete open reading frames (M151R, M152R, M153R, M154L, M156R) and partial duplication of one open reading frame (M150R). This rearrangement was associated with the loss of the majority of the M009L open reading frame. Three known virulence genes, including the serine proteinase inhibitor (SERPIN) genes M151R and M152R and leukemia associated protein (LAP) gene M153R, and the potential virulence gene M156R are now present in two copies. PMID:14991443

  15. Analecta of structures formed during the 28 June 1992 Landers-Big Bear, California earthquake sequence (including maps of shear zones, belts of shear zones, tectonic ridge, duplex en echelon fault, fault elements, and thrusts in restraining steps)

    SciTech Connect

    Johnson, A.M.; Johnson, N.A.; Johnson, K.M.; Wei, W.; Fleming, R.W.; Cruikshank, K.M.; Martosudarmo, S.Y.

    1997-12-31

    The June 28, 1992, M{sub s} 7.5 earthquake at Landers, California, which occurred about 10 km north of the community of Yucca Valley, California, produced spectacular ground rupturing more than 80 km in length (Hough and others, 1993). The ground rupturing, which was dominated by right-lateral shearing, extended along at least four distinct faults arranged broadly en echelon. The faults were connected through wide transfer zones by stepovers, consisting of right-lateral fault zones and tension cracks. The Landers earthquakes occurred in the desert of southeastern California, where details of ruptures were well preserved, and patterns of rupturing were generally unaffected by urbanization. The structures were varied and well-displayed and, because the differential displacements were so large, spectacular. The scarcity of vegetation, the aridity of the area, the compactness of the alluvium and bedrock, and the relative isotropy and brittleness of surficial materials collaborated to provide a marvelous visual record of the character of the deformation zones. The authors present a series of analecta -- that is, verbal clips or snippets -- dealing with a variety of structures, including belts of shear zones, segmentation of ruptures, rotating fault block, en echelon fault zones, releasing duplex structures, spines, and ramps. All of these structures are documented with detailed maps in text figures or in plates (in pocket). The purpose is to describe the structures and to present an understanding of the mechanics of their formation. Hence, most descriptions focus on structures where the authors have information on differential displacements as well as spatial data on the position and orientation of fractures.

  16. Target Enrichment Improves Mapping of Complex Traits by Deep Sequencing

    PubMed Central

    Guo, Jianjun; Fan, Jue; Hauser, Bernard A.; Rhee, Seung Y.

    2015-01-01

    Complex traits such as crop performance and human diseases are controlled by multiple genetic loci, many of which have small effects and often go undetected by traditional quantitative trait locus (QTL) mapping. Recently, bulked segregant analysis with large F2 pools and genome-level markers (named extreme-QTL or X-QTL mapping) has been used to identify many QTL. To estimate parameters impacting QTL detection for X-QTL mapping, we simulated the effects of population size, marker density, and sequencing depth of markers on QTL detectability for traits with differing heritabilities. These simulations indicate that a high (>90%) chance of detecting QTL with at least 5% effect requires 5000× sequencing depth for a trait with heritability of 0.4−0.7. For most eukaryotic organisms, whole-genome sequencing at this depth is not economically feasible. Therefore, we tested and confirmed the feasibility of applying deep sequencing of target-enriched markers for X-QTL mapping. We used two traits in Arabidopsis thaliana with different heritabilities: seed size (H2 = 0.61) and seedling greening in response to salt (H2 = 0.94). We used a modified G test to identify QTL regions and developed a model-based statistical framework to resolve individual peaks by incorporating recombination rates. Multiple QTL were identified for both traits, including previously undiscovered QTL. We call our method target-enriched X-QTL (TEX-QTL) mapping; this mapping approach is not limited by the genome size or the availability of recombinant inbred populations and should be applicable to many organisms and traits. PMID:26530422

  17. User guide for mapping-by-sequencing in Arabidopsis

    PubMed Central

    2013-01-01

    Mapping-by-sequencing combines genetic mapping with whole-genome sequencing in order to accelerate mutant identification. However, application of mapping-by-sequencing requires decisions on various practical settings on the experimental design that are not intuitively answered. Following an experimentally determined recombination landscape of Arabidopsis and next generation sequencing-specific biases, we simulated more than 400,000 mapping-by-sequencing experiments. This allowed us to evaluate a broad range of different types of experiments and to develop general rules for mapping-by-sequencing in Arabidopsis. Most importantly, this informs about the properties of different crossing scenarios, the number of recombinants and sequencing depth needed for successful mapping experiments. PMID:23773572

  18. From synaptic plasticity to spatial maps and sequence learning.

    PubMed

    Mehta, Mayank R

    2015-06-01

    The entorhinal-hippocampal circuit is crucial for several forms of learning and memory, especially sequence learning, including spatial navigation. The challenge is to understand the underlying mechanisms. Pioneering discoveries of spatial selectivity in this circuit, i.e. place cells and grid cells, provided a major step forward in tackling this challenge. Considerable research has also shown that sequence learning relies on synaptic plasticity, especially the Hebbian or the NMDAR-dependent synaptic plasticity. This raises several questions: Are spatial maps plastic? If so, what is the contribution of Hebbian plasticity to spatial map plasticity? How does the spatial map plasticity contribute to sequence learning? A combination of computational and experimental studies has shown that NMDAR-mediated plasticity and theta rhythm can have specific effects on the formation and experiential modification of spatial maps to facilitate predictive coding. Advances in transgenic techniques have provided further support for these mechanisms. Although many exciting challenges remain, these findings have brought us closer to solving the puzzle of how the hippocampal system contributes to spatial memory, and point to a way forward. PMID:25929239

  19. Simulation of Accident Sequences Including Emergency Operating Procedures

    SciTech Connect

    Queral, Cesar; Exposito, Antonio; Hortal, Javier

    2004-07-01

    Operator actions play an important role in accident sequences. However, design analysis (Safety Analysis Report, SAR) seldom includes consideration of operator actions, although they are required by compulsory Emergency Operating Procedures (EOP) to perform some checks and actions from the very beginning of the accident. The basic aim of the project is to develop a procedure validation system which consists of the combination of three elements: a plant transient simulation code TRETA (a C based modular program) developed by the CSN, a computerized procedure system COPMA-III (Java technology based program) developed by the OECD-Halden Reactor Project and adapted for simulation with the contribution of our group and a software interface that provides the communication between COPMA-III and TRETA. The new combined system is going to be applied in a pilot study in order to analyze sequences initiated by secondary side breaks in a Pressurized Water Reactors (PWR) plant. (authors)

  20. Mapping Challenging Mutations by Whole-Genome Sequencing

    PubMed Central

    Smith, Harold E.; Fabritius, Amy S.; Jaramillo-Lambert, Aimee; Golden, Andy

    2016-01-01

    Whole-genome sequencing provides a rapid and powerful method for identifying mutations on a global scale, and has spurred a renewed enthusiasm for classical genetic screens in model organisms. The most commonly characterized category of mutation consists of monogenic, recessive traits, due to their genetic tractability. Therefore, most of the mapping methods for mutation identification by whole-genome sequencing are directed toward alleles that fulfill those criteria (i.e., single-gene, homozygous variants). However, such approaches are not entirely suitable for the characterization of a variety of more challenging mutations, such as dominant and semidominant alleles or multigenic traits. Therefore, we have developed strategies for the identification of those classes of mutations, using polymorphism mapping in Caenorhabditis elegans as our model for validation. We also report an alternative approach for mutation identification from traditional recombinant crosses, and a solution to the technical challenge of sequencing sterile or terminally arrested strains where population size is limiting. The methods described herein extend the applicability of whole-genome sequencing to a broader spectrum of mutations, including classes that are difficult to map by traditional means. PMID:26945029

  1. Mapping Challenging Mutations by Whole-Genome Sequencing.

    PubMed

    Smith, Harold E; Fabritius, Amy S; Jaramillo-Lambert, Aimee; Golden, Andy

    2016-01-01

    Whole-genome sequencing provides a rapid and powerful method for identifying mutations on a global scale, and has spurred a renewed enthusiasm for classical genetic screens in model organisms. The most commonly characterized category of mutation consists of monogenic, recessive traits, due to their genetic tractability. Therefore, most of the mapping methods for mutation identification by whole-genome sequencing are directed toward alleles that fulfill those criteria (i.e., single-gene, homozygous variants). However, such approaches are not entirely suitable for the characterization of a variety of more challenging mutations, such as dominant and semidominant alleles or multigenic traits. Therefore, we have developed strategies for the identification of those classes of mutations, using polymorphism mapping in Caenorhabditis elegans as our model for validation. We also report an alternative approach for mutation identification from traditional recombinant crosses, and a solution to the technical challenge of sequencing sterile or terminally arrested strains where population size is limiting. The methods described herein extend the applicability of whole-genome sequencing to a broader spectrum of mutations, including classes that are difficult to map by traditional means. PMID:26945029

  2. Mapping copy number variation by population-scale genome sequencing.

    PubMed

    Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

    2011-02-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies. PMID:21293372

  3. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1993-02-16

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a pu GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon.

  4. Halvade: scalable sequence analysis with MapReduce

    PubMed Central

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2015-01-01

    Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR. Its source is available at http://bioinformatics.intec.ugent.be/halvade under GPL license. Contact: jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25819078

  5. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 11 figures.

  6. cDNA encoding a polypeptide including a hevein sequence

    SciTech Connect

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  7. CDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  8. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  9. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 12 figs.

  10. A Probabilistic Approach for Improved Sequence Mapping in Metatranscriptomic Studies

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mapping millions of short DNA sequences a reference genome is a necessary step in many experiments designed to investigate the expression of genes involved in disease resistance. This is a difficult task in which several challenges often arise resulting in a suboptimal mapping. This mapping process ...

  11. Regulated expression of repetitive sequences including the identifier sequence during myotube formation in culture.

    PubMed Central

    Herget, T; Reich, M; Stüber, K; Starzinski-Powitz, A

    1986-01-01

    We have isolated and characterized a cDNA of 1183 bp, pL6-411, from rat L6 muscle cells. This cDNA contains repetitive sequences - including two inverted copies of the previously described identifier sequence - as shown by sequence analysis. Repetitive sequences from pL6-411 characterize a family of RNAs which is specifically induced during L6 myotube formation. Another part of the pL6-411 sequence, existing at low-copy number per haploid rat genome, hybridized to two RNAs of 5 kb and 2 kb from L6 myoblasts as well as from L6 myotubes. A third pL6-411-related RNA of 150 bases was detected which hybridized with the repetitive sequence but did not hybridize with the low-copy number part of pL6-411. It appears that the 'identifier' sequence in this population of small RNAs is complementary to one of the 'identifier' copies in the pL6-411-related RNA. Finally, we identified on cDNA pL6-411 the recognition site for the TGGCA-binding protein and in both orientations a total of four putative promoters for RNA polymerase III. Images Fig.1. Fig.2. Fig.3. PMID:2423328

  12. Microbial genome sequencing using optical mapping and Illumina sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  13. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB. PMID:25387956

  14. Sequencing and mapping of the onion genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of DNA sequencing continues to decline and, in the near future, it will become reasonable to undertake sequencing of the enormous nuclear genome of onion. We undertook sequencing of expressed and genomic regions of the onion genome to learn about the structure of the onion genome, as well a...

  15. QTL mapping using high-throughput sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Quantitative trait locus (QTL) mapping in plants dates to the 1980’s, but earlier studies were often hindered by the expense and time required to identify large numbers of polymorphic genetic markers that differentiated the parental genotypes and then to genotype them on large segregating mapping po...

  16. Restoration of distorted depth maps calculated from stereo sequences

    NASA Technical Reports Server (NTRS)

    Damour, Kevin; Kaufman, Howard

    1991-01-01

    A model-based Kalman estimator is developed for spatial-temporal filtering of noise and other degradations in velocity and depth maps derived from image sequences or cinema. As an illustration of the proposed procedures, edge information from image sequences of rigid objects is used in the processing of the velocity maps by selecting from a series of models for directional adaptive filtering. Adaptive filtering then allows for noise reduction while preserving sharpness in the velocity maps. Results from several synthetic and real image sequences are given.

  17. Quality Assessment of Mapping Building Textures from Infrared Image Sequences

    NASA Astrophysics Data System (ADS)

    Hoegner, L.; Iwaszczuk, D.; Stilla, U.

    2012-07-01

    Generation and texturing of building models is a fast developing field of research. Several techniques have been developed to extract building geometry and textures from multiple images and image sequences. In this paper, these techniques are discussed and extended to automatically add new textures from infrared (IR) image sequences to existing building models. In contrast to existing work, geometry and textures are not generated together from the same dataset but the textures are extracted from the image sequence and matched to an existing geo-referenced 3D building model. The texture generation is divided in two main parts. The first part deals with the estimation and refinement of the exterior camera orientation. Feature points are extracted in the images and used as tie points in the sequence. A recorded exterior orientation of the camera s added to these homologous points and a bundle adjustment is performed starting on image pairs and combining the hole sequence. A given 3d model of the observed building is additionally added to introduce further constraint as ground control points in the bundle adjustment. The second part includes the extraction of textures from the images and the combination of textures from different images of the sequence. Using the reconstructed exterior camera orientation for every image of the sequence, the visible facades are projected into the image and texture is extracted. These textures normally contain only parts of the facade. The partial textures extracted from all images are combined to one facade texture. This texture is stored with a 3D reference to the corresponding facade. This allows searching for features in textures and localising those features in 3D space. It will be shown, that the proposed strategy allows texture extraction and mapping even for big building complexes with restricted viewing possibilities and for images with low optical resolution.

  18. Single Nucleotide Polymorphism Mapping Using Genome-Wide Unique Sequences

    PubMed Central

    Chen, Leslie Y.Y.; Lu, Szu-Hsien; Shih, Edward S.C.; Hwang, Ming-Jing

    2002-01-01

    As more and more genomic DNAs are sequenced to characterize human genetic variations, the demand for a very fast and accurate method to genomically position these DNA sequences is high. We have developed a new mapping method that does not require sequence alignment. In this method, we first identified DNA fragments of 15 bp in length that are unique in the human genome and then used them to position single nucleotide polymorphism (SNP) sequences. By use of four desktop personal computers with AMD K7 (1 GHz) processors, our new method mapped more than 1.6 million SNP sequences in 20 hr and achieved a very good agreement with mapping results from alignment-based methods. PMID:12097348

  19. Appliation of rad-sequencing to linkage mapping in citrus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    High density linkage maps can be developed for modest cost using high-throughput DNA sequencing to genotype a defined fraction (representation) of the genome. We developed linkage maps in two citrus populations using the RAD (Restriction site Associated DNA) genotyping method which involves restrict...

  20. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Namhai Chua; Kush, A.

    1993-02-16

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids.

  1. A physical map of the papaya genome with integrated genetic map and genome sequence

    PubMed Central

    2009-01-01

    Background Papaya is a major fruit crop in tropical and subtropical regions worldwide and has primitive sex chromosomes controlling sex determination in this trioecious species. The papaya genome was recently sequenced because of its agricultural importance, unique biological features, and successful application of transgenic papaya for resistance to papaya ringspot virus. As a part of the genome sequencing project, we constructed a BAC-based physical map using a high information-content fingerprinting approach to assist whole genome shotgun sequence assembly. Results The physical map consists of 963 contigs, representing 9.4× genome equivalents, and was integrated with the genetic map and genome sequence using BAC end sequences and a sequence-tagged high-density genetic map. The estimated genome coverage of the physical map is about 95.8%, while 72.4% of the genome was aligned to the genetic map. A total of 1,181 high quality overgo (overlapping oligonucleotide) probes representing conserved sequences in Arabidopsis and genetically mapped loci in Brassica were anchored on the physical map, which provides a foundation for comparative genomics in the Brassicales. The integrated genetic and physical map aligned with the genome sequence revealed recombination hotspots as well as regions suppressed for recombination across the genome, particularly on the recently evolved sex chromosomes. Suppression of recombination spread to the adjacent region of the male specific region of the Y chromosome (MSY), and recombination rates were recovered gradually and then exceeded the genome average. Recombination hotspots were observed at about 10 Mb away on both sides of the MSY, showing 7-fold increase compared with the genome wide average, demonstrating the dynamics of recombination of the sex chromosomes. Conclusion A BAC-based physical map of papaya was constructed and integrated with the genetic map and genome sequence. The integrated map facilitated the draft genome assembly

  2. Time-dependent accident sequences including human actions

    SciTech Connect

    Apostolakis, G.; Chu, T.L.

    1984-02-01

    During an accident, transitions between plant states can occur due to operator intervention and the failure of systems while running. The latter cause of transition is much less likely than the first, which includes errors of commission and omission as well as recovery of lost functions. A methodology has been developed to model these transitions in the time domain. As an example, it is applied to the analysis of Three-Mile-Island-type accidents. Statistical evidence is collected and used in assessing the frequency of stuck-open power-operated relief valves at Babcock and Wilcox plants as well as the frequency of misdiagnosis. Statistical data are also used in modeling the timing of operator actions during the accident, i.e., turning off and on the high-pressure injection system and closing the block valves.

  3. Complete MHC haplotype sequencing for common disease gene mapping.

    PubMed

    Stewart, C Andrew; Horton, Roger; Allcock, Richard J N; Ashurst, Jennifer L; Atrazhev, Alexey M; Coggill, Penny; Dunham, Ian; Forbes, Simon; Halls, Karen; Howson, Joanna M M; Humphray, Sean J; Hunt, Sarah; Mungall, Andrew J; Osoegawa, Kazutoyo; Palmer, Sophie; Roberts, Anne N; Rogers, Jane; Sims, Sarah; Wang, Yu; Wilming, Laurens G; Elliott, John F; de Jong, Pieter J; Sawcer, Stephen; Todd, John A; Trowsdale, John; Beck, Stephan

    2004-06-01

    The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification. PMID:15140828

  4. Temperature sequences for categorizing all ternary distillation boundary maps

    SciTech Connect

    Peterson, E.J.; Partin, L.R.

    1997-05-01

    Temperature sequences are formulated as a complete method of categorizing the feasible distillation boundary maps (DBMs) for ternary systems which commonly have unique binary and ternary azeotropes. DBMs are simplified versions of residue curve maps. The method requires the boiling temperatures at system pressure of pure components and azeotropes, if they exist. Seven position numbers are assigned to the pure components (three) and azeotropes (three binary, one ternary). The boiling temperatures are sorted to rank the position numbers. The temperature sequence is defined as the ranking of position numbers. The position numbers of missing azeotropes are excluded from the sequence. An algorithm searches all possible temperature sequences for feasible DBMs. The result is a complete listing of 125 DBMs, 307 temperature sequences, and 382 [temperature sequence, DBM] pairs. Lookup tables simplify the procedure for finding the DBM(s) for a temperature sequence or finding the temperature sequences for a DBM. Example applications are presented for applying the technique in the initial screening for distillation system synthesis.

  5. A high density physical map of chromosome 1BL supports evolutionary studies, map-based cloning and sequencing in wheat

    PubMed Central

    2013-01-01

    Background As for other major crops, achieving a complete wheat genome sequence is essential for the application of genomics to breeding new and improved varieties. To overcome the complexities of the large, highly repetitive and hexaploid wheat genome, the International Wheat Genome Sequencing Consortium established a chromosome-based strategy that was validated by the construction of the physical map of chromosome 3B. Here, we present improved strategies for the construction of highly integrated and ordered wheat physical maps, using chromosome 1BL as a template, and illustrate their potential for evolutionary studies and map-based cloning. Results Using a combination of novel high throughput marker assays and an assembly program, we developed a high quality physical map representing 93% of wheat chromosome 1BL, anchored and ordered with 5,489 markers including 1,161 genes. Analysis of the gene space organization and evolution revealed that gene distribution and conservation along the chromosome results from the superimposition of the ancestral grass and recent wheat evolutionary patterns, leading to a peak of synteny in the central part of the chromosome arm and an increased density of non-collinear genes towards the telomere. With a density of about 11 markers per Mb, the 1BL physical map provides 916 markers, including 193 genes, for fine mapping the 40 QTLs mapped on this chromosome. Conclusions Here, we demonstrate that high marker density physical maps can be developed in complex genomes such as wheat to accelerate map-based cloning, gain new insights into genome evolution, and provide a foundation for reference sequencing. PMID:23800011

  6. Mapping and Initial Analysis of Human Subtelomeric Sequence Assemblies

    PubMed Central

    Riethman, Harold; Ambrosini, Anthony; Castaneda, Carlos; Finklestein, Jeffrey; Hu, Xue-Lan; Mudunuri, Uma; Paul, Sheila; Wei, Jun

    2004-01-01

    Physical mapping data were combined with public draft and finished sequences to derive subtelomeric sequence assemblies for each of the 41 genetically distinct human telomere regions. Sequence gaps that remain on the reference telomeres are generally small,well-defined,and for the most part,restricted to regions directly adjacent to the terminal (TTAGGG)n tract. Of the 20.66 Mb of subtelomeric DNA analyzed, 3.01 Mb are subtelomeric repeat sequences (Srpt),and an additional 2.11 Mb are segmental duplications. The subtelomeric sequence assemblies are enriched >25-fold in short,internal (TTAGGG)n-like sequences relative to the rest of the genome; a total of 114 (TTAGGG)n-like islands were found,55 within Srpt regions,35 within one-copy regions,11 at one-copy/Srpt or Srpt/segmental duplication boundaries,and 13 at the telomeric ends of assemblies. Transcripts were annotated in each assembly,noting their mapping coordinates relative to their respective telomere and whether they originate in duplicated DNA or single-copy DNA. A total of 697 transcripts were found in 15.53 Mb of one-copy DNA,76 transcripts in 2.11 Mb of segmentally duplicated DNA,and 168 transcripts in 3.01 Mb of Srpt sequence. This overall transcript density is similar (within ∼10%) to that found genome-wide. Zinc finger-containing genes and olfactory receptor genes are duplicated within and between multiple telomere regions. PMID:14707167

  7. Evolutionary optimization of biopolymers and sequence structure maps

    SciTech Connect

    Reidys, C.M.; Kopp, S.; Schuster, P.

    1996-06-01

    Searching for biopolymers having a predefined function is a core problem of biotechnology, biochemistry and pharmacy. On the level of RNA sequences and their corresponding secondary structures we show that this problem can be analyzed mathematically. The strategy will be to study the properties of the RNA sequence to secondary structure mapping that is essential for the understanding of the search process. We show that to each secondary structure s there exists a neutral network consisting of all sequences folding into s. This network can be modeled as a random graph and has the following generic properties: it is dense and has a giant component within the graph of compatible sequences. The neutral network percolates sequence space and any two neutral nets come close in terms of Hamming distance. We investigate the distribution of the orders of neutral nets and show that above a certain threshold the topology of neutral nets allows to find practically all frequent secondary structures.

  8. A High Resolution Genetic Map Anchoring Scaffolds of the Sequenced Watermelon Genome

    PubMed Central

    Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F8 population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits. PMID:22247776

  9. Fractal MapReduce decomposition of sequence alignment

    PubMed Central

    2012-01-01

    Background The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. Results In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. Conclusions The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing. Availability Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm". PMID:22551205

  10. A sequence-tagged site map of human chromosome 11.

    PubMed

    Smith, M W; Clark, S P; Hutchinson, J S; Wei, Y H; Churukian, A C; Daniels, L B; Diggle, K L; Gen, M W; Romo, A J; Lin, Y

    1993-09-01

    We report the construction of 370 sequence-tagged sites (STSs) that are detectable by PCR amplification under sets of standardized conditions and that have been regionally mapped to human chromosome 11. DNA sequences were determined by sequencing directly from cosmid templates using primers complementary to T3 and T7 promoters present in the cloning vector. Oligonucleotide PCR primers were predicted by computer and tested using a battery of genomic DNAs. Cosmids were regionally localized on chromosome 11 by using fluorescence in situ hybridization or by analyzing a somatic cell hybrid panel. Additional STSs corresponding to known genes and markers on chromosome 11 were also produced under the same series of standardized conditions. The resulting STSs provide uniform coverage of chromosome 11 with an average spacing of 340 kb. The DNA sequence determined for use in STS production corresponds to about 0.1% (116 kb) of chromosome 11 and has been analyzed for the presence of repetitive sequences, similarities to known genes and motifs, and possible exons. Computer analysis of this sequence has identified and therefore mapped at least eight new genes on chromosome 11. PMID:8244387

  11. Fast and sensitive mapping of nanopore sequencing reads with GraphMap

    PubMed Central

    Sović, Ivan; Šikić, Mile; Wilm, Andreas; Fenlon, Shannon Nicole; Chen, Swaine; Nagarajan, Niranjan

    2016-01-01

    Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10–80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap. PMID:27079541

  12. Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

    PubMed

    Sović, Ivan; Šikić, Mile; Wilm, Andreas; Fenlon, Shannon Nicole; Chen, Swaine; Nagarajan, Niranjan

    2016-01-01

    Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap. PMID:27079541

  13. Physical mapping of human chromosomes by repetitive sequence fingerprinting.

    PubMed Central

    Stallings, R L; Torney, D C; Hildebrand, C E; Longmire, J L; Deaven, L L; Jett, J H; Doggett, N A; Moyzis, R K

    1990-01-01

    We have developed an approach for identifying overlapping cosmid clones by exploiting the high density of repetitive sequences in complex genomes. Individual clones are fingerprinted, using a combination of restriction enzyme digestions followed by hybridization with selected classes of repetitive sequences. This "repeat fingerprinting" technique allows small regions of clone overlap (10-20%) to be unambiguously assigned. We demonstrate the utility of this approach, using the fingerprinting of 3145 cosmid clones (1.25 x coverage), containing one or more (GT)n repeats, from human chromosome 16. A statistical analysis was used to link these clones into 460 contiguous sequences (contigs), averaging 106 kilobases (kb) in length and representing approximately 54% (48.7 Mb) of the euchromatic arms of this chromosome. These values are consistent with theoretical calculations and indicate that 150- to 200-kb contigs can be generated with 1.5 x coverage. This strategy requires the fingerprinting of approximately one-fourth as many cosmids as random strategies requiring 50% minimum overlap for overlap detection. By "nucleating" at specific regions in the human genome, and exploiting the high density of interspersed sequences, this approach allows (i) the rapid generation of large (greater than 100-kb) contigs in the early stages of contig mapping and (ii) the production of a contig map with useful landmarks for rapid integration of the genetic and physical maps. Images PMID:2385591

  14. Computationally mapping sequence space to understand evolutionary protein engineering.

    PubMed

    Armstrong, Kathryn A; Tidor, Bruce

    2008-01-01

    Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures. PMID:18020358

  15. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

    PubMed

    Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair; Stewart, Chip; Garrison, Erik P; Marth, Gabor T

    2014-01-01

    MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me). PMID:24599324

  16. DOE project on genome mapping and sequencing. Progress report, 1992

    SciTech Connect

    Evans, G.A.

    1992-12-31

    These efforts on the human genome project were initiated in September, 1990, to contribute towards completion of the human genome project physical mapping effort. In the original application, the authors proposed a novel strategy for constructing a physical map of human chromosome 11, based upon techniques derived in this group and by others. The original goals were to (1) produce a set of cosmid reference clones mapped to specific sites by high resolution fluorescence in situ hybridization, (2) produce a set of associated STS sequences and PCR primers for each site, (3) isolate YAC clones corresponding to each STS and, (4) construct YAC contigs such that > 90% of the chromosome would be covered by contigs of 2 mb or greater. Since that time, and with the advent of new technology and reagents, the strategy has been modified slightly but still retains the same goals as originally proposed. The authors have added a project to produce chromosome 11-specific cDNAs and determine the map location and DNA sequence of a selected portion of them.

  17. Mapping DNA polymerase errors by single-molecule sequencing.

    PubMed

    Lee, David F; Lu, Jenny; Chang, Seungwoo; Loparo, Joseph J; Xie, Xiaoliang S

    2016-07-27

    Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replication product is tagged with a unique nucleotide sequence before amplification. This allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases. PMID:27185891

  18. Mapping of aldose reductase gene sequences to human chromosomes 1, 3, 7, 9, 11, and 13

    SciTech Connect

    Bateman, J.B.; Kojis, T. UCLA School of Medicine, Los Angeles, CA ); Heinzmann, C.; Sparkes, R.S.; Klisak, I.; Diep, A. ); Carper, D. ); Nishimura, Chihiro ); Mohandas, T. )

    1993-09-01

    Aldose reductase (alditol:NAD(P)+ 1-oxidoreductase; EC 1.1.1.21) (AR) catalyzes the reduction of several aldehydes, including that of glucose, to the corresponding sugar alcohol. Using a complementary DNA clone encoding human AR, the authors mapped the gene sequences to human chromosomes 1, 3, 7, 9, 11, 13, 14, and 18 by somatic cell hybridization. By in situ hybridization analysis, sequences were localized to human chromosomes 1q32-q43, 3p12, 7q31-q35, 9q22, 11p14-p15, and 13q14-q21. As a putative functional AR gene has been mapped to chromosome 7 and a putative pseudogene to chromosome 3, the sequences on the other seven chromosomes may represent other active genes, non-aldose reductase homologous sequences, or pseudogenes. 24 refs., 3 figs., 2 tabs.

  19. Sequence, molecular properties, and chromosomal mapping of mouse lumican

    NASA Technical Reports Server (NTRS)

    Funderburgh, J. L.; Funderburgh, M. L.; Hevelone, N. D.; Stech, M. E.; Justice, M. J.; Liu, C. Y.; Kao, W. W.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

    1995-01-01

    PURPOSE. Lumican is a major proteoglycan of vertebrate cornea. This study characterizes mouse lumican, its molecular form, cDNA sequence, and chromosomal localization. METHODS. Lumican sequence was determined from cDNA clones selected from a mouse corneal cDNA expression library using a bovine lumican cDNA probe. Tissue expression and size of lumican mRNA were determined using Northern hybridization. Glycosidase digestion followed by Western blot analysis provided characterization of molecular properties of purified mouse corneal lumican. Chromosomal mapping of the lumican gene (Lcn) used Southern hybridization of a panel of genomic DNAs from an interspecific murine backcross. RESULTS. Mouse lumican is a 338-amino acid protein with high-sequence identity to bovine and chicken lumican proteins. The N-terminus of the lumican protein contains consensus sequences for tyrosine sulfation. A 1.9-kb lumican mRNA is present in cornea and several other tissues. Antibody against bovine lumican reacted with recombinant mouse lumican expressed in Escherichia coli and also detected high molecular weight proteoglycans in extracts of mouse cornea. Keratanase digestion of corneal proteoglycans released lumican protein, demonstrating the presence of sulfated keratan sulfate chains on mouse corneal lumican in vivo. The lumican gene (Lcn) was mapped to the distal region of mouse chromosome 10. The Lcn map site is in the region of a previously identified developmental mutant, eye blebs, affecting corneal morphology. CONCLUSIONS. This study demonstrates sulfated keratan sulfate proteoglycan in mouse cornea and describes the tools (antibodies and cDNA) necessary to investigate the functional role of this important corneal molecule using naturally occurring and induced mutants of the murine lumican gene.

  20. Mapping by sequencing the Pneumocystis genome using the ordering DNA sequences V3 tool.

    PubMed

    Xu, Zheng; Lance, Britton; Vargas, Claudia; Arpinar, Budak; Bhandarkar, Suchendra; Kraemer, Eileen; Kochut, Krys J; Miller, John A; Wagner, Jeff R; Weise, Michael J; Wunderlich, John K; Stringer, James; Smulian, George; Cushion, Melanie T; Arnold, Jonathan

    2003-04-01

    A bioinformatics tool called ODS3 has been created for mapping by sequencing. The tool allows the creation of integrated genomic maps from genetic, physical mapping, and sequencing data and permits an integrated genome map to be stored, retrieved, viewed, and queried in a stand-alone capacity, in a client/server relationship with the Fungal Genome Database (FGDB), and as a web-browsing tool for the FGDB. In that ODS3 is programmed in Java, the tool promotes platform independence and supports export of integrated genome-mapping data in the extensible markup language (XML) for data interchange with other genome information systems. The tool ODS3 is used to create an initial integrated genome map of the AIDS-related fungal pathogen, Pneumocystis carinii. Contig dynamics would indicate that this physical map is approximately 50% complete with approximately 200 contigs. A total of 10 putative multigene families were found. Two of these putative families were previously characterized in P. carinii, namely the major surface glycoproteins (MSGs) and HSP70 proteins; three of these putative families (not previously characterized in P. carinii) were found to be similar to families encoding the HSP60 in Schizosaccharomyces pombe, the heat-shock psi protein in S. pombe, and the RNA synthetase family (i.e., MES1) in Saccharomyces cerevisiae. Physical mapping data are consistent with the 16S, 5.8S, and 26S rDNA genes being single copy in P. carinii. No other fungus outside this genus is known to have the rDNA genes in single copy. PMID:12702676

  1. Random-breakage mapping method applied to human DNA sequences

    NASA Technical Reports Server (NTRS)

    Lobrich, M.; Rydberg, B.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)

    1996-01-01

    The random-breakage mapping method [Game et al. (1990) Nucleic Acids Res., 18, 4453-4461] was applied to DNA sequences in human fibroblasts. The methodology involves NotI restriction endonuclease digestion of DNA from irradiated calls, followed by pulsed-field gel electrophoresis, Southern blotting and hybridization with DNA probes recognizing the single copy sequences of interest. The Southern blots show a band for the unbroken restriction fragments and a smear below this band due to radiation induced random breaks. This smear pattern contains two discontinuities in intensity at positions that correspond to the distance of the hybridization site to each end of the restriction fragment. By analyzing the positions of those discontinuities we confirmed the previously mapped position of the probe DXS1327 within a NotI fragment on the X chromosome, thus demonstrating the validity of the technique. We were also able to position the probes D21S1 and D21S15 with respect to the ends of their corresponding NotI fragments on chromosome 21. A third chromosome 21 probe, D21S11, has previously been reported to be close to D21S1, although an uncertainty about a second possible location existed. Since both probes D21S1 and D21S11 hybridized to a single NotI fragment and yielded a similar smear pattern, this uncertainty is removed by the random-breakage mapping method.

  2. Functional mapping of sequence learning in normal humans.

    PubMed

    Grafton, S T; Hazeltine, E; Ivry, R

    1995-01-01

    The brain localization of motor sequence learning was studied in normal subjects with positron emission tomography. Subjects performed a serial reaction time (SRT) task by responding to a series of stimuli that occurred at four different spatial positions. The stimulus locations were either determined randomly or according to a 6-element sequence that cycled continuously. The SRT task was performed under two conditions. With attentional interference from a secondary counting task there was no development of awareness of the sequence. Learning-related increases of cerebral blood flow were located in contralateral motor effector areas including motor cortex, supplementary motor area, and putamen, consistent with the hypothesis that nondeclarative motor learning occurs in cerebral areas that control limb movements. Additional cortical sites included the rostral prefrontal cortex and parietal cortex. The SRT learning task was then repeated with a new sequence and no attentional interference. In this condition, 7 of 12 subjects developed awareness of the sequence. Learning-related blood flow increases were present in right dorsolateral prefrontal cortex, right premotor cortex, right ventral putamen, and biparieto-occipital cortex. The right dorsolateral prefrontal and parietal areas have been previously implicated in spatial working memory and right prefrontal cortex is also implicated in retrieval tasks of verbal episodic memory. Awareness of the sequence at the end of learning was associated with greater activity in bilateral parietal, superior temporal, and right premotor cortex. Motor learning can take place in different cerebral areas, contingent on the attentional demands of the task. PMID:23961907

  3. An autotetraploid linkage map of rose (Rosa hybrida) validated using the strawberry (Fragaria vesca) genome sequence.

    PubMed

    Gar, Oron; Sargent, Daniel J; Tsai, Ching-Jung; Pleban, Tzili; Shalev, Gil; Byrne, David H; Zamir, Dani

    2011-01-01

    Polyploidy is a pivotal process in plant evolution as it increase gene redundancy and morphological intricacy but due to the complexity of polysomic inheritance we have only few genetic maps of autopolyploid organisms. A robust mapping framework is particularly important in polyploid crop species, rose included (2n = 4x = 28), where the objective is to study multiallelic interactions that control traits of value for plant breeding. From a cross between the garden, peach red and fragrant cultivar Fragrant Cloud (FC) and a cut-rose yellow cultivar Golden Gate (GG), we generated an autotetraploid GGFC mapping population consisting of 132 individuals. For the map we used 128 sequence-based markers, 141 AFLP, 86 SSR and three morphological markers. Seven linkage groups were resolved for FC (Total 632 cM) and GG (616 cM) which were validated by markers that segregated in both parents as well as the diploid integrated consensus map.The release of the Fragaria vesca genome, which also belongs to the Rosoideae, allowed us to place 70 rose sequenced markers on the seven strawberry pseudo-chromosomes. Synteny between Rosa and Fragaria was high with an estimated four major translocations and six inversions required to place the 17 non-collinear markers in the same order. Based on a verified linear order of the rose markers, we could further partition each of the parents into its four homologous groups, thus providing an essential framework to aid the sequencing of an autotetraploid genome. PMID:21647382

  4. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

    PubMed

    Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

    2016-07-01

    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. PMID:26801360

  5. Single-Molecule Real-Time Sequencing Combined with Optical Mapping Yields Completely Finished Fungal Genome

    PubMed Central

    Faino, Luigi; Seidl, Michael F.; Datema, Erwin; van den Berg, Grardy C. M.; Janssen, Antoine; Wittenberg, Alexander H. J.

    2015-01-01

    ABSTRACT Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio-generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled chromosomes. Furthermore, the addition of optical map data allowed us to produce a gapless and complete V. dahliae genome assembly of the expected eight chromosomes from telomere to telomere. Consequently, we can now study genomic regions that were previously not assembled or poorly assembled, including regions that are populated by repetitive sequences, such as transposons, allowing us to fully appreciate an organism’s biological complexity. Our data show that a combination of PacBio-generated long reads and optical mapping can be used to generate complete and gapless assemblies of fungal genomes. PMID:26286689

  6. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map

    SciTech Connect

    Kelleher, Colin; CHIU, Dr. R.; Shin, Dr. H.; Krywinski, Martin; Fjell, Chris; Wilkin, Jennifer; Yin, Tongming; Difazio, Stephen P.

    2007-01-01

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 {+-} 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.

  7. A high resolution genetic map anchoring scaffolds of the sequenced watermelon genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high-density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of ...

  8. Rapid multipoint linkage analysis of recessive traits in nuclear families, including homozygosity mapping

    SciTech Connect

    Kruglyak, L.; Daly, M.J.; Lander, E.S. |

    1995-02-01

    Homozygosity mapping is a powerful strategy for mapping rare recessive traits in children of consanguineous marriages. Practical applications of this strategy are currently limited by the inability of conventional linkage analysis software to compute, in reasonable time, multipoint LOD scores for pedigrees with inbreeding loops. We have developed a new algorithm for rapid multipoint likelihood calculations in small pedigrees, including those with inbreeding loops. The running time of the algorithm grows, at most, linearly with the number of loci considered simultaneously. The running time is not sensitive to the presence of inbreeding loops, missing genotype information, and highly polymorphic loci. We have incorporated this algorithm into a software package, MAPMAKER/HOMOZ, that allows very rapid multipoint mapping of disease genes in nuclear families, including homozygosity mapping. Multipoint analysis with dozens of markers can be carried out in minutes on a personal workstation. 23 refs., 4 figs., 1 tab.

  9. Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing

    PubMed Central

    2014-01-01

    Background Genetic linkage maps are useful tools for mapping quantitative trait loci (QTL) influencing variation in traits of interest in a population. Genotyping-by-sequencing approaches such as Restriction-site Associated DNA sequencing (RAD-Seq) now enable the rapid discovery and genotyping of genome-wide SNP markers suitable for the development of dense SNP linkage maps, including in non-model organisms such as Atlantic salmon (Salmo salar). This paper describes the development and characterisation of a high density SNP linkage map based on SbfI RAD-Seq SNP markers from two Atlantic salmon reference families. Results Approximately 6,000 SNPs were assigned to 29 linkage groups, utilising markers from known genomic locations as anchors. Linkage maps were then constructed for the four mapping parents separately. Overall map lengths were comparable between male and female parents, but the distribution of the SNPs showed sex-specific patterns with a greater degree of clustering of sire-segregating SNPs to single chromosome regions. The maps were integrated with the Atlantic salmon draft reference genome contigs, allowing the unique assignment of ~4,000 contigs to a linkage group. 112 genome contigs mapped to two or more linkage groups, highlighting regions of putative homeology within the salmon genome. A comparative genomics analysis with the stickleback reference genome identified putative genes closely linked to approximately half of the ordered SNPs and demonstrated blocks of orthology between the Atlantic salmon and stickleback genomes. A subset of 47 RAD-Seq SNPs were successfully validated using a high-throughput genotyping assay, with a correspondence of 97% between the two assays. Conclusions This Atlantic salmon RAD-Seq linkage map is a resource for salmonid genomics research as genotyping-by-sequencing becomes increasingly common. This is aided by the integration of the SbfI RAD-Seq SNPs with existing reference maps and the draft reference genome, as well

  10. A high-resolution radiation hybrid map of the human genome draft sequence.

    PubMed

    Olivier, M; Aggarwal, A; Allen, J; Almendras, A A; Bajorek, E S; Beasley, E M; Brady, S D; Bushard, J M; Bustos, V I; Chu, A; Chung, T R; De Witte, A; Denys, M E; Dominguez, R; Fang, N Y; Foster, B D; Freudenberg, R W; Hadley, D; Hamilton, L R; Jeffrey, T J; Kelly, L; Lazzeroni, L; Levy, M R; Lewis, S C; Liu, X; Lopez, F J; Louie, B; Marquis, J P; Martinez, R A; Matsuura, M K; Misherghi, N S; Norton, J A; Olshen, A; Perkins, S M; Perou, A J; Piercy, C; Piercy, M; Qin, F; Reif, T; Sheppard, K; Shokoohi, V; Smick, G A; Sun, W L; Stewart, E A; Fernando, J; Tejeda; Tran, N M; Trejo, T; Vo, N T; Yan, S C; Zierten, D L; Zhao, S; Sachidanandam, R; Trask, B J; Myers, R M; Cox, D R

    2001-02-16

    We have constructed a physical map of the human genome by using a panel of 90 whole-genome radiation hybrids (the TNG panel) in conjunction with 40,322 sequence-tagged sites (STSs) derived from random genomic sequences as well as expressed sequences. Of 36,678 STSs on the TNG radiation hybrid map, only 3604 (9.8%) were absent from the unassembled draft sequence of the human genome. Of 20,030 STSs ordered on the TNG map as well as the assembled human genome draft sequence and the Celera assembled human genome sequence, 36% of the STSs had a discrepant order between the working draft sequence and the Celera sequence. The TNG map order was identical to one of the two sequence orders in 60% of these discrepant cases. PMID:11181994

  11. Mapping the mosaic sequence of primate visual cortical development

    PubMed Central

    Mundinano, Inaki-Carril; Kwan, William Chin; Bourne, James A.

    2015-01-01

    Traditional “textbook” theory suggests that the development and maturation of visual cortical areas occur as a wave from V1. However, more recent evidence would suggest that this is not the case, and the emergence of extrastriate areas occurs in a non-hierarchical fashion. This proposition comes from both physiological and anatomical studies but the actual developmental sequence of extrastriate areas remains unknown. In the current study, we examined the development and maturation of the visual cortex of the marmoset monkey, a New World simian, from embryonic day 130 (15 days prior to birth) through to adulthood. Utilizing the well-described expression characteristics of the calcium-binding proteins calbindin and parvalbumin, and nonphosphorylated neurofilament for the pyramidal neurons, we were able to accurately map the sequence of development and maturation of the visual cortex. To this end, we demonstrated that both V1 and middle temporal area (MT) emerge first and that MT likely supports dorsal stream development while V1 supports ventral stream development. Furthermore, the emergence of the dorsal stream-associated areas was significantly earlier than ventral stream areas. The difference in the temporal development of the visual streams is likely driven by a teleological requirement for specific visual behavior in early life. PMID:26539084

  12. Comparative mapping of expressed sequence tags containing microsatellites in rainbow trout (Oncorhynchus mykiss)

    PubMed Central

    Rexroad, Caird E; Rodriguez, Maria F; Coulibaly, Issa; Gharbi, Karim; Danzmann, Roy G; DeKoning, Jenefer; Phillips, Ruth; Palti, Yniv

    2005-01-01

    Background Comparative genomics, through the integration of genetic maps from species of interest with whole genome sequences of other species, will facilitate the identification of genes affecting phenotypes of interest. The development of microsatellite markers from expressed sequence tags will serve to increase marker densities on current salmonid genetic maps and initiate in silico comparative maps with species whose genomes have been fully sequenced. Results Eighty-nine polymorphic microsatellite markers were generated for rainbow trout of which at least 74 amplify in other salmonids. Fifty-five have been associated with functional annotation and 30 were mapped on existing genetic maps. Homologous sequences were identified for 20 of the EST containing microsatellites to identify comparative assignments within the tetraodon, mouse, and/or human genomes. Conclusion The addition of microsatellite markers constructed from expressed sequence tag data will facilitate the development of high-density genetic maps for rainbow trout and comparative maps with other salmonids and better studied species. PMID:15836796

  13. A sequence-based genetic map of Medicago truncatula and comparison of marker colinearity with M. sativa.

    PubMed Central

    Choi, Hong-Kyu; Kim, Dongjin; Uhm, Taesik; Limpens, Eric; Lim, Hyunju; Mun, Jeong-Hwan; Kalo, Peter; Penmetsa, R Varma; Seres, Andrea; Kulikova, Olga; Roe, Bruce A; Bisseling, Ton; Kiss, Gyorgy B; Cook, Douglas R

    2004-01-01

    A core genetic map of the legume Medicago truncatula has been established by analyzing the segregation of 288 sequence-characterized genetic markers in an F(2) population composed of 93 individuals. These molecular markers correspond to 141 ESTs, 80 BAC end sequence tags, and 67 resistance gene analogs, covering 513 cM. In the case of EST-based markers we used an intron-targeted marker strategy with primers designed to anneal in conserved exon regions and to amplify across intron regions. Polymorphisms were significantly more frequent in intron vs. exon regions, thus providing an efficient mechanism to map transcribed genes. Genetic and cytogenetic analysis produced eight well-resolved linkage groups, which have been previously correlated with eight chromosomes by means of FISH with mapped BAC clones. We anticipated that mapping of conserved coding regions would have utility for comparative mapping among legumes; thus 60 of the EST-based primer pairs were designed to amplify orthologous sequences across a range of legume species. As an initial test of this strategy, we used primers designed against M. truncatula exon sequences to rapidly map genes in M. sativa. The resulting comparative map, which includes 68 bridging markers, indicates that the two Medicago genomes are highly similar and establishes the basis for a Medicago composite map. PMID:15082563

  14. Investigating bisulfite short-read mapping failure with hairpin bisulfite sequencing data

    PubMed Central

    2015-01-01

    Background DNA methylation is an important epigenetic mark relevant to normal development and disease genesis. A common approach to characterizing genome-wide DNA methylation is using Next Generation Sequencing technology to sequence bisulfite treated DNA. The short sequence reads are mapped to the reference genome to determine the methylation statuses of Cs. However, despite intense effort, a much smaller proportion of the reads derived from bisulfite treated DNA (usually about 40-80%) can be mapped than regular short reads mapping (> 90%), and it is unclear what factors lead to this low mapping efficiency. Results To address this issue, we used the hairpin bisulfite sequencing technology to determine sequences of both DNA double strands simultaneously. This enabled the recovery of the original non-bisulfite-converted sequences. We used Bismark for bisulfite read mapping and Bowtie2 for recovered read mapping. We found that recovering the reads improved unique mapping efficiency by 9-10% compared to the bisulfite reads. Such improvement in mapping efficiency is related to sequence entropy. Conclusions The hairpin recovery technique improves mapping efficiency, and sequence entropy relates to mapping efficiency. PMID:26576456

  15. Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant.

    PubMed

    Wu, Pingzhi; Zhou, Changpin; Cheng, Shifeng; Wu, Zhenying; Lu, Wenjia; Han, Jinli; Chen, Yanbo; Chen, Yan; Ni, Peixiang; Wang, Ying; Xu, Xun; Huang, Ying; Song, Chi; Wang, Zhiwen; Shi, Nan; Zhang, Xudong; Fang, Xiaohua; Yang, Qing; Jiang, Huawu; Chen, Yaping; Li, Meiru; Wang, Ying; Chen, Fan; Wang, Jun; Wu, Guojiang

    2015-03-01

    The family Euphorbiaceae includes some of the most efficient biomass accumulators. Whole genome sequencing and the development of genetic maps of these species are important components in molecular breeding and genetic improvement. Here we report the draft genome of physic nut (Jatropha curcas L.), a biodiesel plant. The assembled genome has a total length of 320.5 Mbp and contains 27,172 putative protein-coding genes. We established a linkage map containing 1208 markers and anchored the genome assembly (81.7%) to this map to produce 11 pseudochromosomes. After gene family clustering, 15,268 families were identified, of which 13,887 existed in the castor bean genome. Analysis of the genome highlighted specific expansion and contraction of a number of gene families during the evolution of this species, including the ribosome-inactivating proteins and oil biosynthesis pathway enzymes. The genomic sequence and linkage map provide a valuable resource not only for fundamental and applied research on physic nut but also for evolutionary and comparative genomics analysis, particularly in the Euphorbiaceae. PMID:25603894

  16. Chromosome mapping of repetitive sequences in four Serrasalmidae species (Characiformes)

    PubMed Central

    Ribeiro, Leila Braga; Matoso, Daniele Aparecida; Feldberg, Eliana

    2014-01-01

    The Serrasalmidae family is composed of a number of commercially interesting species, mainly in the Amazon region where most of these fishes occur. In the present study, we investigated the genomic organization of the 18S and 5S rDNA and telomeric sequences in mitotic chromosomes of four species from the basal clade of the Serrasalmidae family: Colossoma macropomum, Mylossoma aureum, M. duriventre, and Piaractus mesopotamicus, in order to understand the chromosomal evolution in the family. All the species studied had diploid numbers 2n = 54 and exclusively biarmed chromosomes, but variations of the karyotypic formulas were observed. C-banding resulted in similar patterns among the analyzed species, with heterochromatic blocks mainly present in centromeric regions. The 18S rDNA mapping of C. macropomum and P. mesopotamicus revealed multiple sites of this gene; 5S rDNA sites were detected in two chromosome pairs in all species, although not all of them were homeologs. Hybridization with a telomeric probe revealed signals in the terminal portions of chromosomes in all the species and an interstitial signal was observed in one pair of C. macropomum. PMID:24688290

  17. The Cancer Experience Map: An Approach to Including the Patient Voice in Supportive Care Solutions

    PubMed Central

    2015-01-01

    The perspective of the patient, also called the “patient voice”, is an essential element in materials created for cancer supportive care. Identifying that voice, however, can be a challenge for researchers and developers. A multidisciplinary team at a health information company tasked with addressing this issue created a representational model they call the “cancer experience map”. This map, designed as a tool for content developers, offers a window into the complex perspectives inside the cancer experience. Informed by actual patient quotes, the map shows common overall themes for cancer patients, concerns at key treatment points, strategies for patient engagement, and targeted behavioral goals. In this article, the team members share the process by which they created the map as well as its first use as a resource for cancer support videos. The article also addresses the broader policy implications of including the patient voice in supportive cancer content, particularly with regard to mHealth apps. PMID:26022846

  18. HetMappsS: Heterozygous mapping strategy for high resolution Genotyping-by-Sequencing Markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Reduced representation genotyping approaches, such as genotyping-by-sequencing (GBS), provide opportunities to generate high-resolution genetic maps at a low per-sample cost. However, missing data and non-uniform sequence coverage can complicate map creation in highly heterozygous species. To facili...

  19. Sequencing the Pig Genome Using a Mapped BAC by BAC Approach

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We have generated a highly contiguous physical map covering >98% of the pig genome in just 176 contigs. The map is localised to the genome through integration with the UIUC RH map as well BAC end sequence alignments to the human genome. Over 265k HindIII restriction digest fingerprints totalling 1...

  20. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing

    PubMed Central

    Imielinski, Marcin; Berger, Alice H.; Hammerman, Peter S.; Hernandez, Bryan; Pugh, Trevor J.; Hodis, Eran; Cho, Jeonghee; Suh, James; Capelletti, Marzia; Sivachenko, Andrey; Sougnez, Carrie; Auclair, Daniel; Lawrence, Michael; Stojanov, Petar; Cibulskis, Kristian; Choi, Kyusam; de Waal, Luc; Sharifnia, Tanaz; Brooks, Angela; Greulich, Heidi; Banerji, Shantanu; Zander, Thomas; Seidel, Danila; Leenders, Frauke; Ansén, Sascha; Ludwig, Corinna; Engel-Riedel, Walburga; Stoelben, Erich; Wolf, Jürgen; Goparju, Chandra; Thompson, Kristin; Winckler, Wendy; Kwiatkowski, David; Johnson, Bruce E.; Jänne, Pasi A.; Miller, Vincent A.; Pao, William; Travis, William D.; Pass, Harvey; Gabriel, Stacey; Lander, Eric; Thomas, Roman K.; Garraway, Levi A.; Getz, Gad; Meyerson, Matthew

    2012-01-01

    SUMMARY Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for over 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole genome sequence analysis revealed frequent structural re-arrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma. PMID:22980975

  1. Mapping Nucleotide Sequences that Encode Complex Binary Disease Traits with HapMap

    PubMed Central

    Cui, Yuehua; Fu, Wenjiang; Sun, Kelian; Romero, Roberto; Wu, Rongling

    2007-01-01

    Detecting the patterns of DNA sequence variants across the human genome is a crucial step for unraveling the genetic basis of complex human diseases. The human HapMap constructed by single nucleotide polymorphisms (SNPs) provides efficient sequence variation information that can speed up the discovery of genes related to common diseases. In this article, we present a generalized linear model for identifying specific nucleotide variants that encode complex human diseases. A novel approach is derived to group haplotypes to form composite diplotypes, which largely reduces the model degrees of freedom for an association test and hence increases the power when multiple SNP markers are involved. An efficient two-stage estimation procedure based on the expectation-maximization (EM) algorithm is derived to estimate parameters. Non-genetic environmental or clinical risk factors can also be fitted into the model. Computer simulations show that our model has reasonable power and type I error rate with appropriate sample size. It is also suggested through simulations that a balanced design with approximately equal number of cases and controls should be preferred to maintain small estimation bias and reasonable testing power. To illustrate the utility, we apply the method to a genetic association study of large for gestational age (LGA) neonates. The model provides a powerful tool for elucidating the genetic basis of complex binary diseases. PMID:19384427

  2. Transcriptome sequencing to produce a SNP-based genetic map of onion

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sequencing the onion genome is challenging because of its enormous size (16 giga base pairs DNA per haploid genome). Pilot sequencing of onion transcripts showed sufficient numbers of single nucleotide polymorphisms (SNPs) to develop a detailed genetic map. We sequenced 2.5 Roche-454 plates of norma...

  3. A mapping of an ensemble of mitochondrial sequences for various organisms into 3D space based on the word composition.

    PubMed

    Aita, Takuyo; Nishigaki, Koichi

    2012-11-01

    To visualize a bird's-eye view of an ensemble of mitochondrial genome sequences for various species, we recently developed a novel method of mapping a biological sequence ensemble into Three-Dimensional (3D) vector space. First, we represented a biological sequence of a species s by a word-composition vector x(s), where its length [absolute value]x(s)[absolute value] represents the sequence length, and its unit vector x(s)/[absolute value]x(s)[absolute value] represents the relative composition of the K-tuple words through the sequence and the size of the dimension, N=4(K), is the number of all possible words with the length of K. Second, we mapped the vector x(s) to the 3D position vector y(s), based on the two following simple principles: (1) [absolute value]y(s)[absolute value]=[absolute value]x(s)[absolute value] and (2) the angle between y(s) and y(t) maximally correlates with the angle between x(s) and x(t). The mitochondrial genome sequences for 311 species, including 177 Animalia, 85 Fungi and 49 Green plants, were mapped into 3D space by using K=7. The mapping was successful because the angles between vectors before and after the mapping highly correlated with each other (correlation coefficients were 0.92-0.97). Interestingly, the Animalia kingdom is distributed along a single arc belt (just like the Milky Way on a Celestial Globe), and the Fungi and Green plant kingdoms are distributed in a similar arc belt. These two arc belts intersect at their respective middle regions and form a cross structure just like a jet aircraft fuselage and its wings. This new mapping method will allow researchers to intuitively interpret the visual information presented in the maps in a highly effective manner. PMID:22776549

  4. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  5. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats.

    PubMed

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H; Koller, Daniel L; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A; Worley, Kim C; Muzny, Donna M; Gibbs, Richard A; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J; Keane, Thomas; Atanur, Santosh S; Aitman, Tim J; Flicek, Paul; Malinauskas, Tomas; Jones, E Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-07-01

    Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating new genes in models of anxiety, heart disease and multiple sclerosis. The relationship between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci, a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show that the extent and spatial pattern of variation in inbred rats differ substantially from those of inbred mice and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. PMID:23708188

  6. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    PubMed Central

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-01-01

    Genetic mapping on fully sequenced individuals is transforming our understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating novel genes in models of anxiety, heart disease and multiple sclerosis. The relation between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show the extent and spatial pattern of variation in inbred rats differ significantly from those of inbred mice, and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. PMID:23708188

  7. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype-phenotype maps.

    PubMed

    Greenbury, S F; Ahnert, S E

    2015-12-01

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype-phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into 'constrained' and 'unconstrained' sequences, in the broadest possible sense. As 'constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. 'Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with 'coding' and 'non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps. PMID:26609063

  8. The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype–phenotype maps

    PubMed Central

    Greenbury, S. F.; Ahnert, S. E.

    2015-01-01

    Biological information is stored in DNA, RNA and protein sequences, which can be understood as genotypes that are translated into phenotypes. The properties of genotype–phenotype (GP) maps have been studied in great detail for RNA secondary structure. These include a highly biased distribution of genotypes per phenotype, negative correlation of genotypic robustness and evolvability, positive correlation of phenotypic robustness and evolvability, shape-space covering, and a roughly logarithmic scaling of phenotypic robustness with phenotypic frequency. More recently similar properties have been discovered in other GP maps, suggesting that they may be fundamental to biological GP maps, in general, rather than specific to the RNA secondary structure map. Here we propose that the above properties arise from the fundamental organization of biological information into ‘constrained' and ‘unconstrained' sequences, in the broadest possible sense. As ‘constrained' we describe sequences that affect the phenotype more immediately, and are therefore more sensitive to mutations, such as, e.g. protein-coding DNA or the stems in RNA secondary structure. ‘Unconstrained' sequences, on the other hand, can mutate more freely without affecting the phenotype, such as, e.g. intronic or intergenic DNA or the loops in RNA secondary structure. To test our hypothesis we consider a highly simplified GP map that has genotypes with ‘coding' and ‘non-coding' parts. We term this the Fibonacci GP map, as it is equivalent to the Fibonacci code in information theory. Despite its simplicity the Fibonacci GP map exhibits all the above properties of much more complex and biologically realistic GP maps. These properties are therefore likely to be fundamental to many biological GP maps. PMID:26609063

  9. A sequencing-based linkage map of cucumber

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genetic maps are important tools for molecular breeding, gene cloning, and study of meiotic recombination. In cucumber (Cucumis sativus L.), the marker density, resolution and genome coverage of previously developed genetic maps using PCR-based molecular markers are relatively low. In this study we ...

  10. A 1,681-locus consensus genetic map of cultivated cucumber including 67 NB-LRR resistance gene homolog and ten gene loci

    PubMed Central

    2013-01-01

    Background Cucumber is an important vegetable crop that is susceptible to many pathogens, but no disease resistance (R) genes have been cloned. The availability of whole genome sequences provides an excellent opportunity for systematic identification and characterization of the nucleotide binding and leucine-rich repeat (NB-LRR) type R gene homolog (RGH) sequences in the genome. Cucumber has a very narrow genetic base making it difficult to construct high-density genetic maps. Development of a consensus map by synthesizing information from multiple segregating populations is a method of choice to increase marker density. As such, the objectives of the present study were to identify and characterize NB-LRR type RGHs, and to develop a high-density, integrated cucumber genetic-physical map anchored with RGH loci. Results From the Gy14 draft genome, 70 NB-containing RGHs were identified and characterized. Most RGHs were in clusters with uneven distribution across seven chromosomes. In silico analysis indicated that all 70 RGHs had EST support for gene expression. Phylogenetic analysis classified 58 RGHs into two clades: CNL and TNL. Comparative analysis revealed high-degree sequence homology and synteny in chromosomal locations of these RGH members between the cucumber and melon genomes. Fifty-four molecular markers were developed to delimit 67 of the 70 RGHs, which were integrated into a genetic map through linkage analysis. A 1,681-locus cucumber consensus map including 10 gene loci and spanning 730.0 cM in seven linkage groups was developed by integrating three component maps with a bin-mapping strategy. Physically, 308 scaffolds with 193.2 Mbp total DNA sequences were anchored onto this consensus map that covered 52.6% of the 367 Mbp cucumber genome. Conclusions Cucumber contains relatively few NB-LRR RGHs that are clustered and unevenly distributed in the genome. All RGHs seem to be transcribed and shared significant sequence homology and synteny with the melon

  11. DNA sequence analyses of blended herbal products including synthetic cannabinoids as designer drugs.

    PubMed

    Ogata, Jun; Uchiyama, Nahoko; Kikura-Hanajiri, Ruri; Goda, Yukihiro

    2013-04-10

    In recent years, various herbal products adulterated with synthetic cannabinoids have been distributed worldwide via the Internet. These herbal products are mostly sold as incense, and advertised as not for human consumption. Although their labels indicate that they contain mixtures of several potentially psychoactive plants, and numerous studies have reported that they contain a variety of synthetic cannabinoids, their exact botanical contents are not always clear. In this study, we investigated the origins of botanical materials in 62 Spice-like herbal products distributed on the illegal drug market in Japan, by DNA sequence analyses and BLAST searches. The nucleotide sequences of four regions were analyzed to identify the origins of each plant species in the herbal mixtures. The sequences of "Damiana" (Turnera diffusa) and Lamiaceae herbs (Mellissa, Mentha and Thymus) were frequently detected in a number of products. However, the sequences of other plant species indicated on the packaging labels were not detected. In a few products, DNA fragments of potent psychotropic plants were found, including marijuana (Cannabis sativa), "Diviner's Sage" (Salvia divinorum) and "Kratom" (Mitragyna speciosa). Their active constituents were also confirmed using gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS), although these plant names were never indicated on the labels. Most plant species identified in the products were different from the plants indicated on the labels. The plant materials would be used mainly as diluents for the psychoactive synthetic compounds, because no reliable psychoactive effects have been reported for most of the identified plants, with the exception of the psychotropic plants named above. PMID:23092848

  12. Cloning, mapping, and sequencing of plasmid R100 traM and finP genes.

    PubMed Central

    Fee, B E; Dempsey, W B

    1986-01-01

    The fertility control gene finP, the transfer gene traM, and the transfer origin, oriT, of plasmid R100 were isolated on a single 1.2-kilobase EcoRV fragment and were then subcloned as HaeIII fragments. The sequence of the 754-base-pair finP-containing fragment is reported here. In addition to the finP gene, the sequence includes all but two bases of the R100 traM open reading frame and apparently all of the leader mRNA sequence and amino end of the traJ gene of R100. The sequence contains two open reading frames which encode small proteins on the opposite strand from the traM and traJ genes. It also shows two sets of inverted repeats that have the characteristics of transcription terminators. One set is positioned as if it was the traM terminator, and the other set, which is downstream from the first, sits in the middle of the leader mRNA sequence for traJ. On the bottom strand, this inverted repeat has the structure of a rho-independent terminator. Other less-stable inverted repeats overlap this second terminator in the same way as is seen in attenuation sequences, and the two separate small open reading frames on the bottom strand also totally overlap the stem of the rho-independent terminator, suggesting that their translation would cause shifting of termination to the bottom strand homolog of the putative traM terminator. The finP gene product was not identified, but the gene was mapped to the sequence which contains the traJ gene. It either overlaps traJ or is antisense to it. PMID:3522549

  13. Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce

    PubMed Central

    2013-01-01

    Background Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment. Results On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However

  14. Construction and Analysis of High-Density Linkage Map Using High-Throughput Sequencing Data

    PubMed Central

    Liu, Min; Liu, Hui; Zeng, Huaping; Deng, Dejing; Xin, Huaigen; Song, Jun; Xu, Chunhua; Sun, Xiaowen; Hou, Xilin; Wang, Xiaowu; Zheng, Hongkun

    2014-01-01

    Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/. PMID:24905985

  15. Physical mapping of complex genomes by sampled sequencing: A theoretical analysis

    SciTech Connect

    Kupfer, K.; Smith, M.; Quackenbush, J.

    1995-05-01

    A method for high-throughput, high-resolution physical mapping of complex genomes and human chromosomes called Genomic Sequence Sampling (GSS) has recently been proposed. This mapping strategy employs high-density cosmid contig assembly over 200-kb to 1-Mb regions of the target genome coupled with DNA sequencing of the cosmid ends. The relative order and spacing of the sequence fragments is determined from the template contig, resulting in a physical map of 1-to 5-kb resolution that contains a substantial portion of the entire sequence at one-pass accuracy. The purpose of this paper is to determine the theoretical parameters for GSS mapping, to evaluate the effectiveness of the contig-building strategy, and to calculate the expected fraction of the target genome that can be recovered as mapped sequence. A novel aspect of the cosmid fingerprinting and contig-building strategy involves determining the orientation of the genomic inserts relative to the cloning vectors, so that the sampled sequence fragments can be mapped with high resolution. The algorithm is based upon complete restriction enzyme digestion, contig assembly by matching fragments, and end-orientation of individual cosmids by determining the best consistent fit of the labeled cosmid end fragments in the consensus restriction map. 32 refs., 7 figs.

  16. A Sequence Based Synteny Map Between Soybean and Arabidopis Thaliana.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In an effort to identify conserved sequences between soybean (Glycine max, L. Merr.) and the model organism Arabidopsis thaliana (thale cress), a series of JAVA-based programs were created that processed and compared 341,619 soybean DNA sequences against A. thaliana chromosomal DNA. A. thaliana DNA ...

  17. Integration of Two Diploid Potato Linkage Maps with the Potato Genome Sequence

    PubMed Central

    Felcher, Kimberly J.; Coombs, Joseph J.; Massa, Alicia N.; Hansey, Candice N.; Hamilton, John P.; Veilleux, Richard E.; Buell, C. Robin; Douches, David S.

    2012-01-01

    To facilitate genome-guided breeding in potato, we developed an 8303 Single Nucleotide Polymorphism (SNP) marker array using potato genome and transcriptome resources. To validate the Infinium 8303 Potato Array, we developed linkage maps from two diploid populations (DRH and D84) and compared these maps with the assembled potato genome sequence. Both populations used the doubled monoploid reference genotype DM1-3 516 R44 as the female parent but had different heterozygous diploid male parents (RH89-039-16 and 84SD22). Over 4,400 markers were mapped (1,960 in DRH and 2,454 in D84, 787 in common) resulting in map sizes of 965 (DRH) and 792 (D84) cM, covering 87% (DRH) and 88% (D84) of genome sequence length. Of the mapped markers, 33.5% were in candidate genes selected for the array, 4.5% were markers from existing genetic maps, and 61% were selected based on distribution across the genome. Markers with distorted segregation ratios occurred in blocks in both linkage maps, accounting for 4% (DRH) and 9% (D84) of mapped markers. Markers with distorted segregation ratios were unique to each population with blocks on chromosomes 9 and 12 in DRH and 3, 4, 6 and 8 in D84. Chromosome assignment of markers based on linkage mapping differed from sequence alignment with the Potato Genome Sequencing Consortium (PGSC) pseudomolecules for 1% of the mapped markers with some disconcordant markers attributable to paralogs. In total, 126 (DRH) and 226 (D84) mapped markers were not anchored to the pseudomolecules and provide new scaffold anchoring data to improve the potato genome assembly. The high degree of concordance between the linkage maps and the pseudomolecules demonstrates both the quality of the potato genome sequence and the functionality of the Infinium 8303 Potato Array. The broad genome coverage of the Infinium 8303 Potato Array compared to other marker sets will enable numerous downstream applications. PMID:22558443

  18. Integration of two diploid potato linkage maps with the potato genome sequence.

    PubMed

    Felcher, Kimberly J; Coombs, Joseph J; Massa, Alicia N; Hansey, Candice N; Hamilton, John P; Veilleux, Richard E; Buell, C Robin; Douches, David S

    2012-01-01

    To facilitate genome-guided breeding in potato, we developed an 8303 Single Nucleotide Polymorphism (SNP) marker array using potato genome and transcriptome resources. To validate the Infinium 8303 Potato Array, we developed linkage maps from two diploid populations (DRH and D84) and compared these maps with the assembled potato genome sequence. Both populations used the doubled monoploid reference genotype DM1-3 516 R44 as the female parent but had different heterozygous diploid male parents (RH89-039-16 and 84SD22). Over 4,400 markers were mapped (1,960 in DRH and 2,454 in D84, 787 in common) resulting in map sizes of 965 (DRH) and 792 (D84) cM, covering 87% (DRH) and 88% (D84) of genome sequence length. Of the mapped markers, 33.5% were in candidate genes selected for the array, 4.5% were markers from existing genetic maps, and 61% were selected based on distribution across the genome. Markers with distorted segregation ratios occurred in blocks in both linkage maps, accounting for 4% (DRH) and 9% (D84) of mapped markers. Markers with distorted segregation ratios were unique to each population with blocks on chromosomes 9 and 12 in DRH and 3, 4, 6 and 8 in D84. Chromosome assignment of markers based on linkage mapping differed from sequence alignment with the Potato Genome Sequencing Consortium (PGSC) pseudomolecules for 1% of the mapped markers with some disconcordant markers attributable to paralogs. In total, 126 (DRH) and 226 (D84) mapped markers were not anchored to the pseudomolecules and provide new scaffold anchoring data to improve the potato genome assembly. The high degree of concordance between the linkage maps and the pseudomolecules demonstrates both the quality of the potato genome sequence and the functionality of the Infinium 8303 Potato Array. The broad genome coverage of the Infinium 8303 Potato Array compared to other marker sets will enable numerous downstream applications. PMID:22558443

  19. Construction of an Integrated High Density Simple Sequence Repeat Linkage Map in Cultivated Strawberry (Fragaria × ananassa) and its Applicability

    PubMed Central

    Isobe, Sachiko N.; Hirakawa, Hideki; Sato, Shusei; Maeda, Fumi; Ishikawa, Masami; Mori, Toshiki; Yamamoto, Yuko; Shirasawa, Kenta; Kimura, Mitsuhiro; Fukami, Masanobu; Hashizume, Fujio; Tsuji, Tomoko; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Tsuruoka, Hisano; Minami, Chiharu; Takahashi, Chika; Wada, Tsuyuko; Ono, Akiko; Kawashima, Kumiko; Nakazaki, Naomi; Kishida, Yoshie; Kohara, Mitsuyo; Nakayama, Shinobu; Yamada, Manabu; Fujishiro, Tsunakazu; Watanabe, Akiko; Tabata, Satoshi

    2013-01-01

    The cultivated strawberry (Fragaria× ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family whose genomic architecture is still controversial. Several recent studies support the AAA′A′BBB′B′ model, but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this difficulty and to assist genome-wide analysis of F. × ananassa, we constructed an integrated linkage map by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR markers] derived from F. vesca ESTs, 603 markers (F. × ananassa EST-derived SSR markers) from F. × ananassa ESTs, and 125 markers (F. × ananassa transcriptome-derived SSR markers) from F. × ananassa transcripts. Along with the previously published SSR markers, these markers were mapped onto five parent-specific linkage maps derived from three mapping populations, which were then assembled into an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of F. × ananassa and the genome of F. vesca. Variety distinction on 129 F. × ananassa lines was demonstrated using 45 selected SSR markers. PMID:23248204

  20. CloudMap: a cloud-based pipeline for analysis of mutant genome sequences.

    PubMed

    Minevich, Gregory; Park, Danny S; Blankenberg, Daniel; Poole, Richard J; Hobert, Oliver

    2012-12-01

    Whole genome sequencing (WGS) allows researchers to pinpoint genetic differences between individuals and significantly shortcuts the costly and time-consuming part of forward genetic analysis in model organism systems. Currently, the most effort-intensive part of WGS is the bioinformatic analysis of the relatively short reads generated by second generation sequencing platforms. We describe here a novel, easily accessible and cloud-based pipeline, called CloudMap, which greatly simplifies the analysis of mutant genome sequences. Available on the Galaxy web platform, CloudMap requires no software installation when run on the cloud, but it can also be run locally or via Amazon's Elastic Compute Cloud (EC2) service. CloudMap uses a series of predefined workflows to pinpoint sequence variations in animal genomes, such as those of premutagenized and mutagenized Caenorhabditis elegans strains. In combination with a variant-based mapping procedure, CloudMap allows users to sharply define genetic map intervals graphically and to retrieve very short lists of candidate variants with a few simple clicks. Automated workflows and extensive video user guides are available to detail the individual analysis steps performed (http://usegalaxy.org/cloudmap). We demonstrate the utility of CloudMap for WGS analysis of C. elegans and Arabidopsis genomes and describe how other organisms (e.g., Zebrafish and Drosophila) can easily be accommodated by this software platform. To accommodate rapid analysis of many mutants from large-scale genetic screens, CloudMap contains an in silico complementation testing tool that allows users to rapidly identify instances where multiple alleles of the same gene are present in the mutant collection. Lastly, we describe the application of a novel mapping/WGS method ("Variant Discovery Mapping") that does not rely on a defined polymorphic mapping strain, and we integrate the application of this method into CloudMap. CloudMap tools and documentation are

  1. Infrared Mapping of the Dust Around Main Sequence Stars

    NASA Technical Reports Server (NTRS)

    Heinrichsen, I.; Walker, H.; Klaas, U.; Sylvester, R.

    1998-01-01

    The photopolarimeter on ISO (ISOPHOT) has been used to investigate the dust discs around the four prototype Vega-like stars and several main sequence stars with excess infrared emission from IRAS data.

  2. cDNA encoding a polypeptide including a hev ein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  3. REPEATED SEQUENCES INCLUDING RS1100 FROM PSEUDOMONAS CEPACIA AC100 FUNCTION AS IS ELEMENTS

    EPA Science Inventory

    Several lines of evidence were obtained that the previously identified, repeated sequence RS1100 of Pseudomonas cepacia strain AC1100 undergoes transposition events. NA sequences flanking the chlorohydroxy hydroquinone (CHQ) degradative genes of this organism were examined from s...

  4. A Time Sequence-Oriented Concept Map Approach to Developing Educational Computer Games for History Courses

    ERIC Educational Resources Information Center

    Chu, Hui-Chun; Yang, Kai-Hsiang; Chen, Jing-Hong

    2015-01-01

    Concept maps have been recognized as an effective tool for students to organize their knowledge; however, in history courses, it is important for students to learn and organize historical events according to the time of their occurrence. Therefore, in this study, a time sequence-oriented concept map approach is proposed for developing a game-based…

  5. A high-resolution cattle CNV map by population-scale genome sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. Prior studies in cattle have produced low-resolution CNV maps. We constructed a draft, high-resolution map of cattle CNVs based on whole genome sequencing data from 7...

  6. Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitat...

  7. Sex Differences in Infants' Mapping of Complex Occlusion Sequences: Further Evidence

    ERIC Educational Resources Information Center

    Wilcox, Teresa

    2007-01-01

    Recently, infant researchers have reported sex differences in infants' capacity to map their representation of an occlusion sequence onto a subsequent no-occlusion display. The research reported here sought to identify the extent to which these sex differences are observed in event-mapping tasks and to identify the underlying basis for these…

  8. Construction of a SNP and SSR linkage map in autotetraploid blueberry using genotyping by sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A mapping population developed from a cross between two key highbush blueberry cultivars, Draper × Jewel (Vaccinium corymbosum), segregating for a number of important phenotypic traits, has been utilized to produce a genetic linkage map. Data on 233 single sequence repeat (SSR) markers and 1794 sing...

  9. Dynamical Maps of a Class of Sl+1=Sl-1 Sln Fibonacci Sequences

    NASA Astrophysics Data System (ADS)

    Zhao, Baohua; Liu, Tianshi

    1991-12-01

    The dynamical properties of a class of Sl+1=Sl-1Sln Fibonacci sequences are discussed following K. K. T. method. The recursion relations for the dynamical maps are derived. Linearisation about their fixed points yields the Jacobian matrix of the mapping and the eigenvalues for the fixed points.

  10. A Physical Map, Including a BAC/PAC Clone Contig, of the Williams-Beuren Syndrome–Deletion Region at 7q11.23

    PubMed Central

    Peoples, Risa; Franke, Yvonne; Wang, Yu-Ker; Pérez-Jurado, Luis; Paperna, Tamar; Cisco, Michael; Francke, Uta

    2000-01-01

    Summary Williams-Beuren syndrome (WBS) is a developmental disorder caused by haploinsufficiency for genes in a 2-cM region of chromosome band 7q11.23. With the exception of vascular stenoses due to deletion of the elastin gene, the various features of WBS have not yet been attributed to specific genes. Although ⩾16 genes have been identified within the WBS deletion, completion of a physical map of the region has been difficult because of the large duplicated regions flanking the deletion. We present a physical map of the WBS deletion and flanking regions, based on assembly of a bacterial artificial chromosome/P1-derived artificial chromosome contig, analysis of high-throughput genome-sequence data, and long-range restriction mapping of genomic and cloned DNA by pulsed-field gel electrophoresis. Our map encompasses 3 Mb, including 1.6 Mb within the deletion. Two large duplicons, flanking the deletion, of ⩾320 kb contain unique sequence elements from the internal border regions of the deletion, such as sequences from GTF2I (telomeric) and FKBP6 (centromeric). A third copy of this duplicon exists in inverted orientation distal to the telomeric flanking one. These duplicons show stronger sequence conservation with regard to each other than to the presumptive ancestral loci within the common deletion region. Sequence elements originating from beyond 7q11.23 are also present in these duplicons. Although the duplicons are not present in mice, the order of the single-copy genes in the conserved syntenic region of mouse chromosome 5 is inverted relative to the human map. A model is presented for a mechanism of WBS-deletion formation, based on the orientation of duplicons' components relative to each other and to the ancestral elements within the deletion region. PMID:10631136

  11. Optimization of Brain T2 Mapping Using Standard CPMG Sequence In A Clinical Scanner

    NASA Astrophysics Data System (ADS)

    Hnilicová, P.; Bittšanský, M.; Dobrota, D.

    2014-04-01

    In magnetic resonance imaging, transverse relaxation time (T2) mapping is a useful quantitative tool enabling enhanced diagnostics of many brain pathologies. The aim of our study was to test the influence of different sequence parameters on calculated T2 values, including multi-slice measurements, slice position, interslice gap, echo spacing, and pulse duration. Measurements were performed using standard multi-slice multi-echo CPMG imaging sequence on a 1.5 Tesla routine whole body MR scanner. We used multiple phantoms with different agarose concentrations (0 % to 4 %) and verified the results on a healthy volunteer. It appeared that neither the pulse duration, the size of interslice gap nor the slice shift had any impact on the T2. The measurement accuracy was increased with shorter echo spacing. Standard multi-slice multi-echo CPMG protocol with the shortest echo spacing, also the smallest available interslice gap (100 % of slice thickness) and shorter pulse duration was found to be optimal and reliable for calculating T2 maps in the human brain.

  12. Ioncopy: a novel method for calling copy number alterations in amplicon sequencing data including significance assessment

    PubMed Central

    Budczies, Jan; Pfarr, Nicole; Stenzinger, Albrecht; Treue, Denise; Endris, Volker; Ismaeel, Fakher; Bangemann, Nikola; Blohmer, Jens-Uwe; Dietel, Manfred; Loibl, Sibylle; Klauschen, Frederick; Weichert, Wilko; Denkert, Carsten

    2016-01-01

    Recently, it has been demonstrated that calling of copy number alterations (CNAs) from amplicon sequencing (AS) data is feasible. Most approaches, however, require non-tumor (germline) DNA for data normalization. Here, we present the method Ioncopy for CNA detection which requires no normal controls and includes a significance assessment for each detected alteration. Ioncopy was evaluated in a cohort of 184 clinically annotated breast carcinomas. A total number of 252 amplifications were detected, of which 183 (72.6%) could be validated by a call of an additional amplicon interrogating the same gene. Moreover, a total number of 33 deletions were found, whereof 27 (81.8%) could be validated. Analyzing the 16 most frequently amplified genes, validation rates of over 89% could be achieved for 11 of these genes. 11 of the top 16 genes showed significant overexpression in the amplified tumors. 89.5% of the HER2-amplified tumors were GRB7 and STARD3 co-amplified, whereas 68.4% of the HER2-amplified tumors had additional MED1 amplifications. Correlations between CNAs measured by amplicons in HER2 exons 19, 20 and 21 were strong (all R > 0.93). AS based detection of HER2 amplifications had a sensitivity of 90.0% and a specificity of 98.8% compared to the gold standard of HER2 immunohistochemistry combined with in situ hybridization. In summary, we developed and validated a novel method for detection and significance assessment of CNAs in amplicon sequencing data. Using Ioncopy, AS offers a straightforward and efficient approach to simultaneously analyze gene amplifications and gene deletions together with simple somatic mutations in a single assay. PMID:26910888

  13. Sequence of mammalian fossils, including hominoid teeth, from the Bubing Basin caves, South China.

    PubMed

    Wang, Wei; Potts, Richard; Baoyin, Yuan; Huang, Weiwen; Cheng, Hai; Edwards, R Lawrence; Ditchfield, Peter

    2007-04-01

    A Plio-Pleistocene to Holocene faunal sequence has been recovered from four carefully excavated caves in the Bubing Basin, adjacent to the larger Bose Basin of South China. The caves vary in elevation; we suggest that the higher caves were formed and filled with sediments prior to the lower caves. The highest deposits, which are from Mohui Cave, contain hominoid teeth and other fossilized remains of mammalian taxa most similar to late Pliocene and early Pleistocene faunas. Wuyun Cave ( approximately 50m lower in elevation than Mohui) contains a late middle Pleistocene fauna, which is supported by U-series age constraints from 350 to 200ka. Lower Pubu Cave ( approximately 23m below Wuyun) is assigned to the late Pleistocene, while the Cunkong Cave (the lowest, approximately 2m lower elevation than Lower Pubu) preserves a Holocene fauna. The four faunal assemblages indicate species-level changes in Ailuropoda, Stegodon, and Sus, the appearance of Elephas, the local disappearance of Stegodon, and the migration of Equus hemionus to South China. These initial results of our work call into question the continued value of the Stegodon/Ailuropoda Fauna, a category long used to characterize the Pleistocene faunas of South China. Excavation of karstic caves of varying elevation within the basins of South China holds promise for defining local sequences of mammalian fossils that can be used to investigate faunal variations related to climate change, biogeographic events, and evolutionary change over the past two million years. Stable isotopic analysis of a small sample of mammalian teeth from Bubing Basin caves is consistent with 100% C(3) vegetation in the Bubing/Bose region, with certain delta(13)C values consistent with a canopied woodland or forest. A preliminary assessment of the hominoid teeth indicates the presence of diverse molar and premolar morphologies including dental remains of Gigantopithecus blacki and a sample with similarities to the teeth reported from

  14. A quantitative trait locus for variation in dopamine metabolism mapped in a primate model using reference sequences from related species

    PubMed Central

    Freimer, Nelson B.; Service, Susan K.; Ophoff, Roel A.; Jasinska, Anna J.; McKee, Kevin; Villeneuve, Amelie; Belisle, Alexandre; Bailey, Julia N.; Breidenthal, Sherry E.; Jorgensen, Matthew J.; Mann, J. John; Cantor, Rita M.; Dewar, Ken; Fairbanks, Lynn A.

    2007-01-01

    Non-human primates (NHP) provide crucial research models. Their strong similarities to humans make them particularly valuable for understanding complex behavioral traits and brain structure and function. We report here the genetic mapping of an NHP nervous system biologic trait, the cerebrospinal fluid (CSF) concentration of the dopamine metabolite homovanillic acid (HVA), in an extended inbred vervet monkey (Chlorocebus aethiops sabaeus) pedigree. CSF HVA is an index of CNS dopamine activity, which is hypothesized to contribute substantially to behavioral variations in NHP and humans. For quantitative trait locus (QTL) mapping, we carried out a two-stage procedure. We first scanned the genome using a first-generation genetic map of short tandem repeat markers. Subsequently, using >100 SNPs within the most promising region identified by the genome scan, we mapped a QTL for CSF HVA at a genome-wide level of significance (peak logarithm of odds score >4) to a narrow well delineated interval (<10 Mb). The SNP discovery exploited conserved segments between human and rhesus macaque reference genome sequences. Our findings demonstrate the potential of using existing primate reference genome sequences for designing high-resolution genetic analyses applicable across a wide range of NHP species, including the many for which full genome sequences are not yet available. Leveraging genomic information from sequenced to nonsequenced species should enable the utilization of the full range of NHP diversity in behavior and disease susceptibility to determine the genetic basis of specific biological and behavioral traits. PMID:17884980

  15. Toward a physical map of Drosophila buzzatii. Use of randomly amplified polymorphic dna polymorphisms and sequence-tagged site landmarks.

    PubMed Central

    Laayouni, H; Santos, M; Fontdevila, A

    2000-01-01

    We present a physical map based on RAPD polymorphic fragments and sequence-tagged sites (STSs) for the repleta group species Drosophila buzzatii. One hundred forty-four RAPD markers have been used as probes for in situ hybridization to the polytene chromosomes, and positive results allowing the precise localization of 108 RAPDs were obtained. Of these, 73 behave as effectively unique markers for physical map construction, and in 9 additional cases the probes gave two hybridization signals, each on a different chromosome. Most markers (68%) are located on chromosomes 2 and 4, which partially agree with previous estimates on the distribution of genetic variation over chromosomes. One RAPD maps close to the proximal breakpoint of inversion 2z(3) but is not included within the inverted fragment. However, it was possible to conclude from this RAPD that the distal breakpoint of 2z(3) had previously been wrongly assigned. A total of 39 cytologically mapped RAPDs were converted to STSs and yielded an aggregate sequence of 28,431 bp. Thirty-six RAPDs (25%) did not produce any detectable hybridization signal, and we obtained the DNA sequence from three of them. Further prospects toward obtaining a more developed genetic map than the one currently available for D. buzzatii are discussed. PMID:11102375

  16. Human insulin genome sequence map, biochemical structure of insulin for recombinant DNA insulin.

    PubMed

    Chakraborty, Chiranjib; Mungantiwar, Ashish A

    2003-08-01

    Insulin is a essential molecule for type I diabetes that is marketed by very few companies. It is the first molecule, which was made by recombinant technology; but the commercialization process is very difficult. Knowledge about biochemical structure of insulin and human insulin genome sequence map is pivotal to large scale manufacturing of recombinant DNA Insulin. This paper reviews human insulin genome sequence map, the amino acid sequence of porcine insulin, crystal structure of porcine insulin, insulin monomer, aggregation surfaces of insulin, conformational variation in the insulin monomer, insulin X-ray structures for recombinant DNA technology in the synthesis of human insulin in Escherichia coli. PMID:12769691

  17. Towards a transcription map of human chromosome 21: Identification of expressed sequences by exon trapping

    SciTech Connect

    Chen, H.M.; Chrast, R.; Rossier, C.

    1994-09-01

    Chromosome 21q contains about 1% of the human genome, and when triplicated is responsible for Down syndrome. The genetic and physical maps of this chromosome are amongst the most developed of all human chromosomes. A considerable international effort is now under way with the aims of cloning and mapping all chromosome 21 genes, assigning functions, and determining their involvement in disease phenotypes. We have used exon trapping/amplification methods to identify exons of genes that map on chromosome 21. EcoR1 or Bam HI-digested DNA from pools of 96 cosmids from the chromosome 21 library LL21NC02{open_quotes}Q{close_quotes} were used for cloning in vector pSLP3 (after elimination of cosmids positive for ribosomal RNR genes and mouse DNA); recombinant plasmids were transfected into cos7 cells and trapped sequences were subcloned. False positive clones, i.e. those containing vector self-spliced sequences (which represented between 8-30% of clones in different experiments), have been eliminated by hybridization of oligonucleotides corresponding to sequences of the vector self-spliced events. More than 100 different trapped {open_quotes}exons{close_quotes} have been identified to date after single or double pass sequencing. Two sequences matched exons of known genes on chromosome 21 (COL6A 1 and MX1). About 45% of the sequences were entirely new, i.e. there was no homology with entries in the nucleotide or protein databases (blastin and blastx searches). An additional 48% of the sequences were homologous but not identical to sequences in the databases. Only 4% were repetitive elements. Specific homologies will be presented. All of the trapped sequences that have been mapped by filter hybridization, PCR, or FISH, map back to cosmids or YACs of chromosome 21. This approach permits rapid identification of expressed sequences of this chromosome, the cloning of its genes, and the understanding of its disorders.

  18. High-density linkage map construction and mapping of seed trait QTLs in chickpea (Cicer arietinum L.) using Genotyping-by-Sequencing (GBS)

    PubMed Central

    Verma, Subodh; Gupta, Shefali; Bandhiwal, Nitesh; Kumar, Tapan; Bharadwaj, Chellapilla; Bhatia, Sabhyata

    2015-01-01

    This study reports the use of Genotyping-by-Sequencing (GBS) for large-scale SNP discovery and simultaneous genotyping of recombinant inbred lines (RILs) of an intra-specific mapping population of chickpea contrasting for seed traits. A total of 119,672 raw SNPs were discovered, which after stringent filtering revealed 3,977 high quality SNPs of which 39.5% were present in genic regions. Comparative analysis using physically mapped marker loci revealed a higher degree of synteny with Medicago in comparison to soybean. The SNP genotyping data was utilized to construct one of the most saturated intra-specific genetic linkage maps of chickpea having 3,363 mapped positions including 3,228 SNPs on 8 linkage groups spanning 1006.98 cM at an average inter marker distance of 0.33 cM. The map was utilized to identify 20 quantitative trait loci (QTLs) associated with seed traits accounting for phenotypic variations ranging from 9.97% to 29.71%. Analysis of the genomic sequence corresponding to five robust QTLs led to the identification of 684 putative candidate genes whose expression profiling revealed that 101 genes exhibited seed specific expression. The integrated approach utilizing the identified QTLs along with the available genome and transcriptome could serve as a platform for candidate gene identification for molecular breeding of chickpea. PMID:26631981

  19. High-density linkage map construction and mapping of seed trait QTLs in chickpea (Cicer arietinum L.) using Genotyping-by-Sequencing (GBS).

    PubMed

    Verma, Subodh; Gupta, Shefali; Bandhiwal, Nitesh; Kumar, Tapan; Bharadwaj, Chellapilla; Bhatia, Sabhyata

    2015-01-01

    This study reports the use of Genotyping-by-Sequencing (GBS) for large-scale SNP discovery and simultaneous genotyping of recombinant inbred lines (RILs) of an intra-specific mapping population of chickpea contrasting for seed traits. A total of 119,672 raw SNPs were discovered, which after stringent filtering revealed 3,977 high quality SNPs of which 39.5% were present in genic regions. Comparative analysis using physically mapped marker loci revealed a higher degree of synteny with Medicago in comparison to soybean. The SNP genotyping data was utilized to construct one of the most saturated intra-specific genetic linkage maps of chickpea having 3,363 mapped positions including 3,228 SNPs on 8 linkage groups spanning 1006.98 cM at an average inter marker distance of 0.33 cM. The map was utilized to identify 20 quantitative trait loci (QTLs) associated with seed traits accounting for phenotypic variations ranging from 9.97% to 29.71%. Analysis of the genomic sequence corresponding to five robust QTLs led to the identification of 684 putative candidate genes whose expression profiling revealed that 101 genes exhibited seed specific expression. The integrated approach utilizing the identified QTLs along with the available genome and transcriptome could serve as a platform for candidate gene identification for molecular breeding of chickpea. PMID:26631981

  20. Alignment of Escherichia coli K12 DNA sequences to a genomic restriction map.

    PubMed Central

    Rudd, K E; Miller, W; Ostell, J; Benson, D A

    1990-01-01

    We use the extensive published information describing the genome of Escherichia coli and new restriction map alignment software to align DNA sequence, genetic, and physical maps. Restriction map alignment software is used which considers restriction maps as strings analogous to DNA or protein sequences except that two values, enzyme name and DNA base address, are associated with each position on the string. The resulting alignments reveal a nearly linear relationship between the physical and genetic maps of the E. coli chromosome. Physical map comparisons with the 1976, 1980, and 1983 genetic maps demonstrate a better fit with the more recent maps. The results of these alignments are genomic kilobase coordinates, orientation and rank of the alignment that best fits the genetic data. A statistical measure based on extreme value distribution is applied to the alignments. Additional computer analyses allow us to estimate the accuracy of the published E. coli genomic restriction map, simulate rearrangements of the bacterial chromosome, and search for repetitive DNA. The procedures we used are general enough to be applicable to other genome mapping projects. PMID:2183179

  1. ZOOM Lite: next-generation sequencing data mapping and visualization software

    PubMed Central

    Zhang, Zefeng; Lin, Hao; Ma, Bin

    2010-01-01

    High-throughput next-generation sequencing technologies pose increasing demands on the efficiency, accuracy and usability of data analysis software. In this article, we present ZOOM Lite, a software for efficient reads mapping and result visualization. With a kernel capable of mapping tens of millions of Illumina or AB SOLiD sequencing reads efficiently and accurately, and an intuitive graphical user interface, ZOOM Lite integrates reads mapping and result visualization into a easy to use pipeline on desktop PC. The software handles both single-end and paired-end reads, and can output both the unique mapping result or the top N mapping results for each read. Additionally, the software takes a variety of input file formats and outputs to several commonly used result formats. The software is freely available at http://bioinfor.com/zoom/lite/. PMID:20530531

  2. Tablet: Visualizing Next-Generation Sequence Assemblies and Mappings.

    PubMed

    Milne, Iain; Bayer, Micha; Stephen, Gordon; Cardle, Linda; Marshall, David

    2016-01-01

    This chapter is designed to be a practical guide to using Tablet for the visualization of next/second-generation (NGS) sequencing data. NGS data is being produced more frequently and in greater data volumes every year. As such, it is increasingly important to have tools which enable biologists and bioinformaticians to understand and gain key insights into their data. Visualization can play a key role in the exploration of such data as well as aid in the visual validation of sequence assemblies and features such as single nucleotide polymorphisms (SNPs). We aim to show several use cases which demonstrate Tablet's ability to visually highlight various situations of interest which can arise in NGS data. PMID:26519411

  3. Human phosphoribosylformylglycineamide amidotransferase (FGARAT): regional mapping, complete coding sequence, isolation of a functional genomic clone, and DNA sequence analysis.

    PubMed

    Patterson, D; Bleskan, J; Gardiner, K; Bowersox, J

    1999-11-01

    Purines play essential roles in many cellular functions, including DNA replication, transcription, intra- and extra-cellular signaling, energy metabolism, and as coenzymes for many biochemical reactions. The de-novo synthesis of purines requires 10 enzymatic steps for the production of inosine monophosphate (IMP). Defects in purine metabolism are associated with human diseases. Further, many anticancer agents function as inhibitors of the de-novo biosynthetic pathway. Genes or cDNAs for most of the enzymes comprising this pathway have been isolated from humans or other mammals. One notable exception is the phosphoribosylformylglycineamide amidotransferase (FGARAT) gene, which encodes the fourth step of this pathway. This gene has been cloned from numerous microorganisms and from Drosophila melanogaster and C. elegans. We report here the identification of a human cDNA containing the coding region of the FGARAT mRNA and the isolation of a P1 clone that contains an intact human FGARAT gene. The P1 clone corrects the purine auxotrophy and protein deficiency of Chinese hamster ovary (CHO) cell mutants (AdeB) deficient in both the activity and the protein for FGARAT. The P1 clone was used to regionally map the FGARAT gene to chromosome region 17p13, a location consistent with our prior assignment of this gene to chromosome 17. A comparison of the DNA sequence of the human FGARAT and FGARAT DNA sequence from 17 other organisms is reported. The isolation of this gene means that DNA clones for all the 10 steps of IMP synthesis have been isolated from humans or other mammals. PMID:10548741

  4. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

    PubMed

    Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

    2016-09-01

    Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits. PMID:27545715

  5. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    PubMed Central

    2013-01-01

    Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which

  6. PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds

    PubMed Central

    Chen, Yangho; Souaiaia, Tade; Chen, Ting

    2009-01-01

    Motivation: The explosion of next-generation sequencing data has spawned the design of new algorithms and software tools to provide efficient mapping for different read lengths and sequencing technologies. In particular, ABI's sequencer (SOLiD system) poses a big computational challenge with its capacity to produce very large amounts of data, and its unique strategy of encoding sequence data into color signals. Results: We present the mapping software, named PerM (Periodic Seed Mapping) that uses periodic spaced seeds to significantly improve mapping efficiency for large reference genomes when compared with state-of-the-art programs. The data structure in PerM requires only 4.5 bytes per base to index the human genome, allowing entire genomes to be loaded to memory, while multiple processors simultaneously map reads to the reference. Weight maximized periodic seeds offer full sensitivity for up to three mismatches and high sensitivity for four and five mismatches while minimizing the number random hits per query, significantly speeding up the running time. Such sensitivity makes PerM a valuable mapping tool for SOLiD and Solexa reads. Availability: http://code.google.com/p/perm/ Contact: tingchen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19675096

  7. Ion Torrent sequencing for conducting genome-wide scans for mutation mapping analysis.

    PubMed

    Damerla, Rama Rao; Chatterjee, Bishwanath; Li, You; Francis, Richard J B; Fatakia, Sarosh N; Lo, Cecilia W

    2014-04-01

    Mutation mapping in mice can be readily accomplished by genome wide segregation analysis of polymorphic DNA markers. In this study, we showed the efficacy of Ion Torrent next generation sequencing for conducting genome-wide scans to map and identify a mutation causing congenital heart disease in a mouse mutant, Bishu, recovered from a mouse mutagenesis screen. The Bishu mutant line generated in a C57BL/6J (B6) background was intercrossed with another inbred strain, C57BL/10J (B10), and the resulting B6/B10 hybrid offspring were intercrossed to generate mutants used for the mapping analysis. For each mutant sample, a panel of 123 B6/B10 polymorphic SNPs distributed throughout the mouse genome was PCR amplified, bar coded, and then pooled to generate a single library used for Ion Torrent sequencing. Sequencing carried out using the 314 chip yielded >600,000 usable reads. These were aligned and mapped using a custom bioinformatics pipeline. Each SNP was sequenced to a depth >500×, allowing accurate automated calling of the B6/B10 genotypes. This analysis mapped the mutation in Bishu to an interval on the proximal region of mouse chromosome 4. This was confirmed by parallel capillary sequencing of the 123 polymorphic SNPs. Further analysis of genes in the map interval identified a splicing mutation in Dnaic1(c.204+1G>A), an intermediate chain dynein, as the disease causing mutation in Bishu. Overall, our experience shows Ion Torrent amplicon sequencing is high throughput and cost effective for conducting genome-wide mapping analysis and is easily scalable for other high volume genotyping analyses. PMID:24306492

  8. A Topographic Image Map of the Sabrina Valles Region Including Information on Large Martian Impact Craters

    NASA Astrophysics Data System (ADS)

    Gehrke, S.; Köhring, R.; Barlow, N. G.; Gwinner, K.; Scholten, F.; Lehmann, H.; Albertz, J.

    2007-03-01

    The Catalog of Large Martian Impact Craters provides detailed information on 42,283 craters >5 km; it is planned to be integrated in the Topographic Image Map Mars 1:200,000 series. Such an update is shown in a special target map, based on HRSC data.

  9. Molecular phylogeny of horsetails (Equisetum) including chloroplast atpB sequences.

    PubMed

    Guillon, Jean-Michel

    2007-07-01

    Equisetum is a genus of 15 extant species that are the sole surviving representatives of the class Sphenopsida. The generally accepted taxonomy of Equisetum recognizes two subgenera: Equisetum and Hippochaete. Two recent phylogenetical studies have independently questioned the monophyly of subgenus Equisetum. Here, I use original (atpB) and published (rbcL, trnL-trnF, rps4) sequence data to investigate the phylogeny of the genus. Analyses of atpB sequences give an unusual topology, with E. bogotense branching within Hippochaete. A Bayesian analysis based on all available sequences yields a tree with increased resolution, favoring the sister relationships of E. bogotense with subgenus Hippochaete. PMID:17476459

  10. M2SG: mapping human disease-related genetic variants to protein sequences and genomic loci

    PubMed Central

    Ji, Renkai; Cong, Qian; Li, Wenlin; Grishin, Nick V.

    2013-01-01

    Summary: Online Mendelian Inheritance in Man (OMIM) is a manually curated compendium of human genetic variants and the corresponding phenotypes, mostly human diseases. Instead of directly documenting the native sequences for gene entries, OMIM links its entries to protein and DNA sequences in other databases. However, because of the existence of gene isoforms and errors in OMIM records, mapping a specific OMIM mutation to its corresponding protein sequence is not trivial. Combining computer programs and extensive manual curation of OMIM full-text descriptions and original literature, we mapped 98% of OMIM amino acid substitutions (AASs) and all SwissProt Variant (SwissVar) disease-related AASs to reference sequences and confidently mapped 99.96% of all AASs to the genomic loci. Based on the results, we developed an online database and interactive web server (M2SG) to (i) retrieve the mapped OMIM and SwissVar variants for a given protein sequence; and (ii) obtain related proteins and mutations for an input disease phenotype. This database will be useful for analyzing sequences, understanding the effect of mutations, identifying important genetic variations and designing experiments on a protein of interest. Availability and implementation: The database and web server are freely available at http://prodata.swmed.edu/M2S/mut2seq.cgi. Contact: grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24002112

  11. Synchronous imitation of continuous action sequences: The role of spatial and topological mapping.

    PubMed

    Ramenzoni, Verónica C; Sebanz, Natalie; Knoblich, Günther

    2015-10-01

    What are the mapping mechanisms that enable people to synchronously imitate continuous action sequences observed in others? We investigated this question in 4 experiments that used a tapping task where participants synchronously performed alternating bimanual hand movements with a model presented in an egocentric or allocentric orientation. Their task was to tap in synchrony, with each hand matching the movements of the ipsilateral model hand as closely as possible. The results show that automatic establishment of topological mappings, where the performer's hand is mapped onto the model's anatomically matching hand even if the 2 are spatially misaligned, can interfere with maintaining spatial mappings (Experiments 1 and 2). The interference was particularly strong in musicians who have expertise in establishing topological mappings in continuous performance (Experiment 4). Adopting an unusual body posture greatly interfered with establishing spatial as well as topological mappings (Experiment 3). Together, the results suggest that synchronous imitation of continuous action sequences depends on flexible predictive models that simultaneously apply spatial and topological mapping constraints to enable an actor to act in synchrony with observed action sequences. PMID:26052697

  12. Comparative chromosome mapping of repetitive sequences. Implications for genomic evolution in the fish, Hoplias malabaricus

    PubMed Central

    Cioffi, Marcelo B; Martins, Cesar; Bertollo, Luiz AC

    2009-01-01

    Background Seven karyomorphs of the fish, Hoplias malabaricus (A-G) were previously included in two major groups, Group I (A, B, C, D) and Group II (E, F, G), based on their similar karyotype structure. In this paper, karyomorphs from Group I were analyzed by means of distinct chromosomal markers, including silver-stained nucleolar organizer regions (Ag-NORs) and chromosomal location of repetitive sequences (18S and 5S rDNA, and satellite 5SHindIII-DNA), through fluorescence in situ hybridization (FISH), in order to evaluate the evolutionary relationships among them. Results The results showed that several chromosomal markers had conserved location in the four karyomorphs. In addition, some other markers were only conserved in corresponding chromosomes of karyomorphs A-B and C-D. These data therefore reinforced and confirmed the proposed grouping of karyomorphs A-D in Group I and highlight a closer relationship between karyomorphs A-B and C-D. Moreover, the mapping pattern of some markers on some autosomes and on the chromosomes of the XY and X1X2Y systems provided new evidence concerning the possible origin of the sex chromosomes. Conclusion The in situ investigation of repetitive DNA sequences adds new informative characters useful in comparative genomics at chromosomal level and provides insights into the evolutionary relationships among Hoplias malabaricus karyomorphs. PMID:19583858

  13. Integrating mapping and sequencing around the human iduronate-2-sulfate sulfatase locus

    SciTech Connect

    Timms, K.; Lu, F.; Shen, Y.

    1994-09-01

    The logical progression of the human genome project is from mapping to sequencing. However, the criteria for accurate sequencing and mapping are different and consequently, sequencing can reveal unexpected or erroneous relationships between cosmid clones that appear overlapping by hybridization. We are sequencing a 1 Mb region of human Xq28 spanning the genes for fragile X (fraxA) and iduronate-2-sulfate sulfatase (IDS). To date, seven cosmids from this region have been completed and another five are currently being sequenced. One of the completed cosmids contains the complete IDS gene, while another cosmid contains 4 of the 9 IDS exons. The exon sequences in both cosmids are identical, but corresponding introns have proved to be highly variant. This raises the possibility of either a second IDS gene or unusual pseudogene. In addition, one of the cosmids contains a microsatellite marker which has been mapped 150 kb distant from the gene for IDS. This indicates that either two cosmids containing IDS exons are separated by at least 100 kb, or a rearrangement in one of the cosmids prior to library construction. To simplify the development of sequence-ready cosmids, we have developed a rapid method of cosmid walking to select additional clones that are minimally overlapping.

  14. A High-Density Genetic Linkage Map for Cucumber (Cucumis sativus L.): Based on Specific Length Amplified Fragment (SLAF) Sequencing and QTL Analysis of Fruit Traits in Cucumber

    PubMed Central

    Zhu, Wen-Ying; Huang, Long; Chen, Long; Yang, Jian-Tao; Wu, Jia-Ni; Qu, Mei-Ling; Yao, Dan-Qing; Guo, Chun-Li; Lian, Hong-Li; He, Huan-Le; Pan, Jun-Song; Cai, Run

    2016-01-01

    High-density genetic linkage map plays an important role in genome assembly and quantitative trait loci (QTL) fine mapping. Since the coming of next-generation sequencing, makes the structure of high-density linkage maps much more convenient and practical, which simplifies SNP discovery and high-throughput genotyping. In this research, a high-density linkage map of cucumber was structured using specific length amplified fragment sequencing, using 153 F2 populations of S1000 × S1002. The high-density genetic map composed 3,057 SLAFs, including 4,475 SNP markers on seven chromosomes, and spanned 1061.19 cM. The average genetic distance is 0.35 cM. Based on this high-density genome map, QTL analysis was performed on two cucumber fruit traits, fruit length and fruit diameter. There are 15 QTLs for the two fruit traits were detected. PMID:27148281

  15. A High-Density Genetic Linkage Map for Cucumber (Cucumis sativus L.): Based on Specific Length Amplified Fragment (SLAF) Sequencing and QTL Analysis of Fruit Traits in Cucumber.

    PubMed

    Zhu, Wen-Ying; Huang, Long; Chen, Long; Yang, Jian-Tao; Wu, Jia-Ni; Qu, Mei-Ling; Yao, Dan-Qing; Guo, Chun-Li; Lian, Hong-Li; He, Huan-Le; Pan, Jun-Song; Cai, Run

    2016-01-01

    High-density genetic linkage map plays an important role in genome assembly and quantitative trait loci (QTL) fine mapping. Since the coming of next-generation sequencing, makes the structure of high-density linkage maps much more convenient and practical, which simplifies SNP discovery and high-throughput genotyping. In this research, a high-density linkage map of cucumber was structured using specific length amplified fragment sequencing, using 153 F2 populations of S1000 × S1002. The high-density genetic map composed 3,057 SLAFs, including 4,475 SNP markers on seven chromosomes, and spanned 1061.19 cM. The average genetic distance is 0.35 cM. Based on this high-density genome map, QTL analysis was performed on two cucumber fruit traits, fruit length and fruit diameter. There are 15 QTLs for the two fruit traits were detected. PMID:27148281

  16. Mapping Ds insertions in barley using a sequence-based approach.

    PubMed

    Cooper, L D; Marquez-Cedillo, L; Singh, J; Sturbaum, A K; Zhang, S; Edwards, V; Johnson, K; Kleinhofs, A; Rangel, S; Carollo, V; Bregitzer, P; Lemaux, P G; Hayes, P M

    2004-09-01

    A transposon tagging system, based upon maize Ac/Ds elements, was developed in barley (Hordeum vulgaresubsp. vulgare). The long-term objective of this project is to identify a set of lines with Ds insertions dispersed throughout the genome as a comprehensive tool for gene discovery and reverse genetics. AcTPase and Ds-bar elements were introduced into immature embryos of Golden Promise by biolistic transformation. Subsequent transposition and segregation of Ds away from AcTPase and the original site of integration resulted in new lines, each containing a stabilized Ds element in a new location. The sequence of the genomic DNA flanking the Ds elements was obtained by inverse PCR and TAIL-PCR. Using a sequence-based mapping strategy, we determined the genome locations of the Ds insertions in 19 independent lines using primarily restriction digest-based assays of PCR-amplified single nucleotide polymorphisms and PCR-based assays of insertions or deletions. The principal strategy was to identify and map sequence polymorphisms in the regions corresponding to the flanking DNA using the Oregon Wolfe Barley mapping population. The mapping results obtained by the sequence-based approach were confirmed by RFLP analyses in four of the lines. In addition, cloned DNA sequences corresponding to the flanking DNA were used to assign map locations to Morex-derived genomic BAC library inserts, thus integrating genetic and physical maps of barley. BLAST search results indicate that the majority of the transposed Ds elements are found within predicted or known coding sequences. Transposon tagging in barley using Ac/Ds thus promises to provide a useful tool for studies on the functional genomics of the Triticeae. PMID:15449176

  17. Physical map of the chromosome of Neisseria gonorrhoeae FA1090 with locations of genetic markers, including opa and pil genes.

    PubMed Central

    Dempsey, J A; Litaker, W; Madhure, A; Snodgrass, T L; Cannon, J G

    1991-01-01

    A physical map of the chromosome of Neisseria gonorrhoeae FA1090 has been constructed. Digestion of strain FA1090 DNA with NheI, SpeI, BglII, or PacI resulted in a limited number of fragments that were resolved by contour-clamped homogeneous electric field electrophoresis. The estimated genome size was 2,219 kb. To construct the map, probes corresponding to single-copy chromosomal sequences were used in Southern blots of digested DNA separated on pulsed-field gels, to determine how the fragments from different digests overlapped. Some of the probes represented identified gonococcal genes, whereas others were anonymous cloned fragments of strain FA1090 DNA. By using this approach, a macrorestriction map of the strain FA1090 chromosome was assembled, and the locations of various genetic markers on the map were determined. Once the map was completed, the repeated gene families encoding Opa and pilin proteins were mapped. The 11 opa loci of strain FA1090 were distributed over approximately 60% of the chromosome. The pil loci were more clustered and were located in two regions separated by approximately one-fourth of the chromosome. Images PMID:1679431

  18. Comparative mapping of human alphoid centromeric sequences in great apes

    SciTech Connect

    Archidiacono, N.; Antonacci, R.; Marzella, R.

    1994-09-01

    Metaphase spreads from chimpanzees (Pan troglodytes and Pan paniscus) and gorilla (Gorilla gorilla) have been hybridized in situ with 27 alphoid DNA probes specific for the centromere of human chromosomes, to investigate the evolutionary relationship between centromeric regions of human and great apes. The results showed that most human probes do not recognize their corresponding homologs in great apes. Chromosome X is the only chromosome showing localization consistency in all the four species. Each suprachromosomal family (SCF) exhibits a distinct and peculiar evolutionary history. SCF1 (chromosomes 1, 3, 6, 7, 19, 12, 16) is very heterogeneous: some probes gave intense signals, but always on non-homologous chromosomes; others did not produce any hybridization signal. All probes localized on SCF2 (chromosomes 2, 4, 8, 9, 13, 14, 15, 18, 20, 21, and 22) recognize a single chromosome: chromosome 11 (phylogenetic IX) in PTR and PPA; chromosome 4 (phylogenetic V) in GGO. SCF3 subsets (chromosomes 1, 11, 17, X) are substantially conserved in PTR and PPA, but not in GGO, with the exception restricted to chromosome X. No signals have been detected on PPA chromosomes I, III, IV, V, VI and in PTR chromosomes V, suggesting that the centromeric region of some chromsomes have probably lost homology with human alphoid sequences.

  19. Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

    PubMed Central

    Narasimhan, Kamesh; Lambert, Samuel A; Yang, Ally WH; Riddell, Jeremy; Mnaimneh, Sanie; Zheng, Hong; Albu, Mihai; Najafabadi, Hamed S; Reece-Hoyes, John S; Fuxman Bass, Juan I; Walhout, Albertha JM; Weirauch, Matthew T; Hughes, Timothy R

    2015-01-01

    Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001 PMID:25905672

  20. Use of the Caulobacter crescentus Genome Sequence To Develop a Method for Systematic Genetic Mapping

    PubMed Central

    West, Lisandra; Yang, Desiree; Stephens, Craig

    2002-01-01

    The functional analysis of sequenced genomes will be facilitated by the development of tools for the rapid mapping of mutations. We have developed a systematic approach to genetic mapping in Caulobacter crescentus that is based on bacteriophage-mediated transduction of strategically placed antibiotic resistance markers. The genomic DNA sequence was used to identify sites distributed evenly around the chromosome at which plasmids could be nondisruptively integrated. DNA fragments from these sites were amplified by PCR and cloned into a kanamycin-resistant (Kanr) suicide vector. Delivery of these plasmids into C. crescentus resulted in integration via homologous recombination. A set of 41 strains containing Kanr markers at 100-kb intervals was thereby generated. These strains serve as donors for generalized transduction using bacteriophage φCr30, which can transduce at least 120 kb of DNA. Transductants are selected with kanamycin and screened for loss of the mutant phenotype to assess linkage between the marker and the site of the mutation. The dependence of cotransduction frequency on sequence distance was evaluated using several markers and mutant strains. With these data as a standard, previously unmapped mutations were readily localized to DNA sequence intervals equivalent to less than 1% of the genome. Candidate genes within the interval were then examined further by subcloning and complementation analysis. Mutations resulting in sensitivity to ampicillin, in nutritional auxotrophies, or temperature-sensitive growth were mapped. This approach to genetic mapping should be applicable to other bacteria with sequenced genomes for which generalized transducing phage are available. PMID:11914347

  1. OmniMapFree: A unified tool to visualise and explore sequenced genomes

    PubMed Central

    2011-01-01

    • Background Acquiring and exploring whole genome sequence information for a species under investigation is now a routine experimental approach. On most genome browsers, typically, only the DNA sequence, EST support, motif search results, and GO annotations are displayed. However, for many species, a growing volume of additional experimental information is available but this is rarely searchable within the landscape of the entire genome. • Results We have developed a generic software which permits users to view a single genome in entirety either within its chromosome or supercontig context within a single window. This software permits the genome to be displayed at any scales and with any features. Different data types and data sets are displayed onto the genome, which have been acquired from other types of studies including classical genetics, forward and reverse genetics, transcriptomics, proteomics and improved annotation from alternative sources. In each display, different types of information can be overlapped, then retrieved in the desired combinations and scales and used in follow up analyses. The displays generated are of publication quality. • Conclusions OmniMapFree provides a unified, versatile and easy-to-use software tool for studying a single genome in association with all the other datasets and data types available for the organism. PMID:22085540

  2. High Resolution Genetic Mapping by Genome Sequencing Reveals Genome Duplication and Tetraploid Genetic Structure of the Diploid Miscanthus sinensis

    PubMed Central

    Ma, Xue-Feng; Jensen, Elaine; Alexandrov, Nickolai; Troukhan, Maxim; Zhang, Liping; Thomas-Jones, Sian; Farrar, Kerrie; Clifton-Brown, John; Donnison, Iain; Swaller, Timothy; Flavell, Richard

    2012-01-01

    We have created a high-resolution linkage map of Miscanthus sinensis, using genotyping-by-sequencing (GBS), identifying all 19 linkage groups for the first time. The result is technically significant since Miscanthus has a very large and highly heterozygous genome, but has no or limited genomics information to date. The composite linkage map containing markers from both parental linkage maps is composed of 3,745 SNP markers spanning 2,396 cM on 19 linkage groups with a 0.64 cM average resolution. Comparative genomics analyses of the M. sinensis composite linkage map to the genomes of sorghum, maize, rice, and Brachypodium distachyon indicate that sorghum has the closest syntenic relationship to Miscanthus compared to other species. The comparative results revealed that each pair of the 19 M. sinensis linkages aligned to one sorghum chromosome, except for LG8, which mapped to two sorghum chromosomes (4 and 7), presumably due to a chromosome fusion event after genome duplication. The data also revealed several other chromosome rearrangements relative to sorghum, including two telomere-centromere inversions of the sorghum syntenic chromosome 7 in LG8 of M. sinensis and two paracentric inversions of sorghum syntenic chromosome 4 in LG7 and LG8 of M. sinensis. The results clearly demonstrate, for the first time, that the diploid M. sinensis is tetraploid origin consisting of two sub-genomes. This complete and high resolution composite linkage map will not only serve as a useful resource for novel QTL discoveries, but also enable informed deployment of the wealth of existing genomics resources of other species to the improvement of Miscanthus as a high biomass energy crop. In addition, it has utility as a reference for genome sequence assembly for the forthcoming whole genome sequencing of the Miscanthus genus. PMID:22439001

  3. High resolution genetic mapping by genome sequencing reveals genome duplication and tetraploid genetic structure of the diploid Miscanthus sinensis.

    PubMed

    Ma, Xue-Feng; Jensen, Elaine; Alexandrov, Nickolai; Troukhan, Maxim; Zhang, Liping; Thomas-Jones, Sian; Farrar, Kerrie; Clifton-Brown, John; Donnison, Iain; Swaller, Timothy; Flavell, Richard

    2012-01-01

    We have created a high-resolution linkage map of Miscanthus sinensis, using genotyping-by-sequencing (GBS), identifying all 19 linkage groups for the first time. The result is technically significant since Miscanthus has a very large and highly heterozygous genome, but has no or limited genomics information to date. The composite linkage map containing markers from both parental linkage maps is composed of 3,745 SNP markers spanning 2,396 cM on 19 linkage groups with a 0.64 cM average resolution. Comparative genomics analyses of the M. sinensis composite linkage map to the genomes of sorghum, maize, rice, and Brachypodium distachyon indicate that sorghum has the closest syntenic relationship to Miscanthus compared to other species. The comparative results revealed that each pair of the 19 M. sinensis linkages aligned to one sorghum chromosome, except for LG8, which mapped to two sorghum chromosomes (4 and 7), presumably due to a chromosome fusion event after genome duplication. The data also revealed several other chromosome rearrangements relative to sorghum, including two telomere-centromere inversions of the sorghum syntenic chromosome 7 in LG8 of M. sinensis and two paracentric inversions of sorghum syntenic chromosome 4 in LG7 and LG8 of M. sinensis. The results clearly demonstrate, for the first time, that the diploid M. sinensis is tetraploid origin consisting of two sub-genomes. This complete and high resolution composite linkage map will not only serve as a useful resource for novel QTL discoveries, but also enable informed deployment of the wealth of existing genomics resources of other species to the improvement of Miscanthus as a high biomass energy crop. In addition, it has utility as a reference for genome sequence assembly for the forthcoming whole genome sequencing of the Miscanthus genus. PMID:22439001

  4. Mapping Sensorimotor Sequences to Word Sequences: A Connectionist Model of Language Acquisition and Sentence Generation

    ERIC Educational Resources Information Center

    Takac, Martin; Benuskova, Lubica; Knott, Alistair

    2012-01-01

    In this article we present a neural network model of sentence generation. The network has both technical and conceptual innovations. Its main technical novelty is in its semantic representations: the messages which form the input to the network are structured as sequences, so that message elements are delivered to the network one at a time. Rather…

  5. 1989 Walker Branch Watershed Surveying and Mapping Including a Guide to Coordinate Transformation Procedures

    SciTech Connect

    Timmins, S.

    1991-01-01

    Walker Branch Watershed is a forested, research watershed marked throughout by a 264 ft grid that was surveyed in 1967 using the Oak Ridge National Laboratory (X-10) coordinate system. The Tennessee Valley Authority (TVA) prepared a contour map of the watershed in 1987, and an ARC/INFO{trademark} version of the TVA topographic map with the X-10 grid superimposed has since been used as the primary geographic information system (GIS) data base for the watershed. However, because of inaccuracies observed in mapped locations of some grid markers and permanent research plots, portions of the watershed were resurveyed in 1989 and an extensive investigation of the coordinates used in creating both the TVA map and ARC/INFO data base and of coordinate transformation procedures currently in use on the Oak Ridge Reservation was conducted. They determined that the positional errors resulted from the field orientation of the blazed grid rather than problems in mapmaking. In resurveying the watershed, previously surveyed control points were located or noted as missing, and 25 new control points along the perimeter roads were surveyed. In addition, 67 of 156 grid line intersections (pegs) were physically located and their positions relative to mapped landmarks were recorded. As a result, coordinates for the Walker Branch Watershed grid lines and permanent research plots were revised, and a revised map of the watershed was produced. In conjunction with this work, existing procedures for converting between the local grid systems, Tennessee state plane, and the 1927 and 1983 North American Datums were updated and compiled along with illustrative examples and relevant historical information. Alternative algorithms were developed for several coordinate conversions commonly used on the Oak Ridge Reservation.

  6. A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the first maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2,...

  7. Substrate-Driven Mapping of the Degradome by Comparison of Sequence Logos

    PubMed Central

    Fuchs, Julian E.; von Grafenstein, Susanne; Huber, Roland G.; Kramer, Christian; Liedl, Klaus R.

    2013-01-01

    Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available. PMID:24244149

  8. A physical map of the X chromosome of Drosophila melanogaster: Cosmid contigs and sequence tagged sites

    SciTech Connect

    Madueno, E.; Modolell, J.; Papagiannakis, G.

    1995-04-01

    A physical map of the euchromatic X chromosome of Drosophila melanogaster has been constructed by assembling contiguous arrays of cosmids that were selected by screening a library with DNA isolated from microamplified chromosomal divisions. This map, consisting of 893 cosmids, covers {approximately}64% of the euchromatic part of the chromosome. In addition, 568 sequence tagged sites (STS), in aggregate representing 120 kb of sequenced DNA, were derived from selected cosmids. Most of these STSs, spaced at an average distance of {approximately} 35 kb along the euchromatic region of the chromosome, represent DNA tags that can be used as entry points to the fruitfly genome. Furthermore, 42 genes have been placed on the physical map, either through the hybridization of specific probes to the cosmids or through the fact that they were represented among the STSs. These provide a link between the physical and the genetic maps of D. melanogaster. Nine novel genes have been tentatively identified in Drosophila on the basis of matches between STS sequences and sequences from other species. 32 refs., 3 figs., 4 tabs.

  9. Mapping sequenced E.coli genes by computer: software, strategies and examples.

    PubMed Central

    Rudd, K E; Miller, W; Werner, C; Ostell, J; Tolstoshev, C; Satterfield, S G

    1991-01-01

    Methods are presented for organizing and integrating DNA sequence data, restriction maps, and genetic maps for the same organism but from a variety of sources (databases, publications, personal communications). Proper software tools are essential for successful organization of such diverse data into an ordered, cohesive body of information, and a suite of novel software to support this endeavor is described. Though these tools automate much of the task, a variety of strategies is needed to cope with recalcitrant cases. We describe such strategies and illustrate their application with numerous examples. These strategies have allowed us to order, analyze, and display over one megabase of E. coli DNA sequence information. The integration task often exposes inconsistencies in the available data, perhaps caused by strain polymorphisms or human oversight, necessitating the application of sound biological judgment. The examples illustrate both the level of expertise required of the database curator and the knowledge gained as apparent inconsistencies are resolved. The software and mapping methods are applicable to the study of any genome for which a high resolution restriction map is available. They were developed to support a weakly coordinated sequencing effort involving many laboratories, but would also be useful for highly orchestrated sequencing projects. PMID:2011534

  10. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker d...

  11. Raman-based system for DNA sequencing-mapping and other separations

    DOEpatents

    Vo-Dinh, Tuan

    1994-01-01

    DNA sequencing and mapping are performed by using a Raman spectrometer with a surface enhanced Raman scattering (SERS) substrate to enhance the Raman signal. A SERS label is attached to a DNA fragment and then analyzed with the Raman spectrometer to identify the DNA fragment according to characteristics of the Raman spectrum generated.

  12. Raman-based system for DNA sequencing-mapping and other separations

    DOEpatents

    Vo-Dinh, T.

    1994-04-26

    DNA sequencing and mapping are performed by using a Raman spectrometer with a surface enhanced Raman scattering (SERS) substrate to enhance the Raman signal. A SERS label is attached to a DNA fragment and then analyzed with the Raman spectrometer to identify the DNA fragment according to characteristics of the Raman spectrum generated. 11 figures.

  13. Rapid restriction mapping of cosmids by sequence-specific triple-helix-mediated affinity capture

    SciTech Connect

    Ji, Huamin; Francisco, T.; Smith, L.M.; Guilfoyle, R.A.

    1996-01-15

    A simple and rapid strategy for restriction mapping based on sequence-specific triple-helix affinity capture (TAC) was developed. The strategy was applied to the analysis of cosmid clones by the construction of a new cosmid vector, ScosTriplex-II, containing two different triple-helix-forming sequences flanking the cloning site of the original SuperCos-1 cosmid vector. For restriction mapping, the recombinant cosmid DNA is digested with NotI restriction enzyme or with one of four intron-encoded endonucleases for excision of intact inserts followed by controlled partial digestion with a mapping enzyme used in conjunction with the corresponding methyltransferase. The partial digestion products are combined with biotinylated triple-helix-forming oligonucleotides to form a triple-helical complex. The triple-helix complexes are immobilized on streptavidin-coated magnetic beads, washed, and eluted with pH 9 buffer solution. The fragments are separated and directly sized by agarose gel electrophoresis. Bidirectional maps are obtained simultaneously by binding to the two different triple-helix-forming oligonucleotides. No probe labeling, gel drying, blotting to membranes, hybridization, or autoradiography is necessary. Also, TAC conditions that permit gel-free isolation of the terminal restriction fragments from cosmid inserts were found. These advantages afforded by ScosTriplex-II should facilitate the automation of cosmid restriction site fingerprinting needed for large-scale mapping and sequencing projects. 24 refs., 5 figs.

  14. Computational approach towards promoter sequence comparison via TF mapping using a new distance measure.

    PubMed

    Meera, A; Rangarajan, Lalitha; Bhat, Savithri

    2011-03-01

    We propose a method for identifying transcription factor binding sites (TFBS) in the given promoter sequence and mapping the transcription factors (TFs). The proposed algorithm searches the +1 transcription start site (TSS) for eukaryotic and prokaryotic sequences individually. The algorithm was tested with sequences from both eukaryotes and prokaryotes for at least 9 experimentally verified and validated functional TFs in promoter sequences. The order and type of TF binding to the promoter of genes encoding central metabolic pathway (CMP) enzyme was tabulated. A new similarity measure was devised for scoring the similarity between a pair of promoter sequences based on the number and order of motifs. Further, these were grouped in clusters considering the scores between them. The distance between each of the clusters in individual pathway was calculated and a phylogenetic tree was developed. This method is further applied to other pathways such as lipid and amino acid biosynthesis to retrieve and compare experimentally verified and conserved TFBS. PMID:21369887

  15. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments

    PubMed Central

    Miga, Karen H.; Eisenhart, Christopher; Kent, W. James

    2015-01-01

    The human reference assembly remains incomplete due to the underrepresentation of repeat-rich sequences that are found within centromeric regions and acrocentric short arms. Although these sequences are marginally represented in the assembly, they are often fully represented in whole-genome short-read datasets and contribute to inappropriate alignments and high read-depth signals that localize to a small number of assembled homologous regions. As a consequence, these regions often provide artifactual peak calls that confound hypothesis testing and large-scale genomic studies. To address this problem, we have constructed mapping targets that represent roughly 8% of the human genome generally omitted from the human reference assembly. By integrating these data into standard mapping and peak-calling pipelines we demonstrate a 10-fold reduction in signals in regions common to the blacklisted region and identify a comprehensive set of regions that exhibit mapping sensitivity with the presence of the repeat-rich targets. PMID:26163063

  16. Rapid fine conformational epitope mapping using comprehensive mutagenesis and deep sequencing.

    PubMed

    Kowalsky, Caitlin A; Faber, Matthew S; Nath, Aritro; Dann, Hailey E; Kelly, Vince W; Liu, Li; Shanker, Purva; Wagner, Ellen K; Maynard, Jennifer A; Chan, Christina; Whitehead, Timothy A

    2015-10-30

    Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891

  17. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    PubMed

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  18. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    PubMed Central

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  19. Identification and mapping of paralogous genes on a known genomic DNA sequence.

    PubMed

    Bina, Minou

    2006-01-01

    The completion of whole genome sequencing projects offers the opportunity to examine the organization of genes and the discovery of evolutionarily related genes in a given species. For the beginners in the field, through a specific example, this chapter provides a step-by-step procedure for identifying paralogous genes, using the genome browser at UCSC (http://genome.ucsc.edu/). The example describes identification and mapping in the human genome, the paralogs of TCF12/HTF4. The example identifies TCF3 and TCF4 as paralogs of the TCF12/HTF4 gene. The example also identifies a related sequence, corresponding to a pseudogene, in one of the introns of the JAK2 gene. The procedure described should be applicable to the discovery and creation of maps of paralogous genes in the genomic DNA sequences that are available at the genome browser at UCSC. PMID:16888348

  20. SNP identification from RNA sequencing and linkage map construction of rubber tree for anchoring the draft genome.

    PubMed

    Shearman, Jeremy R; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-Areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2015-01-01

    Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly. PMID:25831195

  1. Fine-mapping diabetes-related traits, including insulin resistance, in heterogeneous stock rats.

    PubMed

    Solberg Woods, Leah C; Holl, Katie L; Oreper, Daniel; Xie, Yuying; Tsaih, Shirng-Wern; Valdar, William

    2012-11-01

    Type 2 diabetes (T2D) is a disease of relative insulin deficiency resulting from both insulin resistance and beta cell failure. We have previously used heterogeneous stock (HS) rats to fine-map a locus for glucose tolerance. We show here that glucose intolerance in the founder strains of the HS colony is mediated by different mechanisms: insulin resistance in WKY and an insulin secretion defect in ACI, and we demonstrate a high degree of variability for measures of insulin resistance and insulin secretion in HS rats. As such, our goal was to use HS rats to fine-map several diabetes-related traits within a region on rat chromosome 1. We measured blood glucose and plasma insulin levels after a glucose tolerance test in 782 male HS rats. Using 97 SSLP markers, we genotyped a 68 Mb region on rat chromosome 1 previously implicated in glucose and insulin regulation. We used linkage disequilibrium mapping by mixed model regression with inferred descent to identify a region from 198.85 to 205.9 that contains one or more quantitative trait loci (QTL) for fasting insulin and a measure of insulin resistance, the quantitative insulin sensitivity check index. This region also encompasses loci identified for fasting glucose and Insulin_AUC (area under the curve). A separate <3 Mb QTL was identified for body weight. Using a novel penalized regression method we then estimated effects of alternative haplotype pairings under each locus. These studies highlight the utility of HS rats for fine-mapping genetic loci involved in the underlying causes of T2D. PMID:22947656

  2. Dual-pathway multi-echo sequence for simultaneous frequency and T2 mapping

    NASA Astrophysics Data System (ADS)

    Cheng, Cheng-Chieh; Mei, Chang-Sheng; Duryea, Jeffrey; Chung, Hsiao-Wen; Chao, Tzu-Cheng; Panych, Lawrence P.; Madore, Bruno

    2016-04-01

    Purpose: To present a dual-pathway multi-echo steady state sequence and reconstruction algorithm to capture T2, T2∗ and field map information. Methods: Typically, pulse sequences based on spin echoes are needed for T2 mapping while gradient echoes are needed for field mapping, making it difficult to jointly acquire both types of information. A dual-pathway multi-echo pulse sequence is employed here to generate T2 and field maps from the same acquired data. The approach might be used, for example, to obtain both thermometry and tissue damage information during thermal therapies, or susceptibility and T2 information from a same head scan, or to generate bonus T2 maps during a knee scan. Results: Quantitative T2, T2∗ and field maps were generated in gel phantoms, ex vivo bovine muscle, and twelve volunteers. T2 results were validated against a spin-echo reference standard: A linear regression based on ROI analysis in phantoms provided close agreement (slope/R2 = 0.99/0.998). A pixel-wise in vivo Bland-Altman analysis of R2 = 1/T2 showed a bias of 0.034 Hz (about 0.3%), as averaged over four volunteers. Ex vivo results, with and without motion, suggested that tissue damage detection based on T2 rather than temperature-dose measurements might prove more robust to motion. Conclusion: T2, T2∗ and field maps were obtained simultaneously, from the same datasets, in thermometry, susceptibility-weighted imaging and knee-imaging contexts.

  3. Identification and Mapping of the Edwards Stratigraphic Sequence in the State of Chihuahua Assisted by ten ArcMap Based Layers

    NASA Astrophysics Data System (ADS)

    Martinez-Pina, C.; Granados, A.; Goodell, P.

    2007-05-01

    Edwards Formation is a reef limestone that hosts one of the largest aquifers of the State of Texas. In 2004 the United States and Mexico signed an agreement intended to characterize and identify the shared binational underground resources. Texas Water Development Board Report 360 established for the Edwards Aquifer an area of more than 31,000 km2, half of which is in the State of Coahuila, Mexico (the agreement did not include the State of Chihuahua). This led to the idea that Chihuahua may also have hydrologic potential in the Edwards equivalent, where numerous large cavern systems are already recognized (Naica's Sword Cavern, and the Coyame, Nombre de Dios and Bocagrande Caverns). The objective of this study is to establish the existence, in the State of Chihuahua, of the stratigraphic sequence and geohydrologic properties such as faulting, sinkholes, and springs, within the Edwards equivalent. The Consejo de Recursos Minerales geologic map, INEGI's hydrologic study, petroleum, mining and hydrogeology studies of Chihuahua, and many others, constitute the database used. ArcMap is used to define the geologic framework and construct different thematic layers (structural, lithological, hydrological) that would aid in the identification of the stratigraphic sequence. The results show that all the Edwards Stratigraphic Sequence (ESS) exists in Chihuahua; that there are isolated areas of groundwater production in eastern Chihuahua possibly from ESS but this is not well established. Overall the ESS presents an unusual opportunity as a potentially productive aquifer in the State of Chihuahua.

  4. Phase-corrected Bipolar Gradients in Multiecho Gradient-echo Sequences for Quantitative Susceptibility Mapping

    PubMed Central

    Li, Jianqi; Chang, Shixin; Liu, Tian; Jiang, Hongwei; Dong, Fang; Pei, Mengchao; Wang, Qianfeng; Wang, Yi

    2016-01-01

    Object The large echo spacing of unipolar readout gradients in current multiecho gradient-echo sequences for mapping fields in quantitative susceptibility mapping (QSM) can be reduced using bipolar readout gradients to improve acquisition efficiency. Materials and Methods Phase discrepancies between odd and even echoes in the bipolar readout gradients caused by non-ideal gradient behaviors were measured, modeled as polynomials in space and corrected for accordingly in field mapping. The bipolar approach for multiecho gradient-echo field mapping was compared with the unipolar approach for QSM. Results The odd-even-echo phase discrepancies were approximately constant along the phase encoding direction and linear along the readout and slice-selection directions. A simple linear phase correction in all three spatial directions was shown to enable accurate QSM in the human brain using a bipolar multiecho GRE sequence. Bipolar multiecho acquisition provides QSM in good quantitative agreement with unipolar acquisition while also reducing noise. Conclusion With a linear phase correction between odd-even echoes, bipolar readout gradients can be used in multiecho gradient-echo sequences for QSM. PMID:25408108

  5. Mapping wide row crops with video sequences acquired from a tractor moving at treatment speed.

    PubMed

    Sainz-Costa, Nadir; Ribeiro, Angela; Burgos-Artizzu, Xavier P; Guijarro, María; Pajares, Gonzalo

    2011-01-01

    This paper presents a mapping method for wide row crop fields. The resulting map shows the crop rows and weeds present in the inter-row spacing. Because field videos are acquired with a camera mounted on top of an agricultural vehicle, a method for image sequence stabilization was needed and consequently designed and developed. The proposed stabilization method uses the centers of some crop rows in the image sequence as features to be tracked, which compensates for the lateral movement (sway) of the camera and leaves the pitch unchanged. A region of interest is selected using the tracked features, and an inverse perspective technique transforms the selected region into a bird's-eye view that is centered on the image and that enables map generation. The algorithm developed has been tested on several video sequences of different fields recorded at different times and under different lighting conditions, with good initial results. Indeed, lateral displacements of up to 66% of the inter-row spacing were suppressed through the stabilization process, and crop rows in the resulting maps appear straight. PMID:22164003

  6. Mapping Wide Row Crops with Video Sequences Acquired from a Tractor Moving at Treatment Speed

    PubMed Central

    Sainz-Costa, Nadir; Ribeiro, Angela; Burgos-Artizzu, Xavier P.; Guijarro, María; Pajares, Gonzalo

    2011-01-01

    This paper presents a mapping method for wide row crop fields. The resulting map shows the crop rows and weeds present in the inter-row spacing. Because field videos are acquired with a camera mounted on top of an agricultural vehicle, a method for image sequence stabilization was needed and consequently designed and developed. The proposed stabilization method uses the centers of some crop rows in the image sequence as features to be tracked, which compensates for the lateral movement (sway) of the camera and leaves the pitch unchanged. A region of interest is selected using the tracked features, and an inverse perspective technique transforms the selected region into a bird’s-eye view that is centered on the image and that enables map generation. The algorithm developed has been tested on several video sequences of different fields recorded at different times and under different lighting conditions, with good initial results. Indeed, lateral displacements of up to 66% of the inter-row spacing were suppressed through the stabilization process, and crop rows in the resulting maps appear straight. PMID:22164003

  7. Nested Association Mapping of Stem Rust Resistance in Wheat Using Genotyping by Sequencing

    PubMed Central

    Rouse, Matthew N.; Tsilo, Toi J.; Macharia, Godwin K.; Bhavani, Sridhar; Jin, Yue; Anderson, James A.

    2016-01-01

    We combined the recently developed genotyping by sequencing (GBS) method with joint mapping (also known as nested association mapping) to dissect and understand the genetic architecture controlling stem rust resistance in wheat (Triticum aestivum). Ten stem rust resistant wheat varieties were crossed to the susceptible line LMPG-6 to generate F6 recombinant inbred lines. The recombinant inbred line populations were phenotyped in Kenya, South Africa, and St. Paul, Minnesota, USA. By joint mapping of the 10 populations, we identified 59 minor and medium-effect QTL (explained phenotypic variance range of 1% – 20%) on 20 chromosomes that contributed towards adult plant resistance to North American Pgt races as well as the highly virulent Ug99 race group. Fifteen of the 59 QTL were detected in multiple environments. No epistatic relationship was detected among the QTL. While these numerous small- to medium-effect QTL are shared among the families, the founder parents were found to have different allelic effects for the QTL. Fourteen QTL identified by joint mapping were also detected in single-population mapping. As these QTL were mapped using SNP markers with known locations on the physical chromosomes, the genomic regions identified with QTL could be explored more in depth to discover candidate genes for stem rust resistance. The use of GBS-derived de novo SNPs in mapping resistance to stem rust shown in this study could be used as a model to conduct similar marker-trait association studies in other plant species. PMID:27186883

  8. Genome Assembly Improvement and Mapping Convergently Evolved Skeletal Traits in Sticklebacks with Genotyping-by-Sequencing

    PubMed Central

    Glazer, Andrew M.; Killingbeck, Emily E.; Mitros, Therese; Rokhsar, Daniel S.; Miller, Craig T.

    2015-01-01

    Marine populations of the threespine stickleback (Gasterosteus aculeatus) have repeatedly colonized and rapidly adapted to freshwater habitats, providing a powerful system to map the genetic architecture of evolved traits. Here, we developed and applied a binned genotyping-by-sequencing (GBS) method to build dense genome-wide linkage maps of sticklebacks using two large marine by freshwater F2 crosses of more than 350 fish each. The resulting linkage maps significantly improve the genome assembly by anchoring 78 new scaffolds to chromosomes, reorienting 40 scaffolds, and rearranging scaffolds in 4 locations. In the revised genome assembly, 94.6% of the assembly was anchored to a chromosome. To assess linkage map quality, we mapped quantitative trait loci (QTL) controlling lateral plate number, which mapped as expected to a 200-kb genomic region containing Ectodysplasin, as well as a chromosome 7 QTL overlapping a previously identified modifier QTL. Finally, we mapped eight QTL controlling convergently evolved reductions in gill raker length in the two crosses, which revealed that this classic adaptive trait has a surprisingly modular and nonparallel genetic basis. PMID:26044731

  9. High-resolution genetic mapping of maize pan-genome sequence anchors.

    PubMed

    Lu, Fei; Romay, Maria C; Glaubitz, Jeffrey C; Bradbury, Peter J; Elshire, Robert J; Wang, Tianyu; Li, Yu; Li, Yongxiang; Semagn, Kassa; Zhang, Xuecai; Hernandez, Alvaro G; Mikel, Mark A; Soifer, Ilya; Barad, Omer; Buckler, Edward S

    2015-01-01

    In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. The structural variation across a species can be represented by a 'pan-genome', which is essential to fully understand the genetic control of phenotypes. However, the pan-genome's complexity hinders its accurate assembly via sequence alignment. Here we demonstrate an approach to facilitate pan-genome construction in maize. By performing 18 trillion association tests we map 26 million tags generated by reduced representation sequencing of 14,129 maize inbred lines. Using machine-learning models we select 4.4 million accurately mapped tags as sequence anchors, 1.1 million of which are presence/absence variations. Structural variations exhibit enriched association with phenotypic traits, indicating that it is a significant source of adaptive variation in maize. The ability to efficiently map ultrahigh-density pan-genome sequence anchors enables fine characterization of structural variation and will advance both genetic research and breeding in many crops. PMID:25881062

  10. Recombination mapping using Boolean logic and high-density SNP genotyping for exome sequence filtering

    PubMed Central

    Markello, Thomas C.; Han, Ted; Carlson-Donohoe, Hannah; Ahaghotu, Chidi; Harper, Ursula; Jones, MaryPat; Chandrasekharappa, Settara; Anikster, Yair; Adams, David R.; Gahl, William A.; Boerkoel, Cornelius F.

    2012-01-01

    Whole genome sequence data for small pedigrees has been shown to provide sufficient information to resolve detailed haplotypes in small pedigrees. Using such information, recombinations can be mapped onto chromosomes, compared with the segregation of a disease of interest and used to filter genome sequence variants. We now show that relatively inexpensive SNP array data from small pedigrees can be used in a similar manner to provide a means of identifying regions of interest in exome sequencing projects. We demonstrate that in those situations where one can assume complete penetrance and parental DNA is available, SNP recombination mapping using Boolean logic identifies chromosomal regions identical to those detected by multipoint linkage using microsatellites but with much less computation. We further show that this approach is successful because the probability of a double crossover between informative SNP loci is negligible. Our observations provide a rationale for using SNP arrays and recombination mapping as a rapid and cost-effective means of incorporating chromosome segregation information into exome sequencing projects intended for disease-gene identification. PMID:22264778

  11. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Cancer.gov

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  12. A framework radiation hybrid map of buffalo chromosome 1 ordering scaffolds from buffalo genome sequence assembly.

    PubMed

    Stafuzza, N B; Naressi, B C M; Yang, E; Cai, J J; Amaral-Trusty, M E J

    2015-01-01

    River buffalo chromosome 1 (BBU1) is a sub-metacentric chromosome homologous to bovine chromosomes 1 and 27. In this study, we constructed a new framework radiation hybrid (RH) map from BBU1 using BBURH5000 panel adding nine new genes (ADRB3, ATP2C1, COPB2, CRYGS, P2RY1, SLC5A3, SLC20A2, SST, and ZDHHC2) and one microsatellite (CSSM043) to the set of markers previously mapped on BBU1. The new framework RH map of BBU1 contained 141 markers (55 genes, 2 ESTs, 10 microsatellites, and 74 SNPs) distributed within one linkage group spanning 2832.62 centirays. Comparison of the RH map to sequences from bovine chromosomes 1 and 27 revealed an inversion close to the telomeric region. In addition, we ordered a set of 34 scaffolds from the buffalo genome assembly UMD_CASPUR_WB_2.0. The RH map could provide a valuable tool to order scaffolds from the buffalo genome sequence, contributing to its annotation. PMID:26535622

  13. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing.

    PubMed

    Kunde-Ramamoorthy, Govindarajan; Coarfa, Cristian; Laritsky, Eleonora; Kessler, Noah J; Harris, R Alan; Xu, Mingchu; Chen, Rui; Shen, Lanlan; Milosavljevic, Aleksandar; Waterland, Robert A

    2014-04-01

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered >70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r(2) ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8-12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage. PMID:24391148

  14. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing

    PubMed Central

    Kunde-Ramamoorthy, Govindarajan; Coarfa, Cristian; Laritsky, Eleonora; Kessler, Noah J.; Harris, R. Alan; Xu, Mingchu; Chen, Rui; Shen, Lanlan; Milosavljevic, Aleksandar; Waterland, Robert A.

    2014-01-01

    Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered >70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r2 ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8–12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage. PMID:24391148

  15. A high-density linkage map for Astyanax mexicanus using genotyping-by-sequencing technology.

    PubMed

    Carlson, Brian M; Onusko, Samuel W; Gross, Joshua B

    2015-02-01

    The Mexican tetra, Astyanax mexicanus, is a unique model system consisting of cave-adapted and surface-dwelling morphotypes that diverged >1 million years (My) ago. This remarkable natural experiment has enabled powerful genetic analyses of cave adaptation. Here, we describe the application of next-generation sequencing technology to the creation of a high-density linkage map. Our map comprises more than 2200 markers populating 25 linkage groups constructed from genotypic data generated from a single genotyping-by-sequencing project. We leveraged emergent genomic and transcriptomic resources to anchor hundreds of anonymous Astyanax markers to the genome of the zebrafish (Danio rerio), the most closely related model organism to our study species. This facilitated the identification of 784 distinct connections between our linkage map and the Danio rerio genome, highlighting several regions of conserved genomic architecture between the two species despite ~150 My of divergence. Using a Mendelian cave-associated trait as a proof-of-principle, we successfully recovered the genomic position of the albinism locus near the gene Oca2. Further, our map successfully informed the positions of unplaced Astyanax genomic scaffolds within particular linkage groups. This ability to identify the relative location, orientation, and linear order of unaligned genomic scaffolds will facilitate ongoing efforts to improve on the current early draft and assemble future versions of the Astyanax physical genome. Moreover, this improved linkage map will enable higher-resolution genetic analyses and catalyze the discovery of the genetic basis for cave-associated phenotypes. PMID:25520037

  16. Geologic map of the southern Funeral Mountains including nearby groundwater discharge sites in Death Valley National Park, California and Nevada

    USGS Publications Warehouse

    Fridrich, C.J.; Thompson, R.A.; Slate, J.L.; Berry, M.E.; Machette, M.N.

    2012-01-01

    This 1:50,000-scale geologic map covers the southern part of the Funeral Mountains, and adjoining parts of four structural basins—Furnace Creek, Amargosa Valley, Opera House, and central Death Valley—in California and Nevada. It extends over three full 7.5-minute quadrangles, and parts of eleven others—an area of about 1,000 square kilometers (km2). The boundaries of this map were drawn to include all of the known proximal hydrogeologic features that may affect the flow of groundwater that discharges from springs of the Furnace Creek basin, in the west-central part of the map. These springs provide the main potable water supply for Death Valley National Park. Major hydrogeologic features shown on this map include: (1) springs of the Furnace Creek basin, (2) a large Pleistocene groundwater discharge mound in the northeastern part of the map, (3) the exposed extent of limestones and dolomites that constitute the Paleozoic carbonate aquifer, and (4) the exposed extent of the alluvial conglomerates that constitute the Funeral Formation aquifer.

  17. The reduced mycorrhizal colonisation (rmc) mutation of tomato disrupts five gene sequences including the CYCLOPS/IPD3 homologue.

    PubMed

    Larkan, Nicholas J; Ruzicka, Dan R; Edmonds-Tibbett, Tamara; Durkin, Jonathan M H; Jackson, Louise E; Smith, F Andrew; Schachtman, Daniel P; Smith, Sally E; Barker, Susan J

    2013-10-01

    Arbuscular mycorrhizal (AM) symbiosis in vascular plant roots is an ancient mutualistic interaction that evolved with land plants. More recently evolved root mutualisms have recruited components of the AM signalling pathway as identified with molecular approaches in model legume research. Earlier we reported that the reduced mycorrhizal colonisation (rmc) mutation of tomato mapped to chromosome 8. Here we report additional functional characterisation of the rmc mutation using genotype grafts and proteomic and transcriptomic analyses. Our results led to identification of the precise genome location of the Rmc locus from which we identified the mutation by sequencing. The rmc phenotype results from a deletion that disrupts five predicted gene sequences, one of which has close sequence match to the CYCLOPS/IPD3 gene identified in legumes as an essential intracellular regulator of both AM and rhizobial symbioses. Identification of two other genes not located at the rmc locus but with altered expression in the rmc genotype is also described. Possible roles of the other four disrupted genes in the deleted region are discussed. Our results support the identification of CYCLOPS/IPD3 in legumes and rice as a key gene required for AM symbiosis. The extensive characterisation of rmc in comparison with its 'parent' 76R, which has a normal mycorrhizal phenotype, has validated these lines as an important comparative model for glasshouse and field studies of AM and non-mycorrhizal plants with respect to plant competition and microbial interactions with vascular plant roots. PMID:23572326

  18. A Guide to Films, Filmstrips, Maps and Globes, Records on Asia. [and] Supplement, Including a New Section on Slides.

    ERIC Educational Resources Information Center

    Bell, Violet M., Comp.; And Others

    This third edition bibliography identifies and annotates selected films, filmstrips, maps and globes, and records which will contribute to increased knowledge and understanding of Asian peoples and cultures. (Asia is defined as including all countries from Afghanistan to Japan). A separate supplement, designed to be used with the third edition,…

  19. A new technique for selective identification and mapping of enhancers within long genomic sequences.

    PubMed

    Chernov, Igor; Stukacheva, Elena; Akopov, Sergey; Didych, Dmitry; Nikolaev, Lev; Sverdlov, Eugene

    2008-05-01

    We report a new experimental method of direct selection, identification, and mapping of potential enhancer sequences within extended stretches of genomic DNA. The method allows simultaneous cloning of a quantity of sequences instead of tedious screening of the separate ones, thus providing a robust and high-throughput approach to the mapping of enhancers. The selection procedure is based on the ability of such sequences to activate a minimal promoter that drives expression of a selective gene. To this end a mixture of short DNA fragments derived from the segment of interest was cloned in a retroviral vector containing the neomycin phosphotransferase II gene under control of a cytomegalovirus (CMV) minimal promoter. The pool of retroviruses obtained was used to infect HeLa cells and then to select neomycin-resistant colonies containing constructs with enhancer-like sequences. The pool of the genomic fragments was rescued by PCR and cloned, forming a library of the potential enhancers. Fifteen enhancer-like fragments were selected from 1-Mb human genome locus, and enhancer activity of 13 of them was verified in a transient transfection reporter gene assay. The sequences selected were found to be predominantly located near 5' regions of genes or within gene introns. PMID:18476831

  20. Genome Sequencing of Four Strains of Rickettsia prowazekii, the Causative Agent of Epidemic Typhus, Including One Flying Squirrel Isolate.

    PubMed

    Bishop-Lilly, Kimberly A; Ge, Hong; Butani, Amy; Osborne, Brian; Verratti, Kathleen; Mokashi, Vishwesh; Nagarajan, Niranjan; Pop, Mihai; Read, Timothy D; Richards, Allen L

    2013-01-01

    Rickettsia prowazekii is a notable intracellular pathogen, the agent of epidemic typhus, and a potential biothreat agent. We present here whole-genome sequence data for four strains of R. prowazekii, including one from a flying squirrel. PMID:23814035

  1. A genetic map of melon highly enriched with fruit quality QTLs and EST markers, including sugar and carotenoid metabolism genes.

    PubMed

    Harel-Beja, R; Tzuri, G; Portnoy, V; Lotan-Pompan, M; Lev, S; Cohen, S; Dai, N; Yeselson, L; Meir, A; Libhaber, S E; Avisar, E; Melame, T; van Koert, P; Verbakel, H; Hofstede, R; Volpin, H; Oliver, M; Fougedoire, A; Stalh, C; Fauve, J; Copes, B; Fei, Z; Giovannoni, J; Ori, N; Lewinsohn, E; Sherman, A; Burger, J; Tadmor, Y; Schaffer, A A; Katzir, N

    2010-08-01

    A genetic map of melon enriched for fruit traits was constructed, using a recombinant inbred (RI) population developed from a cross between representatives of the two subspecies of Cucumis melo L.: PI 414723 (subspecies agrestis) and 'Dulce' (subspecies melo). Phenotyping of 99 RI lines was conducted over three seasons in two locations in Israel and the US. The map includes 668 DNA markers (386 SSRs, 76 SNPs, six INDELs and 200 AFLPs), of which 160 were newly developed from fruit ESTs. These ESTs include candidate genes encoding for enzymes of sugar and carotenoid metabolic pathways that were cloned from melon cDNA or identified through mining of the International Cucurbit Genomics Initiative database (http://www.icugi.org/). The map covers 1,222 cM with an average of 2.672 cM between markers. In addition, a skeleton physical map was initiated and 29 melon BACs harboring fruit ESTs were localized to the 12 linkage groups of the map. Altogether, 44 fruit QTLs were identified: 25 confirming QTLs described using other populations and 19 newly described QTLs. The map includes QTLs for fruit sugar content, particularly sucrose, the major sugar affecting sweetness in melon fruit. Six QTLs interacting in an additive manner account for nearly all the difference in sugar content between the two genotypes. Three QTLs for fruit flesh color and carotenoid content were identified. Interestingly, no clear colocalization of QTLs for either sugar or carotenoid content was observed with over 40 genes encoding for enzymes involved in their metabolism. The RI population described here provides a useful resource for further genomics and metabolomics studies in melon, as well as useful markers for breeding for fruit quality. PMID:20401460

  2. Heterozygous Mapping Strategy (HetMappS) for High Resolution Genotyping-By-Sequencing Markers: A Case Study in Grapevine

    PubMed Central

    Wang, Minghui; Londo, Jason P.; Acharya, Charlotte B.; Mitchell, Sharon E.; Sun, Qi; Reisch, Bruce; Cadle-Davidson, Lance

    2015-01-01

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low genotyping cost, but for highly heterozygous species, missing data and heterozygote undercalling complicate the creation of GBS genetic maps. To overcome these issues, we developed a publicly available, modular approach called HetMappS, which functions independently of parental genotypes and corrects for genotyping errors associated with heterozygosity. For linkage group formation, HetMappS includes both a reference-guided synteny pipeline and a reference-independent de novo pipeline. The de novo pipeline can be utilized for under-characterized or high diversity families that lack an appropriate reference. We applied both HetMappS pipelines in five half-sib F1 families involving genetically diverse Vitis spp. Starting with at least 116,466 putative SNPs per family, the HetMappS pipelines identified 10,440 to 17,267 phased pseudo-testcross (Pt) markers and generated high-confidence maps. Pt marker density exceeded crossover resolution in all cases; up to 5,560 non-redundant markers were used to generate parental maps ranging from 1,047 cM to 1,696 cM. The number of markers used was strongly correlated with family size in both de novo and synteny maps (r = 0.92 and 0.91, respectively). Comparisons between allele and tag frequencies suggested that many markers were in tandem repeats and mapped as single loci, while markers in regions of more than two repeats were removed during map curation. Both pipelines generated similar genetic maps, and genetic order was strongly correlated with the reference genome physical order in all cases. Independently created genetic maps from shared parents exhibited nearly identical results. Flower sex was mapped in three families and correctly localized to the known sex locus in all cases. The HetMappS pipeline could have wide application for genetic mapping in highly heterozygous species, and its modularity provides opportunities to

  3. Using genic sequence capture in combination with a syntenic pseudo genome to map a deletion mutant in a wheat species.

    PubMed

    Gardiner, Laura-Jayne; Gawroński, Piotr; Olohan, Lisa; Schnurbusch, Thorsten; Hall, Neil; Hall, Anthony

    2014-12-01

    Mapping-by-sequencing analyses have largely required a complete reference sequence and employed whole genome re-sequencing. In species such as wheat, no finished genome reference sequence is available. Additionally, because of its large genome size (17 Gb), re-sequencing at sufficient depth of coverage is not practical. Here, we extend the utility of mapping by sequencing, developing a bespoke pipeline and algorithm to map an early-flowering locus in einkorn wheat (Triticum monococcum L.) that is closely related to the bread wheat genome A progenitor. We have developed a genomic enrichment approach using the gene-rich regions of hexaploid bread wheat to design a 110-Mbp NimbleGen SeqCap EZ in solution capture probe set, representing the majority of genes in wheat. Here, we use the capture probe set to enrich and sequence an F2 mapping population of the mutant. The mutant locus was identified in T. monococcum, which lacks a complete genome reference sequence, by mapping the enriched data set onto pseudo-chromosomes derived from the capture probe target sequence, with a long-range order of genes based on synteny of wheat with Brachypodium distachyon. Using this approach we are able to map the region and identify a set of deleted genes within the interval. PMID:25205592

  4. High-throughput sequencing for 1-methyladenosine (m(1)A) mapping in RNA.

    PubMed

    Tserovski, Lyudmil; Marchand, Virginie; Hauenschild, Ralf; Blanloeil-Oillo, Florence; Helm, Mark; Motorin, Yuri

    2016-09-01

    Detection and mapping of modified nucleotides in RNAs is a difficult and laborious task. Several physico-chemical approaches based on differential properties of modified nucleotides can be used, however, most of these methods do not allow high-throughput analysis. Here we describe in details a method for mapping of rather common 1-methyladenosine (m(1)A) residues using high-throughput next generation sequencing (NGS). Since m(1)A residues block primer extension during reverse transcription (RT), the accumulation of abortive products as well as the nucleotide misincorporation can be detected in the sequencing data. The described library preparation protocol allows to capture both types of cDNA products essential for further bioinformatic analysis. We demonstrate that m(1)A residues produce characteristic arrest and mismatch rates and combination of both can be used for their detection as well as for discrimination of m(1)A from other modified A residues present in RNAs. PMID:26922842

  5. [Identification and mapping of cis-regulatory elements within long genomic sequences].

    PubMed

    Akopov, S B; Chernov, I P; Vetchinova, A S; Bulanenkova, S S; Nikolaev, L G

    2007-01-01

    The publication of the human and other metazoan genome sequences opened up the possibility for mapping and analysis of genomic regulatory elements. Unfortunately, experimental data on genomic positions of such sequences as enhancers, silencers, insulators, transcription terminators, and replication origins are very limited, especially at the whole genome level. As most genomic regulatory elements (e.g., enhancers) are generally gene-, tissue-, or cell-specific, the prediction of these elements in silico is often ambiguous. Therefore, the development of high-throughput experimental approaches for identification and mapping of genomic functional elements is highly desirable. In this review we discuss novel approaches to high-throughput experimental identification of mammalian genomes cis-regulatory elements which is a necessary step toward the complete genome annotation. PMID:18240562

  6. How children aged seven to twelve organize the opening sequence in a map task.

    PubMed

    Filipi, Anna

    2016-07-01

    Using the methods of conversation analysis, the opening sequences of a map task in the interactions of sixteen children aged seven to twelve were analyzed. The analytical concerns driving the study were who started, how they started, and how children dealt with differential access to information and the identification of phases within the opening. It was found that all participants oriented to the instruction-giver as the one to start, even when the information-follower commenced the task. With respect to how to start, the older children produced a question and answer sequence or a try-mark to establish a common starting point. Five of the eight younger children inferred a common starting point on the map. Three recurring phases were identified: readiness to begin established through a discourse marker, location of the starting point, and actual instruction. The findings are discussed with reference to the importance of interaction in referential spatial tasks. PMID:26144557

  7. Bacterial interspersed mosaic elements (BIMEs) are a major source of sequence polymorphism in Escherichia coli intergenic regions including specific associations with a new insertion sequence.

    PubMed

    Bachellier, S; Clément, J M; Hofnung, M; Gilson, E

    1997-03-01

    A significant fraction of Escherichia coli intergenic DNA sequences is composed of two families of repeated bacterial interspersed mosaic elements (BIME-1 and BIME-2). In this study, we determined the sequence organization of six intergenic regions in 51 E. coli and Shigella natural isolates. Each region contains a BIME in E. coli K-12. We found that multiple sequence variations are located within or near these BIMEs in the different bacteria. Events included excisions of a whole BIME-1, expansion/deletion within a BIME-2 and insertions of non-BIME sequences like the boxC repeat or a new IS element, named IS 1397. Remarkably, 14 out of IS 1397 integration sites correspond to a BIME sequence, strongly suggesting that this IS element is specifically associated with BIMEs, and thus inserts only in extragenic regions. Unlike BIMEs, IS 1397 is not detected in all E. coli isolates. Possible relationships between the presence of this IS element and the evolution of BIMEs are discussed. PMID:9055066

  8. Quantitative Trait Locus Mapping and Candidate Gene Analysis for Plant Architecture Traits Using Whole Genome Re-Sequencing in Rice

    PubMed Central

    Lim, Jung-Hyun; Yang, Hyun-Jung; Jung, Ki-Hong; Yoo, Soo-Cheul; Paek, Nam-Chon

    2014-01-01

    Plant breeders have focused on improving plant architecture as an effective means to increase crop yield. Here, we identify the main-effect quantitative trait loci (QTLs) for plant shape-related traits in rice (Oryza sativa) and find candidate genes by applying whole genome re-sequencing of two parental cultivars using next-generation sequencing. To identify QTLs influencing plant shape, we analyzed six traits: plant height, tiller number, panicle diameter, panicle length, flag leaf length, and flag leaf width. We performed QTL analysis with 178 F7 recombinant in-bred lines (RILs) from a cross of japonica rice line ‘SNUSG1’ and indica rice line ‘Milyang23’. Using 131 molecular markers, including 28 insertion/deletion markers, we identified 11 main- and 16 minor-effect QTLs for the six traits with a threshold LOD value > 2.8. Our sequence analysis identified fifty-four candidate genes for the main-effect QTLs. By further comparison of coding sequences and meta-expression profiles between japonica and indica rice varieties, we finally chose 15 strong candidate genes for the 11 main-effect QTLs. Our study shows that the whole-genome sequence data substantially enhanced the efficiency of polymorphic marker development for QTL fine-mapping and the identification of possible candidate genes. This yields useful genetic resources for breeding high-yielding rice cultivars with improved plant architecture. PMID:24599000

  9. Genetic Mapping and Exome Sequencing Identify Variants Associated with Five Novel Diseases

    PubMed Central

    Puffenberger, Erik G.; Jinks, Robert N.; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A.; Achilly, Nathan P.; Cassidy, Ryan P.; Fiorentini, Christopher J.; Heiken, Kory F.; Lawrence, Johnny J.; Mahoney, Molly H.; Miller, Christopher J.; Nair, Devika T.; Politi, Kristin A.; Worcester, Kimberly N.; Setton, Roni A.; DiPiazza, Rosa; Sherman, Eric A.; Eastman, James T.; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L.; Gabriel, Stacey; Morton, D. Holmes; Strauss, Kevin A.

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data. PMID:22279524

  10. Constructing a Genome-Wide LD Map of Wild A. gambiae Using Next-Generation Sequencing

    PubMed Central

    Wang, Xiaohong; Afrane, Yaw A.; Yan, Guiyun; Li, Jun

    2015-01-01

    Anopheles gambiae is the major malaria vector in Africa. Examining the molecular basis of A. gambiae traits requires knowledge of both genetic variation and genome-wide linkage disequilibrium (LD) map of wild A. gambiae populations from malaria-endemic areas. We sequenced the genomes of nine wild A. gambiae mosquitoes individually using next-generation sequencing technologies and detected 2,219,815 common single nucleotide polymorphisms (SNPs), 88% of which are novel. SNPs are not evenly distributed across A. gambiae chromosomes. The low SNP-frequency regions overlay heterochromatin and chromosome inversion domains, consistent with the lower recombinant rates at these regions. Nearly one million SNPs that were genotyped correctly in all individual mosquitoes with 99.6% confidence were extracted from these high-throughput sequencing data. Based on these SNP genotypes, we constructed a genome-wide LD map for wild A. gambiae from malaria-endemic areas in Kenya and made it available through a public Website. The average size of LD blocks is less than 40 bp, and several large LD blocks were also discovered clustered around the para gene, which is consistent with the effect of insecticide selective sweeps. The SNPs and the LD map will be valuable resources for scientific communities to dissect the A. gambiae genome. PMID:26421280

  11. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/. PMID:22669902

  12. Detection and mapping of amplified DNA sequences in breast cancer by comparative genomic hybridization

    SciTech Connect

    Kallioniemi, A.; Tanner, M.; Kallioniemi, O.P.; Piper, J.; Stokke, T.; Pinkel, D.; Gray, J.W.; Waldman, F.M.; Chen, L.; Smith, H.S.

    1994-03-15

    Comparative genomic hybridization was applied to 5 breast cancer cell lines and 33 primary tumors to discover and map regions of the genome with increased DNA-sequence copy-number. Two-thirds of primary tumors and almost all cell lines showed increased DNA-sequence copy-number affecting a total of 26 chromosomal subregions. Most of these loci were distinct from those of currently known amplified genes in breast cancer, with sequences originating from 17q22-q24 and 20q13 showing the highest frequency of amplification. The results indicate that these chromosomal regions may contain previously unknown genes whose increased expression contributes to breast cancer progression. Chromosomal regions with increased copy-number often spanned tens of Mb, suggesting involvement of more than one gene in each region.

  13. Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites.

    PubMed

    Serrao, Erik; Cherepanov, Peter; Engelman, Alan N

    2016-01-01

    Retroviruses exhibit signature integration preferences on both the local and global scales. Here, we present a detailed protocol for (1) generation of diverse libraries of retroviral integration sites using ligation-mediated PCR (LM-PCR) amplification and next-generation sequencing (NGS), (2) mapping the genomic location of each virus-host junction using BEDTools, and (3) analyzing the data for statistical relevance. Genomic DNA extracted from infected cells is fragmented by digestion with restriction enzymes or by sonication. After suitable DNA end-repair, double-stranded linkers are ligated onto the DNA ends, and semi-nested PCR is conducted using primers complementary to both the long terminal repeat (LTR) end of the virus and the ligated linker DNA. The PCR primers carry sequences required for DNA clustering during NGS, negating the requirement for separate adapter ligation. Quality control (QC) is conducted to assess DNA fragment size distribution and adapter DNA incorporation prior to NGS. Sequence output files are filtered for LTR-containing reads, and the sequences defining the LTR and the linker are cropped away. Trimmed host cell sequences are mapped to a reference genome using BLAT and are filtered for minimally 97% identity to a unique point in the reference genome. Unique integration sites are scrutinized for adjacent nucleotide (nt) sequence and distribution relative to various genomic features. Using this protocol, integration site libraries of high complexity can be constructed from genomic DNA in three days. The entire protocol that encompasses exogenous viral infection of susceptible tissue culture cells to integration site analysis can therefore be conducted in approximately one to two weeks. Recent applications of this technology pertain to longitudinal analysis of integration sites from HIV-infected patients. PMID:27023428

  14. L-asparaginase II of Escherichia coli K-12: cloning, mapping and sequencing of the ansB gene.

    PubMed

    Bonthron, D T

    1990-07-01

    The Escherichia coli gene ansB, encoding the chemotherapeutic enzyme L-asparaginase II, has been cloned, using a strategy based on the polymerase chain reaction, and sequenced. The amino acid (aa) sequence differs in eleven positions from the data previously derived by direct aa sequencing. A cleavable secretory signal peptide precedes the N terminus of the mature protein. The ansB gene maps to position 3114 kb on the physical map of E. coli [Kohara et al., Cell 50 (1987) 495-508], corresponding to approx. 63.8 min on the genetic map. PMID:2144836

  15. Misassembly detection using paired-end sequence reads and optical mapping data

    PubMed Central

    Muggli, Martin D.; Puglisi, Simon J.; Ronen, Roy; Boucher, Christina

    2015-01-01

    Motivation: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data. Our method also fulfills the critical need for open source computational methods for analyzing optical mapping data. We apply our method to various assemblies of the loblolly pine, Francisella tularensis, rice and budgerigar genomes. We generated and used stimulated optical mapping data for loblolly pine and F.tularensis and used real optical mapping data for rice and budgerigar. Results: Our results demonstrate that we detect more than 54% of extensively misassembled contigs and more than 60% of locally misassembled contigs in assemblies of F.tularensis and between 31% and 100% of extensively misassembled contigs and between 57% and 73% of locally misassembled contigs in assemblies of loblolly pine. Using the real optical mapping data, we correctly identified 75% of extensively misassembled contigs and 100% of locally misassembled contigs in rice, and 77% of extensively misassembled contigs and 80% of locally misassembled contigs in budgerigar. Availability and implementation: misSEQuel can be used as a post-processing step in combination with any genome assembler and is freely available at http://www.cs.colostate.edu/seq/. Contact: muggli@cs.colostate.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072512

  16. Computational Prediction and Experimental Verification of New MAP Kinase Docking Sites and Substrates Including Gli Transcription Factors

    PubMed Central

    Whisenant, Thomas C.; Ho, David T.; Benz, Ryan W.; Rogers, Jeffrey S.; Kaake, Robyn M.; Gordon, Elizabeth A.; Huang, Lan; Baldi, Pierre; Bardwell, Lee

    2010-01-01

    In order to fully understand protein kinase networks, new methods are needed to identify regulators and substrates of kinases, especially for weakly expressed proteins. Here we have developed a hybrid computational search algorithm that combines machine learning and expert knowledge to identify kinase docking sites, and used this algorithm to search the human genome for novel MAP kinase substrates and regulators focused on the JNK family of MAP kinases. Predictions were tested by peptide array followed by rigorous biochemical verification with in vitro binding and kinase assays on wild-type and mutant proteins. Using this procedure, we found new ‘D-site’ class docking sites in previously known JNK substrates (hnRNP-K, PPM1J/PP2Czeta), as well as new JNK-interacting proteins (MLL4, NEIL1). Finally, we identified new D-site-dependent MAPK substrates, including the hedgehog-regulated transcription factors Gli1 and Gli3, suggesting that a direct connection between MAP kinase and hedgehog signaling may occur at the level of these key regulators. These results demonstrate that a genome-wide search for MAP kinase docking sites can be used to find new docking sites and substrates. PMID:20865152

  17. A third-generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map

    PubMed Central

    Solignac, Michel; Mougel, Florence; Vautrin, Dominique; Monnerot, Monique; Cornuet, Jean-Marie

    2007-01-01

    Background: The honey bee is a key model for social behavior and this feature led to the selection of the species for genome sequencing. A genetic map is a necessary companion to the sequence. In addition, because there was originally no physical map for the honey bee genome project, a meiotic map was the only resource for organizing the sequence assembly on the chromosomes. Results: We present the genetic (meiotic) map here and describe the main features that emerged from comparison with the sequence-based physical map. The genetic map of the honey bee is saturated and the chromosomes are oriented from the centromeric to the telomeric regions. The map is based on 2,008 markers and is about 40 Morgans (M) long, resulting in a marker density of one every 2.05 centiMorgans (cM). For the 186 megabases (Mb) of the genome mapped and assembled, this corresponds to a very high average recombination rate of 22.04 cM/Mb. Honey bee meiosis shows a relatively homogeneous recombination rate along and across chromosomes, as well as within and between individuals. Interference is higher than inferred from the Kosambi function of distance. In addition, numerous recombination hotspots are dispersed over the genome. Conclusion: The very large genetic length of the honey bee genome, its small physical size and an almost complete genome sequence with a relatively low number of genes suggest a very promising future for association mapping in the honey bee, particularly as the existence of haploid males allows easy bulk segregant analysis. PMID:17459148

  18. Sequencing and mapping hemoglobin gene clusters in the australian model dasyurid marsupial sminthopsis macroura

    SciTech Connect

    De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.

    2004-07-26

    Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.

  19. Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach

    PubMed Central

    Hahn, Christoph; Bachmann, Lutz; Chevreux, Bastien

    2013-01-01

    We present an in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data—mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling (Thymallus thymallus) and Rainbow trout (Oncorhynchus mykiss). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in <24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/chrishah/MITObim. PMID:23661685

  20. The Amino Acid Alphabet and the Architecture of the Protein Sequence-Structure Map. I. Binary Alphabets

    PubMed Central

    Ferrada, Evandro

    2014-01-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  1. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

    PubMed

    Ferrada, Evandro

    2014-12-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  2. YAC contig and cell hybrid mapping of six expressed sequences encoded by human chromosome 21

    SciTech Connect

    Yu, J.; Cox, M.; Patterson, D.

    1994-09-01

    The candidate gene approach for positional cloning requires a sufficient number of expressed gene sequences from the chromosomal region of interest. Trisomy for human chromosome 21 results in Down syndrome (DS). However, only a limited number of genes on chromosome 21 have been identified and cloned. We used 1,000 single-copy microclones from a microdissection library of chromosome 21 to screen various cDNA libraries and isolated 9 cDNA clones, of which 6 contain unique sequences: 21E-C1, C3, C4, C5, C7, C10. Using a refined regional mapping panel of chromosome 21 which comprised 24 cell hybrids and divided the chromosome into 33 subregions, we assigned 21E-C1 and C7 to subregion No. 22 (distal q22.1), 21E-C3 to No. 25 (proximal q22.2), 21E-C4 to No. 23 (very distal q22.1), 21E-C5 to No. 31 (proximal q22.3), and 21E-C10 to No. 28 (middle q22.2). In addition, we identified YAC clones corresponding to these cDNA clones using the complete YAC contig spanning the entire chromosome 21q. On the average, 10 positive YAC clones were identified for each cDNA. The mapping positions for the 6 cDNAs determined by the STSs in the YAC contig agree well with the cytogenetic map constructed by the hybrid panel. These cDNA clones with refined mapping positions on chromosome 21 should be useful as candidate genes for the specific component phenotypes of DS assigned to the region.

  3. Mapping and sequencing the human genome: Science, ethics, and public policy. Final report

    SciTech Connect

    McInerney, J.D.

    1993-03-31

    Development of Mapping and Sequencing the Human Genome: Science, Ethics, and Public Policy followed the standard process of curriculum development at the Biological Sciences Curriculum Study (BSCS), the process is described. The production of this module was a collaborative effort between BSCS and the American Medical Association (AMA). Appendix A contains a copy of the module. Copies of reports sent to the Department of Energy (DOE) during the development process are contained in Appendix B; all reports should be on file at DOE. Appendix B also contains copies of status reports submitted to the BSCS Board of Directors.

  4. Small genomes: New initiatives in mapping and sequencing. Workshop summary report

    SciTech Connect

    McKenney, K.; Robb, F.

    1993-12-31

    The workshop was held 5--7 July 1993 at the Center for Advanced Research in Biotechnology (CARB) and hosted by the University of Maryland Biotechnology Institute (UMBI) and the National Institute of Standards and Technology (NIST). The objective of this workshop was to bring together individuals interested in DNA technologies and to determine the impact of these current and potential improvements of the speed and cost-effectiveness of mapping and sequencing on the planning of future small genome projects. A major goal of the workshop was to spur the collaboration of more diverse groups of scientists working on this topic, and to minimize competitiveness as an inhibitory factor to progress.

  5. Integrable maps from Galois differential algebras, Borel transforms and number sequences

    NASA Astrophysics Data System (ADS)

    Tempesta, Piergiulio

    A new class of integrable maps, obtained as lattice versions of polynomial dynamical systems is introduced. These systems are obtained by means of a discretization procedure that preserves several analytic and algebraic properties of a given differential equation, in particular symmetries and integrability (see Tempesta, 2010 [40]). Our approach is based on the properties of a suitable Galois differential algebra, that we shall call a Rota algebra. A formulation of the procedure in terms of category theory is proposed. In order to render the lattice dynamics confined, a Borel regularization is also adopted. As a byproduct of the theory, a connection between number sequences and integrability is discussed.

  6. Heterozygous mapping strategy (HetMapps)for high resolution genotyping-by-sequencing markers: a case study in grapevine

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low per-sample genotyping cost, but missing data and under-calling of heterozygotes complicate the creation of GBS linkage maps for highly heterozygous species. To overcome these issues, we developed ...

  7. Genetic linkage map of Chinese native variety faba bean (Vicia faba L.) based on simple sequence repeat markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple sequence repeat (SSR) marker is a powerful tool for construction of genetic linkage map which can be applied for locating quantitative trait loci (QTL) and marker-assisted selection (MAS). In this study, a genetic map of faba bean was constructed with SSR markers using a population of 129 F2 ...

  8. Large-scale metagenomic sequence clustering on map-reduce clusters.

    PubMed

    Yang, Xiao; Zola, Jaroslaw; Aluru, Srinivas

    2013-02-01

    Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. In this paper, we present a parallel algorithm for taxonomic clustering of large metagenomic samples with support for overlapping clusters. We develop sketching techniques, akin to those created for web document clustering, to deduce significant similarities between pairs of sequences without resorting to expensive all vs. all comparison. We formulate the metagenomic classification problem as that of maximal quasi-clique enumeration in the resulting similarity graph, at multiple levels of the hierarchy as prescribed by different similarity thresholds. We cast execution of the underlying algorithmic steps as applications of the map-reduce framework to achieve a cloud ready implementation. We show that the resulting framework can produce high quality clustering of metagenomic samples consisting of millions of reads, in reasonable time limits, when executed on a modest size cluster. PMID:23427983

  9. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-12-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 {sup 32}P- or {sup 33}P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  10. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    SciTech Connect

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-01-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 [sup 32]P- or [sup 33]P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  11. dcp gene of Escherichia coli: cloning, sequencing, transcript mapping, and characterization of the gene product.

    PubMed Central

    Henrich, B; Becker, S; Schroeder, U; Plapp, R

    1993-01-01

    Dipeptidyl carboxypeptidase is a C-terminal exopeptidase of Escherichia coli. We have isolated the respective gene, dcp, from a low-copy-number plasmid library by its ability to complement a dcp mutation preventing the utilization of the unique substrate N-benzoyl-L-glycyl-L-histidyl-L-leucine. Sequence analysis of a 2.9-kb DNA fragment revealed an open reading frame of 2,043 nucleotides which was assigned to the dcp gene by N-terminal amino acid sequencing and electrophoretic molecular mass determination of the purified dcp product. Transcript mapping by primer extension and S1 protection experiments verified the physiological significance of potential initiation and termination signals for dcp transcription and allowed the identification of a single species of monocistronic dcp mRNA. The codon usage pattern and the effects of elevated gene copy number indicated a relatively low level of dcp expression. The predicted amino acid sequence of dipeptidyl carboxypeptidase, containing a potential zinc-binding site, is highly homologous (78.8%) to the corresponding enzyme from Salmonella typhimurium. It also displays significant homology to the products of the S. typhimurium opdA and the E. coli prlC genes and to some metalloproteases from rats and Saccharomyces cerevisiae. No potential export signals could be inferred from the amino acid sequence. Dipeptidyl carboxypeptidase was enriched 80-fold from crude extracts of E. coli and used to investigate some of its biochemical and biophysical properties. Images PMID:8226676

  12. Linking the human cytogenetic map with nucleotide sequence: the CCAP clone set.

    PubMed

    Jang, Wonhee; Yonescu, Raluca; Knutsen, Turid; Brown, Theresa; Reppert, Tricia; Sirotkin, Karl; Schuler, Gregory D; Ried, Thomas; Kirsch, Ilan R

    2006-07-15

    We present the completed dataset and clone repository of the Cancer Chromosome Aberration Project (CCAP), an initiative developed and funded through the intramural program of the U.S. National Cancer Institute, to provide seamless linkage of human cytogenetic markers with the primary nucleotide sequence of the human genome. Spaced at 1-2 Mb intervals across the human genome, 1,339 bacterial artificial chromosome (BAC) clones have been localized to chromosomal bands through high-resolution fluorescence in situ hybridization (FISH) mapping. Of these clones, 99.8% can be positioned on the primary human genome sequence and 95% are placed at or close to their precise nucleotide starts and stops. This dataset can be studied and manipulated within generally available public Web sites. The clones are available from a commercial repository. The CCAP BAC clone set provides anchors for the interrogation of gene and sequence involvement in oncogenic and developmental disorders when the starting point is the recognition of a structural, numerical, or interstitial chromosomal aberration. This dataset also provides a current view of the quality and coherence of the available genome sequence and insight into the nucleotide and three-dimensional structures that manifest as Giemsa light and dark chromosomal banding patterns. PMID:16843097

  13. Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

    PubMed Central

    2012-01-01

    Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993

  14. Progress towards the construction of a sequence-ready physical map of the 3AS chromosome arm of hexaploid wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The large genome size (~17 Gb), polyploid nature, and repetitive sequence content (>90%) of hexaploid wheat (Triticum aestivum) present a challenge for constructing physical maps, which are fundamental resources to aid genomic sequencing and annotation, and gene cloning. One approach to reduce the c...

  15. Geologic map of southwestern Sequoia National Park and vicinity, Tulare County, California, including the Mineral King metamorphic pendant

    NASA Astrophysics Data System (ADS)

    Sisson, T. W.; Moore, J. G.

    2012-12-01

    From the late 1940s to the early 1990s, scientists of the U.S. Geological Survey (USGS) mapped the geology of most of Sequoia and Kings Canyon National Parks, California, and published the results as a series of 15-minute (1:62,500 scale) Geologic Quadrangles. The southwest corner of Sequoia National Park, encompassing the Mineral King and eastern edge of the Kaweah 15-minute topographic quadrangles, however, remained unfinished. At the request of the National Park Service's Geologic Resources Division (NPS-GRD), the USGS has mapped the geology of that area using 7.5-minute (1:24,000 scale) topographic bases and high-resolution ortho-imagery. With partial support from NPS-GRD, the major plutons in the map area were dated by the U-Pb zircon method with the Stanford-USGS SHRIMP-RG ion microprobe. Highlights include: (1) Identification of the Early Cretaceous volcano-plutonic suite of Mineral King (informally named), consisting of three deformed granodiorite plutons and the major metarhyolite tuffs of the Mineral King metamorphic pendant. Members of the suite erupted or intruded at 130-140 Ma (pluton ages: this study; rhyolite ages: lower-intercept concordia from zircon results of Busby-Spera, 1983, Princeton Ph.D. thesis, and from Klemetti et al., 2011, AGU abstract) during the pause of igneous activity between emplacement of the Jurassic and Cretaceous Sierran batholiths. (2) Some of the deformation of the Mineral King metamorphic pendant is demonstrably Cretaceous, with evidence including map-scale folding of Early Cretaceous metarhyolite tuff, and an isoclinally folded aplite dike dated at 98 Ma, concurrent with the large 98-Ma granodiorite of Castle Creek that intruded the Mineral King pendant on the west. (3) A 21-km-long magmatic synform within the 99-100 Ma granite of Coyote Pass that is defined both by inward-dipping mafic inclusions (enclaves) and by sporadic, cm-thick, sharply defined mineral layering. The west margin of the granite of Coyote Pass overlies

  16. WebGMAP: a web service for mapping and aligning cDNA sequences to genomes

    PubMed Central

    Liang, Chun; Liu, Lin; Ji, Guoli

    2009-01-01

    The genomes of thousands of organisms are being sequenced, often with accompanying sequences of cDNAs or ESTs. One of the great challenges in bioinformatics is to make these genomic sequences and genome annotations accessible in a user-friendly manner to general biologists to address interesting biological questions. We have created an open-access web service called WebGMAP (http://www.bioinfolab.org/software/webgmap) that seamlessly integrates cDNA-genome alignment tools, such as GMAP, with easy-to-use data visualization and mining tools. This web service is intended to facilitate community efforts in improving genome annotation, determining accurate gene structures and their variations, and exploring important biological processes such as alternative splicing and alternative polyadenylation. For routine sequence analysis, WebGMAP provides a web-based sequence viewer with many useful functions, including nucleotide positioning, six-frame translations, sequence reverse complementation, and imperfect motif detection and alignment. WebGMAP also provides users with the ability to sort, filter and search for individual cDNA sequences and cDNA-genome alignments. Our EST-Genome-Browser can display annotated gene structures and cDNA-genome alignments at scales from 100 to 50 000 nt. With its ability to highlight base differences between query cDNAs and the genome, our EST-Genome-Browser allows biologists to discover potential point or insertion-deletion variations from cDNA-genome alignments. PMID:19465381

  17. Mapping the Pheno-Structure of Didactic Sequences. Didakometry No. 1.

    ERIC Educational Resources Information Center

    Bjerstedt, Ake

    A framework is provided for the evaluation of self instructional materials before the materials are ready for field testing. Several aids are offered to assist the development of an evaluation model, including: checklists, maps of relations between terminal objectives and single didactic units, and unit charting protocols. Checklist questions…

  18. High resolution radiation hybrid maps of bovine chromosomes 19 and 29: comparison with the bovine genome sequence assembly

    PubMed Central

    Prasad, Aparna; Schiex, Thomas; McKay, Stephanie; Murdoch, Brenda; Wang, Zhiquan; Womack, James E; Stothard, Paul; Moore, Stephen S

    2007-01-01

    Background High resolution radiation hybrid (RH) maps can facilitate genome sequence assembly by correctly ordering genes and genetic markers along chromosomes. The objective of the present study was to generate high resolution RH maps of bovine chromosomes 19 (BTA19) and 29 (BTA29), and compare them with the current 7.1X bovine genome sequence assembly (bovine build 3.1). We have chosen BTA19 and 29 as candidate chromosomes for mapping, since many Quantitative Trait Loci (QTL) for the traits of carcass merit and residual feed intake have been identified on these chromosomes. Results We have constructed high resolution maps of BTA19 and BTA29 consisting of 555 and 253 Single Nucleotide Polymorphism (SNP) markers respectively using a 12,000 rad whole genome RH panel. With these markers, the RH map of BTA19 and BTA29 extended to 4591.4 cR and 2884.1 cR in length respectively. When aligned with the current bovine build 3.1, the order of markers on the RH map for BTA19 and 29 showed inconsistencies with respect to the genome assembly. Maps of both the chromosomes show that there is a significant internal rearrangement of the markers involving displacement, inversion and flips within the scaffolds with some scaffolds being misplaced in the genome assembly. We also constructed cattle-human comparative maps of these chromosomes which showed an overall agreement with the comparative maps published previously. However, minor discrepancies in the orientation of few homologous synteny blocks were observed. Conclusion The high resolution maps of BTA19 (average 1 locus/139 kb) and BTA29 (average 1 locus/208 kb) presented in this study suggest that by the incorporation of RH mapping information, the current bovine genome sequence assembly can be significantly improved. Furthermore, these maps can serve as a potential resource for fine mapping QTL and identification of causative mutations underlying QTL for economically important traits. PMID:17784962

  19. Ordered shotgun sequencing of a 135 kb Xq25 YAC containing ANT2 and four possible genes, including three confirmed by EST matches.

    PubMed Central

    Chen, C N; Su, Y; Baybayan, P; Siruno, A; Nagaraja, R; Mazzarella, R; Schlessinger, D; Chen, E

    1996-01-01

    Ordered shotgun sequencing (OSS) has been successfully carried out with an Xq25 YAC substrate. yWXD703 DNA was subcloned into lambda phage and sequences of insert ends of the lambda subclones were used to generate a map to select a minimum tiling path of clones to be completely sequenced. The sequence of 135 038 nt contains the entire ANT2 cDNA as well as four other candidates suggested by computer-assisted analyses. One of the putative genes is homologous to a gene implicated in Graves' disease and it, ANT2 and two others are confirmed by EST matches. The results suggest that OSS can be applied to YACs in accord with earlier simulations and further indicate that the sequence of the YAC accurately reflects the sequence of uncloned human DNA. PMID:8918809

  20. Whole Genome Profiling provides a robust framework for physical mapping and sequencing in the highly complex and repetitive wheat genome

    PubMed Central

    2012-01-01

    Background Sequencing projects using a clone-by-clone approach require the availability of a robust physical map. The SNaPshot technology, based on pair-wise comparisons of restriction fragments sizes, has been used recently to build the first physical map of a wheat chromosome and to complete the maize physical map. However, restriction fragments sizes shared randomly between two non-overlapping BACs often lead to chimerical contigs and mis-assembled BACs in such large and repetitive genomes. Whole Genome Profiling (WGP™) was developed recently as a new sequence-based physical mapping technology and has the potential to limit this problem. Results A subset of the wheat 3B chromosome BAC library covering 230 Mb was used to establish a WGP physical map and to compare it to a map obtained with the SNaPshot technology. We first adapted the WGP-based assembly methodology to cope with the complexity of the wheat genome. Then, the results showed that the WGP map covers the same length than the SNaPshot map but with 30% less contigs and, more importantly with 3.5 times less mis-assembled BACs. Finally, we evaluated the benefit of integrating WGP tags in different sequence assemblies obtained after Roche/454 sequencing of BAC pools. We showed that while WGP tag integration improves assemblies performed with unpaired reads and with paired-end reads at low coverage, it does not significantly improve sequence assemblies performed at high coverage (25x) with paired-end reads. Conclusions Our results demonstrate that, with a suitable assembly methodology, WGP builds more robust physical maps than the SNaPshot technology in wheat and that WGP can be adapted to any genome. Moreover, WGP tag integration in sequence assemblies improves low quality assembly. However, to achieve a high quality draft sequence assembly, a sequencing depth of 25x paired-end reads is required, at which point WGP tag integration does not provide additional scaffolding value. Finally, we suggest that WGP

  1. MapMyFlu: visualizing spatio-temporal relationships between related influenza sequences.

    PubMed

    Nolte, Nicholas; Kurzawa, Nils; Eils, Roland; Herrmann, Carl

    2015-07-01

    Understanding the molecular dynamics of viral spreading is crucial for anticipating the epidemiological implications of disease outbreaks. In the case of influenza, reassortments or point mutations affect the adaption to new hosts or resistance to anti-viral drugs and can determine whether a new strain will result in a pandemic infection or a less severe progression. To this end, tools integrating molecular information with epidemiological parameters are important to understand how molecular characteristics reflect in the infection dynamics. We present a new web tool, MapMyFlu, which allows to spatially and temporally display influenza viruses related to a query sequence on a Google Map based on BLAST results against the NCBI Influenza Database. Temporal and geographical trends appear clearly and may help in reconstructing the evolutionary history of a particular sequence. The tool is accessible through a web server, hence without the need for local installation. The website has an intuitive design and provides an easy-to-use service, and is available at http://mapmyflu.ipmb.uni-heidelberg.de. PMID:25940623

  2. Information on a Major New Initiative: Mapping and Sequencing the Human Genome (1986 DOE Memorandum)

    DOE R&D Accomplishments Database

    DeLisi, Charles (Associate Director, Health and Environmental Research, DOE Office of Energy Research)

    1986-05-06

    In the history of the Human Genome Program, Dr. Charles DeLisi and Dr. Alvin Trivelpiece of the Department of Energy (DOE) were instrumental in moving the seeds of the program forward. This May 1986 memo from DeLisi to Trivelpiece, Director of DOE's Office of Energy Research, documents this fact. Following the March 1986 Santa Fe workshop on the subject of mapping and sequencing the human genome, DeLisi's memo outlines workshop conclusions, explains the relevance of this project to DOE and the importance of the Department's laboratories and capabilities, notes the critical experience of DOE in managing projects of this scale and potential magnitude, and recognizes the fact that the project will impact biomedical science in ways which could not be fully anticipated at the time. Subsequently, program guidance was further sought from the DOE Health Effects Research Advisory Committee (HERAC) and the April 1987 HERAC report recommended that DOE and the nation commit to a large, multidisciplinary, scientific and technological undertaking to map and sequence the human genome.

  3. MapMyFlu: visualizing spatio-temporal relationships between related influenza sequences

    PubMed Central

    Nolte, Nicholas; Kurzawa, Nils; Eils, Roland; Herrmann, Carl

    2015-01-01

    Understanding the molecular dynamics of viral spreading is crucial for anticipating the epidemiological implications of disease outbreaks. In the case of influenza, reassortments or point mutations affect the adaption to new hosts or resistance to anti-viral drugs and can determine whether a new strain will result in a pandemic infection or a less severe progression. To this end, tools integrating molecular information with epidemiological parameters are important to understand how molecular characteristics reflect in the infection dynamics. We present a new web tool, MapMyFlu, which allows to spatially and temporally display influenza viruses related to a query sequence on a Google Map based on BLAST results against the NCBI Influenza Database. Temporal and geographical trends appear clearly and may help in reconstructing the evolutionary history of a particular sequence. The tool is accessible through a web server, hence without the need for local installation. The website has an intuitive design and provides an easy-to-use service, and is available at http://mapmyflu.ipmb.uni-heidelberg.de PMID:25940623

  4. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  5. A High-Density SNP Map of Sunflower Derived from RAD-Sequencing Facilitating Fine-Mapping of the Rust Resistance Gene R12

    PubMed Central

    Talukder, Zahirul I.; Gong, Li; Hulke, Brent S.; Pegadaraju, Venkatramana; Song, Qijian; Schultz, Quentin; Qi, Lili

    2014-01-01

    A high-resolution genetic map of sunflower was constructed by integrating SNP data from three F2 mapping populations (HA 89/RHA 464, B-line/RHA 464, and CR 29/RHA 468). The consensus map spanned a total length of 1443.84 cM, and consisted of 5,019 SNP markers derived from RAD tag sequencing and 118 publicly available SSR markers distributed in 17 linkage groups, corresponding to the haploid chromosome number of sunflower. The maximum interval between markers in the consensus map is 12.37 cM and the average distance is 0.28 cM between adjacent markers. Despite a few short-distance inversions in marker order, the consensus map showed high levels of collinearity among individual maps with an average Spearman's rank correlation coefficient of 0.972 across the genome. The order of the SSR markers on the consensus map was also in agreement with the order of the individual map and with previously published sunflower maps. Three individual and one consensus maps revealed the uneven distribution of markers across the genome. Additionally, we performed fine mapping and marker validation of the rust resistance gene R12, providing closely linked SNP markers for marker-assisted selection of this gene in sunflower breeding programs. This high resolution consensus map will serve as a valuable tool to the sunflower community for studying marker-trait association of important agronomic traits, marker assisted breeding, map-based gene cloning, and comparative mapping. PMID:25014030

  6. Including Faults Detected By Near-Surface Seismic Methods in the USGS National Seismic Hazard Maps - Some Restrictions Apply

    NASA Astrophysics Data System (ADS)

    Williams, R. A.; Haller, K. M.

    2014-12-01

    Every 6 years, the USGS updates the National Seismic Hazard Maps (new version released July 2014) that are intended to help society reduce risk from earthquakes. These maps affect hundreds of billions of dollars in construction costs each year as they are used to develop seismic-design criteria of buildings, bridges, highways, railroads, and provide data for risk assessment that help determine insurance rates. Seismic source characterization, an essential component of hazard model development, ranges from detailed trench excavations across faults at the ground surface to less detailed analysis of broad regions defined mainly on the basis of historical seismicity. Though it is a priority for the USGS to discover new Quaternary fault sources, the discovered faults only become a part of the hazard model if there are corresponding constraints on their geometry (length and depth extent) and slip-rate (or recurrence interval). When combined with fault geometry and slip-rate constraints, near-surface seismic studies that detect young (Quaternary) faults have become important parts of the hazard source model. Examples of seismic imaging studies with significant hazard impact include the Southern Whidbey Island fault, Washington; Santa Monica fault, San Andreas fault, and Palos Verdes fault zone, California; and Commerce fault, Missouri. There are many more faults in the hazard model in the western U.S. than in the expansive region east of the Rocky Mountains due to the higher rate of tectonic deformation, frequent surface-rupturing earthquakes and, in some cases, lower erosion rates. However, the recent increase in earthquakes in the central U.S. has revealed previously unknown faults for which we need additional constraints before we can include them in the seismic hazard maps. Some of these new faults may be opportunities for seismic imaging studies to provide basic data on location, dip, style of faulting, and recurrence.

  7. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes.

    PubMed

    Kalbfleisch, Ted; Heaton, Michael P

    2013-01-01

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1).  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene

  8. Crop Type Mapping from a Sequence of Terrasar-X Images with Dynamic Conditional Random Fields

    NASA Astrophysics Data System (ADS)

    Kenduiywo, B. K.; Bargiel, D.; Soergel, U.

    2016-06-01

    Crop phenology is dynamic as it changes with times of the year. Such biophysical processes also look spectrally different to remote sensing satellites. Some crops may depict similar spectral properties if their phenology coincide, but differ later when their phenology diverge. Thus, conventional approaches that select only images from phenological stages where crops are distinguishable for classification, have low discrimination. In contrast, stacking images within a cropping season limits discrimination to a single feature space that can suffer from overlapping classes. Since crop backscatter varies with time, it can aid discrimination. Therefore, our main objective is to develop a crop sequence classification method using multitemporal TerraSAR-X images. We adopt first order markov assumption in undirected temporal graph sequence. This property is exploited to implement Dynamic Conditional Random Fields (DCRFs). Our DCRFs model has a repeated structure of temporally connected Conditional Random Fields (CRFs). Each node in the sequence is connected to its predecessor via conditional probability matrix. The matrix is computed using posterior class probabilities from association potential. This way, there is a mutual temporal exchange of phenological information observed in TerraSAR-X images. When compared to independent epoch classification, the designed DCRF model improved crop discrimination at each epoch in the sequence. However, government, insurers, agricultural market traders and other stakeholders are interested in the quantity of a certain crop in a season. Therefore, we further develop a DCRF ensemble classifier. The ensemble produces an optimal crop map by maximizing over posterior class probabilities selected from the sequence based on maximum F1-score and weighted by correctness. Our ensemble technique is compared to standard approach of stacking all images as bands for classification using Maximum Likelihood Classifier (MLC) and standard CRFs. It

  9. B1 Mapping of Short T2* Spins Using a 3D Radial Gradient Echo Sequence

    PubMed Central

    Kobayashi, Naoharu; Garwood, Michael

    2014-01-01

    Purpose To develop a method to acquire a radiofrequency (B1) field map when the signal has a short T2*. Theory and Methods The method is based on the actual flip angle imaging (AFI) technique and a radial 3D gradient-echo sequence known as COncurrent Dephasing and Excitation (CODE) which preserves short T2* signals. CODE was implemented with Gradient-modulated Offset-Independent Adiabaticity (GOIA) pulses to obtain high estimation sensitivity with AFI. The correlation method, that removes the quadratic phase from the frequency-modulated pulse excitation, was modified to handle gradient-modulated pulses. Validity of the modified correlation procedure was tested by Bloch simulations. CODE experiments with sinc, hyperbolic secant, and GOIA pulses were performed in order to see effects from the frequency- and gradient-modulation. Finally, GOIA-CODE AFI was conducted and compared with conventional AFI with 3D GRE. Results The modified correlation method developed to accommodate frequency- and gradient-modulations of GOIA performed well as judged by the minimal impact on reconstructed image quality. GOIA-CODE AFI provided flip angle maps consistent with those measured by GRE AFI when the T2* was long (> 2 ms) and continued to perform well for short T2* signals. Conclusion The proposed technique provides a means to obtain a 3D B1 field map when imaging spins with short T2*. PMID:23754634

  10. A Comprehensive Genome-Wide Map of Autonomously Replicating Sequences in a Naive Genome

    PubMed Central

    Liachko, Ivan; Bhaskar, Anand; Lee, Chanmi; Chung, Shau Chee Claire

    2010-01-01

    Eukaryotic chromosomes initiate DNA synthesis from multiple replication origins. The machinery that initiates DNA synthesis is highly conserved, but the sites where the replication initiation proteins bind have diverged significantly. Functional comparative genomics is an obvious approach to study the evolution of replication origins. However, to date, the Saccharomyces cerevisiae replication origin map is the only genome map available. Using an iterative approach that combines computational prediction and functional validation, we have generated a high-resolution genome-wide map of DNA replication origins in Kluyveromyces lactis. Unlike other yeasts or metazoans, K. lactis autonomously replicating sequences (KlARSs) contain a 50 bp consensus motif suggestive of a dimeric structure. This motif is necessary and largely sufficient for initiation and was used to dependably identify 145 of the up to 156 non-repetitive intergenic ARSs projected for the K. lactis genome. Though similar in genome sizes, K. lactis has half as many ARSs as its distant relative S. cerevisiae. Comparative genomic analysis shows that ARSs in K. lactis and S. cerevisiae preferentially localize to non-syntenic intergenic regions, linking ARSs with loci of accelerated evolutionary change. PMID:20485513

  11. Isolation and refined regional mapping of expressed sequences from human chromosome 21

    SciTech Connect

    Kao, F.T.; Yu, J.; Patterson, D.

    1994-10-01

    To increase candidate genes from human chromosome 21 for the analysis of Down syndrome and other genetic diseases localized on this chromosome, we have isolated and studied 9 cDNA clones encoded by chromosome 21. For isolating cDNAs, single-copy microclones from a chromosome 21 microdissection library were used in direct screening of various cDNA libraries. Seven of the cDNA clones have been regionally mapped on chromosome 21 using a comprehensive hybrid mapping panel comprising 24 cell hybrids that divide the chromosome into 33 subregions. These cDNA clones with refined mapping positions should be useful for identification and cloning of genes responsible for the specific component phenotypes of Down syndrome and other diseases on chromosome 21, including progressive myoclonus epilepsy in 21q22.3. 12 refs., 2 figs., 1 tab.

  12. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

    PubMed

    McKenna, Aaron; Hanna, Matthew; Banks, Eric; Sivachenko, Andrey; Cibulskis, Kristian; Kernytsky, Andrew; Garimella, Kiran; Altshuler, David; Gabriel, Stacey; Daly, Mark; DePristo, Mark A

    2010-09-01

    Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas. PMID:20644199

  13. Genetic linkage maps for Asian and American lotus constructed using novel SSR markers derived from the genome of sequenced cultivar

    PubMed Central

    2012-01-01

    Background The genus Nelumbo Adans. comprises two living species, N. nucifera Gaertan. (Asian lotus) and N. lutea Pers. (American lotus). A genetic linkage map is an essential resource for plant genetic studies and crop improvement but has not been generated for Nelumbo. We aimed to develop genomic simple sequence repeat (SSR) markers from the genome sequence and construct two genetic maps for Nelumbo to assist genome assembly and integration of a genetic map with the genome sequence. Results A total of 86,089 SSR motifs were identified from the genome sequences. Di- and tri-nucleotide repeat motifs were the most abundant, and accounted for 60.73% and 31.66% of all SSRs, respectively. AG/GA repeats constituted 51.17% of dinucleotide repeat motifs, followed by AT/TA (44.29%). Of 500 SSR primers tested, 386 (77.20%) produced scorable alleles with an average of 2.59 per primer, and 185 (37.00%) showed polymorphism among two parental genotypes, N. nucifera ‘Chinese Antique’ and N. lutea ‘AL1’, and six progenies of their F1 population. The normally segregating markers, which comprised 268 newly developed SSRs, 37 previously published SSRs and 53 sequence-related amplified polymorphism markers, were used for genetic map construction. The map for Asian lotus was 365.67 cM with 47 markers distributed in seven linkage groups. The map for American lotus was 524.51 cM, and contained 177 markers distributed in 11 genetic linkage groups. The number of markers per linkage group ranged from three to 34 with an average genetic distance of 3.97 cM between adjacent markers. Moreover, 171 SSR markers contained in linkage groups were anchored to 97 genomic DNA sequence contigs of ‘Chinese Antique’. The 97 contigs were merged into 60 scaffolds. Conclusion Genetic mapping of SSR markers derived from sequenced contigs in Nelumbo enabled the associated contigs to be anchored in the linkage map and facilitated assembly of the genome sequences of ‘Chinese Antique’. The

  14. A high-density genetic map of cucumber derived from Specific Length Amplified Fragment sequencing (SLAF-seq)

    PubMed Central

    Xu, Xuewen; Xu, Ruixue; Zhu, Biyun; Yu, Ting; Qu, Wenqin; Lu, Lu; Xu, Qiang; Qi, Xiaohua; Chen, Xuehao

    2015-01-01

    High-density genetic map provides an essential framework for accurate and efficient genome assembly and QTL fine mapping. Construction of high-density genetic maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. In this research, a high-density genetic map of cucumber (Cucumis sativus L.) was successfully constructed across an F2 population by a recently developed Specific Length Amplified Fragment sequencing (SLAF-seq) method. In total, 18.69 GB of data containing 93,460,000 paired-end reads were obtained after preprocessing. The average sequencing depth was 44.92 in the D8 (female parent), 42.16 in the Jin5-508 (male parent), and 5.01 in each progeny. 79,092 high-quality SLAFs were detected, of which 6784 SLAFs were polymorphic, and 1892 of the polymorphic markers met the requirements for constructing genetic map. The genetic map spanned 845.87 cm with an average genetic distance of 0.45 cm. It is a reliable linkage map for fine mapping and molecular breeding of cucumber for its high marker density and well-ordered markers. PMID:25610449

  15. A consensus linkage map for sugi (Cryptomeria japonica) from two pedigrees, based on microsatellites and expressed sequence tags.

    PubMed Central

    Tani, Naoki; Takahashi, Tomokazu; Iwata, Hiroyoshi; Mukai, Yuzuru; Ujino-Ihara, Tokuko; Matsumoto, Asako; Yoshimura, Kensuke; Yoshimaru, Hiroshi; Murai, Masafumi; Nagasaka, Kazutoshi; Tsumura, Yoshihiko

    2003-01-01

    A consensus map for sugi (Cryptomeria japonica) was constructed by integrating linkage data from two unrelated third-generation pedigrees, one derived from a full-sib cross and the other by self-pollination of F1 individuals. The progeny segregation data of the first pedigree were derived from cleaved amplified polymorphic sequences, microsatellites, restriction fragment length polymorphisms, and single nucleotide polymorphisms. The data of the second pedigree were derived from cleaved amplified polymorphic sequences, isozyme markers, morphological traits, random amplified polymorphic DNA markers, and restriction fragment length polymorphisms. Linkage analyses were done for the first pedigree with JoinMap 3.0, using its parameter set for progeny derived by cross-pollination, and for the second pedigree with the parameter set for progeny derived from selfing of F1 individuals. The 11 chromosomes of C. japonica are represented in the consensus map. A total of 438 markers were assigned to 11 large linkage groups, 1 small linkage group, and 1 nonintegrated linkage group from the second pedigree; their total length was 1372.2 cM. On average, the consensus map showed 1 marker every 3.0 cM. PCR-based codominant DNA markers such as cleaved amplified polymorphic sequences and microsatellite markers were distributed in all linkage groups and occupied about half of mapped loci. These markers are very useful for integration of different linkage maps, QTL mapping, and comparative mapping for evolutional study, especially for species with a large genome size such as conifers. PMID:14668402

  16. A high-density gene map of loblolly pine (Pinus taeda L.) based on exome sequence capture genotyping.

    PubMed

    Neves, Leandro Gomide; Davis, John M; Barbazuk, William B; Kirst, Matias

    2014-01-01

    Loblolly pine (Pinus taeda L.) is an economically and ecologically important conifer for which a suite of genomic resources is being generated. Despite recent attempts to sequence the large genome of conifers, their assembly and the positioning of genes remains largely incomplete. The interspecific synteny in pines suggests that a gene-based map would be useful to support genome assemblies and analysis of conifers. To establish a reference gene-based genetic map, we performed exome sequencing of 14729 genes on a mapping population of 72 haploid samples, generating a resource of 7434 sequence variants segregating for 3787 genes. Most markers are single-nucleotide polymorphisms, although short insertions/deletions and multiple nucleotide polymorphisms also were used. Marker segregation in the population was used to generate a high-density, gene-based genetic map. A total of 2841 genes were mapped to pine's 12 linkage groups with an average of one marker every 0.58 cM. Capture data were used to detect gene presence/absence variations and position 65 genes on the map. We compared the marker order of genes previously mapped in loblolly pine and found high agreement. We estimated that 4123 genes had enough sequencing depth for reliable detection of markers, suggesting a high marker conversation rate of 92% (3787/4123). This is possible because a significant portion of the gene is captured and sequenced, increasing the chances of identifying a polymorphic site for characterization and mapping. This sub-centiMorgan genetic map provides a valuable resource for gene positioning on chromosomes and guide for the assembly of a reference pine genome. PMID:24192835

  17. Genome-Wide Single-Nucleotide Polymorphisms Discovery and High-Density Genetic Map Construction in Cauliflower Using Specific-Locus Amplified Fragment Sequencing

    PubMed Central

    Zhao, Zhenqing; Gu, Honghui; Sheng, Xiaoguang; Yu, Huifang; Wang, Jiansheng; Huang, Long; Wang, Dan

    2016-01-01

    Molecular markers and genetic maps play an important role in plant genomics and breeding studies. Cauliflower is an important and distinctive vegetable; however, very few molecular resources have been reported for this species. In this study, a novel, specific-locus amplified fragment (SLAF) sequencing strategy was employed for large-scale single nucleotide polymorphism (SNP) discovery and high-density genetic map construction in a double-haploid, segregating population of cauliflower. A total of 12.47 Gb raw data containing 77.92 M pair-end reads were obtained after processing and 6815 polymorphic SLAFs between the two parents were detected. The average sequencing depths reached 52.66-fold for the female parent and 49.35-fold for the male parent. Subsequently, these polymorphic SLAFs were used to genotype the population and further filtered based on several criteria to construct a genetic linkage map of cauliflower. Finally, 1776 high-quality SLAF markers, including 2741 SNPs, constituted the linkage map with average data integrity of 95.68%. The final map spanned a total genetic length of 890.01 cM with an average marker interval of 0.50 cM, and covered 364.9 Mb of the reference genome. The markers and genetic map developed in this study could provide an important foundation not only for comparative genomics studies within Brassica oleracea species but also for quantitative trait loci identification and molecular breeding of cauliflower. PMID:27047515

  18. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  19. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. PMID:25625550

  20. Connectivity mapping using a combined gene signature from multiple colorectal cancer datasets identified candidate drugs including existing chemotherapies

    PubMed Central

    2015-01-01

    Background While the discovery of new drugs is a complex, lengthy and costly process, identifying new uses for existing drugs is a cost-effective approach to therapeutic discovery. Connectivity mapping integrates gene expression profiling with advanced algorithms to connect genes, diseases and small molecule compounds and has been applied in a large number of studies to identify potential drugs, particularly to facilitate drug repurposing. Colorectal cancer (CRC) is a commonly diagnosed cancer with high mortality rates, presenting a worldwide health problem. With the advancement of high throughput omics technologies, a number of large scale gene expression profiling studies have been conducted on CRCs, providing multiple datasets in gene expression data repositories. In this work, we systematically apply gene expression connectivity mapping to multiple CRC datasets to identify candidate therapeutics to this disease. Results We developed a robust method to compile a combined gene signature for colorectal cancer across multiple datasets. Connectivity mapping analysis with this signature of 148 genes identified 10 candidate compounds, including irinotecan and etoposide, which are chemotherapy drugs currently used to treat CRCs. These results indicate that we have discovered high quality connections between the CRC disease state and the candidate compounds, and that the gene signature we created may be used as a potential therapeutic target in treating the disease. The method we proposed is highly effective in generating quality gene signature through multiple datasets; the publication of the combined CRC gene signature and the list of candidate compounds from this work will benefit both cancer and systems biology research communities for further development and investigations. PMID:26356760

  1. Bloch Equations-Based Reconstruction of Myocardium T1 Maps from Modified Look-Locker Inversion Recovery Sequence

    PubMed Central

    Marty, Benjamin; Vignaud, Alexandre; Greiser, Andreas; Robert, Benjamin; de Sousa, Paulo Loureiro; Carlier, Pierre G.

    2015-01-01

    Modified Look-Locker Inversion recovery (MOLLI) sequence is increasingly performed for myocardial T1 mapping but is known to underestimate T1 values. The aim of the study was to quantitatively analyze several sources of errors when T1 maps are derived using standard post-processing of the sequence and to propose a reconstruction approach that takes into account inversion efficacy (η), T2 relaxation during balanced steady-state free-precession readouts and B1+ inhomogeneities. Contributions of the different sources of error were analyzed using Bloch equations simulations of MOLLI sequence. Bloch simulations were then combined with the acquisition of fast B1+ and T2 maps to derive more accurate T1 maps. This novel approach was evaluated on phantoms and on five healthy volunteers. Simulations show that T2 variations, B1+ heterogeneities and inversion efficiency represent major confounders for T1 mapping when MOLLI is processed with standard 3-parameters fitting. In vitro data indicate that T1 values are accurately derived with the simulation approach and in vivo data suggest that myocardium T1 are 15% underestimated when processed with the standard 3-parameters fitting. At the cost of additional acquisitions, this method might be suitable in clinical research protocols for precise tissue characterization as it decorrelates T1 and T2 effects on parametric maps provided by MOLLI sequence and avoids inaccuracies when B1+ is not homogenous throughout the myocardium. PMID:25962182

  2. Microbial genome program report: Optical approaches for physical mapping and sequence assembly of the Deinococcus radiodurans chromosome

    SciTech Connect

    Schwartz, David C.

    1999-11-23

    Maps of genomic or cloned DNA are frequently constructed by analyzing the cleavage patterns produced by restriction enzymes. Restriction enzymes are remarkable reagents that faithfully cleave only at specific sequences of between 4 and 8 nucleotides, which vary according to the specific enzymes. Restriction enzymes are reliable, numerous, and easily obtainable and presently, there are approximately 250 different sequences represented among thousands of enzymes. Restriction maps characterize gene structure and even entire genomes. Furthermore, such maps provide a useful scaffold for the alignment and verification of sequence data. Restriction maps generated by computer and predicted from the sequence are aligned with the actual restriction map. Restriction enzyme action has traditionally been assayed by gel electrophoresis. This technique separates cleaved molecules on the basis of their nobilities under the influence of an applied electrical field, within a gel separation matrix (small fragments have a greater mobility than large ones). Although gel electrophoresis distinguishes different sized DNA fragments (known as a fingerprint), the original order of these fragments remains unknown. The subsequent task of determining the order of such fragments is a labor intensive task, especially when making restriction maps of whole genomes, and therefore despite its obvious utility to genome analysis, it is not widely used.

  3. A High-Density SNP-Based Linkage Map of the Chicken Genome Reveals Sequence Features Correlated With Recombination Rate

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The resolution of the widely used chicken consensus linkage map was highly enlarged by genotyping a total of 12,945 SNPs on the three existing mapping populations in chicken; the Wageningen (WU), East Lansing (EL) and Uppsala (UPP) mapping populations. A total of 8608 SNPs could be included on the m...

  4. MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads

    PubMed Central

    2009-01-01

    Background Next-generation sequencing technologies provide exciting avenues for studies of transcriptomics and population genomics. There is an increasing need to conduct spliced and unspliced alignments of short transcript reads onto a reference genome and estimate minor allele frequency from sequences of population samples. Results We have designed and implemented MapNext, a software tool for both spliced and unspliced alignments of short sequence reads onto reference sequences, and automated SNP detection using neighbourhood quality standards. MapNext provides four main analyses: (i) unspliced alignment and clustering of reads, (ii) spliced alignment of transcript reads over intron boundaries, (iii) SNP detection and estimation of minor allele frequency from population sequences, and (iv) storage of result data in a database to make it available for more flexible queries and for further analyses. The software tool has been tested using both simulated and real data. Conclusion MapNext is a comprehensive and powerful tool for both spliced and unspliced alignments of short reads and automated SNP detection from population sequences. The simplicity, flexibility and efficiency of MapNext makes it a valuable tool for transcriptomic and population genomic research. PMID:19958476

  5. Efficient high-resolution genetic mapping of mouse interspersed repetitive sequence PCR products, toward integrated genetic and physical mapping of the mouse genome.

    PubMed Central

    McCarthy, L; Hunter, K; Schalkwyk, L; Riba, L; Anson, S; Mott, R; Newell, W; Bruley, C; Bar, I; Ramu, E

    1995-01-01

    The ability to carry out high-resolution genetic mapping at high throughput in the mouse is a critical rate-limiting step in the generation of genetically anchored contigs in physical mapping projects and the mapping of genetic loci for complex traits. To address this need, we have developed an efficient, high-resolution, large-scale genome mapping system. This system is based on the identification of polymorphic DNA sites between mouse strains by using interspersed repetitive sequence (IRS) PCR. Individual cloned IRS PCR products are hybridized to a DNA array of IRS PCR products derived from the DNA of individual mice segregating DNA sequences from the two parent strains. Since gel electrophoresis is not required, large numbers of samples can be genotyped in parallel. By using this approach, we have mapped > 450 polymorphic probes with filters containing the DNA of up to 517 backcross mice, potentially allowing resolution of 0.14 centimorgan. This approach also carries the potential for a high degree of efficiency in the integration of physical and genetic maps, since pooled DNAs representing libraries of yeast artificial chromosomes or other physical representations of the mouse genome can be addressed by hybridization of filter representations of the IRS PCR products of such libraries. Images Fig. 1 Fig. 2 Fig. 4 Fig. 5 PMID:7777502

  6. Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis

    PubMed Central

    Martin, Véronique; Zytnicki, Matthias; Fayolle, Julien; Loux, Valentin; Gibrat, Jean-François

    2012-01-01

    Abstract Mapping short reads against a reference genome is classically the first step of many next-generation sequencing data analyses, and it should be as accurate as possible. Because of the large number of reads to handle, numerous sophisticated algorithms have been developped in the last 3 years to tackle this problem. In this article, we first review the underlying algorithms used in most of the existing mapping tools, and then we compare the performance of nine of these tools on a well controled benchmark built for this purpose. We built a set of reads that exist in single or multiple copies in a reference genome and for which there is no mismatch, and a set of reads with three mismatches. We considered as reference genome both the human genome and a concatenation of all complete bacterial genomes. On each dataset, we quantified the capacity of the different tools to retrieve all the occurrences of the reads in the reference genome. Special attention was paid to reads uniquely reported and to reads with multiple hits. PMID:22506536

  7. A Simple Sequence Repeat- and Single-Nucleotide Polymorphism-Based Genetic Linkage Map of the Brown Planthopper, Nilaparvata lugens

    PubMed Central

    Jairin, Jirapong; Kobayashi, Tetsuya; Yamagata, Yoshiyuki; Sanada-Morimura, Sachiyo; Mori, Kazuki; Tashiro, Kosuke; Kuhara, Satoru; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Yamamoto, Kimiko; Matsumura, Masaya; Yasui, Hideshi

    2013-01-01

    In this study, we developed the first genetic linkage map for the major rice insect pest, the brown planthopper (BPH, Nilaparvata lugens). The linkage map was constructed by integrating linkage data from two backcross populations derived from three inbred BPH strains. The consensus map consists of 474 simple sequence repeats, 43 single-nucleotide polymorphisms, and 1 sequence-tagged site, for a total of 518 markers at 472 unique positions in 17 linkage groups. The linkage groups cover 1093.9 cM, with an average distance of 2.3 cM between loci. The average number of marker loci per linkage group was 27.8. The sex-linkage group was identified by exploiting X-linked and Y-specific markers. Our linkage map and the newly developed markers used to create it constitute an essential resource and a useful framework for future genetic analyses in BPH. PMID:23204257

  8. Genomic mapping of phosphorothioates reveals partial modification of short consensus sequences

    PubMed Central

    Cao, Bo; Chen, Chao; DeMott, Michael S.; Cheng, Qiuxiang; Clark, Tyson A.; Xiong, Xiaolin; Zheng, Xiaoqing; Butty, Vincent; Levine, Stuart S.; Yuan, George; Boitano, Matthew; Luong, Khai; Song, Yi; Zhou, Xiufen; Deng, Zixin; Turner, Stephen W.; Korlach, Jonas; You, Delin; Wang, Lianrong; Chen, Shi; Dedon, Peter C.

    2015-01-01

    Bacterial phosphorothioate (PT) DNA modifications are incorporated by Dnd proteins A-E and often function with DndF-H as a restriction-modification (R-M) system, as in Escherichia coli B7A. However, bacteria such as Vibrio cyclitrophicus FF75 lack dndF-H, which points to other PT functions. To better understand PT biology, we report two novel, orthogonal technologies to map PTs across the genomes of B7A and FF75 with >90% agreement: real-time (SMRT) sequencing and deep sequencing of iodine-induced cleavage at PT (ICDS). In B7A, we detect PT on both strands of GpsAAC/GpsTTC motifs, but with only 18% of 40,701 possible sites modified. In contrast, PT in FF75 occurs as a single-strand modification at CpsCA, again with only 14% of 160,541 sites modified. Single-molecule analysis indicates that modification could be partial at any particular genomic site even with active restriction by DndF-H, with direct interaction of modification proteins with GAAC/GTTC sites demonstrated with oligonucleotides. These results point to highly unusual target selection by PT modification proteins and rule out known R-M mechanisms. PMID:24899568

  9. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    NASA Astrophysics Data System (ADS)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  10. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    SciTech Connect

    Shou, S.; Kvikstad, E.; Kile, A.; Severin, J.; Forrest, D.; Runnheim, R.; Churas, C.; Hickman, J. W.; Mackenzie, C.; Choudhary, M.; Donohue, T.; Kaplan, S.; Schwartz, D. C.

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.