physical map contigs: Topics by Science.gov

Sample records for physical map contigs

A hybrid BAC physical map of potato: a framework for sequencing a heterozygous genome

PubMed Central

2011-01-01

Background Potato is the world's third most important food crop, yet cultivar improvement and genomic research in general remain difficult because of the heterozygous and tetraploid nature of its genome. The development of physical map resources that can facilitate genomic analyses in potato has so far been very limited. Here we present the methods of construction and the general statistics of the first two genome-wide BAC physical maps of potato, which were made from the heterozygous diploid clone RH89-039-16 (RH). Results First, a gel electrophoresis-based physical map was made by AFLP fingerprinting of 64478 BAC clones, which were aligned into 4150 contigs with an estimated total length of 1361 Mb. Screening of BAC pools, followed by the KeyMaps in silico anchoring procedure, identified 1725 AFLP markers in the physical map, and 1252 BAC contigs were anchored the ultradense potato genetic map. A second, sequence-tag-based physical map was constructed from 65919 whole genome profiling (WGP) BAC fingerprints and these were aligned into 3601 BAC contigs spanning 1396 Mb. The 39733 BAC clones that overlap between both physical maps provided anchors to 1127 contigs in the WGP physical map, and reduced the number of contigs to around 2800 in each map separately. Both physical maps were 1.64 times longer than the 850 Mb potato genome. Genome heterozygosity and incomplete merging of BAC contigs are two factors that can explain this map inflation. The contig information of both physical maps was united in a single table that describes hybrid potato physical map. Conclusions The AFLP physical map has already been used by the Potato Genome Sequencing Consortium for sequencing 10% of the heterozygous genome of clone RH on a BAC-by-BAC basis. By layering a new WGP physical map on top of the AFLP physical map, a genetically anchored genome-wide framework of 322434 sequence tags has been created. This reference framework can be used for anchoring and ordering of genomic sequences of clone RH (and other potato genotypes), and opens the possibility to finish sequencing of the RH genome in a more efficient way via high throughput next generation approaches. PMID:22142254
A first generation BAC-based physical map of the rainbow trout genome

PubMed Central

Palti, Yniv; Luo, Ming-Cheng; Hu, Yuqin; Genet, Carine; You, Frank M; Vallejo, Roger L; Thorgaard, Gary H; Wheeler, Paul A; Rexroad, Caird E

2009-01-01

Background Rainbow trout (Oncorhynchus mykiss) are the most-widely cultivated cold freshwater fish in the world and an important model species for many research areas. Coupling great interest in this species as a research model with the need for genetic improvement of aquaculture production efficiency traits justifies the continued development of genomics research resources. Many quantitative trait loci (QTL) have been identified for production and life-history traits in rainbow trout. A bacterial artificial chromosome (BAC) physical map is needed to facilitate fine mapping of QTL and the selection of positional candidate genes for incorporation in marker-assisted selection (MAS) for improving rainbow trout aquaculture production. This resource will also facilitate efforts to obtain and assemble a whole-genome reference sequence for this species. Results The physical map was constructed from DNA fingerprinting of 192,096 BAC clones using the 4-color high-information content fingerprinting (HICF) method. The clones were assembled into physical map contigs using the finger-printing contig (FPC) program. The map is composed of 4,173 contigs and 9,379 singletons. The total number of unique fingerprinting fragments (consensus bands) in contigs is 1,185,157, which corresponds to an estimated physical length of 2.0 Gb. The map assembly was validated by 1) comparison with probe hybridization results and agarose gel fingerprinting contigs; and 2) anchoring large contigs to the microsatellite-based genetic linkage map. Conclusion The production and validation of the first BAC physical map of the rainbow trout genome is described in this paper. We are currently integrating this map with the NCCCWA genetic map using more than 200 microsatellites isolated from BAC end sequences and by identifying BACs that harbor more than 300 previously mapped markers. The availability of an integrated physical and genetic map will enable detailed comparative genome analyses, fine mapping of QTL, positional cloning, selection of positional candidate genes for economically important traits and the incorporation of MAS into rainbow trout breeding programs. PMID:19814815
Integrated physical map of bread wheat chromosome arm 7DS to facilitate gene cloning and comparative studies.

PubMed

Tulpová, Zuzana; Luo, Ming-Cheng; Toegelová, Helena; Visendi, Paul; Hayashi, Satomi; Vojta, Petr; Paux, Etienne; Kilian, Andrzej; Abrouk, Michaël; Bartoš, Jan; Hajdúch, Marián; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

2018-03-08

Bread wheat (Triticum aestivum L.) is a staple food for a significant part of the world's population. The growing demand on its production can be satisfied by improving yield and resistance to biotic and abiotic stress. Knowledge of the genome sequence would aid in discovering genes and QTLs underlying these traits and provide a basis for genomics-assisted breeding. Physical maps and BAC clones associated with them have been valuable resources from which to generate a reference genome of bread wheat and to assist map-based gene cloning. As a part of a joint effort coordinated by the International Wheat Genome Sequencing Consortium, we have constructed a BAC-based physical map of bread wheat chromosome arm 7DS consisting of 895 contigs and covering 94% of its estimated length. By anchoring BAC contigs to one radiation hybrid map and three high resolution genetic maps, we assigned 73% of the assembly to a distinct genomic position. This map integration, interconnecting a total of 1713 markers with ordered and sequenced BAC clones from a minimal tiling path, provides a tool to speed up gene cloning in wheat. The process of physical map assembly included the integration of the 7DS physical map with a whole-genome physical map of Aegilops tauschii and a 7DS Bionano genome map, which together enabled efficient scaffolding of physical-map contigs, even in the non-recombining region of the genetic centromere. Moreover, this approach facilitated a comparison of bread wheat and its ancestor at BAC-contig level and revealed a reconstructed region in the 7DS pericentromere. Copyright © 2018. Published by Elsevier B.V.
A High-throughput AFLP-based Method for Constructing Integrated Genetic and Physical Maps: Progress Toward a Sorghum Genome Map

PubMed Central

Klein, Patricia E.; Klein, Robert R.; Cartinhour, Samuel W.; Ulanch, Paul E.; Dong, Jianmin; Obert, Jacque A.; Morishige, Daryl T.; Schlueter, Shannon D.; Childs, Kevin L.; Ale, Melissa; Mullet, John E.

2000-01-01

Sorghum is an important target for plant genomic mapping because of its adaptation to harsh environments, diverse germplasm collection, and value for comparing the genomes of grass species such as corn and rice. The construction of an integrated genetic and physical map of the sorghum genome (750 Mbp) is a primary goal of our sorghum genome project. To help accomplish this task, we have developed a new high-throughput PCR-based method for building BAC contigs and locating BAC clones on the sorghum genetic map. This task involved pooling 24,576 sorghum BAC clones (∼4× genome equivalents) in six different matrices to create 184 pools of BAC DNA. DNA fragments from each pool were amplified using amplified fragment length polymorphism (AFLP) technology, resolved on a LI-COR dual-dye DNA sequencing system, and analyzed using Bionumerics software. On average, each set of AFLP primers amplified 28 single-copy DNA markers that were useful for identifying overlapping BAC clones. Data from 32 different AFLP primer combinations identified ∼2400 BACs and ordered ∼700 BAC contigs. Analysis of a sorghum RIL mapping population using the same primer pairs located ∼200 of the BAC contigs on the sorghum genetic map. Restriction endonuclease fingerprinting of the entire collection of sorghum BAC clones was applied to test and extend the contigs constructed using this PCR-based methodology. Analysis of the fingerprint data allowed for the identification of 3366 contigs each containing an average of 5 BACs. BACs in ∼65% of the contigs aligned by AFLP analysis had sufficient overlap to be confirmed by DNA fingerprint analysis. In addition, 30% of the overlapping BACs aligned by AFLP analysis provided information for merging contigs and singletons that could not be joined using fingerprint data alone. Thus, the combination of fingerprinting and AFLP-based contig assembly and mapping provides a reliable, high-throughput method for building an integrated genetic and physical map of the sorghum genome. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF218263.] PMID:10854411
A comparative physical map reveals the pattern of chromosomal evolution between the turkey (Meleagris gallopavo) and chicken (Gallus gallus) genomes

PubMed Central

2011-01-01

Background A robust bacterial artificial chromosome (BAC)-based physical map is essential for many aspects of genomics research, including an understanding of chromosome evolution, high-resolution genome mapping, marker-assisted breeding, positional cloning of genes, and quantitative trait analysis. To facilitate turkey genetics research and better understand avian genome evolution, a BAC-based integrated physical, genetic, and comparative map was developed for this important agricultural species. Results The turkey genome physical map was constructed based on 74,013 BAC fingerprints (11.9 × coverage) from two independent libraries, and it was integrated with the turkey genetic map and chicken genome sequence using over 41,400 BAC assignments identified by 3,499 overgo hybridization probes along with > 43,000 BAC end sequences. The physical-comparative map consists of 74 BAC contigs, with an average contig size of 13.6 Mb. All but four of the turkey chromosomes were spanned on this map by three or fewer contigs, with 14 chromosomes spanned by a single contig and nine chromosomes spanned by two contigs. This map predicts 20 to 27 major rearrangements distinguishing turkey and chicken chromosomes, despite up to 40 million years of separate evolution between the two species. These data elucidate the chromosomal evolutionary pattern within the Phasianidae that led to the modern turkey and chicken karyotypes. The predominant rearrangement mode involves intra-chromosomal inversions, and there is a clear bias for these to result in centromere locations at or near telomeres in turkey chromosomes, in comparison to interstitial centromeres in the orthologous chicken chromosomes. Conclusion The BAC-based turkey-chicken comparative map provides novel insights into the evolution of avian genomes, a framework for assembly of turkey whole genome shotgun sequencing data, and tools for enhanced genetic improvement of these important agricultural and model species. PMID:21906286
Features of the organization of bread wheat chromosome 5BS based on physical mapping.

PubMed

Salina, Elena A; Nesterov, Mikhail A; Frenkel, Zeev; Kiseleva, Antonina A; Timonova, Ekaterina M; Magni, Federica; Vrána, Jan; Šafář, Jan; Šimková, Hana; Doležel, Jaroslav; Korol, Abraham; Sergeeva, Ekaterina M

2018-02-09

The IWGSC strategy for construction of the reference sequence of the bread wheat genome is based on first obtaining physical maps of the individual chromosomes. Our aim is to develop and use the physical map for analysis of the organization of the short arm of wheat chromosome 5B (5BS) which bears a number of agronomically important genes, including genes conferring resistance to fungal diseases. A physical map of the 5BS arm (290 Mbp) was constructed using restriction fingerprinting and LTC software for contig assembly of 43,776 BAC clones. The resulting physical map covered ~ 99% of the 5BS chromosome arm (111 scaffolds, N50 = 3.078 Mb). SSR, ISBP and zipper markers were employed for anchoring the BAC clones, and from these 722 novel markers were developed based on previously obtained data from partial sequencing of 5BS. The markers were mapped using a set of Chinese Spring (CS) deletion lines, and F2 and RICL populations from a cross of CS and CS-5B dicoccoides. Three approaches have been used for anchoring BAC contigs on the 5BS chromosome, including clone-by-clone screening of BACs, GenomeZipper analysis, and comparison of BAC-fingerprints with in silico fingerprinting of 5B pseudomolecules of T. dicoccoides. These approaches allowed us to reach a high level of BAC contig anchoring: 96% of 5BS BAC contigs were located on 5BS. An interesting pattern was revealed in the distribution of contigs along the chromosome. Short contigs (200-999 kb) containing markers for the regions interrupted by tandem repeats, were mainly localized to the 5BS subtelomeric block; whereas the distribution of larger 1000-3500 kb contigs along the chromosome better correlated with the distribution of the regions syntenic to rice, Brachypodium, and sorghum, as detected by the Zipper approach. The high fingerprinting quality, LTC software and large number of BAC clones selected by the informative markers in screening of the 43,776 clones allowed us to significantly increase the BAC scaffold length when compared with the published physical maps for other wheat chromosomes. The genetic and bioinformatics resources developed in this study provide new possibilities for exploring chromosome organization and for breeding applications.
Feasibility of physical map construction from fingerprinted bacterial artificial chromosome libraries of polyploid plant species

PubMed Central

2010-01-01

Background The presence of closely related genomes in polyploid species makes the assembly of total genomic sequence from shotgun sequence reads produced by the current sequencing platforms exceedingly difficult, if not impossible. Genomes of polyploid species could be sequenced following the ordered-clone sequencing approach employing contigs of bacterial artificial chromosome (BAC) clones and BAC-based physical maps. Although BAC contigs can currently be constructed for virtually any diploid organism with the SNaPshot high-information-content-fingerprinting (HICF) technology, it is currently unknown if this is also true for polyploid species. It is possible that BAC clones from orthologous regions of homoeologous chromosomes would share numerous restriction fragments and be therefore included into common contigs. Because of this and other concerns, physical mapping utilizing the SNaPshot HICF of BAC libraries of polyploid species has not been pursued and the possibility of doing so has not been assessed. The sole exception has been in common wheat, an allohexaploid in which it is possible to construct single-chromosome or single-chromosome-arm BAC libraries from DNA of flow-sorted chromosomes and bypass the obstacles created by polyploidy. Results The potential of the SNaPshot HICF technology for physical mapping of polyploid plants utilizing global BAC libraries was evaluated by assembling contigs of fingerprinted clones in an in silico merged BAC library composed of single-chromosome libraries of two wheat homoeologous chromosome arms, 3AS and 3DS, and complete chromosome 3B. Because the chromosome arm origin of each clone was known, it was possible to estimate the fidelity of contig assembly. On average 97.78% or more clones, depending on the library, were from a single chromosome arm. A large portion of the remaining clones was shown to be library contamination from other chromosomes, a feature that is unavoidable during the construction of single-chromosome BAC libraries. Conclusions The negligibly low level of incorporation of clones from homoeologous chromosome arms into a contig during contig assembly suggested that it is feasible to construct contigs and physical maps using global BAC libraries of wheat and almost certainly also of other plant polyploid species with genome sizes comparable to that of wheat. Because of the high purity of the resulting assembled contigs, they can be directly used for genome sequencing. It is currently unknown but possible that equally good BAC contigs can be also constructed for polyploid species containing smaller, more gene-rich genomes. PMID:20170511
BAC-end sequence-based SNPs and Bin mapping for rapid integration of physical and genetic maps in apple.

PubMed

Han, Yuepeng; Chagné, David; Gasic, Ksenija; Rikkerink, Erik H A; Beever, Jonathan E; Gardiner, Susan E; Korban, Schuyler S

2009-03-01

A genome-wide BAC physical map of the apple, Malus x domestica Borkh., has been recently developed. Here, we report on integrating the physical and genetic maps of the apple using a SNP-based approach in conjunction with bin mapping. Briefly, BAC clones located at ends of BAC contigs were selected, and sequenced at both ends. The BAC end sequences (BESs) were used to identify candidate SNPs. Subsequently, these candidate SNPs were genetically mapped using a bin mapping strategy for the purpose of mapping the physical onto the genetic map. Using this approach, 52 (23%) out of 228 BESs tested were successfully exploited to develop SNPs. These SNPs anchored 51 contigs, spanning approximately 37 Mb in cumulative physical length, onto 14 linkage groups. The reliability of the integration of the physical and genetic maps using this SNP-based strategy is described, and the results confirm the feasibility of this approach to construct an integrated physical and genetic maps for apple.
A YAC contig spanning the dominant retinitis pigmentosa locus (RP9) on chromosome 7p

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keen, T.J.; Inglehearn, C.F.; Patel, R.J.

1995-08-10

The dominant retinitis pigmentosa locus RP9 has previously been localized to 7p13-p15, in the interval D7S526-D7S484. We now report refinement of the locus to the interval D7S795-D7S484 and YAC contig of approximately 4.8 Mb spanning this region and extending both distally and proximally from it. The contig was constructed by STS content mapping and physically orders 29 STSs in 28 YAC clones. The order of polymorphic markers in the contig is consistent with a genetic map that has been assembled using haplotype data from the CEPH pedigrees. This contig will provide a primary resource for the construction of a transcriptionalmore » map of this region and for the identification of the defective gene causing this form of adRP. 27 refs., 3 figs., 1 tab.« less
A physical map of Brassica oleracea shows complexity of chromosomal changes following recursive paleopolyploidizations

PubMed Central

2011-01-01

Background Evolution of the Brassica species has been recursively affected by polyploidy events, and comparison to their relative, Arabidopsis thaliana, provides means to explore their genomic complexity. Results A genome-wide physical map of a rapid-cycling strain of B. oleracea was constructed by integrating high-information-content fingerprinting (HICF) of Bacterial Artificial Chromosome (BAC) clones with hybridization to sequence-tagged probes. Using 2907 contigs of two or more BACs, we performed several lines of comparative genomic analysis. Interspecific DNA synteny is much better preserved in euchromatin than heterochromatin, showing the qualitative difference in evolution of these respective genomic domains. About 67% of contigs can be aligned to the Arabidopsis genome, with 96.5% corresponding to euchromatic regions, and 3.5% (shown to contain repetitive sequences) to pericentromeric regions. Overgo probe hybridization data showed that contigs aligned to Arabidopsis euchromatin contain ~80% of low-copy-number genes, while genes with high copy number are much more frequently associated with pericentromeric regions. We identified 39 interchromosomal breakpoints during the diversification of B. oleracea and Arabidopsis thaliana, a relatively high level of genomic change since their divergence. Comparison of the B. oleracea physical map with Arabidopsis and other available eudicot genomes showed appreciable 'shadowing' produced by more ancient polyploidies, resulting in a web of relatedness among contigs which increased genomic complexity. Conclusions A high-resolution genetically-anchored physical map sheds light on Brassica genome organization and advances positional cloning of specific genes, and may help to validate genome sequence assembly and alignment to chromosomes. All the physical mapping data is freely shared at a WebFPC site (http://lulu.pgml.uga.edu/fpc/WebAGCoL/brassica/WebFPC/; Temporarily password-protected: account: pgml; password: 123qwe123. PMID:21955929
Construction of a yeast artificial chromosome contig encompassing the chromosome 14 Alzheimer`s disease locus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharma, V.; Bonnycastle, L.; Poorkai, P.

1994-09-01

We have constructed a yeast artificial chromosome (YAC) contig of chromosome 14q24.3 which encompasses the chromosome 14 Alzheimer`s disease locus (AD3). Determined by linkage analysis of early-onset Alzheimer`s disease kindreds, this interval is bounded by the genetic markers D14S61-D14S63 and spans approximately 15 centimorgans. The contig consists of 29 markers and 74 YACs of which 57 are defined by one or more sequence tagged sites (STSs). The STS markers comprise 5 genes, 16 short tandem repeat polymorphisms and 8 cDNA clones. An additional number of genes, expressed sequence tags and cDNA fragments have been identified and localized to the contigmore » by hybridization and sequence analysis of anonymous clones isolated by cDNA direct selection techniques. A minimal contig of about 15 YACs averaging 0.5-1.5 megabase in length will span this interval and is, at first approximation, in rough agreement with the genetic map. For two regions of the contig, our coverage has relied on L1/THE fingerprint and Alu-PCR hybridization data of YACs provided by CEPH/Genethon. We are currently developing sequence tagged sites from these to confirm the overlaps revealed by the fingerprint data. Among the genes which map to the contig are transforming growth factor beta 3, c-fos, and heat shock protein 2A (HSPA2). C-fos is not a candidate gene for AD3 based on the sequence analysis of affected and unaffected individuals. HSPA2 maps to the proximal edge of the contig and Calmodulin 1, a candidate gene from 4q24.3, maps outside of the region. The YAC contig is a framework physical map from which cosmid or P1 clone contigs can be constructed. As more genes and cDNAs are mapped, a highly resolved transcription map will emerge, a necessary step towards positionally cloning the AD3 gene.« less
A complete YAC contig of the Prader-Willi/Angelman chromosome region (15q11-q13) and refined localization of the SNRPN gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mutirangura, A.; Jayakumar, A.; Sutcliffe, J.S.

1993-12-01

Since a previous report of a partial YAC contig of the Prader-Willi/Angelman chromosome region (15q11-q13), a complete contig spanning approximately 3.5 Mb has been developed. YACs were isolated from two human genomic libraries by PCR and hybridization screening methods. Twenty-three sequence-tagged sites (STSs) were mapped within the contig, a density of [approximately]1 per 200 kb. Overlaps between YAC clones were identified by Alu-PCR dot-blot analysis and confirmed by STS mapping or hybridization with ends of YAC inserts. The gene encoding small nuclear ribonucleoprotein-associated peptide N (SNRPN), recently identified as a candidate gene for Prader-Willi syndrome, was localized within this contigmore » between markers PW71 and TD3-21. Loci mapped within and immediately flanking the Prader-Willi/Angelman chromosome region contig are ordered as follows: cen-IR39-ML34-IR4-3R-TD189-1-PW71-SNRPN-TD3-21-LS6-1-GABRB3,D15S97-GABRA5-IR10-1-CMW1-tel. This YAC contig will be a useful resource for more detailed physical mapping of the region, for generation of new DNA markers, and for mapping or cloning candidate genes for the Prader-Willi and Angelman syndromes. 36 refs., 2 figs., 2 tabs.« less
Anchoring 9,371 Maize Expressed Sequence Tagged Unigenes to the Bacterial Artificial Chromosome Contig Map by Two-Dimensional Overgo Hybridization1

PubMed Central

Gardiner, Jack; Schroeder, Steven; Polacco, Mary L.; Sanchez-Villeda, Hector; Fang, Zhiwei; Morgante, Michele; Landewe, Tim; Fengler, Kevin; Useche, Francisco; Hanafey, Michael; Tingey, Scott; Chou, Hugh; Wing, Rod; Soderlund, Carol; Coe, Edward H.

2004-01-01

Our goal is to construct a robust physical map for maize (Zea mays) comprehensively integrated with the genetic map. We have used a two-dimensional 24 × 24 overgo pooling strategy to anchor maize expressed sequence tagged (EST) unigenes to 165,888 bacterial artificial chromosomes (BACs) on high-density filters. A set of 70,716 public maize ESTs seeded derivation of 10,723 EST unigene assemblies. From these assemblies, 10,642 overgo sequences of 40 bp were applied as hybridization probes. BAC addresses were obtained for 9,371 overgo probes, representing an 88% success rate. More than 96% of the successful overgo probes identified two or more BACs, while 5% identified more than 50 BACs. The majority of BACs identified (79%) were hybridized with one or two overgos. A small number of BACs hybridized with eight or more overgos, suggesting that these BACs must be gene rich. Approximately 5,670 overgos identified BACs assembled within one contig, indicating that these probes are highly locus specific. A total of 1,795 megabases (Mb; 87%) of the total 2,050 Mb in BAC contigs were associated with one or more overgos, which are serving as sequence-tagged sites for single nucleotide polymorphism development. Overgo density ranged from less than one overgo per megabase to greater than 20 overgos per megabase. The majority of contigs (52%) hit by overgos contained three to nine overgos per megabase. Analysis of approximately 1,022 Mb of genetically anchored BAC contigs indicates that 9,003 of the total 13,900 overgo-contig sites are genetically anchored. Our results indicate overgos are a powerful approach for generating gene-specific hybridization probes that are facilitating the assembly of an integrated genetic and physical map for maize. PMID:15020742
Physical Maps for Genome Analysis of Serotype A and D Strains of the Fungal Pathogen Cryptococcus neoformans

PubMed Central

Schein, Jacqueline E.; Tangen, Kristin L.; Chiu, Readman; Shin, Heesun; Lengeler, Klaus B.; MacDonald, William Kim; Bosdet, Ian; Heitman, Joseph; Jones, Steven J.M.; Marra, Marco A.; Kronstad, James W.

2002-01-01

The basidiomycete fungus Cryptococcus neoformans is an important opportunistic pathogen of humans that poses a significant threat to immunocompromised individuals. Isolates of C. neoformans are classified into serotypes (A, B, C, D, and AD) based on antigenic differences in the polysaccharide capsule that surrounds the fungal cells. Genomic and EST sequencing projects are underway for the serotype D strain JEC21 and the serotype A strain H99. As part of a genomics program for C. neoformans, we have constructed fingerprinted bacterial artificial chromosome (BAC) clone physical maps for strains H99 and JEC21 to support the genomic sequencing efforts and to provide an initial comparison of the two genomes. The BAC clones represented an estimated 10-fold redundant coverage of the genomes of each serotype and allowed the assembly of 20 contigs each for H99 and JEC21. We found that the genomes of the two strains are sufficiently distinct to prevent coassembly of the two maps when combined fingerprint data are used to construct contigs. Hybridization experiments placed 82 markers on the JEC21 map and 102 markers on the H99 map, enabling contigs to be linked with specific chromosomes identified by electrophoretic karyotyping. These markers revealed both extensive similarity in gene order (conservation of synteny) between JEC21 and H99 as well as examples of chromosomal rearrangements including inversions and translocations. Sequencing reads were generated from the ends of the BAC clones to allow correlation of genomic shotgun sequence data with physical map contigs. The BAC maps therefore represent a valuable resource for the generation, assembly, and finishing of the genomic sequence of both JEC21 and H99. The physical maps also serve as a link between map-based and sequence-based data, providing a powerful resource for continued genomic studies. [This paper is dedicated to the memory of Michael Smith, Founding Director of the Biotechnology Laboratory and the BC Cancer Agency Genome Sciences Centre. Supplemental material is available online at http://www.genome.org.] PMID:12213782
Physical and genetic mapping of the CMT4A locus and exclusion of PMP-2 as the defect in CMT4A

DOE Office of Scientific and Technical Information (OSTI.GOV)

Othmane, K.B.; Loeb, D.; Roses, A.D.

1995-07-20

We have previously localized one form of the autosomal recessive Charcot-Marie-Tooth disease type 4 (CMT4A) to a 5-cM region of chromosome 8q13-q21. We now report the formation of a 7-Bp YAC contig spanning the region. This contig was used to map nine additional microsatellites and six STSs to this region, and subsequent haplotype analysis has narrowed the CMT4A flanking interval to less than 1 cM. In addition, using SSCP and our physical map, we have demonstrated that the myelin protein PMP-2, mapped by FISH to this region, is not the defect in CMT4A. 27 refs., 3 figs., 1 tab.
Physical mapping of a large plant genome using global high-information-content-fingerprinting: the distal region of the wheat ancestor Aegilops tauschii chromosome 3DS

PubMed Central

2010-01-01

Background Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of the D-genome of hexaploid wheat (Triticum aestivum), Aegilops tauschii, is used as a resource for wheat genomics. The barley diploid genome also provides a good model for the Triticeae and T. aestivum since it is only slightly larger than the ancestor wheat D genome. Gene co-linearity between the grasses can be exploited by extrapolating from rice and Brachypodium distachyon to Ae. tauschii or barley, and then to wheat. Results We report the use of Ae. tauschii for the construction of the physical map of a large distal region of chromosome arm 3DS. A physical map of 25.4 Mb was constructed by anchoring BAC clones of Ae. tauschii with 85 EST on the Ae. tauschii and barley genetic maps. The 24 contigs were aligned to the rice and B. distachyon genomic sequences and a high density SNP genetic map of barley. As expected, the mapped region is highly collinear to the orthologous chromosome 1 in rice, chromosome 2 in B. distachyon and chromosome 3H in barley. However, the chromosome scale of the comparative maps presented provides new insights into grass genome organization. The disruptions of the Ae. tauschii-rice and Ae. tauschii-Brachypodium syntenies were identical. We observed chromosomal rearrangements between Ae. tauschii and barley. The comparison of Ae. tauschii physical and genetic maps showed that the recombination rate across the region dropped from 2.19 cM/Mb in the distal region to 0.09 cM/Mb in the proximal region. The size of the gaps between contigs was evaluated by comparing the recombination rate along the map with the local recombination rates calculated on single contigs. Conclusions The physical map reported here is the first physical map using fingerprinting of a complete Triticeae genome. This study demonstrates that global fingerprinting of the large plant genomes is a viable strategy for generating physical maps. Physical maps allow the description of the co-linearity between wheat and grass genomes and provide a powerful tool for positional cloning of new genes. PMID:20553621
Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome

PubMed Central

2011-01-01

Background Flax (Linum usitatissimum L.) is an important source of oil rich in omega-3 fatty acids, which have proven health benefits and utility as an industrial raw material. Flax seeds also contain lignans which are associated with reducing the risk of certain types of cancer. Its bast fibres have broad industrial applications. However, genomic tools needed for molecular breeding were non existent. Hence a project, Total Utilization Flax GENomics (TUFGEN) was initiated. We report here the first genome-wide physical map of flax and the generation and analysis of BAC-end sequences (BES) from 43,776 clones, providing initial insights into the genome. Results The physical map consists of 416 contigs spanning ~368 Mb, assembled from 32,025 fingerprints, representing roughly 54.5% to 99.4% of the estimated haploid genome (370-675 Mb). The N50 size of the contigs was estimated to be ~1,494 kb. The longest contig was ~5,562 kb comprising 437 clones. There were 96 contigs containing more than 100 clones. Approximately 54.6 Mb representing 8-14.8% of the genome was obtained from 80,337 BES. Annotation revealed that a large part of the genome consists of ribosomal DNA (~13.8%), followed by known transposable elements at 6.1%. Furthermore, ~7.4% of sequence was identified to harbour novel repeat elements. Homology searches against flax-ESTs and NCBI-ESTs suggested that ~5.6% of the transcriptome is unique to flax. A total of 4064 putative genomic SSRs were identified and are being developed as novel markers for their use in molecular breeding. Conclusion The first genome-wide physical map of flax constructed with BAC clones provides a framework for accessing target loci with economic importance for marker development and positional cloning. Analysis of the BES has provided insights into the uniqueness of the flax genome. Compared to other plant genomes, the proportion of rDNA was found to be very high whereas the proportion of known transposable elements was low. The SSRs identified from BES will be valuable in saturating existing linkage maps and for anchoring physical and genetic maps. The physical map and paired-end reads from BAC clones will also serve as scaffolds to build and validate the whole genome shotgun assembly. PMID:21554714
Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome.

PubMed

Ragupathy, Raja; Rathinavelu, Rajkumar; Cloutier, Sylvie

2011-05-09

Flax (Linum usitatissimum L.) is an important source of oil rich in omega-3 fatty acids, which have proven health benefits and utility as an industrial raw material. Flax seeds also contain lignans which are associated with reducing the risk of certain types of cancer. Its bast fibres have broad industrial applications. However, genomic tools needed for molecular breeding were non existent. Hence a project, Total Utilization Flax GENomics (TUFGEN) was initiated. We report here the first genome-wide physical map of flax and the generation and analysis of BAC-end sequences (BES) from 43,776 clones, providing initial insights into the genome. The physical map consists of 416 contigs spanning ~368 Mb, assembled from 32,025 fingerprints, representing roughly 54.5% to 99.4% of the estimated haploid genome (370-675 Mb). The N50 size of the contigs was estimated to be ~1,494 kb. The longest contig was ~5,562 kb comprising 437 clones. There were 96 contigs containing more than 100 clones. Approximately 54.6 Mb representing 8-14.8% of the genome was obtained from 80,337 BES. Annotation revealed that a large part of the genome consists of ribosomal DNA (~13.8%), followed by known transposable elements at 6.1%. Furthermore, ~7.4% of sequence was identified to harbour novel repeat elements. Homology searches against flax-ESTs and NCBI-ESTs suggested that ~5.6% of the transcriptome is unique to flax. A total of 4064 putative genomic SSRs were identified and are being developed as novel markers for their use in molecular breeding. The first genome-wide physical map of flax constructed with BAC clones provides a framework for accessing target loci with economic importance for marker development and positional cloning. Analysis of the BES has provided insights into the uniqueness of the flax genome. Compared to other plant genomes, the proportion of rDNA was found to be very high whereas the proportion of known transposable elements was low. The SSRs identified from BES will be valuable in saturating existing linkage maps and for anchoring physical and genetic maps. The physical map and paired-end reads from BAC clones will also serve as scaffolds to build and validate the whole genome shotgun assembly.
A Single Molecule Scaffold for the Maize Genome

PubMed Central

Zhou, Shiguo; Wei, Fusheng; Nguyen, John; Bechner, Mike; Potamousis, Konstantinos; Goldstein, Steve; Pape, Louise; Mehan, Michael R.; Churas, Chris; Pasternak, Shiran; Forrest, Dan K.; Wise, Roger; Ware, Doreen; Wing, Rod A.; Waterman, Michael S.; Livny, Miron; Schwartz, David C.

2009-01-01

About 85% of the maize genome consists of highly repetitive sequences that are interspersed by low-copy, gene-coding sequences. The maize community has dealt with this genomic complexity by the construction of an integrated genetic and physical map (iMap), but this resource alone was not sufficient for ensuring the quality of the current sequence build. For this purpose, we constructed a genome-wide, high-resolution optical map of the maize inbred line B73 genome containing >91,000 restriction sites (averaging 1 site/∼23 kb) accrued from mapping genomic DNA molecules. Our optical map comprises 66 contigs, averaging 31.88 Mb in size and spanning 91.5% (2,103.93 Mb/∼2,300 Mb) of the maize genome. A new algorithm was created that considered both optical map and unfinished BAC sequence data for placing 60/66 (2,032.42 Mb) optical map contigs onto the maize iMap. The alignment of optical maps against numerous data sources yielded comprehensive results that proved revealing and productive. For example, gaps were uncovered and characterized within the iMap, the FPC (fingerprinted contigs) map, and the chromosome-wide pseudomolecules. Such alignments also suggested amended placements of FPC contigs on the maize genetic map and proactively guided the assembly of chromosome-wide pseudomolecules, especially within complex genomic regions. Lastly, we think that the full integration of B73 optical maps with the maize iMap would greatly facilitate maize sequence finishing efforts that would make it a valuable reference for comparative studies among cereals, or other maize inbred lines and cultivars. PMID:19936062
Enhancing genome assemblies by integrating non-sequence based data

PubMed Central

2011-01-01

Introduction Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. Methods The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Results Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. Conclusions We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses. PMID:21554765

Enhancing genome assemblies by integrating non-sequence based data.

PubMed

Heider, Thomas N; Lindsay, James; Wang, Chenwei; O'Neill, Rachel J; Pask, Andrew J

2011-05-28

Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses.
A physical map of a BAC clone contig covering the entire autosome insertion between ovine MHC Class IIa and IIb

PubMed Central

2012-01-01

Background The ovine Major Histocompatibility Complex (MHC) harbors genes involved in overall resistance/susceptibility of the host to infectious diseases. Compared to human and mouse, the ovine MHC is interrupted by a large piece of autosome insertion via a hypothetical chromosome inversion that constitutes ~25% of ovine chromosome 20. The evolutionary consequence of such an inversion and an insertion (inversion/insertion) in relation to MHC function remains unknown. We previously constructed a BAC clone physical map for the ovine MHC exclusive of the insertion region. Here we report the construction of a high-density physical map covering the autosome insertion in order to address the question of what the inversion/insertion had to do with ruminants during the MHC evolution. Results A total of 119 pairs of comparative bovine oligo primers were utilized to screen an ovine BAC library for positive clones and the orders and overlapping relationships of the identified clones were determined by DNA fingerprinting, BAC-end sequencing, and sequence-specific PCR. A total of 368 positive BAC clones were identified and 108 of the effective clones were ordered into an overlapping BAC contig to cover the consensus region between ovine MHC class IIa and IIb. Therefore, a continuous physical map covering the entire ovine autosome inversion/insertion region was successfully constructed. The map confirmed the bovine sequence assembly for the same homologous region. The DNA sequences of 185 BAC-ends have been deposited into NCBI database with the access numbers HR309252 through HR309068, corresponding to dbGSS ID 30164010 through 30163826. Conclusions We have constructed a high-density BAC clone physical map for the ovine autosome inversion/insertion between the MHC class IIa and IIb. The entire ovine MHC region is now fully covered by a continuous BAC clone contig. The physical map we generated will facilitate MHC functional studies in the ovine, as well as the comparative MHC evolution in ruminants. PMID:22897909
Characterization of Three Maize Bacterial Artificial Chromosome Libraries toward Anchoring of the Physical Map to the Genetic Map Using High-Density Bacterial Artificial Chromosome Filter Hybridization1

PubMed Central

Yim, Young-Sun; Davis, Georgia L.; Duru, Ngozi A.; Musket, Theresa A.; Linton, Eric W.; Messing, Joachim W.; McMullen, Michael D.; Soderlund, Carol A.; Polacco, Mary L.; Gardiner, Jack M.; Coe, Edward H.

2002-01-01

Three maize (Zea mays) bacterial artificial chromosome (BAC) libraries were constructed from inbred line B73. High-density filter sets from all three libraries, made using different restriction enzymes (HindIII, EcoRI, and MboI, respectively), were evaluated with a set of complex probes including the185-bp knob repeat, ribosomal DNA, two telomere-associated repeat sequences, four centromere repeats, the mitochondrial genome, a multifragment chloroplast DNA probe, and bacteriophage λ. The results indicate that the libraries are of high quality with low contamination by organellar and λ-sequences. The use of libraries from multiple enzymes increased the chance of recovering each region of the genome. Ninety maize restriction fragment-length polymorphism core markers were hybridized to filters of the HindIII library, representing 6× coverage of the genome, to initiate development of a framework for anchoring BAC contigs to the intermated B73 × Mo17 genetic map and to mark the bin boundaries on the physical map. All of the clones used as hybridization probes detected at least three BACs. Twenty-two single-copy number core markers identified an average of 7.4 ± 3.3 positive clones, consistent with the expectation of six clones. This information is integrated into fingerprinting data generated by the Arizona Genomics Institute to assemble the BAC contigs using fingerprint contig and contributed to the process of physical map construction. PMID:12481051
A 1.8-Mb YAC contig in Xp11.23: identification of CpG islands and physical mapping of CA repeats in a region of high gene density.

PubMed

Coleman, M P; Németh, A H; Campbell, L; Raut, C P; Weissenbach, J; Davies, K E

1994-05-15

The genes ARAF1, SYN1, TIMP, and PFC are clustered within 70 kb of one another, and, as reported in the accompanying paper (J. Knight et al., 1994, Genomics 21: 180-187), at least four more genes map within 400 kb: a cluster of Krüppel-type zinc finger genes (including ZNF21, ZNF41, and ZNF81) and ELK-1, a member of the ets oncogene superfamily. This gene-rich region is of particular interest because of the large number of disease genes mapping to Xp11.23: at least three eye diseases (retinitis pigmentosa type 2, congenital stationary night blindness CSNB1, and Aland Island eye disease), Wiskott-Aldrich syndrome, X-linked nephrolithiasis, and a translocation breakpoint associated with synovial sarcoma. We have constructed a 1.8-Mb YAC contig in this region, confirming the link between TIMP and OATL1 reported by Knight et al. (1994) and extending the map in the distal direction. To investigate the likelihood that more genes are located within this region, we have carried out detailed mapping of rare-cutter restriction sites in these YACs and identified seven CpG islands. At least six of these islands are located over 50 kb from any known gene locations, suggesting that the region contains at least this many as yet unidentified genes. We have also mapped the physical locations of six highly polymorphic CA repeats within the contig, thus integrating the physical, genetic, and transcriptional maps of the region and facilitating the mapping and identification of disease genes.(ABSTRACT TRUNCATED AT 250 WORDS)
Radiation hybrid map of barley chromosome 3H

USDA-ARS?s Scientific Manuscript database

Assembly of the barley genome is complicated by its large size (5.1 Gb) and proportion of repetitive elements (84%). This process is facilitated by high resolution maps for aligning BAC contigs along chromosomes. Available genetic maps; however, do not provide accurate information on the physical po...
A 2.5-Mb contig constructed from Angus, Longhorn and horned Hereford DNA spanning the polled interval on bovine chromosome 1.

PubMed

Wunderlich, K R; Abbey, C A; Clayton, D R; Song, Y; Schein, J E; Georges, M; Coppieters, W; Adelson, D L; Taylor, J F; Davis, S L; Gill, C A

2006-12-01

The polled locus has been mapped by genetic linkage analysis to the proximal region of bovine chromosome 1. As an intermediate step in our efforts to identify the polled locus and the underlying causative mutation for the polled phenotype, we have constructed a BAC-based physical map of the interval containing the polled locus. Clones containing genes and markers in the critical interval were isolated from the TAMBT (constructed from Angus and Longhorn genomic DNA) and CHORI-240 (constructed from horned Hereford genomic DNA) BAC libraries and ordered based on fingerprinting and the presence or absence of 80 STS markers. A single contig spanning 2.5 Mb was assembled. Comparison of the physical order of STSs to the corresponding region of human chromosome 21 revealed the same order of genes within the polled critical interval. This contig of overlapping BAC clones from horned and polled breeds is a useful resource for SNP discovery and characterization of positional candidate genes.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hellsten, E.; Vesa, J.; Peltonen, L.

Infantile neuronal ceroid lipofuscinosis (INCL, CLN1) is a neurodegenerative disorder in which the biochemical defect is unknown. We earlier assigned the disease locus to chromosome 1p32 in the immediate vicinity of the highly informative HY-TM1 marker by linkage and linkage disequilibrium analysis. Here we report the construction of PFGE maps on the CLN1 region covering a total of 4 Mb of this relatively poorly mapped chromosomal region. We established the order of loci at 1p32 as tel-D1S57-L-myc-HY-TM1-rlf-COL9A2-D1S193-D1S62-D1S211-cen by combining data obtained from analysis of a chromosome 1 somatic cell hybrid panel, PFGE, and interphase FISH. We isolated YACs and constructedmore » two separate YAC contigs, the loci L-myc, HY-TM1, rlf, and COL9A2 being present on a 1000-kb contig and the markers D1S193, D1S62, and D1S211 on a YAC contig spanning a maximum of 860 kb. Within the 1000-kb contig we were able to identify five CpG islands in addition to those associated with the earlier cloned genes. The YAC contigs as well as the physical map provide us with tools for the identification of the INCL gene. 36 refs., 4 figs., 3 tabs.« less
A high resolution radiation hybrid map of wheat chromosome 4A

USDA-ARS?s Scientific Manuscript database

Bread wheat has a large and complex allohexaploid genome with low recombination level at chromosome centromeric and peri-centromeric regions. This significantly hampers ordering of markers, contigs of physical maps and sequence scaffolds and impedes obtaining of high-quality reference genome sequenc...
Optical mapping and its potential for large-scale sequencing projects.

PubMed

Aston, C; Mishra, B; Schwartz, D C

1999-07-01

Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.
A draft physical map of a D-genome cotton species (Gossypium raimondii)

PubMed Central

2010-01-01

Background Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing. Results A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences. Conclusion Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence. PMID:20569427
BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

PubMed

Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

2016-07-01

The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Mapping of Micro-Tom BAC-End Sequences to the Reference Tomato Genome Reveals Possible Genome Rearrangements and Polymorphisms

PubMed Central

Asamizu, Erika; Shirasawa, Kenta; Hirakawa, Hideki; Sato, Shusei; Tabata, Satoshi; Yano, Kentaro; Ariizumi, Tohru; Shibata, Daisuke; Ezura, Hiroshi

2012-01-01

A total of 93,682 BAC-end sequences (BESs) were generated from a dwarf model tomato, cv. Micro-Tom. After removing repetitive sequences, the BESs were similarity searched against the reference tomato genome of a standard cultivar, “Heinz 1706.” By referring to the “Heinz 1706” physical map and by eliminating redundant or nonsignificant hits, 28,804 “unique pair ends” and 8,263 “unique ends” were selected to construct hypothetical BAC contigs. The total physical length of the BAC contigs was 495, 833, 423 bp, covering 65.3% of the entire genome. The average coverage of euchromatin and heterochromatin was 58.9% and 67.3%, respectively. From this analysis, two possible genome rearrangements were identified: one in chromosome 2 (inversion) and the other in chromosome 3 (inversion and translocation). Polymorphisms (SNPs and Indels) between the two cultivars were identified from the BLAST alignments. As a result, 171,792 polymorphisms were mapped on 12 chromosomes. Among these, 30,930 polymorphisms were found in euchromatin (1 per 3,565 bp) and 140,862 were found in heterochromatin (1 per 2,737 bp). The average polymorphism density in the genome was 1 polymorphism per 2,886 bp. To facilitate the use of these data in Micro-Tom research, the BAC contig and polymorphism information are available in the TOMATOMICS database. PMID:23227037
An integrated map of the genome of the tubercle bacillus, Mycobacterium tuberculosis H37Rv, and comparison with Mycobacterium leprae.

PubMed Central

Philipp, W J; Poulet, S; Eiglmeier, K; Pascopella, L; Balasubramanian, V; Heym, B; Bergh, S; Bloom, B R; Jacobs, W R; Cole, S T

1996-01-01

An integrated map of the genome of the tubercle bacillus, Mycobacterium tuberculosis, was constructed by using a twin-pronged approach. Pulsed-field gel electrophoretic analysis enabled cleavage sites for Asn I and Dra I to be positioned on the 4.4-Mb circular chromosome, while, in parallel, clones from two cosmid libraries were ordered into contigs by means of fingerprinting and hybridization mapping. The resultant contig map was readily correlated with the physical map of the genome via the landmarked restriction sites. Over 165 genes and markers were localized on the integrated map, thus enabling comparisons with the leprosy bacillus, Mycobacterium leprae, to be undertaken. Mycobacterial genomes appear to have evolved as mosaic structures since extended segments with conserved gene order and organization are interspersed with different flanking regions. Repetitive sequences and insertion elements are highly abundant in M. tuberculosis, but the distribution of IS6110 is apparently nonrandom. Images Fig. 1 Fig. 2 PMID:8610181
Construction of a 780-kb PAC, BAC, and cosmid contig encompassing the minimal critical deletion involved in B cell lymphocytic leukemia at 13q14.3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bouyge-Moreau, I.; Rondeau, G.; Andre, M.T.

A putative tumor suppressor gene involved in B cell chronic lymphocytic leukemia (B-CLL) was mapped to human chromosome 13q14.3 close to the genetic markers D13S25 and D13S319. We constructed a 780-kb-long contig composed of cosmids, bacterial artificial chromosomes, and bacteriophage PI-derived artificial chromosomes that provides essential information and tools for the positional cloning of this gene. The contig contains both flanking markers as well as several additional genetic markers, three ESTs, and one potential CpG island. In addition, using one B-CLL patient, we characterized a small internal deleted region of 550 kb. Comparing this deletion with other recently published deletionsmore » narrows the minimally deleted area to less than 100 kb in our physical map. This deletion core region should contain all or part of the disrupted in B cell malignancies tumor suppressor gene. 27 refs., 3 figs.« less
A Segment of the Apospory-Specific Genomic Region Is Highly Microsyntenic Not Only between the Apomicts Pennisetum squamulatum and Buffelgrass, But Also with a Rice Chromosome 11 Centromeric-Proximal Genomic Region1[W

PubMed Central

Gualtieri, Gustavo; Conner, Joann A.; Morishige, Daryl T.; Moore, L. David; Mullet, John E.; Ozias-Akins, Peggy

2006-01-01

Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory. PMID:16415213
A segment of the apospory-specific genomic region is highly microsyntenic not only between the apomicts Pennisetum squamulatum and buffelgrass, but also with a rice chromosome 11 centromeric-proximal genomic region.

PubMed

Gualtieri, Gustavo; Conner, Joann A; Morishige, Daryl T; Moore, L David; Mullet, John E; Ozias-Akins, Peggy

2006-03-01

Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory.
YAC and cosmid contigs encompassing the Fukuyama-type congenital muscular dystrophy (FCMD) candidate region on 9q31

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miyake, Masashi; Nakahori, Yutaka; Matsushita, Ikumi

1997-03-01

Fukuyama-type congenital muscular dystrophy (FCMD), the second most common form of childhood muscular dystrophy in Japan, is an autosomal recessive severe muscular dystrophy associated with an anomaly of the brain. We had mapped the FCMD gene to an approximately 5-cM interval between D9S127 and D9S2111 on 9q31-q33 and had also found evidence for linkage disequilibrium between FCMD and D9S306 in this candidate region. Through further analysis, we have defined another marker, D9S172, which showed stronger linkage disequilibrium than D9S306. A yeast artificial chromosome (YAC) contig spanning 3.5 Mb, which includes this D9S306-D9S172 interval on 9q31, has been constructed by amore » combination of sequence-tagged site, Alu-PCR, and restriction mapping. Also, cosmid clones subcloned from the YAC were assembled into three contigs, one of which contains D9S2107, which showed the strongest linkage disequilibrium with FCMD. These contigs also allowed us to order the markers as follows: cen-D9S127-({approximately}800 kb)-D9S306 (identical to D9S53)-({approximately}700 kb)-A107XF9-({approximately}500 kb)-D9S172-({approximately}30 kb)-D9S299 (identical to D9S774)-({approximately}120 kb)-WI2269-tel. Thus, we have constructed the first high-resolution physical map of the FCMD candidate region. The YAC and cosmid contigs established here will be a crucial resource for identification of the FCMD gene and other genes in this region. 37 refs., 7 figs., 2 tabs.« less
Toward Integration of Comparative Genetic, Physical, Diversity, and Cytomolecular Maps for Grasses and Grains, Using the Sorghum Genome as a Foundation1

PubMed Central

Draye, Xavier; Lin, Yann-Rong; Qian, Xiao-yin; Bowers, John E.; Burow, Gloria B.; Morrell, Peter L.; Peterson, Daniel G.; Presting, Gernot G.; Ren, Shu-xin; Wing, Rod A.; Paterson, Andrew H.

2001-01-01

The small genome of sorghum (Sorghum bicolor L. Moench.) provides an important template for study of closely related large-genome crops such as maize (Zea mays) and sugarcane (Saccharum spp.), and is a logical complement to distantly related rice (Oryza sativa) as a “grass genome model.” Using a high-density RFLP map as a framework, a robust physical map of sorghum is being assembled by integrating hybridization and fingerprint data with comparative data from related taxa such as rice and using new methods to resolve genomic duplications into locus-specific groups. By taking advantage of allelic variation revealed by heterologous probes, the positions of corresponding loci on the wheat (Triticum aestivum), rice, maize, sugarcane, and Arabidopsis genomes are being interpolated on the sorghum physical map. Bacterial artificial chromosomes for the small genome of rice are shown to close several gaps in the sorghum contigs; the emerging rice physical map and assembled sequence will further accelerate progress. An important motivation for developing genomic tools is to relate molecular level variation to phenotypic diversity. “Diversity maps,” which depict the levels and patterns of variation in different gene pools, shed light on relationships of allelic diversity with chromosome organization, and suggest possible locations of genomic regions that are under selection due to major gene effects (some of which may be revealed by quantitative trait locus mapping). Both physical maps and diversity maps suggest interesting features that may be integrally related to the chromosomal context of DNA—progress in cytology promises to provide a means to elucidate such relationships. We seek to provide a detailed picture of the structure, function, and evolution of the genome of sorghum and its relatives, together with molecular tools such as locus-specific sequence-tagged site DNA markers and bacterial artificial chromosome contigs that will have enduring value for many aspects of genome analysis. PMID:11244113
An Integrated Physical, Genetic and Cytogenetic Map of Brachypodium distachyon, a Model System for Grass Research

PubMed Central

Febrer, Melanie; Goicoechea, Jose Luis; Wright, Jonathan; McKenzie, Neil; Song, Xiang; Lin, Jinke; Collura, Kristi; Wissotski, Marina; Yu, Yeisoo; Ammiraju, Jetty S. S.; Wolny, Elzbieta; Idziak, Dominika; Betekhtin, Alexander; Kudrna, Dave; Hasterok, Robert; Wing, Rod A.; Bevan, Michael W.

2010-01-01

The pooid subfamily of grasses includes some of the most important crop, forage and turf species, such as wheat, barley and Lolium. Developing genomic resources, such as whole-genome physical maps, for analysing the large and complex genomes of these crops and for facilitating biological research in grasses is an important goal in plant biology. We describe a bacterial artificial chromosome (BAC)-based physical map of the wild pooid grass Brachypodium distachyon and integrate this with whole genome shotgun sequence (WGS) assemblies using BAC end sequences (BES). The resulting physical map contains 26 contigs spanning the 272 Mb genome. BES from the physical map were also used to integrate a genetic map. This provides an independent vaildation and confirmation of the published WGS assembly. Mapped BACs were used in Fluorescence In Situ Hybridisation (FISH) experiments to align the integrated physical map and sequence assemblies to chromosomes with high resolution. The physical, genetic and cytogenetic maps, integrated with whole genome shotgun sequence assemblies, enhance the accuracy and durability of this important genome sequence and will directly facilitate gene isolation. PMID:20976139
A 405-kb cosmid contig and HindIII restriction map of the progressive myoclonus epilepsy type 1 (EPM1) candidate region in 21q22.3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lafreniere, R.G.; Rouleau, G.A.; De Jong, P.J.

1995-09-01

As a step toward identifying the molecular defect in patients afflicted with progressive myoclonus epilepsy type 1 (EPM1), we have assembled a cosmid contig of the candidate EPM1 region in 21q22.3. The contig constitutes a collection of 87 different cosmids spanning 405 kb based on a derived HindIII restriction map. Potential CpG-rich islands have been identified based on the restriction map generated from eight different rare-cutting enzymes. This contig contains the genetic material required for the isolation of expressed sequences and the identification of the gene defective in EPM1 and possibly other disorders mapping to this region. 15 refs., 1more » fig.« less

Cross-species bacterial artificial chromosome (BAC) library screening via overgo-based hybridization and BAC-contig mapping of a yield enhancement quantitative trait locus (QTL) yld1.1 in the Malaysian wild rice Oryza rufipogon.

PubMed

Song, Beng-Kah; Nadarajah, Kalaivani; Romanov, Michael N; Ratnam, Wickneswari

2005-01-01

The construction of BAC-contig physical maps is an important step towards a partial or ultimate genome sequence analysis. Here, we describe our initial efforts to apply an overgo approach to screen a BAC library of the Malaysian wild rice species, Oryza rufipogon. Overgo design is based on repetitive element masking and sequence uniqueness, and uses short probes (approximately 40 bp), making this method highly efficient and specific. Pairs of 24-bp oligos that contain an 8-bp overlap were developed from the publicly available genomic sequences of the cultivated rice, O. sativa, to generate 20 overgo probes for a 1-Mb region that encompasses a yield enhancement QTL yld1.1 in O. rufipogon. The advantages of a high similarity in melting temperature, hybridization kinetics and specific activities of overgos further enabled a pooling strategy for library screening by filter hybridization. Two pools of ten overgos each were hybridized to high-density filters representing the O. rufipogon genomic BAC library. These screening tests succeeded in providing 69 PCR-verified positive hits from a total of 23,040 BAC clones of the entire O. rufipogon library. A minimal tilling path of clones was generated to contribute to a fully covered BAC-contig map of the targeted 1-Mb region. The developed protocol for overgo design based on O. sativa sequences as a comparative genomic framework, and the pooled overgo hybridization screening technique are suitable means for high-resolution physical mapping and the identification of BAC candidates for sequencing.
Leaf morphology in Cowpea [Vigna unguiculata (L.) Walp]: QTL analysis, physical mapping and identifying a candidate gene using synteny with model legume species

PubMed Central

2012-01-01

Background Cowpea [Vigna unguiculata (L.) Walp] exhibits a considerable variation in leaf shape. Although cowpea is mostly utilized as a dry grain and animal fodder crop, cowpea leaves are also used as a high-protein pot herb in many countries of Africa. Results Leaf morphology was studied in the cowpea RIL population, Sanzi (sub-globose leaf shape) x Vita 7 (hastate leaf shape). A QTL for leaf shape, Hls (hastate leaf shape), was identified on the Sanzi x Vita 7 genetic map spanning from 56.54 cM to 67.54 cM distance on linkage group 15. SNP marker 1_0910 was the most significant over the two experiments, accounting for 74.7% phenotypic variance (LOD 33.82) in a greenhouse experiment and 71.5% phenotypic variance (LOD 30.89) in a field experiment. The corresponding Hls locus was positioned on the cowpea consensus genetic map on linkage group 4, spanning from 25.57 to 35.96 cM. A marker-trait association of the Hls region identified SNP marker 1_0349 alleles co-segregating with either the hastate or sub-globose leaf phenotype. High co-linearity was observed for the syntenic Hls region in Medicago truncatula and Glycine max. One syntenic locus for Hls was identified on Medicago chromosome 7 while syntenic regions for Hls were identified on two soybean chromosomes, 3 and 19. In all three syntenic loci, an ortholog for the EZA1/SWINGER (AT4G02020.1) gene was observed and is the candidate gene for the Hls locus. The Hls locus was identified on the cowpea physical map via SNP markers 1_0910, 1_1013 and 1_0992 which were identified in three BAC contigs; contig926, contig821 and contig25. Conclusions This study has demonstrated how integrated genomic resources can be utilized for a candidate gene approach. Identification of genes which control leaf morphology may be utilized to improve the quality of cowpea leaves for vegetable and or forage markets as well as contribute to more fundamental research understanding the control of leaf shape in legumes. PMID:22691139
Leaf morphology in Cowpea [Vigna unguiculata (L.) Walp]: QTL analysis, physical mapping and identifying a candidate gene using synteny with model legume species.

PubMed

Pottorff, Marti; Ehlers, Jeffrey D; Fatokun, Christian; Roberts, Philip A; Close, Timothy J

2012-06-12

Cowpea [Vigna unguiculata (L.) Walp] exhibits a considerable variation in leaf shape. Although cowpea is mostly utilized as a dry grain and animal fodder crop, cowpea leaves are also used as a high-protein pot herb in many countries of Africa. Leaf morphology was studied in the cowpea RIL population, Sanzi (sub-globose leaf shape) x Vita 7 (hastate leaf shape). A QTL for leaf shape, Hls (hastate leaf shape), was identified on the Sanzi x Vita 7 genetic map spanning from 56.54 cM to 67.54 cM distance on linkage group 15. SNP marker 1_0910 was the most significant over the two experiments, accounting for 74.7% phenotypic variance (LOD 33.82) in a greenhouse experiment and 71.5% phenotypic variance (LOD 30.89) in a field experiment. The corresponding Hls locus was positioned on the cowpea consensus genetic map on linkage group 4, spanning from 25.57 to 35.96 cM. A marker-trait association of the Hls region identified SNP marker 1_0349 alleles co-segregating with either the hastate or sub-globose leaf phenotype. High co-linearity was observed for the syntenic Hls region in Medicago truncatula and Glycine max. One syntenic locus for Hls was identified on Medicago chromosome 7 while syntenic regions for Hls were identified on two soybean chromosomes, 3 and 19. In all three syntenic loci, an ortholog for the EZA1/SWINGER (AT4G02020.1) gene was observed and is the candidate gene for the Hls locus. The Hls locus was identified on the cowpea physical map via SNP markers 1_0910, 1_1013 and 1_0992 which were identified in three BAC contigs; contig926, contig821 and contig25. This study has demonstrated how integrated genomic resources can be utilized for a candidate gene approach. Identification of genes which control leaf morphology may be utilized to improve the quality of cowpea leaves for vegetable and or forage markets as well as contribute to more fundamental research understanding the control of leaf shape in legumes.
A YAC contig encompassing the chromosome 7p locus for autosomal dominant retinitis pigmentosa

DOE Office of Scientific and Technical Information (OSTI.GOV)

Inglehearn, C.F.; Keen, T.J.; Ratel, R.

1994-09-01

Retinitis pigmentosa is an inherited retinal degeneration characterized by night blindness and loss of peripheral vision, often leading to complete blindness. The autosomal dominant form (adRP) maps to at least six different loci, including the rhodopsin and peripherin/Rds genes and four loci identified only by linkage analysis on chromosomes 7p, 7q, 8cen and 19q. The 7p locus was reported by this laboratory in a large English family, with a lod score of 16.5. Several new genetic markers have been tested in the family and this locus has now been refined to an interval of approximately 1 cM between markers D7S795more » and D7S484 in the 7p13-15 region. In order to clone the gene for adRP, we have used microsatellites and STSs from the region to identify over 80 YACs, from four different libraries, which map to this interval. End clones from key YACs were isolated for the generation of additional STSs. Eleven microsatellite markers between D7S435 (distal) and D7S484 (proximal) have been ordered by a combination of both physical and genetic mapping. In this way we have now obtained a YAC contig spanning approximately 3 megabases of chromosome 7p within which the adRP gene must lie. One gene (aquaporin) and one chromosome 7 brain EST have been placed on the contig but both map distal to the region of interest. Sixteen other ESTs and three further known 7p genes mapping in the region have been excluded. We are now attempting to build a cosmid contig in the defined interval and identify further expressed sequences from both YACs and cosmids to test as candidates for the adRP gene.« less
A physical map, including a BAC/PAC clone contig, of the Williams-Beuren syndrome--deletion region at 7q11.23.

PubMed

Peoples, R; Franke, Y; Wang, Y K; Pérez-Jurado, L; Paperna, T; Cisco, M; Francke, U

2000-01-01

Williams-Beuren syndrome (WBS) is a developmental disorder caused by haploinsufficiency for genes in a 2-cM region of chromosome band 7q11.23. With the exception of vascular stenoses due to deletion of the elastin gene, the various features of WBS have not yet been attributed to specific genes. Although >/=16 genes have been identified within the WBS deletion, completion of a physical map of the region has been difficult because of the large duplicated regions flanking the deletion. We present a physical map of the WBS deletion and flanking regions, based on assembly of a bacterial artificial chromosome/P1-derived artificial chromosome contig, analysis of high-throughput genome-sequence data, and long-range restriction mapping of genomic and cloned DNA by pulsed-field gel electrophoresis. Our map encompasses 3 Mb, including 1.6 Mb within the deletion. Two large duplicons, flanking the deletion, of >/=320 kb contain unique sequence elements from the internal border regions of the deletion, such as sequences from GTF2I (telomeric) and FKBP6 (centromeric). A third copy of this duplicon exists in inverted orientation distal to the telomeric flanking one. These duplicons show stronger sequence conservation with regard to each other than to the presumptive ancestral loci within the common deletion region. Sequence elements originating from beyond 7q11.23 are also present in these duplicons. Although the duplicons are not present in mice, the order of the single-copy genes in the conserved syntenic region of mouse chromosome 5 is inverted relative to the human map. A model is presented for a mechanism of WBS-deletion formation, based on the orientation of duplicons' components relative to each other and to the ancestral elements within the deletion region.
MagnaportheDB: a federated solution for integrating physical and genetic map data with BAC end derived sequences for the rice blast fungus Magnaporthe grisea.

PubMed

Martin, Stanton L; Blackmon, Barbara P; Rajagopalan, Ravi; Houfek, Thomas D; Sceeles, Robert G; Denn, Sheila O; Mitchell, Thomas K; Brown, Douglas E; Wing, Rod A; Dean, Ralph A

2002-01-01

We have created a federated database for genome studies of Magnaporthe grisea, the causal agent of rice blast disease, by integrating end sequence data from BAC clones, genetic marker data and BAC contig assembly data. A library of 9216 BAC clones providing >25-fold coverage of the entire genome was end sequenced and fingerprinted by HindIII digestion. The Image/FPC software package was then used to generate an assembly of 188 contigs covering >95% of the genome. The database contains the results of this assembly integrated with hybridization data of genetic markers to the BAC library. AceDB was used for the core database engine and a MySQL relational database, populated with numerical representations of BAC clones within FPC contigs, was used to create appropriately scaled images. The database is being used to facilitate sequencing efforts. The database also allows researchers mapping known genes or other sequences of interest, rapid and easy access to the fundamental organization of the M.grisea genome. This database, MagnaportheDB, can be accessed on the web at http://www.cals.ncsu.edu/fungal_genomics/mgdatabase/int.htm.
Physical and transcriptional map in the CMT 1A region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chevillard, C.; Passage, E.; Cudrey, C.

1994-09-01

The Charcot-Marie-Tooth disease type 1A (CMT1A) has been mapped to the proximal short arm of chromosome 17. CMT1A is the most frequent of the motor and sensory peripheral neuropathies and is associated with a duplication of a 1.5 Mb fragment in proximal 17p12. Several groups have proposed that the gene coding for peripheral myelin protein-22 (PMP-22) as the candidate gene for CMT1A. We have recently published a {open_quote}MegaYAC{close_quote} contig of 6 Mb which covers the CMT1A critical region. In order to isolate new genes localized in this region, we used a {open_quote}physical trapping {close_quote} strategy derived from the direct cDNAmore » selection technique developed by Parimoo et al. This approach has allowed us to construct cDNA {open_quotes}minilibraries{close_quotes} using YAC DNA from the CMT1A region. One of the clones in these minilibraries has been mapped back to the CMT1A duplication. Other potentially interesting clones are in the process of further characterization. Furthermore, we have mapped several Genethon microsatellites in the 6 Mb YAC contig and some are located in the CMT1A duplicated region. These highly polymorphic markers should prove useful for diagnostic testing in CMT1A.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Willard, H.F.; Cremers, F.; Mandel, J.L.

A high-quality integrated genetic and physical map of the X chromosome from telomere to telomere, based primarily on YACs formatted with probes and STSs, is increasingly close to reality. At the Fifth International X Chromosome Workshop, organized by A.M. Poustka and D. Schlessinger in Heidelberg, Germany, April 24--27, 1994, substantial progress was recorded on extension and refinement of the physical map, on the integration of genetic and cytogenetic data, on attempts to use the map to direct gene searches, and on nascent large-scale sequencing efforts. This report summarizes physical and genetic mapping information presented at the workshop and/or published sincemore » the reports of the fourth International X Chromosome Workshop. The principle aim of the workshop was to derive a consensus map of the chromosome, in terms of physical contigs emphasizing the location of genes and microsatellite markers. The resulting map is presented and updates previous versions. This report also updates the list of highly informative microsatellites. The text highlights the working state of the map, the genes known to reside on the X, and the progress toward integration of various types of data.« less
A genetically anchored physical framework for Theobroma cacao cv. Matina 1-6

PubMed Central

2011-01-01

Background The fermented dried seeds of Theobroma cacao (cacao tree) are the main ingredient in chocolate. World cocoa production was estimated to be 3 million tons in 2010 with an annual estimated average growth rate of 2.2%. The cacao bean production industry is currently under threat from a rise in fungal diseases including black pod, frosty pod, and witches' broom. In order to address these issues, genome-sequencing efforts have been initiated recently to facilitate identification of genetic markers and genes that could be utilized to accelerate the release of robust T. cacao cultivars. However, problems inherent with assembly and resolution of distal regions of complex eukaryotic genomes, such as gaps, chimeric joins, and unresolvable repeat-induced compressions, have been unavoidably encountered with the sequencing strategies selected. Results Here, we describe the construction of a BAC-based integrated genetic-physical map of the T. cacao cultivar Matina 1-6 which is designed to augment and enhance these sequencing efforts. Three BAC libraries, each comprised of 10× coverage, were constructed and fingerprinted. 230 genetic markers from a high-resolution genetic recombination map and 96 Arabidopsis-derived conserved ortholog set (COS) II markers were anchored using pooled overgo hybridization. A dense tile path consisting of 29,383 BACs was selected and end-sequenced. The physical map consists of 154 contigs and 4,268 singletons. Forty-nine contigs are genetically anchored and ordered to chromosomes for a total span of 307.2 Mbp. The unanchored contigs (105) span 67.4 Mbp and therefore the estimated genome size of T. cacao is 374.6 Mbp. A comparative analysis with A. thaliana, V. vinifera, and P. trichocarpa suggests that comparisons of the genome assemblies of these distantly related species could provide insights into genome structure, evolutionary history, conservation of functional sites, and improvements in physical map assembly. A comparison between the two T. cacao cultivars Matina 1-6 and Criollo indicates a high degree of collinearity in their genomes, yet rearrangements were also observed. Conclusions The results presented in this study are a stand-alone resource for functional exploitation and enhancement of Theobroma cacao but are also expected to complement and augment ongoing genome-sequencing efforts. This resource will serve as a template for refinement of the T. cacao genome through gap-filling, targeted re-sequencing, and resolution of repetitive DNA arrays. PMID:21846342
A genetically anchored physical framework for Theobroma cacao cv. Matina 1-6.

PubMed

Saski, Christopher A; Feltus, Frank A; Staton, Margaret E; Blackmon, Barbara P; Ficklin, Stephen P; Kuhn, David N; Schnell, Raymond J; Shapiro, Howard; Motamayor, Juan Carlos

2011-08-16

The fermented dried seeds of Theobroma cacao (cacao tree) are the main ingredient in chocolate. World cocoa production was estimated to be 3 million tons in 2010 with an annual estimated average growth rate of 2.2%. The cacao bean production industry is currently under threat from a rise in fungal diseases including black pod, frosty pod, and witches' broom. In order to address these issues, genome-sequencing efforts have been initiated recently to facilitate identification of genetic markers and genes that could be utilized to accelerate the release of robust T. cacao cultivars. However, problems inherent with assembly and resolution of distal regions of complex eukaryotic genomes, such as gaps, chimeric joins, and unresolvable repeat-induced compressions, have been unavoidably encountered with the sequencing strategies selected. Here, we describe the construction of a BAC-based integrated genetic-physical map of the T. cacao cultivar Matina 1-6 which is designed to augment and enhance these sequencing efforts. Three BAC libraries, each comprised of 10× coverage, were constructed and fingerprinted. 230 genetic markers from a high-resolution genetic recombination map and 96 Arabidopsis-derived conserved ortholog set (COS) II markers were anchored using pooled overgo hybridization. A dense tile path consisting of 29,383 BACs was selected and end-sequenced. The physical map consists of 154 contigs and 4,268 singletons. Forty-nine contigs are genetically anchored and ordered to chromosomes for a total span of 307.2 Mbp. The unanchored contigs (105) span 67.4 Mbp and therefore the estimated genome size of T. cacao is 374.6 Mbp. A comparative analysis with A. thaliana, V. vinifera, and P. trichocarpa suggests that comparisons of the genome assemblies of these distantly related species could provide insights into genome structure, evolutionary history, conservation of functional sites, and improvements in physical map assembly. A comparison between the two T. cacao cultivars Matina 1-6 and Criollo indicates a high degree of collinearity in their genomes, yet rearrangements were also observed. The results presented in this study are a stand-alone resource for functional exploitation and enhancement of Theobroma cacao but are also expected to complement and augment ongoing genome-sequencing efforts. This resource will serve as a template for refinement of the T. cacao genome through gap-filling, targeted re-sequencing, and resolution of repetitive DNA arrays.
Construction of the physical map of the gpa7 locus reveals that a large segment was deleted during rice domestication.

PubMed

Li, Xianran; Tian, Feng; Huang, Haiyan; Tan, Lubin; Zhu, Zuofeng; Hu, Songnian; Sun, Chuanqing

2008-06-01

To facilitate cloning gene(s) underlying gpa7, a deep-coverage BAC library was constructed for an isolate of common wild rice (Oryza rufipogon Griff.) collected from Dongxiang, Jiangxi Province, China (DXCWR). gpa7, a quantitative trait locus corresponding to grain number per panicle, is positioned in the short arm of chromosome 7. The BAC library containing 96,768 clones represents approximate 18 haploid genome equivalents. The contig spanning DXCWR gpa7 was constructed with a series of ordered markers. The putative physical map near the gpa7 locus of another accession of O. rufipogon (Accession: IRGC 105491) was also isolated in silico. Analysis of the physical maps of gpa7 indicated that a segment of about 150 kb was deleted during domestication of common wild rice.
Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

PubMed Central

Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

2006-01-01

Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We report here the result of the first extensive analysis of the conserved microsynteny using DNA sequences across the Prunus genome and their Arabidopsis homologs. Our study also illustrates that both the ancestral and present Arabidopsis genomes can provide a useful resource for marker saturation and candidate gene search, as well as elucidating evolutionary relationships between species. PMID:16615871
Assembly of ordered contigs of cosmids selected with YACs of human chromosome 13

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fischer, S.G.; Cayanis, E.; Boukhgalter, B.

1994-06-01

The authors have developed an efficient method for assembling ordered cosmid contigs aligned to mega-YACs and midi-YACs (average insert sizes of 1.0 and 0.35 Mb, respectively) and used this general method to initiate high-resolution physical mapping of human chromosome 13 (Chr 13). Chr 13-enriched midi-YAC (mYAC) and mega-YAC (MYAC) sublibraries were obtained from corresponding CEPH total human YAC libraries by selecting colonies with inter-Alu PCR probes derived from Chr 13 monochromosomal cell hybrid DNA. These sublibraries were arrayed on filters at high density. In this approach, the MYAC 13 sublibrary is screened by hybridization with cytogenetically assigned Chr 13 DNAmore » probes to select one or a small subset of MYACs. Inter-Alu PCR products from each mYAC are then hybridized to the MYAC and mYAC sublibraries to identify overlapping YACs and to an arrayed Chr 13-specific cosmid library to select corresponding cosmids. The set of selected cosmids, gridded on filters at high density, is hybridized with inter-Alu PCR products from each of the overlapping YACs to identify subsets of cosmids and also with riboprobes from each cosmid of the arrayed set ({open_quotes}cosmid matrix cross-hybridization{close_quotes}). From these data, cosmid contigs are assembled by a specifically designed computer program. Application of this method generates cosmid contigs spanning the length of a MYAC with few gaps. To provide a high-resolution map, ends of cosmids are sequenced at preselected sites to position densely spaced sequence-tagged sites. 33 refs., 7 figs., 1 tab.« less
A Physical Map, Including a BAC/PAC Clone Contig, of the Williams-Beuren Syndrome–Deletion Region at 7q11.23

PubMed Central

Peoples, Risa; Franke, Yvonne; Wang, Yu-Ker; Pérez-Jurado, Luis; Paperna, Tamar; Cisco, Michael; Francke, Uta

2000-01-01

Summary Williams-Beuren syndrome (WBS) is a developmental disorder caused by haploinsufficiency for genes in a 2-cM region of chromosome band 7q11.23. With the exception of vascular stenoses due to deletion of the elastin gene, the various features of WBS have not yet been attributed to specific genes. Although ⩾16 genes have been identified within the WBS deletion, completion of a physical map of the region has been difficult because of the large duplicated regions flanking the deletion. We present a physical map of the WBS deletion and flanking regions, based on assembly of a bacterial artificial chromosome/P1-derived artificial chromosome contig, analysis of high-throughput genome-sequence data, and long-range restriction mapping of genomic and cloned DNA by pulsed-field gel electrophoresis. Our map encompasses 3 Mb, including 1.6 Mb within the deletion. Two large duplicons, flanking the deletion, of ⩾320 kb contain unique sequence elements from the internal border regions of the deletion, such as sequences from GTF2I (telomeric) and FKBP6 (centromeric). A third copy of this duplicon exists in inverted orientation distal to the telomeric flanking one. These duplicons show stronger sequence conservation with regard to each other than to the presumptive ancestral loci within the common deletion region. Sequence elements originating from beyond 7q11.23 are also present in these duplicons. Although the duplicons are not present in mice, the order of the single-copy genes in the conserved syntenic region of mouse chromosome 5 is inverted relative to the human map. A model is presented for a mechanism of WBS-deletion formation, based on the orientation of duplicons' components relative to each other and to the ancestral elements within the deletion region. PMID:10631136
Towards a transcription map spanning a 250 kb area within the DiGeorge syndrome chromosome region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wong, W.; Emanuel, B.S.; Siegert, J.

1994-09-01

DiGeorge syndrome (DGS) and velocardiofacial syndrome (VCFS) are congenital anomalies affecting predominantly the thymus, parathyroid glands, heart and craniofacial development. Detection of 22q11.2 deletions in the majority of DGS and VCFS patients implicate 22q11 haploinsufficiency in the etiology of these disorders. The VCFS/DGS critical region lies within the proximal portion of a commonly deleted 1.2 Mb region in 22q11. A 250 kb cosmid contig covering this critical region and containing D22S74 (N25) has been established. From this contig, eleven cosmids with minimal overlap were biotinylated by nick translation, and hybridized to PCR-amplified cDNAs prepared from different tissues. The use ofmore » cDNAs from a variety of tissues increases the likelihood of identifying low abundance transcripts and tissue-specific expressed sequences. A DGCR-specific cDNA sublibrary consisting of 670 cDNA clones has been constructed. To date, 49 cDNA clones from this sub-library have been identified with single copy probes and cosmids containing putative CpG islands. Based on sequence analysis, 25 of the clones contain regions of homology to several cDNAs which map within the proximal contig. LAN is a novel partial cDNA isolated from a fetal brain library probed with one of the cosmids in the proximal contig. Using LAN as a probe, we have found 19 positive clones in the DGCR-specific cDNA sub-library (4 clones from fetal brain, 14 from adult skeletal muscle and one from fetal liver). Some of the LAN-positive clones extend the partial cDNA in the 5{prime} direction and will be useful in assembling a full length transcript. This resource will be used to develop a complete transcriptional map of the critical region in order to identify candidate gene(s) involved in the etiology of DGS/VCFS and to determine the relationship between the transcriptional and physical maps of 22q11.« less
Towards cloning the WAS-gene locus: YAC-contigs and PFGE analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meindi, A.; Schindelhauer, D.; Hellebrand, H.

1994-09-01

Patients with X-linked recessive Wiskott-Aldrich syndrome (WAS) manifest eczema, thrombocytopenia and severe immunodeficiency. Mapping studies place the WAS gene locus between the markers TIMP and DXS255 which both have been shown to be recombinant with the disease locus. Linkage analysis in eight families including a large Swiss family showed tight linkage of the disease to the loci DXS255 and DXS1126 and exclusion of TIMP as well as polymorphic loci adjacent to the OATL1 pseudogene cluster (e.g., DXS6616). Physical mapping with established YAC contigs and a radiation hybrid encompassing the Xp11.22-11.3 region revealed the loci order TIMP-PFC-elk1-DXS1367-DXS6616-OATL1-(DXS11260DXS226)-C5-3-TGE-3, SYP and (DXS255-DXS146). Themore » markers TIMP and C5-3 are contained on the same 1.6 Mb MluI-fragment. A novel expressed sequence (R1) could be placed between elk-1 and the PFC gene while the STS C5-3 could be localized adjacent to DXS1126. The gene cluster around DXS1126 could be connected with the TFE-3 and synaptophysin genes which map on the same 400 kb MluI fragment and two overlapping YACs. The minimum distance between SYP and DXS255 is 1.2 Mb; the maximum distance is 2.2 Mb. Expressed sequences which are obtained from a cosmid contig around DXS1126 and C5-3 are being used for mutation screening in WAS patients.« less
BACCardI--a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison.

PubMed

Bartels, Daniela; Kespohl, Sebastian; Albaum, Stefan; Drüke, Tanja; Goesmann, Alexander; Herold, Julia; Kaiser, Olaf; Pühler, Alfred; Pfeiffer, Friedhelm; Raddatz, Günter; Stoye, Jens; Meyer, Folker; Schuster, Stephan C

2005-04-01

We provide the graphical tool BACCardI for the construction of virtual clone maps from standard assembler output files or BLAST based sequence comparisons. This new tool has been applied to numerous genome projects to solve various problems including (a) validation of whole genome shotgun assemblies, (b) support for contig ordering in the finishing phase of a genome project, and (c) intergenome comparison between related strains when only one of the strains has been sequenced and a large insert library is available for the other. The BACCardI software can seamlessly interact with various sequence assembly packages. Genomic assemblies generated from sequence information need to be validated by independent methods such as physical maps. The time-consuming task of building physical maps can be circumvented by virtual clone maps derived from read pair information of large insert libraries.
A cosmid and cDNA fine physical map of a human chromosome 13q14 region frequently lost in B-cell chronic lymphocytic leukemia and identification of a new putative tumor suppressor gene, Leu5.

PubMed

Kapanadze, B; Kashuba, V; Baranova, A; Rasool, O; van Everdink, W; Liu, Y; Syomov, A; Corcoran, M; Poltaraus, A; Brodyansky, V; Syomova, N; Kazakov, A; Ibbotson, R; van den Berg, A; Gizatullin, R; Fedorova, L; Sulimova, G; Zelenin, A; Deaven, L; Lehrach, H; Grander, D; Buys, C; Oscier, D; Zabarovsky, E R; Einhorn, S; Yankovsky, N

1998-04-17

B-cell chronic lymphocytic leukemia (B-CLL) is a human hematological neoplastic disease often associated with the loss of a chromosome 13 region between RB1 gene and locus D13S25. A new tumor suppressor gene (TSG) may be located in the region. A cosmid contig has been constructed between the loci D13S1168 (WI9598) and D13S25 (H2-42), which corresponds to the minimal region shared by B-CLL associated deletions. The contig includes more than 200 LANL and ICRF cosmid clones covering 620 kb. Three cDNAs likely corresponding to three different genes have been found in the minimally deleted region, sequenced and mapped against the contigged cosmids. cDNA clone 10k4 as well as a chimeric clone 13g3, codes for a zinc-finger domain of the RING type and shares homology to some known genes involved in tumorigenesis (RET finger protein, BRCA1) and embryogenesis (MID1). We have termed the gene corresponding to 10k4/13g3 clones LEU5. This is the first gene with homology to known TSGs which has been found in the region of B-CLL rearrangements.
Mapping of the locus for autosomal dominant amelogenesis imperfecta (AIH2) to a 4-Mb YAC contig on chromosome 4q11-q21

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kaerrman, C.; Holmgren, G.; Forsman, K.

1997-01-15

Amelogenesis imperfecta (Al) is a clinically and genetically heterogeneous group of inherited enamel defects. We recently mapped a locus for autosomal dominant local hypoplastic amelogenesis imperfecta (AIH2) to the long arm of chromosome 4. The disease gene was localized to a 17.6-cM region between the markers D4S392 and D4S395. The albumin gene (ALB), located in the same interval, was a candidate gene for autosomal dominant AI (ADAI) since albumin has a potential role in enamel maturation. Here we describe refined mapping of the AIH2 locus and the construction of marker maps by radiation hybrid mapping and yeast artificial chromosome (YAC)-basedmore » sequence tagged site-content mapping. A radiation hybrid map consisting of 11 microsatellite markers in the 5-cM interval between D4S409 and D4S1558 was constructed. Recombinant haplotypes in six Swedish ADAI families suggest that the disease gene is located in the interval between D4S2421 and ALB. ALB is therefore not likely to be the disease-causing gene. Affected members in all six families share the same allele haplotypes, indicating a common ancestral mutation in all families. The AIH2 critical region is less than 4 cM and spans a physical distance of approximately 4 Mb as judged from radiation hybrid maps. A YAC contig over the AIH2 critical region including several potential candidate genes was constructed. 35 refs., 4 figs., 1 tab.« less
Integration of hybridization-based markers (overgos) into physical maps for comparative and evolutionary explorations in the genus Oryza and in Sorghum

PubMed Central

Hass-Jacobus, Barbara L; Futrell-Griggs, Montona; Abernathy, Brian; Westerman, Rick; Goicoechea, Jose-Luis; Stein, Joshua; Klein, Patricia; Hurwitz, Bonnie; Zhou, Bin; Rakhshan, Fariborz; Sanyal, Abhijit; Gill, Navdeep; Lin, Jer-Young; Walling, Jason G; Luo, Mei Zhong; Ammiraju, Jetty Siva S; Kudrna, Dave; Kim, Hye Ran; Ware, Doreen; Wing, Rod A; Miguel, Phillip San; Jackson, Scott A

2006-01-01

Background With the completion of the genome sequence for rice (Oryza sativa L.), the focus of rice genomics research has shifted to the comparison of the rice genome with genomes of other species for gene cloning, breeding, and evolutionary studies. The genus Oryza includes 23 species that shared a common ancestor 8–10 million years ago making this an ideal model for investigations into the processes underlying domestication, as many of the Oryza species are still undergoing domestication. This study integrates high-throughput, hybridization-based markers with BAC end sequence and fingerprint data to construct physical maps of rice chromosome 1 orthologues in two wild Oryza species. Similar studies were undertaken in Sorghum bicolor, a species which diverged from cultivated rice 40–50 million years ago. Results Overgo markers, in conjunction with fingerprint and BAC end sequence data, were used to build sequence-ready BAC contigs for two wild Oryza species. The markers drove contig merges to construct physical maps syntenic to rice chromosome 1 in the wild species and provided evidence for at least one rearrangement on chromosome 1 of the O. sativa versus Oryza officinalis comparative map. When rice overgos were aligned to available S. bicolor sequence, 29% of the overgos aligned with three or fewer mismatches; of these, 41% gave positive hybridization signals. Overgo hybridization patterns supported colinearity of loci in regions of sorghum chromosome 3 and rice chromosome 1 and suggested that a possible genomic inversion occurred in this syntenic region in one of the two genomes after the divergence of S. bicolor and O. sativa. Conclusion The results of this study emphasize the importance of identifying conserved sequences in the reference sequence when designing overgo probes in order for those probes to hybridize successfully in distantly related species. As interspecific markers, overgos can be used successfully to construct physical maps in species which diverged less than 8 million years ago, and can be used in a more limited fashion to examine colinearity among species which diverged as much as 40 million years ago. Additionally, overgos are able to provide evidence of genomic rearrangements in comparative physical mapping studies. PMID:16895597

Mapping the yeast genome by melting in nanofluidic devices

NASA Astrophysics Data System (ADS)

Welch, Robert L.; Czolkos, Ilja; Sladek, Rob; Reisner, Walter

2012-02-01

Optical mapping of DNA provides large-scale genomic information that can be used to assemble contigs from next-generation sequencing, and to detect re-arrangements between single cells. A recent optical mapping technique called denaturation mapping has the unique advantage of using physical principles rather than the action of enzymes to probe genomic structure. The absence of reagents or reaction steps makes denaturation mapping simpler than other protocols. Denaturation mapping uses fluorescence microscopy to image the pattern of partial melting along a DNA molecule extended in a channel of cross-section ˜100nm at the heart of a nanofluidic device. We successfully aligned melting maps from single DNA molecules to a theoretical map of the yeast genome (11.6Mbp) to identify their location. By aligning hundreds of molecules we assembled a consensus melting map of the yeast genome with 95% coverage.
Construction of a high-density, high-resolution genetic map and its integration with BAC-based physical map in channel catfish

PubMed Central

Li, Yun; Liu, Shikai; Qin, Zhenkui; Waldbieser, Geoff; Wang, Ruijia; Sun, Luyang; Bao, Lisui; Danzmann, Roy G.; Dunham, Rex; Liu, Zhanjiang

2015-01-01

Construction of genetic linkage map is essential for genetic and genomic studies. Recent advances in sequencing and genotyping technologies made it possible to generate high-density and high-resolution genetic linkage maps, especially for the organisms lacking extensive genomic resources. In the present work, we constructed a high-density and high-resolution genetic map for channel catfish with three large resource families genotyped using the catfish 250K single-nucleotide polymorphism (SNP) array. A total of 54,342 SNPs were placed on the linkage map, which to our knowledge had the highest marker density among aquaculture species. The estimated genetic size was 3,505.4 cM with a resolution of 0.22 cM for sex-averaged genetic map. The sex-specific linkage maps spanned a total of 4,495.1 cM in females and 2,593.7 cM in males, presenting a ratio of 1.7 : 1 between female and male in recombination fraction. After integration with the previously established physical map, over 87% of physical map contigs were anchored to the linkage groups that covered a physical length of 867 Mb, accounting for ∼90% of the catfish genome. The integrated map provides a valuable tool for validating and improving the catfish whole-genome assembly and facilitates fine-scale QTL mapping and positional cloning of genes responsible for economically important traits. PMID:25428894
Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.).

PubMed

Cloutier, Sylvie; Ragupathy, Raja; Miranda, Evelyn; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Ward, Kerry; Rowland, Gordon; Duguid, Scott; Banik, Mitali

2012-12-01

Three linkage maps of flax (Linum usitatissimum L.) were constructed from populations CDC Bethune/Macbeth, E1747/Viking and SP2047/UGG5-5 containing between 385 and 469 mapped markers each. The first consensus map of flax was constructed incorporating 770 markers based on 371 shared markers including 114 that were shared by all three populations and 257 shared between any two populations. The 15 linkage group map corresponds to the haploid number of chromosomes of this species. The marker order of the consensus map was largely collinear in all three individual maps but a few local inversions and marker rearrangements spanning short intervals were observed. Segregation distortion was present in all linkage groups which contained 1-52 markers displaying non-Mendelian segregation. The total length of the consensus genetic map is 1,551 cM with a mean marker density of 2.0 cM. A total of 670 markers were anchored to 204 of the 416 fingerprinted contigs of the physical map corresponding to ~274 Mb or 74 % of the estimated flax genome size of 370 Mb. This high resolution consensus map will be a resource for comparative genomics, genome organization, evolution studies and anchoring of the whole genome shotgun sequence.
Refined mapping and YAC contig construction of the X-linked cleft palate and ankyloglossia locus (CPX) including the proximal X-Y homology breakpoint within Xq21.3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Forbes, S.A.; Brennan, L.; Richardson, M.

1996-01-01

The gene for X-linked cleft palate (CPX) has previously been mapped in an Icelandic kindred between the unordered proximal markers DXS1002/DXS349/DXS95 and the distal marker DXYS1X, which maps to the proximal end of the X-Y homology region in Xq21.3. Using six sequence-tagged sites (STSs) within the region, a total of 91 yeast artificial chromosome (YAC) clones were isolated and overlapped in a single contig that spans approximately 3.1 Mb between DXS1002 and DXYS1X. The order of microsatellite and STS markers in this was established as DXS1002-DXS1168-DXS349-DXS95-DXS364-DXS1196-DXS472-DXS1217-DXYS1X. A long-range restriction map of this region was created using eight nonchimeric, overlapping YACmore » clones. Analysis of newly positioned polymorphic markers in recombinant individuals from the Icelandic family has enabled us to identify DXS1196 and DXS1217 as the flanking markers for CPX. The maximum physical distance containing the CPX gene has been estimated to be 2.0 Mb, which is spanned by a minimum set of five nonchimeric YAC clones. In addition, YAC end clone and STS analyses have pinpointed the location of the proximal boundary of the X-Y homology region within the map. 40 refs., 2 figs., 2 tabs.« less
Long-range restriction map of human chromosome 22q11-22q12 between the lambda immunoglobulin locus and the Ewing sarcoma breakpoint

DOE Office of Scientific and Technical Information (OSTI.GOV)

McDermid, H.E.; Budarf, M.L.; Emanuel, B.S.

1993-11-01

A long-range restriction map of the region between the immunoglobulin lambda locus and the Ewing sarcoma breakpoint has been constructed using the rare-cutting enzymes NotI, NruI, AscI, and BsiWI. The map spans approximately 11,000 kb and represents about one-fifth of the long arm of chromosome 22. Thirty-nine markers, including seven NotI junction clones as well as numerous genes and anonymous sequences, were mapped to the region with a somatic cell hybrid panel. These probes were then used to produce the map. The seven NotI junction clones each identified a possible CpG island. The breakpoints of the RAJ5 hybrid and themore » Ewing sarcoma t(11;22) were also localized in the resulting map. This physical map will be useful in studying chromosomal rearrangements in the region, as well as providing the details to examine the fidelity of the YAC and cosmid contigs currently under construction. Comparisons of this physical map to genetic and radiation hybrid maps are discussed. 52 refs., 7 figs., 3 tabs.« less
High-throughput physical mapping of chromosomes using automated in situ hybridization.

PubMed

George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V

2012-06-28

Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.
Report of the Fourth International Workshop on human X chromosome mapping 1993

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schlessinger, D.; Mandel, J.L.; Monaco, A.P.

1993-12-31

Vigorous interactive efforts by the X chromosome community have led to accelerated mapping in the last six months. Seventy-five participants from 12 countries around the globe contributed progress reports to the Fourth International X Chromosome Workshop, at St. Louis, MO, May 9-12, 1993. It became clear that well over half the chromosome is now covered by YAC contigs that are being extended, verified, and aligned by their content of STSs and other markers placed by cytogenetic or linkage mapping techniques. The major aim of the workshop was to assemble the consensus map that appears in this report, summarizing both consensusmore » order and YAC contig information.« less
A Fine Physical Map of the Rice Chromosome 4

PubMed Central

Zhao, Qiang; Zhang, Yu; Cheng, Zhukuan; Chen, Mingsheng; Wang, Shengyue; Feng, Qi; Huang, Yucheng; Li, Ying; Tang, Yesheng; Zhou, Bo; Chen, Zhehua; Yu, Shuliang; Zhu, Jingjie; Hu, Xin; Mu, Jie; Ying, Kai; Hao, Pei; Zhang, Lei; Lu, Yiqi; Zhang, Lei S.; Liu, Yilei; Yu, Zhen; Fan, Danlin; Weng, Qijun; Chen, Ling; Lu, Tingting; Liu, Xiaohui; Jia, Peixin; Sun, Tongguo; Wu, Yongrui; Zhang, Yujun; Lu, Ying; Li, Can; Wang, Rong; Lei, Haiyan; Li, Tao; Hu, Hao; Wu, Mei; Zhang, Runquan; Guan, Jianping; Zhu, Jia; Fu, Gang; Gu, Minghong; Hong, Guofan; Xue, Yongbiao; Wing, Rod; Jiang, Jiming; Han, Bin

2002-01-01

As part of an international effort to completely sequence the rice genome, we have produced a fine bacterial artificial chromosome (BAC)-based physical map of the Oryza sativa japonica Nipponbare chromosome 4 through an integration of 114 sequenced BAC clones from a taxonomically related subspecies O. sativa indica Guangluai 4 and 182 RFLP and 407 expressed sequence tag (EST) markers with the fingerprinted data of the Nipponbare genome. The map consists of 11 contigs with a total length of 34.5 Mb covering 94% of the estimated chromosome size (36.8 Mb). BAC clones corresponding to telomeres, as well as to the centromere position, were determined by BAC-pachytene chromosome fluorescence in situ hybridization (FISH). This gave rise to an estimated length ratio of 5.13 for the long arm and 2.9 for the short arm (on the basis of the physical map), which indicates that the short arm is a highly condensed one. The FISH analysis and physical mapping also showed that the short arm and the pericentromeric region of the long arm are rich in heterochromatin, which occupied 45% of the chromosome, indicating that this chromosome is likely very difficult to sequence. To our knowledge, this map provides the first example of a rapid and reliable physical mapping on the basis of the integration of the data from two taxonomically related subspecies. [The following individuals and institutions kindly provided reagents, samples, or unpublished information as indicated in the paper: S. McCouch, T. Sasaki, and Monsanto.] PMID:11997348
Genetic and physical mapping at the limb-girdle muscular dystrophy locus (LGMD2B) on chromosome 2p

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bashir, R.; Keers, S.; Strachan, T.

1996-04-01

The limb-girdle muscular dystrophies (LGMD) are a genetically heterogeneous group of disorders, different forms of which have been mapped to at least six distinct genetic loci. We have mapped to at least six distinct genetic loci. We have mapped an autosomal recessive form of LGMD (LGMD2B) to chromosome 2p13. Two other conditions have been shown to map to this region or to the homologous region in mouse: a gene for a form of autosomal recessive distal muscular dystrophy, Miyoshi myopathy, shows linkage to the same markers on chromosome 2p as LGMD2B, and an autosomal recessive mouse mutation mnd2, in whichmore » there is rapidly progressive paralysis and muscle atrophy, has been mapped to mouse chromosome 6 to a region showing conserved synteny with human chromosome 2p12-p13. We have assembled a 6-cM YAC contig spanning the LGMD2B locus and have mapped seven genes and 13 anonymous polymorphic microsatellites to it. Using haplotype analysis in the linked families, we have narrowed our region of interest to a 0-cM interval between D2S2113 and D2S145, which does not overlap with the critical region for mnd2 in mouse. Use of these most closely linked markers will help to determine the relationship between LGMD2B and Miyoshi myopathy. YACs selected from our contig will be the starting point for the cloning of the LGMD2B gene and thereby establish the biological basis for this form of muscular dystrophy and its relationship with the other limb-girdle muscular dystrophies. 26 refs., 6 figs.« less
Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

PubMed

Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

2015-10-16

The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average deletion size of 42.0 Mb. A total of 520 markers were anchored to 216 Ae. tauschii sequence scaffolds, 116 of which were not anchored earlier to the D-genome. This study reports the development of first high resolution RH maps for the D-genome of Ae. tauschii accession AL8/78, which were then used for the anchoring of unassigned sequence scaffolds. This study demonstrates how RH mapping, which offered high and uniform resolution across the length of the chromosome, can facilitate the complete sequence assembly of the large and complex plant genomes.
Comparative Maps of Human 19p13.3 and Mouse Chromosome 10 Allow Identification of Sequences at Evolutionary Breakpoints

PubMed Central

Puttagunta, Radhika; Gordon, Laurie A.; Meyer, Gary E.; Kapfhamer, David; Lamerdin, Jane E.; Kantheti, Prameela; Portman, Kathleen M.; Chung, Wendy K.; Jenne, Dieter E.; Olsen, Anne S.; Burmeister, Margit

2000-01-01

A cosmid/bacterial artificial chromosome (BAC) contiguous (contig) map of human chromosome (HSA) 19p13.3 has been constructed, and over 50 genes have been localized to the contig. Genes and anonymous ESTs from ≈4000 kb of human 19p13.3 were placed on the central mouse chromosome 10 map by genetic mapping and pulsed-field gel electrophoresis (PFGE) analysis. A region of ∼2500 kb of HSA 19p13.3 is collinear to mouse chromosome (MMU) 10. In contrast, the adjacent ≈1200 kb are inverted. Two genes are located in a 50-kb region after the inversion on MMU 10, followed by a region of homology to mouse chromosome 17. The synteny breakpoint and one of the inversion breakpoints has been localized to sequenced regions in human <5 kb in size. Both breakpoints are rich in simple tandem repeats, including (TCTG)n, (CT)n, and (GTCTCT)n, suggesting that simple repeat sequences may be involved in chromosome breaks during evolution. The overall size of the region in mouse is smaller, although no large regions are missing. Comparing the physical maps to the genetic maps showed that in contrast to the higher-than-average rate of genetic recombination in gene-rich telomeric region on HSA 19p13.3, the average rate of recombination is lower than expected in the homologous mouse region. This might indicate that a hot spot of recombination may have been lost in mouse or gained in human during evolution, or that the position of sequences along the chromosome (telomeric compared to the middle of a chromosome) is important for recombination rates. PMID:10984455
Global Genomic Diversity of Oryza sativa Varieties Revealed by Comparative Physical Mapping

PubMed Central

Wang, Xiaoming; Kudrna, David A.; Pan, Yonglong; Wang, Hao; Liu, Lin; Lin, Haiyan; Zhang, Jianwei; Song, Xiang; Goicoechea, Jose Luis; Wing, Rod A.; Zhang, Qifa; Luo, Meizhong

2014-01-01

Bacterial artificial chromosome (BAC) physical maps embedding a large number of BAC end sequences (BESs) were generated for Oryza sativa ssp. indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) and were compared with the genome sequences of O. sativa spp. japonica cv. Nipponbare and O. sativa ssp. indica cv. 93-11. The comparisons exhibited substantial diversities in terms of large structural variations and small substitutions and indels. Genome-wide BAC-sized and contig-sized structural variations were detected, and the shared variations were analyzed. In the expansion regions of the Nipponbare reference sequence, in comparison to the MH63 and ZS97 physical maps, as well as to the previously constructed 93-11 physical map, the amounts and types of the repeat contents, and the outputs of gene ontology analysis, were significantly different from those of the whole genome. Using the physical maps of four wild Oryza species from OMAP (http://www.omap.org) as a control, we detected many conserved and divergent regions related to the evolution process of O. sativa. Between the BESs of MH63 and ZS97 and the two reference sequences, a total of 1532 polymorphic simple sequence repeats (SSRs), 71,383 SNPs, 1767 multiple nucleotide polymorphisms, 6340 insertions, and 9137 deletions were identified. This study provides independent whole-genome resources for intra- and intersubspecies comparisons and functional genomics studies in O. sativa. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html). PMID:24424778
Fine genetic mapping of spot blotch resistance gene Sb3 in wheat (Triticum aestivum).

PubMed

Lu, Ping; Liang, Yong; Li, Delin; Wang, Zhengzhong; Li, Wenbin; Wang, Guoxin; Wang, Yong; Zhou, Shenghui; Wu, Qiuhong; Xie, Jingzhong; Zhang, Deyun; Chen, Yongxing; Li, Miaomiao; Zhang, Yan; Sun, Qixin; Han, Chenggui; Liu, Zhiyong

2016-03-01

Spot blotch disease resistance gene Sb3 was mapped to a 0.15 centimorgan (cM) genetic interval spanning a 602 kb physical genomic region on chromosome 3BS. Wheat spot blotch disease, caused by B. sorokiniana, is a devastating disease that can cause severe yield losses. Although inoculum levels can be reduced by planting disease-free seed, treatment of plants with fungicides and crop rotation, genetic resistance is likely to be a robust, economical and environmentally friendly tool in the control of spot blotch. The winter wheat line 621-7-1 confers immune resistance against B. sorokiniana. Genetic analysis indicates that the spot blotch resistance of 621-7-1 is controlled by a single dominant gene, provisionally designated Sb3. Bulked segregant analysis (BSA) and simple sequence repeat (SSR) mapping showed that Sb3 is located on chromosome arm 3BS linked with markers Xbarc133 and Xbarc147. Seven and twelve new polymorphic markers were developed from the Chinese Spring 3BS shotgun survey sequence contigs and 3BS reference sequences, respectively. Finally, Sb3 was mapped in a 0.15 cM genetic interval spanning a 602 kb physical genomic region of Chinese Spring chromosome 3BS. The genetic and physical maps of Sb3 provide a framework for map-based cloning and marker-assisted selection (MAS) of the spot blotch resistance.
A contiguous clone map over 3 Mb on the long arm of chromosome 11 across a balanced translocation associated with schizophrenia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Evans, K.L.; Shibasaki, Yoshiro; Devon, R.S.

1995-08-10

Forty-nine clones derived by microdissection of a schizophrenia-associated t(1;11)(q42.1;q14.3) breakpoint region have been assigned by somatic cell hybrid mapping to seven discrete intervals on the long arm of human chromosome 11. Eleven of the clones were shown to map to a small region immediately distal to the translocation breakpoint on 11q. A 3-Mb contiguous clone map of this region was established by isolation of corresponding YAC recombinants. The contig was oriented and shown to traverse the translocation breakpoint by FISH and microsatellite marker analysis. This contig will facilitate the isolation of candidate sequences whose expression may be affected by themore » translocation. 28 refs., 4 figs., 3 tabs.« less
Progressive myoclonus epilepsy EPM1 locus maps to a 175-kb interval in distal 21q

DOE Office of Scientific and Technical Information (OSTI.GOV)

Virtaneva, K.; Miao, J.; Traeskelin, A.L.

1996-06-01

The EPM1 locus responsible for progressive myoclonus epilepsy of Unverricht-Lundborg type (MIM 254800) maps to a region in distal chromosome 21q where positional cloning has been hampered by the lack of physical and genetic mapping resolution. We here report the use of a recently constituted contig of cosmid, BAC, and P1 clones that allowed new polymorphic markers to be positioned. These were typed in 53 unrelated disease families from an isolated Finnish population in which a putative single ancestral EPM1 mutation has segregated for an estimated 100 generations. By thus exploiting historical recombinations in haplotype analysis, EPM1 could be assignedmore » to the {approximately}175-kb interval between the markers D21S2040 and D21S1259. 26 refs., 2 figs., 4 tabs.« less
A Blumeria graminisf.sp. hordei BAC library--contig building and microsynteny studies.

PubMed

Pedersen, Carsten; Wu, Boqian; Giese, Henriette

2002-11-01

A bacterial artificial chromosome (BAC) library of Blumeria graminis f.sp. hordei, containing 12,000 clones with an average insert size of 41 kb, was constructed. The library represents about three genome equivalents and BAC-end sequencing showed a high content of repetitive sequences, making contig-building difficult. To identify overlapping clones, several strategies were used: colony hybridisation, PCR screening, fingerprinting techniques and the use of single-copy expressed sequence tags. The latter proved to be the most efficient method for identification of overlapping clones. Two contigs, at or close to avirulence loci, were constructed. Single nucleotide polymorphism (SNP) markers were developed from BAC-end sequences to link the contigs to the genetic maps. Two other BAC contigs were used to study microsynteny between B. graminis and two other ascomycetes, Neurospora crassa and Aspergillus fumigatus. The library provides an invaluable tool for the isolation of avirulence genes from B. graminis and for the study of gene synteny between this fungus and other fungi.
Physical Mapping in a Triplicated Genome: Mapping the Downy Mildew Resistance Locus Pp523 in Brassica oleracea L.

PubMed Central

Carlier, Jorge D.; Alabaça, Claudia S.; Sousa, Nelson H.; Coelho, Paula S.; Monteiro, António A.; Paterson, Andrew H.; Leitão, José M.

2011-01-01

We describe the construction of a BAC contig and identification of a minimal tiling path that encompass the dominant and monogenically inherited downy mildew resistance locus Pp523 of Brassica oleracea L. The selection of BAC clones for construction of the physical map was carried out by screening gridded BAC libraries with DNA overgo probes derived from both genetically mapped DNA markers flanking the locus of interest and BAC-end sequences that align to Arabidopsis thaliana sequences within the previously identified syntenic region. The selected BAC clones consistently mapped to three different genomic regions of B. oleracea. Although 83 BAC clones were accurately mapped within a ∼4.6 cM region surrounding the downy mildew resistance locus Pp523, a subset of 33 BAC clones mapped to another region on chromosome C8 that was ∼60 cM away from the resistance gene, and a subset of 63 BAC clones mapped to chromosome C5. These results reflect the triplication of the Brassica genomes since their divergence from a common ancestor shared with A. thaliana, and they are consonant with recent analyses of the C genome of Brassica napus. The assembly of a minimal tiling path constituted by 13 (BoT01) BAC clones that span the Pp523 locus sets the stage for map-based cloning of this resistance gene. PMID:22384370
A comprehensive map of the porcine genome.

PubMed

Rohrer, G A; Alexander, L J; Hu, Z; Smith, T P; Keele, J W; Beattie, C W

1996-05-01

We report the highest density genetic linkage map for a livestock species produced to date. Three published maps for Sus scrofa were merged by genotyping virtually every publicly available microsatellite across a single reference population to yield 1042 linked loci, 536 of which are novel assignments, spanning 2286.2 cM (average interval 2.23 cM) in 19 linkage groups (18 autosomal and X chromosomes, n = 19). Linkage groups were constructed de novo and mapped by locus content to avoid propagation of errors in older genotypes. The physical and genetic maps were integrated with 123 informative loci assigned previously by fluorescence in situ hybridization (FISH). Fourteen linkage groups span the entire length of each chromosome. Coverage of chromosomes 11, 12, 15, and 18 will be evaluated as more markers are physically assigned. Marker-deficient regions were identified only on 11q1.7-qter and 14 cen-q1.2. Recombination rates (cM/Mbp) varied between and within chromosomes. Short chromosomal arms recombined at higher rates than long arms, and recombination was more frequent in telomeric regions than in pericentric regions. The high-resolution comprehensive map has the marker density needed to identify quantitative trait loci (QTL), implement marker-assisted selection or introgression and YAC contig construction or chromosomal microdissection.
Comparative genomics identifies candidate genes for infectious salmon anemia (ISA) resistance in Atlantic salmon (Salmo salar).

PubMed

Li, Jieying; Boroevich, Keith A; Koop, Ben F; Davidson, William S

2011-04-01

Infectious salmon anemia (ISA) has been described as the hoof and mouth disease of salmon farming. ISA is caused by a lethal and highly communicable virus, which can have a major impact on salmon aquaculture, as demonstrated by an outbreak in Chile in 2007. A quantitative trait locus (QTL) for ISA resistance has been mapped to three microsatellite markers on linkage group (LG) 8 (Chr 15) on the Atlantic salmon genetic map. We identified bacterial artificial chromosome (BAC) clones and three fingerprint contigs from the Atlantic salmon physical map that contains these markers. We made use of the extensive BAC end sequence database to extend these contigs by chromosome walking and identified additional two markers in this region. The BAC end sequences were used to search for conserved synteny between this segment of LG8 and the fish genomes that have been sequenced. An examination of the genes in the syntenic segments of the tetraodon and medaka genomes identified candidates for association with ISA resistance in Atlantic salmon based on differential expression profiles from ISA challenges or on the putative biological functions of the proteins they encode. One gene in particular, HIV-EP2/MBP-2, caught our attention as it may influence the expression of several genes that have been implicated in the response to infection by infectious salmon anemia virus (ISAV). Therefore, we suggest that HIV-EP2/MBP-2 is a very strong candidate for the gene associated with the ISAV resistance QTL in Atlantic salmon and is worthy of further study.
The Peach v2.0 release: high-resolution linkage mapping and deep resequencing improve chromosome-scale assembly and contiguity.

PubMed

Verde, Ignazio; Jenkins, Jerry; Dondini, Luca; Micali, Sabrina; Pagliarani, Giulia; Vendramin, Elisa; Paris, Roberta; Aramini, Valeria; Gazza, Laura; Rossini, Laura; Bassi, Daniele; Troggio, Michela; Shu, Shengqiang; Grimwood, Jane; Tartarini, Stefano; Dettori, Maria Teresa; Schmutz, Jeremy

2017-03-11

The availability of the peach genome sequence has fostered relevant research in peach and related Prunus species enabling the identification of genes underlying important horticultural traits as well as the development of advanced tools for genetic and genomic analyses. The first release of the peach genome (Peach v1.0) represented a high-quality WGS (Whole Genome Shotgun) chromosome-scale assembly with high contiguity (contig L50 214.2 kb), large portions of mapped sequences (96%) and high base accuracy (99.96%). The aim of this work was to improve the quality of the first assembly by increasing the portion of mapped and oriented sequences, correcting misassemblies and improving the contiguity and base accuracy using high-throughput linkage mapping and deep resequencing approaches. Four linkage maps with 3,576 molecular markers were used to improve the portion of mapped and oriented sequences (from 96.0% and 85.6% of Peach v1.0 to 99.2% and 98.2% of v2.0, respectively) and enabled a more detailed identification of discernible misassemblies (10.4 Mb in total). The deep resequencing approach fixed 859 homozygous SNPs (Single Nucleotide Polymorphisms) and 1347 homozygous indels. Moreover, the assembled NGS contigs enabled the closing of 212 gaps with an improvement in the contig L50 of 19.2%. The improved high quality peach genome assembly (Peach v2.0) represents a valuable tool for the analysis of the genetic diversity, domestication, and as a vehicle for genetic improvement of peach and related Prunus species. Moreover, the important phylogenetic position of peach and the absence of recent whole genome duplication (WGD) events make peach a pivotal species for comparative genomics studies aiming at elucidating plant speciation and diversification processes.

Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.

PubMed

Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe

2018-01-01

Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
Identification of candidate genes and molecular markers for heat-induced brown discoloration of seed coats in cowpea [Vigna unguiculata (L.) Walp].

PubMed

Pottorff, Marti; Roberts, Philip A; Close, Timothy J; Lonardi, Stefano; Wanamaker, Steve; Ehlers, Jeffrey D

2014-05-01

Heat-induced browning (Hbs) of seed coats is caused by high temperatures which discolors the seed coats of many legumes, affecting the visual appearance and quality of seeds. The genetic determinants underlying Hbs in cowpea are unknown. We identified three QTL associated with the heat-induced browning of seed coats trait, Hbs-1, Hbs-2 and Hbs-3, using cowpea RIL populations IT93K-503-1 (Hbs positive) x CB46 (hbs negative) and IT84S-2246 (Hbs positive) x TVu14676 (hbs negative). Hbs-1 was identified in both populations, accounting for 28.3% -77.3% of the phenotypic variation. SNP markers 1_0032 and 1_1128 co-segregated with the trait. Within the syntenic regions of Hbs-1 in soybean, Medicago and common bean, several ethylene forming enzymes, ethylene responsive element binding factors and an ACC oxidase 2 were observed. Hbs-1 was identified in a BAC clone in contig 217 of the cowpea physical map, where ethylene forming enzymes were present. Hbs-2 was identified in the IT93K-503-1 x CB46 population and accounted for of 9.5 to 12.3% of the phenotypic variance. Hbs-3 was identified in the IT84S-2246 x TVu14676 population and accounted for 6.2 to 6.8% of the phenotypic variance. SNP marker 1_0640 co-segregated with the heat-induced browning phenotype. Hbs-3 was positioned on BAC clones in contig512 of the cowpea physical map, where several ACC synthase 1 genes were present. The identification of loci determining heat-induced browning of seed coats and co-segregating molecular markers will enable transfer of hbs alleles into cowpea varieties, contributing to higher quality seeds.
Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing

PubMed Central

2014-01-01

Background Genetic linkage maps are useful tools for mapping quantitative trait loci (QTL) influencing variation in traits of interest in a population. Genotyping-by-sequencing approaches such as Restriction-site Associated DNA sequencing (RAD-Seq) now enable the rapid discovery and genotyping of genome-wide SNP markers suitable for the development of dense SNP linkage maps, including in non-model organisms such as Atlantic salmon (Salmo salar). This paper describes the development and characterisation of a high density SNP linkage map based on SbfI RAD-Seq SNP markers from two Atlantic salmon reference families. Results Approximately 6,000 SNPs were assigned to 29 linkage groups, utilising markers from known genomic locations as anchors. Linkage maps were then constructed for the four mapping parents separately. Overall map lengths were comparable between male and female parents, but the distribution of the SNPs showed sex-specific patterns with a greater degree of clustering of sire-segregating SNPs to single chromosome regions. The maps were integrated with the Atlantic salmon draft reference genome contigs, allowing the unique assignment of ~4,000 contigs to a linkage group. 112 genome contigs mapped to two or more linkage groups, highlighting regions of putative homeology within the salmon genome. A comparative genomics analysis with the stickleback reference genome identified putative genes closely linked to approximately half of the ordered SNPs and demonstrated blocks of orthology between the Atlantic salmon and stickleback genomes. A subset of 47 RAD-Seq SNPs were successfully validated using a high-throughput genotyping assay, with a correspondence of 97% between the two assays. Conclusions This Atlantic salmon RAD-Seq linkage map is a resource for salmonid genomics research as genotyping-by-sequencing becomes increasingly common. This is aided by the integration of the SbfI RAD-Seq SNPs with existing reference maps and the draft reference genome, as well as the identification of putative genes proximal to the SNPs. Differences in the distribution of recombination events between the sexes is evident, and regions of homeology have been identified which are reflective of the recent salmonid whole genome duplication. PMID:24571138
Mapping of the 3q27 region involved in Dup(3q) syndrome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rizzu, P.; Baldini, A.; Overhauser, J.

1994-09-01

The duplication 3q syndrome is characterized by partial trisomy of a segment of the long arm of chromosome 3. We have previously found that 3q26.3-3q27 is the minimal region of trisomy overlap. This critical region (CR) is delimited by two patient chromosome breakpoints, approximately 10 cM apart. In order to identify the gene(s) responsible for the Dup(3q) phenotype, we are generating a physical map of the region and identifying expressed sequences. First, we have generated a cytological map using two- and three-color fluorescence in situ hybridization on metaphase and interphase chromosomes. Results allowed us to determine the centromere-telomere orientation, ordermore » and relative distances of six cosmid clones mapped to the CR. Because some of the markers used are part of the consensus chromosome 3 map, our data were easily integrated with existing mapping information. Subsequently, we have included in the map YAC clones positive for polymorphic PCR markers identified by CEPH-Genethon, as well as newly isolated YACs. We have assigned them to the critical region 7 of the Genethon polymorphic markers and linked them to three YAC contigs. Currently our map includes two of the five genes known to map in this region. Interestingly, we found that these two functionally related genes (kininogen and histidin-rich glycoprotein) map to the same 1 Mb genomic fragment. As the physical map is being constructed we are searching for expressed sequences. Positive cDNAs have been found and their characterization is in progress. In conclusion, we will present an integrated map of 3q27 that includes genetic, physical and cytological information as well as gene annotation. As Dup(3q) syndrome is likely to be a contiguous gene syndrome, such a map will be necessary for our understanding of this multiple congenital anomaly.« less
Physical mapping of chromosome 17p13.3 in the region of a putative tumor suppressor gene important in medulloblastoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

McDonald, J.D.; Daneshvar, L.; Willert, J.R.

1994-09-01

Deletion mapping of a medulloblastoma tumor panel revealed loss of distal chromosome 17p13.3 sequences in tumors from 14 of 32 patients (44%). Of the 14 tumors showing loss of heterozygosity by restriction fragment length polymorphism analysis, 14 of 14 (100%) displayed loss of the telomeric marker p144-D6 (D17S34), while a probe for the ABR gene on 17p13.3 was lost in 7 of 8 (88%) informative cases. Using pulsed-field gel electrophoresis, we localized the polymorphic marker (VNTR-A) of the ABR gene locus to within 220 kb of the p144-D6 locus. A cosmid contig constructed in this region was used to demonstratemore » by fluorescence in situ hybridization that the ABR gene is oriented transcriptionally 5{prime} to 3{prime} toward the telomere. This report provides new physical mapping data for the ABR gene, which has not been previously shown to be deleted in medulloblastoma. These results provide further evidence for the existence of a second tumor suppressor gene distinct from p53 on distal chromosome 17p. 12 refs., 3 figs.« less
Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

USDA-ARS?s Scientific Manuscript database

The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...
Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simon, M. I.; Kim, U.-J.

We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping andmore » sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year.« less
The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing

PubMed Central

2010-01-01

Background Food supply from the ocean is constrained by the shortage of domesticated and selected fish. Development of genomic models of economically important fishes should assist with the removal of this bottleneck. European sea bass Dicentrarchus labrax L. (Moronidae, Perciformes, Teleostei) is one of the most important fishes in European marine aquaculture; growing genomic resources put it on its way to serve as an economic model. Results End sequencing of a sea bass genomic BAC-library enabled the comparative mapping of the sea bass genome using the three-spined stickleback Gasterosteus aculeatus genome as a reference. BAC-end sequences (102,690) were aligned to the stickleback genome. The number of mappable BACs was improved using a two-fold coverage WGS dataset of sea bass resulting in a comparative BAC-map covering 87% of stickleback chromosomes with 588 BAC-contigs. The minimum size of 83 contigs covering 50% of the reference was 1.2 Mbp; the largest BAC-contig comprised 8.86 Mbp. More than 22,000 BAC-clones aligned with both ends to the reference genome. Intra-chromosomal rearrangements between sea bass and stickleback were identified. Size distributions of mapped BACs were used to calculate that the genome of sea bass may be only 1.3 fold larger than the 460 Mbp stickleback genome. Conclusions The BAC map is used for sequencing single BACs or BAC-pools covering defined genomic entities by second generation sequencing technologies. Together with the WGS dataset it initiates a sea bass genome sequencing project. This will allow the quantification of polymorphisms through resequencing, which is important for selecting highly performing domesticated fish. PMID:20105308
Contig Maps and Genomic Sequencing Identify Candidate Genes in the Usher 1C Locus

PubMed Central

Higgins, Michael J.; Day, Colleen D.; Smilinich, Nancy J.; Ni, L.; Cooper, Paul R.; Nowak, Norma J.; Davies, Chris; de Jong, Pieter J.; Hejtmancik, Fielding; Evans, Glen A.; Smith, Richard J.H.; Shows, Thomas B.

1998-01-01

Usher syndrome 1C (USH1C) is a congenital condition manifesting profound hearing loss, the absence of vestibular function, and eventual retinal degeneration. The USH1C locus has been mapped genetically to a 2- to 3-cM interval in 11p14–15.1 between D11S899 and D11S861. In an effort to identify the USH1C disease gene we have isolated the region between these markers in yeast artificial chromosomes (YACs) using a combination of STS content mapping and Alu–PCR hybridization. The YAC contig is ∼3.5 Mb and has located several other loci within this interval, resulting in the order CEN-LDHA-SAA1-TPH-D11S1310-(D11S1888/KCNC1)-MYOD1-D11S902D11S921-D11S1890-TEL. Subsequent haplotyping and homozygosity analysis refined the location of the disease gene to a 400-kb interval between D11S902 and D11S1890 with all affected individuals being homozygous for the internal marker D11S921. To facilitate gene identification, the critical region has been converted into P1 artificial chromosome (PAC) clones using sequence-tagged sites (STSs) mapped to the YAC contig, Alu–PCR products generated from the YACs, and PAC end probes. A contig of >50 PAC clones has been assembled between D11S1310 and D11S1890, confirming the order of markers used in haplotyping. Three PAC clones representing nearly two-thirds of the USH1C critical region have been sequenced. PowerBLAST analysis identified six clusters of expressed sequence tags (ESTs), two known genes (BIR,SUR1) mapped previously to this region, and a previously characterized but unmapped gene NEFA (DNA binding/EF hand/acidic amino-acid-rich). GRAIL analysis identified 11 CpG islands and 73 exons of excellent quality. These data allowed the construction of a transcription map for the USH1C critical region, consisting of three known genes and six or more novel transcripts. Based on their map location, these loci represent candidate disease loci for USH1C. The NEFA gene was assessed as the USH1C locus by the sequencing of an amplified NEFA cDNA from an USH1C patient; however, no mutations were detected. [The sequence data described in this paper have been submitted to GenBank under accession numbers AC000406–AC000407.] PMID:9445488
An integrated genetic and physical map of the autosomal recessive polycystic kidney disease region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lens, X.M.; Onuchic, L.F.; Daoust, M.

1997-05-01

Autosomal recessive polycystic kidney disease is one of the most common hereditary renal cystic diseases in children. Genetic studies have recently assigned the only known locus for this disorder, PKHD1, to chromosome 6p21-p12. We have generated a YAC contig that spans {approximately}5 cM of this region, defined by the markers D6S1253-D6S295, and have mapped 43 sequence-tagged sites (STS) within this interval. This set includes 20 novel STSs, which define 12 unique positions in the region, and three ESTs. A minimal set of two YACs spans the segment D6S465-D6S466, which contains PKHD1, and estimates of their sizes based on information inmore » public databases suggest that the size of the critical region is <3.1 Mb. Twenty-eight STSs map to this interval, giving an average STS density of <1/150 kb. These resources will be useful for establishing a complete trancription map of the PKHD1 region. 10 refs., 1 fig., 1 tab.« less
Development of an Expressed Sequence Tag (EST) Resource for Wheat (Triticum aestivum L.)

PubMed Central

Lazo, G. R.; Chao, S.; Hummel, D. D.; Edwards, H.; Crossman, C. C.; Lui, N.; Matthews, D. E.; Carollo, V. L.; Hane, D. L.; You, F. M.; Butler, G. E.; Miller, R. E.; Close, T. J.; Peng, J. H.; Lapitan, N. L. V.; Gustafson, J. P.; Qi, L. L.; Echalier, B.; Gill, B. S.; Dilbirligi, M.; Randhawa, H. S.; Gill, K. S.; Greene, R. A.; Sorrells, M. E.; Akhunov, E. D.; Dvořák, J.; Linkiewicz, A. M.; Dubcovsky, J.; Hossain, K. G.; Kalavacharla, V.; Kianian, S. F.; Mahmoud, A. A.; Miftahudin; Ma, X.-F.; Conley, E. J.; Anderson, J. A.; Pathan, M. S.; Nguyen, H. T.; McGuire, P. E.; Qualset, C. O.; Anderson, O. D.

2004-01-01

This report describes the rationale, approaches, organization, and resource development leading to a large-scale deletion bin map of the hexaploid (2n = 6x = 42) wheat genome (Triticum aestivum L.). Accompanying reports in this issue detail results from chromosome bin-mapping of expressed sequence tags (ESTs) representing genes onto the seven homoeologous chromosome groups and a global analysis of the entire mapped wheat EST data set. Among the resources developed were the first extensive public wheat EST collection (113,220 ESTs). Described are protocols for sequencing, sequence processing, EST nomenclature, and the assembly of ESTs into contigs. These contigs plus singletons (unassembled ESTs) were used for selection of distinct sequence motif unigenes. Selected ESTs were rearrayed, validated by 5′ and 3′ sequencing, and amplified for probing a series of wheat aneuploid and deletion stocks. Images and data for all Southern hybridizations were deposited in databases and were used by the coordinators for each of the seven homoeologous chromosome groups to validate the mapping results. Results from this project have established the foundation for future developments in wheat genomics. PMID:15514037
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

PubMed Central

Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter

2018-01-01

Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. PMID:29876136
Characterisation of the transcriptome of a wild great tit Parus major population by next generation sequencing

PubMed Central

2011-01-01

Background The recent development of next generation sequencing technologies has made it possible to generate very large amounts of sequence data in species with little or no genome information. Combined with the large phenotypic databases available for wild and non-model species, these data will provide an unprecedented opportunity to "genomicise" ecological model organisms and establish the genetic basis of quantitative traits in natural populations. Results This paper describes the sequencing, de novo assembly and analysis from the transcriptome of eight tissues of ten wild great tits. Approximately 4.6 million sequences and 1.4 billion bases of DNA were generated and assembled into 95,979 contigs, one third of which aligned with known Taeniopygia guttata (zebra finch) and Gallus gallus (chicken) transcripts. The majority (78%) of the remaining contigs aligned within or very close to regions of the zebra finch genome containing known genes, suggesting that they represented precursor mRNA rather than untranscribed genomic DNA. More than 35,000 single nucleotide polymorphisms and 10,000 microsatellite repeats were identified. Eleven percent of contigs were expressed in every tissue, while twenty one percent of contigs were expressed in only one tissue. The function of those contigs with strong evidence for tissue specific expression and contigs expressed in every tissue was inferred from the gene ontology (GO) terms associated with these contigs; heart and pancreas had the highest number of highly tissue specific GO terms (21.4% and 28.5% respectively). Conclusions In summary, the transcriptomic data generated in this study will contribute towards efforts to assemble and annotate the great tit genome, as well as providing the markers required to perform gene mapping studies in wild populations. PMID:21635727
Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs

DOE PAGES

Scholz, Matthew; Lo, Chien -Chi; Chain, Patrick S. G.

2014-10-01

Assembly of metagenomic samples is a very complex process, with algorithms designed to address sequencing platform-specific issues, (read length, data volume, and/or community complexity), while also faced with genomes that differ greatly in nucleotide compositional biases and in abundance. To address these issues, we have developed a post-assembly process: MetaGenomic Assembly by Merging (MeGAMerge). We compare this process to the performance of several assemblers, using both real, and in-silico generated samples of different community composition and complexity. MeGAMerge consistently outperforms individual assembly methods, producing larger contigs with an increased number of predicted genes, without replication of data. MeGAMerge contigs aremore » supported by read mapping and contig alignment data, when using synthetically-derived and real metagenomic data, as well as by gene prediction analyses and similarity searches. Ultimately, MeGAMerge is a flexible method that generates improved metagenome assemblies, with the ability to accommodate upcoming sequencing platforms, as well as present and future assembly algorithms.« less
YAC contigs covering an 8-megabase region of 3p deleted in the small-cell lung cancer cell line U2020

DOE Office of Scientific and Technical Information (OSTI.GOV)

Todd, S.; Bolin, R.; Drabkin, H.A.

1995-01-01

Somatic deletions of chromosome 3p occur at high frequencies in cancers of kidney, breast, cervix, head and neck, nasopharynx, and lung. The frequency of 3p deletion in lung cancer approaches 100% among small cell lesions and 70 to 80% in non-small cell lesions. This evidence strongly implies that one or more tumor suppressor genes of potentially widespread significance reside within the deleted region(s). Precise definition of the deleted target region(s) has been difficult due to the extensive area(s) lost and use of markers with low informativeness. However, improved definition remains essential to permit isolation of putative tumor suppressor genes frommore » 3p. The identification of several small, homozygous 3p deletions in lung cancer cell lines has provided a critical resource that will assist this search. The U2020 cell line contains a small homozygous deletion that maps to a very proximal region of 3p and includes the marker D3S3. We previously identified a subset of DNA markers located within the deleted region and determined their relative order by pulsed-field gel mapping studies. In the present report, we describe the development of YAC contigs that span the majority of the deleted region and link up to flanking markers on both sides. The centromere proximal portion of the contig crosses the breakpoint from an X;3 translocation located within 3p12 providing both location and orientation to the map. PCR-based (CA){sub n} microsatellite polymorphisms have been localized within and flanking the deletion region. These markers should greatly facilitate loss-of-heterozygosity studies of this region in human cancer. The contig provides a direct means for isolation of putative tumor suppressor genes from this segment of 3p. 51 refs., 3 figs., 3 tabs.« less
Transcriptome Analysis of Nine Tissues to Discover Genes Involved in the Biosynthesis of Active Ingredients in Sophora flavescens.

PubMed

Han, Rongchun; Takahashi, Hiroki; Nakamura, Michimi; Bunsupa, Somnuk; Yoshimoto, Naoko; Yamamoto, Hirobumi; Suzuki, Hideyuki; Shibata, Daisuke; Yamazaki, Mami; Saito, Kazuki

2015-01-01

Sophora flavescens AITON (kurara) has long been used to treat various diseases. Although several research findings revealed the biosynthetic pathways of its characteristic chemical components as represented by matrine, insufficient analysis of transcriptome data hampered in-depth analysis of the underlying putative genes responsible for the biosynthesis of pharmaceutical chemical components. In this study, more than 200 million fastq format reads were generated by Illumina's next-generation sequencing approach using nine types of tissue from S. flavescens, followed by CLC de novo assembly, ultimately yielding 83,325 contigs in total. By mapping the reads back to the contigs, reads per kilobase of the transcript per million mapped reads values were calculated to demonstrate gene expression levels, and overrepresented gene ontology terms were evaluated using Fisher's exact test. In search of the putative genes relevant to essential metabolic pathways, all 1350 unique enzyme commission numbers were used to map pathways against the Kyoto Encyclopedia of Genes and Genomes. By analyzing expression patterns, we proposed some candidate genes involved in the biosynthesis of isoflavonoids and quinolizidine alkaloids. Adopting RNA-Seq analysis, we obtained substantially credible contigs for downstream work. The preferential expression of the gene for putative lysine/ornithine decarboxylase committed in the initial step of matrine biosynthesis in leaves and stems was confirmed in semi-quantitative polymerase chain reaction (PCR) analysis. The findings in this report may serve as a stepping-stone for further research into this promising medicinal plant.
Physical mapping of a pollen modifier locus controlling self-incompatibility in apricot and synteny analysis within the Rosaceae.

PubMed

Zuriaga, Elena; Molina, Laura; Badenes, María Luisa; Romero, Carlos

2012-06-01

S-locus products (S-RNase and F-box proteins) are essential for the gametophytic self-incompatibility (GSI) specific recognition in Prunus. However, accumulated genetic evidence suggests that other S-locus unlinked factors are also required for GSI. For instance, GSI breakdown was associated with a pollen-part mutation unlinked to the S-locus in the apricot (Prunus armeniaca L.) cv. 'Canino'. Fine-mapping of this mutated modifier gene (M-locus) and the synteny analysis of the M-locus within the Rosaceae are here reported. A segregation distortion loci mapping strategy, based on a selectively genotyped population, was used to map the M-locus. In addition, a bacterial artificial chromosome (BAC) contig was constructed for this region using overlapping oligonucleotides probes, and BAC-end sequences (BES) were blasted against Rosaceae genomes to perform micro-synteny analysis. The M-locus was mapped to the distal part of chr.3 flanked by two SSR markers within an interval of 1.8 cM corresponding to ~364 Kb in the peach (Prunus persica L. Batsch) genome. In the integrated genetic-physical map of this region, BES were mapped against the peach scaffold_3 and BACs were anchored to the apricot map. Micro-syntenic blocks were detected in apple (Malus × domestica Borkh.) LG17/9 and strawberry (Fragaria vesca L.) FG6 chromosomes. The M-locus fine-scale mapping provides a solid basis for self-compatibility marker-assisted selection and for positional cloning of the underlying gene, a necessary goal to elucidate the pollen rejection mechanism in Prunus. In a wider context, the syntenic regions identified in peach, apple and strawberry might be useful to interpret GSI evolution in Rosaceae.
High-resolution human/goat comparative map of the goat polled/intersex syndrome (PIS): the human homologue is contained in a human YAC from HSA3q23.

PubMed

Vaiman, D; Schibler, L; Oustry-Vaiman, A; Pailhoux, E; Goldammer, T; Stevanovic, M; Furet, J P; Schwerin, M; Cotinot, C; Fellous, M; Cribiu, E P

1999-02-15

The genetic and cytogenetic map around the chromosome 1 region shown to be linked with polledness and intersexuality (PIS) in the domestic goat (Capra hircus) was refined. For this purpose, a goat BAC library was systematically screened with primers from human coding sequences, scraped chromosome 1 DNA, bovine microsatellites from the region, and BAC ends. All the BACs (n = 30) were mapped by fluorescence in situ hybridization (FISH) on goat chromosome 1q41-q45. The genetic mapping of 30 new goat polymorphic markers, isolated from these BACs, made it possible to reduce the PIS interval to a region of less than 1 cM on goat chromosome 1q43. The PIS locus is now located between the two genes ATP1B and COP, which both map to 3q23 in humans. Genetic, cytogenetic, and comparative data suggest that the PIS region is now probably circumscribed to an approximately 1-Mb DNA segment for which construction of a BAC contig is in progress. In addition, a human YAC contig encompassing the blepharophimosis-ptosis-epicanthus-inversus region was mapped by FISH to goat chromosome 1q43. This human disease, mapped to HSA 3q23 and affecting the development and maintenance of ovarian function, could be a potential candidate for goat PIS. Copyright 1999 Academic Press.
Comparison of the chromosome maps around a resistance hot spot on chromosome 5 of potato and tomato using BAC-FISH painting.

PubMed

Achenbach, Ute C; Tang, Xiaomin; Ballvora, Agim; de Jong, Hans; Gebhardt, Christiane

2010-02-01

Potato chromosome 5 harbours numerous genes for important qualitative and quantitative traits, such as resistance to the root cyst nematode Globodera pallida and the late blight fungus, Phytophthora infestans. The genes make up part of a "hot spot" for resistances to various pathogens covering a genetic map length of 3 cM between markers GP21 and GP179. We established the physical size and position of this region on chromosome 5 in potato and tomato using fluorescence in situ hybridization (FISH) on pachytene chromosomes. Five potato bacterial artificial chromosome (BAC) clones with the genetically anchored markers GP21, R1-contig (proximal end), CosA, GP179, and StPto were selected, labeled with different fluorophores, and hybridized in a five-colour FISH experiment. Our results showed the location of the BAC clones in the middle of the long arm of chromosome 5 in both potato and tomato. Based on chromosome measurements, we estimate the physical size of the GP21-GP179 interval at 0.85 Mb and 1.2 Mb in potato and tomato, respectively. The GP21-GP179 interval is part of a genome segment known to have inverted map positions between potato and tomato.
A YAC contig of the human CC chemokine genes clustered on chromosome 17q11.2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naruse, Kuniko; Nomiyama, Hisayuki; Miura, Retsu

1996-06-01

CC chemokines are cytokines that attract and activate leukocytes. The human genes for the CC chemokines are clustered on chromosome 17. To elucidate the genomic organization of the CC chemokine genes, we constructed a YAC contig comprising 34 clones. The contig was shown to contain all 10 CC chemokine genes reported so far, except for one gene whose nucleotide sequence is not available. The contig also contains 4 CC chemokine-like genes, which were deposited in GenBank as ESTs and are here referred to as NCC-1, NCC-2, NCC-3, and NCC-4. Within the contig, the CC chemokine genes were localized in twomore » regions. In addition, the CC chemokine genes were localized in two regions. In addition, the CC chemokine genes were more precisely mapped on chromosome 17q11.2 using a somatic cell hybrid cell DNA panel containing various portions of human chromosome 17. Interestingly, a reciprocal translocation t(Y;17) breakpoint, contained in the hybrid cell line Y1741, lay between the two chromosome 17 chemokine gene regions covered by our YAC contig. From these results, the order and the orientation of CC chemokine genes on chromosome 17 were determined as follows: centromere-neurofibromatosis 1-(MCP-3, MCP-1, NCC-1, I-309)-Y1741 breakpoint-RANTES-(LD78{gamma}, AT744.2, LD78{beta})-(NCC-3, NCC-2, AT744.1, LD78{alpha})-NCC-4-retinoic acid receptor {alpha}-telomere. 22 refs., 1 fig., 2 tabs.« less

ABACAS: algorithm-based automatic contiguation of assembled sequences

PubMed Central

Assefa, Samuel; Keane, Thomas M.; Otto, Thomas D.; Newbold, Chris; Berriman, Matthew

2009-01-01

Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net Contact: sa4@sanger.ac.uk PMID:19497936
Development of a YAC contig covering the minimal region of a CSNB1 locus in Xp11

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boycott, K.M.; Gratton, K.J.; Moore, B.J.

1994-09-01

X-linked congenital stationary night blindness (CSNB1) is an eye disorder that includes impairment of night vision, reduced visual acuity and, in some cases, myopia and congenital nystagmus. Electroretinography reveals a marked reduction of the b-wave in affected individuals suggesting that X-linked CSNB is due to a molecular defect in the bipolar layer of the retina. Based on our studies of a large four generation family with X-linked CSNB, a CSNB1 locus was mapped to a 4-5 cM region at Xp11.23-Xp11.22 bounded telomerically by DXS426 and centromerically by DXS988. Using a panel of radiation and conventional somatic cell hybrids, a detailedmore » map of new and published STSs has been generated for the minimal region of CSNB1. PCR primer pairs for STSs has been generated for the minimal region of CSNB1. PCR primer pairs for twenty-five STSs, including eleven end-clones, were used to isolate YAC clones from CEPH, mega-CEPH, and X chromosome-specific YAC libraries. In total, fifty-two YACs were characterized for STS overlaps and assembled to provide a minimum of 3 Mb of physical coverage in the region between DXS426 and DXS988. Five gaps proximal to SYP are still to be closed. Our physical map suggests the following gene order: Xpter-OTAL1-GF1-DXS1011E-MG81-HUMCRAS2P-SYP-Xcen. STS analysis of the YACs revealed three subregions of the physical map which appear to be particularly susceptible to internal deletions and end-clone analysis demonstrated chimerism in six of seventeen YACs. A physical map of Xp11.23-Xp11.22 will provide a resource for the isolation of candidate genes for the X-linked CSNB gene which maps to this region.« less
Fine Mapping of the Barley Chromosome 6H Net Form Net Blotch Susceptibility Locus

PubMed Central

Richards, Jonathan; Chao, Shiaoman; Friesen, Timothy; Brueggeman, Robert

2016-01-01

Net form net blotch, caused by the necrotrophic fungal pathogen Pyrenophora teres f. teres, is a destructive foliar disease of barley with the potential to cause significant yield loss in major production regions throughout the world. The complexity of the host–parasite genetic interactions in this pathosystem hinders the deployment of effective resistance in barley cultivars, warranting a deeper understanding of the interactions. Here, we report on the high-resolution mapping of the dominant susceptibility locus near the centromere of chromosome 6H in the barley cultivars Rika and Kombar, which are putatively targeted by necrotrophic effectors from P. teres f. teres isolates 6A and 15A, respectively. Utilization of progeny isolates derived from a cross of P. teres f. teres isolates 6A × 15A harboring single major virulence loci (VK1, VK2, and VR2) allowed for the Mendelization of single inverse gene-for-gene interactions in a high-resolution population consisting of 2976 Rika × Kombar recombinant gametes. Brachypodium distachyon synteny was exploited to develop and saturate the susceptibility region with markers, delimiting it to ∼0.24 cM and a partial physical map was constructed. This genetic and physical characterization further resolved the dominant susceptibility locus, designated Spt1 (susceptibility to P. teres f. teres). The high-resolution mapping and cosegregation of the Spt1.R and Spt1.K gene/s indicates tightly linked genes in repulsion or alleles possibly targeted by different necrotrophic effectors. Newly developed barley genomic resources greatly enhance the efficiency of positional cloning efforts in barley, as demonstrated by the Spt1 fine mapping and physical contig identification reported here. PMID:27172206
A high-resolution whole genome radiation hybrid map of human chromosome 17q22-q25.3 across the genes for GH and TK

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, J.W.; Schafer, A.J.; Critcher, R.

1996-04-15

We have constructed a whole genome radiation hybrid (WG-RH) map across a region of human chromosome 17q, from growth hormone (GH) to thymidine kinase (TK). A panel of 128 WG-RH hybrid cell lines generated by X-irradiation and fusion has been tested for the retention of 39 sequence-tagged site (STS) markers by the polymerase chain reaction. This genome mapping technique has allowed the integration of existing VNTR and microsatellite markers with additional new markers and existing STS markers previously mapped to this region by other means. The WG-RH map includes eight expressed sequence tag (EST) and three anonymous markers developed formore » this study, together with 23 anonymous microsatellites and five existing ESTs. Analysis of these data resulted in a high-density comprehensive map across this region of the genome. A subset of these markers has been used to produce a framework map consisting of 20 loci ordered with odds greater than 1000:1. The markers are of sufficient density to build a YAC contig across this region based on marker content. We have developed sequence tags for both ends of a 2.1-Mb YAC and mapped these using the WG-RH panel, allowing a direct comparison of cRay{sub 6000} to physical distance. 31 refs., 3 figs., 2 tabs.« less
Improvement of the Threespine Stickleback Genome Using a Hi-C-Based Proximity-Guided Assembly.

PubMed

Peichel, Catherine L; Sullivan, Shawn T; Liachko, Ivan; White, Michael A

2017-09-01

Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity-guided assembly (PGA) to perform a de novo genome assembly from relatively short contigs. Using Hi-C based PGA, we generated complete chromosome assemblies from a distribution of short contigs (20-100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups (LGs), with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the 2 assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to LGs. Together, our results highlight the potential of the Hi-C based PGA method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Mapping analysis of scaffold/matrix attachment regions (s/MARs) from two different mammalian cell lines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pilus, Nur Shazwani Mohd; Ahmad, Azrin; Yusof, Nurul Yuziana Mohd

Scaffold/matrix attachment regions (S/MARs) are potential element that can be integrated into expression vector to increase expression of recombinant protein. Many studies on S/MAR have been done but none has revealed the distribution of S/MAR in a genome. In this study, we have isolated S/MAR sequences from HEK293 and Chinese hamster ovary cell lines (CHO DG44) using two different methods utilizing 2 M NaCl and lithium-3,5-diiodosalicylate (LIS). The isolated S/MARs were sequenced using Next Generation Sequencing (NGS) platform. Based on reference mapping analysis against human genome database, a total of 8,994,856 and 8,412,672 contigs of S/MAR sequences were retrieved frommore » 2M NaCl and LIS extraction of HEK293 respectively. On the other hand, reference mapping analysis of S/MAR derived from CHO DG44 against our own CHO DG44 database have generated a total of 7,204,348 and 4,672,913 contigs from 2 M NaCl and LIS extraction method respectively.« less
A Fast and Scalable Radiation Hybrid Map Construction and Integration Strategy

PubMed Central

Agarwala, Richa; Applegate, David L.; Maglott, Donna; Schuler, Gregory D.; Schäffer, Alejandro A.

2000-01-01

This paper describes a fast and scalable strategy for constructing a radiation hybrid (RH) map from data on different RH panels. The maps on each panel are then integrated to produce a single RH map for the genome. Recurring problems in using maps from several sources are that the maps use different markers, the maps do not place the overlapping markers in same order, and the objective functions for map quality are incomparable. We use methods from combinatorial optimization to develop a strategy that addresses these issues. We show that by the standard objective functions of obligate chromosome breaks and maximum likelihood, software for the traveling salesman problem produces RH maps with better quality much more quickly than using software specifically tailored for RH mapping. We use known algorithms for the longest common subsequence problem as part of our map integration strategy. We demonstrate our methods by reconstructing and integrating maps for markers typed on the Genebridge 4 (GB4) and the Stanford G3 panels publicly available from the RH database. We compare map quality of our integrated map with published maps for GB4 panel and G3 panel by considering whether markers occur in the same order on a map and in DNA sequence contigs submitted to GenBank. We find that all of the maps are inconsistent with the sequence data for at least 50% of the contigs, but our integrated maps are more consistent. The map integration strategy not only scales to multiple RH maps but also to any maps that have comparable criteria for measuring map quality. Our software improves on current technology for doing RH mapping in areas of computation time and algorithms for considering a large number of markers for mapping. The essential impediments to producing dense high-quality RH maps are data quality and panel size, not computation. PMID:10720576
Regional gene mapping using mixed radiation hybrids and reverse chromosome painting.

PubMed

Lin, J Y; Bedford, J S

1997-11-01

We describe a new approach for low-resolution physical mapping using pooled DNA probe from mixed (non-clonal) populations of human-CHO cell hybrids and reverse chromosome painting. This mapping method is based on a process in which the human chromosome fragments bearing a complementing gene were selectively retained in a large non-clonal population of CHO-human hybrid cells during a series of 12- to 15-Gy gamma irradiations each followed by continuous growth selection. The location of the gene could then be identified by reverse chromosome painting on normal human metaphase spreads using biotinylated DNA from this population of "enriched" hybrid cells. We tested the validity of this method by correctly mapping the complementing human HPRT gene, whose location is well established. We then demonstrated the method's usefulness by mapping the chromosome location of a human gene which complemented the defect responsible for the hypersensitivity to ionizing radiation in CHO irs-20 cells. This method represents an efficient alternative to conventional concordance analysis in somatic cell hybrids where detailed chromosome analysis of numerous hybrid clones is necessary. Using this approach, it is possible to localize a gene for which there is no prior sequence or linkage information to a subchromosomal region, thus facilitating association with known mapping landmarks (e.g. RFLP, YAC or STS contigs) for higher-resolution mapping.
Physical mapping of the Gorlin syndrome region on 9q22 by pulsed field gel electrophoresis (PFGE) and FISH

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levanat, S.; Gailani, M.; Dean, M.

1994-09-01

Gorlin syndrome is an autosomal dominant disorder characterized by basal cell carcinomas, medulloblastomas, and ovarian fibromas, as well as widespread developmental defects. Linkage and tumor deletion studies localized the gene for this syndrome to the 3 cM region on chromosome 9q22 between D9S196 and D9S180. Several groups have constructed YAC contigs of this region, but many of the YACs are known to contain rearrangements. Mapping by PGE and FISH is useful in further characterization of the relationship between physical distance and genetic distance. We isolated seven cosmids mapping to this region (D9S180, D9S196, D9S287, Col 15A1, XPA and two newmore » anonymous cosmids). FISH gave a distance between D9S196 and D9S180 of at least 2 Mb and showed that Col15A1, previously considered as a candidate gene, mapped a few hundred kb distal to S180. For PFGE, DNA blocks from normal and 20 Gorlin syndrome patients were digested with 5 restriction enzymes and probed with single copy fragments of the seven cosmids. No aberrant bands have been identified in patients. Non-overlapping Not I fragments from these seven markers totalled 2.3 kb. Given an average gene density, a region of this size would contain 50-100 genes.« less
Assembly of YAC contigs on the long arm of human chromosome 2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, J.; Fujiwara, T.M.; Wang, J.X.

1994-09-01

We have previously identified approximately 2,000 chromosome 2-specific YACs by screening the CEPH Mark I YAC library (`Midi- YACs`). Using STS content mapping, we have been able to order groups of these YACs along chromosome 2q. The four biggest YAC groups were associated with VIL (2q35), FN (2q34), PAX3 (2q36), ALPI (2q37) and contained 113, 107, 79, and 63 YACs, respectively. We have identified the minimal tiling paths for most YAC groups and determined the insert sizes of over 300 YACs. Furthermore, on human chromosome 2q31-q37, 15 microsatellite markers were linked to various expressed genes through overlapping YACs and themore » physical distance of microsatellites to expressed genes was determined. The precise mapping of a set of highly informative microsatellite markers with respect to known genes provides a useful tool for linkage studies and the identification of disease genes from the long arm of human chromosome 2.« less
A whole-genome, radiation hybrid map of wheat

USDA-ARS?s Scientific Manuscript database

Generating a reference sequence of bread wheat (Triticum aestivum L.) is a challenging task because of its large, highly repetitive and allopolyploid genome. Ordering of BAC- and NGS-based contigs in ongoing wheat genome-sequencing projects primarily uses recombination and comparative genomics-base...
Deep, Staged Transcriptomic Resources for the Novel Coleopteran Models Atrachya menetriesi and Callosobruchus maculatus.

PubMed

Benton, Matthew A; Kenny, Nathan J; Conrads, Kai H; Roth, Siegfried; Lynch, Jeremy A

2016-01-01

Despite recent efforts to sample broadly across metazoan and insect diversity, current sequence resources in the Coleoptera do not adequately describe the diversity of the clade. Here we present deep, staged transcriptomic data for two coleopteran species, Atrachya menetriesi (Faldermann 1835) and Callosobruchus maculatus (Fabricius 1775). Our sampling covered key stages in ovary and early embryonic development in each species. We utilized this data to build combined assemblies for each species which were then analysed in detail. The combined A. menetriesi assembly consists of 228,096 contigs with an N50 of 1,598 bp, while the combined C. maculatus assembly consists of 128,837 contigs with an N50 of 2,263 bp. For these assemblies, 34.6% and 32.4% of contigs were identified using Blast2GO, and 97% and 98.3% of the BUSCO set of metazoan orthologs were present, respectively. We also carried out manual annotation of developmental signalling pathways and found that nearly all expected genes were present in each transcriptome. Our analyses show that both transcriptomes are of high quality. Lastly, we performed read mapping utilising our timed, stage specific RNA samples to identify differentially expressed contigs. The resources presented here will provide a firm basis for a variety of experimentation, both in developmental biology and in comparative genomic studies.
Positional cloning of the chromosome 14 Alzheimer`s disease locus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clark, R.F.; Korenblat, K.M.; Goate, A.M.

1994-09-01

Genetic linkage analysis had indicated a locus for familial early-onset Alzheimer`s disease (FAD) on chromosome 14 at q24.3. The FAD locus has been shown previously to lie between the dinucleotide markers D14S61 and D14S63, a genetic distance of approximately 13 cM. We are currently attempting to identify the gene using a positional cloning strategy. The first step towards the isolation and characterization of this locus was the construction of an overlapping YAC contig covering the entire region. Over forty YACs which map to this region have been isolated from the St. Louis and CEPH libraries by a combination of YACmore » end sequence walking and sequence tagged site mapping. Our contig fully spans the complete domain, encompassing all genetic markers non-recombinant with FAD (i.e. D14S76, D14S43, D14S71, D14S77) and the two nearest flanking FAD-recombinant markers. With restriction mapping of the domain, we can determine the exact size of the region. As a second step, the YACs in this contig are currently being inspected for expressed sequences by exon trapping, initially on those YACs known to be nonchimeric. We have currently made exon-trapped libraries from YACs that have the markers D14S76 and D14S43. Sequence analysis of these libraries indicates that a trapped exon is identified on average for each 30 kb of YAC DNA. The trapped exons are being screened to identify likely candidate genes, which will be examined for mutations in FAD families.« less
Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

PubMed

Kisand, Veljo; Lettieri, Teresa

2013-04-01

De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.
A transcription map of the regions surrounding the CSF1R locus on human chromosome 5q31: Candidate genes for diastrophic dysplasia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clines, G.; Lovett, M.

1994-09-01

Diastrophic dysplasia (DTD) is an autosomal recessive disorder of unknown pathogenesis that is characterized by abnormal skeletal and cartilage growth. Phenotypic characteristics of the disorder include short stature, scoliosis, and deformation of the first metacarpal. The diastrophic dysplasia gene has been localized to chromosome 5q31-33, within {approximately}60 kb of the colony stimulating factor 1 receptor gene (CSF1R). We have used direct cDNA selection to build a transcription map across {approximately}250 kb surrounding and including the CSF1R locus. cDNA pools from human placenta, activated T cells, cerebellum, Hela cells, fetal brain, chondrocytes, chondrosarcomas and osteosarcomas were multiplexed in these selections. Aftermore » two rounds of selection, an analysis revealed that {approximately}70% of the selected cDNAs were contained within the contig. DNA sequencing and cosmid mapping data from a collection of 310 clones revealed the presence of three new genes in this region that show no appreciable homologies on sequence database searches, as well as cDNA clones from the CSF1R and the PDGFRB loci (another of the known genes in the region). An additional cDNA was found with 100% homology to the gene encoding human ribosomal protein L7 (RPL7). This cDNA comprised {approximately}25% of all selected clones. However, further analysis of the genomic contig revealed the presence of an RPL7 processed pseudogene in very close proximity to the CSF1R and PDGFRB genes. The selection of processed pseudogenes is one previously anticipated artifact of selection metholodolgies, but has not been previously observed. Mutational analysis of the three new genes is underway in diastrophic dysplasia families, as is derivation of full length cDNA clones and the expansion of this detailed transcription map into a larger genomic contig.« less
Two sequence-ready contigs spanning the two copies of a 200-kb duplication on human 21q: partial sequence and polymorphisms.

PubMed

Potier, M; Dutriaux, A; Orti, R; Groet, J; Gibelin, N; Karadima, G; Lutfalla, G; Lynn, A; Van Broeckhoven, C; Chakravarti, A; Petersen, M; Nizetic, D; Delabar, J; Rossier, J

1998-08-01

Physical mapping across a duplication can be a tour de force if the region is larger than the size of a bacterial clone. This was the case of the 170- to 275-kb duplication present on the long arm of chromosome 21 in normal human at 21q11.1 (proximal region) and at 21q22.1 (distal region), which we described previously. We have constructed sequence-ready contigs of the two copies of the duplication of which all the clones are genuine representatives of one copy or the other. This required the identification of four duplicon polymorphisms that are copy-specific and nonallelic variations in the sequence of the STSs. Thirteen STSs were mapped inside the duplicated region and 5 outside but close to the boundaries. Among these STSs 10 were end clones from YACs, PACs, or cosmids, and the average interval between two markers in the duplicated region was 16 kb. Eight PACs and cosmids showing minimal overlaps were selected in both copies of the duplication. Comparative sequence analysis along the duplication showed three single-basepair changes between the two copies over 659 bp sequenced (4 STSs), suggesting that the duplication is recent (less than 4 mya). Two CpG islands were located in the duplication, but no genes were identified after a 36-kb cosmid from the proximal copy of the duplication was sequenced. The homology of this chromosome 21 duplicated region with the pericentromeric regions of chromosomes 13, 2, and 18 suggests that the mechanism involved is probably similar to pericentromeric-directed mechanisms described in interchromosomal duplications. Copyright 1998 Academic Press.
Comparative fine mapping of the Wax 1 (W1) locus in hexaploid wheat.

PubMed

Lu, Ping; Qin, Jinxia; Wang, Guoxin; Wang, Lili; Wang, Zhenzhong; Wu, Qiuhong; Xie, Jingzhong; Liang, Yong; Wang, Yong; Zhang, Deyun; Sun, Qixin; Liu, Zhiyong

2015-08-01

By applying comparative genomics analyses, a high-density genetic linkage map of the Wax 1 ( W1 ) locus was constructed as a framework for map-based cloning. Glaucousness is described as the scattering effect of visible light from wax deposited on the cuticle of plant aerial organs. In wheat, the wax on leaves and stems is mainly controlled by two sets of genes: glaucousness loci (W1 and W2) and non-glaucousness loci (Iw1 and Iw2). Bulked segregant analysis (BSA) and simple sequence repeat (SSR) mapping showed that Wax1 (W1) is located on chromosome arm 2BS between markers Xgwm210 and Xbarc35. By applying comparative genomics analyses, colinearity genomic regions of the W1 locus on wheat 2BS were identified in Brachypodium distachyon chromosome 5, rice chromosome 4 and sorghum chromosome 6, respectively. Four STS markers were developed using the Triticum aestivum cv. Chinese Spring 454 contig sequences and the International Wheat Genome Sequencing Consortium (IWGSC) survey sequences. W1 was mapped into a 0.93 cM genetic interval flanked by markers XWGGC3197 and XWGGC2484, which has synteny with genomic regions of 56.5 kb in Brachypodium, 390 kb in rice and 31.8 kb in sorghum. The fine genetic map can serve as a framework for chromosome landing, physical mapping and map-based cloning of the W1 in wheat.
Use of Genome Sequence Information for Meat Quality Trait QTL Mining for Causal Genes and Mutations on Pig Chromosome 17

PubMed Central

Hu, Zhi-Liang; Ramos, Antonio M.; Humphray, Sean J.; Rogers, Jane; Reecy, James M.; Rothschild, Max F.

2011-01-01

The newly available pig genome sequence has provided new information to fine map quantitative trait loci (QTL) in order to eventually identify causal variants. With targeted genomic sequencing efforts, we were able to obtain high quality BAC sequences that cover a region on pig chromosome 17 where a number of meat quality QTL have been previously discovered. Sequences from 70 BAC clones were assembled to form an 8-Mbp contig. Subsequently, we successfully mapped five previously identified QTL, three for meat color and two for lactate related traits, to the contig. With an additional 25 genetic markers that were identified by sequence comparison, we were able to carry out further linkage disequilibrium analysis to narrow down the genomic locations of these QTL, which allowed identification of the chromosomal regions that likely contain the causative variants. This research has provided one practical approach to combine genetic and molecular information for QTL mining. PMID:22303339
Genomic stability in the archaeae Haloferax volcanii and Haloferax mediterranei.

PubMed Central

López-García, P; St Jean, A; Amils, R; Charlebois, R L

1995-01-01

Through hybridization of available probes, we have added nine genes to the macrorestriction map of the Haloferax mediterranei chromosome and five genes to the contig map of Haloferax volcanii. Additionally, we hybridized 17 of the mapped cosmid clones from H. volcanii to the H. mediterranei genome. The resulting 35-point chromosomal comparison revealed only two inversions and a few translocations. Forces known to promote rearrangement, common in the haloarchaea, have been ineffective in changing global gene order throughout the nearly 10(7) years of these species' divergent evolution. PMID:7868620
Anhidrotic ectodermal dysplasia gene region cloned in yeast artificial chromosomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kere, J.; Grzeschik, K.H.; Limon, J.

1993-05-01

Anhidrotic ectodermal dysplasia (EDA), an X-chromosomal recessive disorder, is expressed in a few females with chromosomal translocations involving bands Xq12-q13. Using available DNA markers from the region and somatic cell hybrids the authors mapped the X-chromosomal breakpoints in two such translocations. The breakpoints were further mapped within a yeast artificial chromosome contig constructed by chromosome walking techniques. Genomic DNA markers that map between the two translocation breakpoints were recovered representing putative portions of the EDA gene. 32 refs., 3 figs., 1 tab.

Fine Physical and Genetic Mapping of Powdery Mildew Resistance Gene MlIW172 Originating from Wild Emmer (Triticum dicoccoides)

PubMed Central

Han, Jun; Zhao, Xiaojie; Cui, Yu; Song, Wei; Huo, Naxin; Liang, Yong; Xie, Jingzhong; Wang, Zhenzhong; Wu, Qiuhong; Chen, Yong-Xing; Lu, Ping; Zhang, De-Yun; Wang, Lili; Sun, Hua; Yang, Tsomin; Keeble-Gagnere, Gabriel; Appels, Rudi; Doležel, Jaroslav; Ling, Hong-Qing; Luo, Mingcheng; Gu, Yongqiang; Sun, Qixin; Liu, Zhiyong

2014-01-01

Powdery mildew, caused by Blumeria graminis f. sp. tritici, is one of the most important wheat diseases in the world. In this study, a single dominant powdery mildew resistance gene MlIW172 was identified in the IW172 wild emmer accession and mapped to the distal region of chromosome arm 7AL (bin7AL-16-0.86-0.90) via molecular marker analysis. MlIW172 was closely linked with the RFLP probe Xpsr680-derived STS marker Xmag2185 and the EST markers BE405531 and BE637476. This suggested that MlIW172 might be allelic to the Pm1 locus or a new locus closely linked to Pm1. By screening genomic BAC library of durum wheat cv. Langdon and 7AL-specific BAC library of hexaploid wheat cv. Chinese Spring, and after analyzing genome scaffolds of Triticum urartu containing the marker sequences, additional markers were developed to construct a fine genetic linkage map on the MlIW172 locus region and to delineate the resistance gene within a 0.48 cM interval. Comparative genetics analyses using ESTs and RFLP probe sequences flanking the MlIW172 region against other grass species revealed a general co-linearity in this region with the orthologous genomic regions of rice chromosome 6, Brachypodium chromosome 1, and sorghum chromosome 10. However, orthologous resistance gene-like RGA sequences were only present in wheat and Brachypodium. The BAC contigs and sequence scaffolds that we have developed provide a framework for the physical mapping and map-based cloning of MlIW172. PMID:24955773
Cloning of the anhidrotic ectodermal dysplasia gene: Identification of cDNAs associated with CpG islands mapped near translocation breakpoint in two female patients

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srivastava, A.K.; Schlessinger, D.; Kere, J.

1994-09-01

The gene for the X chromosomal developmental disorder anhidrotic ectodermal dysplasia (EDA) has been mapped to Xq12-q13 by linkage analysis and is expressed in a few females with chromosomal translocations involving band Xq12-q13. A yeast artificial chromosome (YAC) contig (2.0 Mb) spanning two translocation breakpoints has been assembled by sequence-tagged site (STS)-based chromosomal walking. The two translocation breakpoints (X:autosome translocations from the affected female patients) have been mapped less than 60 kb apart within a YAC contig. Unique probes and intragenic STSs (mapped between the two translocations) have been developed and a somatic cell hybrid carrying the translocated X chromosomemore » from the AK patient has been analyzed by isolating unique probes that span the breakpoint. Several STSs made from intragenic sequences have been found to be conserved in mouse, hamster and monkey, but we have detected no mRNAs in a number of tissues tested. However, a probe and STS developed from the DNA spanning the AK breakpoint is conserved in mouse, hamster and monkey, and we have detected expressed sequences in skin cells and cDNA libraries. In addition, unique sequences have been obtained from two CpG islands in the region that maps proximal to the breakpoints. cDNAs containing these sequences are being studied as candidates for the gene affected in the etiology of EDA.« less
A candidate region for Nevoid Basal Cell Carcinoma Syndrome defined by genetic and physical mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wainwright, B.; Negus, K.; Berkman, J.

1994-09-01

Nevoid Basal Cell Carcinoma Syndrome (NBCCS, or Gorlin`s syndrome) is a cancer predisposition syndrome charcterized by multiple basal cell carcinomas (BCCs) and diverse developmental defects. The gene responsible for NBCCS, which is most likely to be a tumor suppressor gene, has previously been mapped to 9q22.3-q31 in a 12 cM interval between the microsatellite marker loci D9S12 and D9S109. Combined multipoint and haplotype analyses of Australian pedigrees has further refined the localization to a 2 cM interval between markers D9S196 and D9S180. Our loss of heterozygosity (LOH) studies from sporadic (n= 58) and familial (n=41) BCCs indicate that 50% havemore » deletions within the NBCCS candidate region. All LOH is consistent with the genetic mapping of the NBCCS locus. Additionally, one sporadic tumor indicates that the smallest region of overlap in the deletions is within the interval D9S287 (proximal) and D9S180 (distal). A series of YAC clones from within this region has been mapped by FISH to examine chimerism. These clones, which have been mapped with respect to one another, form a contig which encompasses the candidate region from D9S196 to D9S180.« less
Deep, Staged Transcriptomic Resources for the Novel Coleopteran Models Atrachya menetriesi and Callosobruchus maculatus

PubMed Central

Conrads, Kai H.; Roth, Siegfried; Lynch, Jeremy A.

2016-01-01

Despite recent efforts to sample broadly across metazoan and insect diversity, current sequence resources in the Coleoptera do not adequately describe the diversity of the clade. Here we present deep, staged transcriptomic data for two coleopteran species, Atrachya menetriesi (Faldermann 1835) and Callosobruchus maculatus (Fabricius 1775). Our sampling covered key stages in ovary and early embryonic development in each species. We utilized this data to build combined assemblies for each species which were then analysed in detail. The combined A. menetriesi assembly consists of 228,096 contigs with an N50 of 1,598 bp, while the combined C. maculatus assembly consists of 128,837 contigs with an N50 of 2,263 bp. For these assemblies, 34.6% and 32.4% of contigs were identified using Blast2GO, and 97% and 98.3% of the BUSCO set of metazoan orthologs were present, respectively. We also carried out manual annotation of developmental signalling pathways and found that nearly all expected genes were present in each transcriptome. Our analyses show that both transcriptomes are of high quality. Lastly, we performed read mapping utilising our timed, stage specific RNA samples to identify differentially expressed contigs. The resources presented here will provide a firm basis for a variety of experimentation, both in developmental biology and in comparative genomic studies. PMID:27907180
A Genome-Wide Association Study Identifies Genomic Regions for Virulence in the Non-Model Organism Heterobasidion annosum s.s

PubMed Central

Dalman, Kerstin; Himmelstrand, Kajsa; Olson, Åke; Lind, Mårten; Brandström-Durling, Mikael; Stenlid, Jan

2013-01-01

The dense single nucleotide polymorphisms (SNP) panels needed for genome wide association (GWA) studies have hitherto been expensive to establish and use on non-model organisms. To overcome this, we used a next generation sequencing approach to both establish SNPs and to determine genotypes. We conducted a GWA study on a fungal species, analysing the virulence of Heterobasidion annosum s.s., a necrotrophic pathogen, on its hosts Picea abies and Pinus sylvestris. From a set of 33,018 single nucleotide polymorphisms (SNP) in 23 haploid isolates, twelve SNP markers distributed on seven contigs were associated with virulence (P<0.0001). Four of the contigs harbour known virulence genes from other fungal pathogens and the remaining three harbour novel candidate genes. Two contigs link closely to virulence regions recognized previously by QTL mapping in the congeneric hybrid H. irregulare × H. occidentale. Our study demonstrates the efficiency of GWA studies for dissecting important complex traits of small populations of non-model haploid organisms with small genomes. PMID:23341945
Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

PubMed Central

Mascher, Martin; Muehlbauer, Gary J; Rokhsar, Daniel S; Chapman, Jarrod; Schmutz, Jeremy; Barry, Kerrie; Muñoz-Amatriaín, María; Close, Timothy J; Wise, Roger P; Schulman, Alan H; Himmelbach, Axel; Mayer, Klaus FX; Scholz, Uwe; Poland, Jesse A; Stein, Nils; Waugh, Robbie

2013-01-01

Next-generation whole-genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence-based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost-efficient establishment of powerful genomic information for many species. PMID:23998490
ILP-based maximum likelihood genome scaffolding

PubMed Central

2014-01-01

Background Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to orient, order, and link contigs into larger structures referred to as scaffolds. Due to library preparation artifacts and erroneous mapping of reads originating from repeats, scaffolding remains a challenging problem. In this paper, we provide a scalable scaffolding algorithm (SILP2) employing a maximum likelihood model capturing read mapping uncertainty and/or non-uniformity of contig coverage which is solved using integer linear programming. A Non-Serial Dynamic Programming (NSDP) paradigm is applied to render our algorithm useful in the processing of larger mammalian genomes. To compare scaffolding tools, we employ novel quantitative metrics in addition to the extant metrics in the field. We have also expanded the set of experiments to include scaffolding of low-complexity metagenomic samples. Results SILP2 achieves better scalability throughg a more efficient NSDP algorithm than previous release of SILP. The results show that SILP2 compares favorably to previous methods OPERA and MIP in both scalability and accuracy for scaffolding single genomes of up to human size, and significantly outperforms them on scaffolding low-complexity metagenomic samples. Conclusions Equipped with NSDP, SILP2 is able to scaffold large mammalian genomes, resulting in the longest and most accurate scaffolds. The ILP formulation for the maximum likelihood model is shown to be flexible enough to handle metagenomic samples. PMID:25253180
Metavir 2: new tools for viral metagenome comparison and assembled virome analysis

PubMed Central

2014-01-01

Background Metagenomics, based on culture-independent sequencing, is a well-fitted approach to provide insights into the composition, structure and dynamics of environmental viral communities. Following recent advances in sequencing technologies, new challenges arise for existing bioinformatic tools dedicated to viral metagenome (i.e. virome) analysis as (i) the number of viromes is rapidly growing and (ii) large genomic fragments can now be obtained by assembling the huge amount of sequence data generated for each metagenome. Results To face these challenges, a new version of Metavir was developed. First, all Metavir tools have been adapted to support comparative analysis of viromes in order to improve the analysis of multiple datasets. In addition to the sequence comparison previously provided, viromes can now be compared through their k-mer frequencies, their taxonomic compositions, recruitment plots and phylogenetic trees containing sequences from different datasets. Second, a new section has been specifically designed to handle assembled viromes made of thousands of large genomic fragments (i.e. contigs). This section includes an annotation pipeline for uploaded viral contigs (gene prediction, similarity search against reference viral genomes and protein domains) and an extensive comparison between contigs and reference genomes. Contigs and their annotations can be explored on the website through specifically developed dynamic genomic maps and interactive networks. Conclusions The new features of Metavir 2 allow users to explore and analyze viromes composed of raw reads or assembled fragments through a set of adapted tools and a user-friendly interface. PMID:24646187
Transcriptional dynamics of the developing sweet cherry (Prunus avium L.) fruit: sequencing, annotation and expression profiling of exocarp-associated genes

PubMed Central

Alkio, Merianne; Jonas, Uwe; Declercq, Myriam; Van Nocker, Steven; Knoche, Moritz

2014-01-01

The exocarp, or skin, of fleshy fruit is a specialized tissue that protects the fruit, attracts seed dispersing fruit eaters, and has large economical relevance for fruit quality. Development of the exocarp involves regulated activities of many genes. This research analyzed global gene expression in the exocarp of developing sweet cherry (Prunus avium L., ‘Regina’), a fruit crop species with little public genomic resources. A catalog of transcript models (contigs) representing expressed genes was constructed from de novo assembled short complementary DNA (cDNA) sequences generated from developing fruit between flowering and maturity at 14 time points. Expression levels in each sample were estimated for 34 695 contigs from numbers of reads mapping to each contig. Contigs were annotated functionally based on BLAST, gene ontology and InterProScan analyses. Coregulated genes were detected using partitional clustering of expression patterns. The results are discussed with emphasis on genes putatively involved in cuticle deposition, cell wall metabolism and sugar transport. The high temporal resolution of the expression patterns presented here reveals finely tuned developmental specialization of individual members of gene families. Moreover, the de novo assembled sweet cherry fruit transcriptome with 7760 full-length protein coding sequences and over 20 000 other, annotated cDNA sequences together with their developmental expression patterns is expected to accelerate molecular research on this important tree fruit crop. PMID:26504533
A clone-free, single molecule map of the domestic cow (Bos taurus) genome.

PubMed

Zhou, Shiguo; Goldstein, Steve; Place, Michael; Bechner, Michael; Patino, Diego; Potamousis, Konstantinos; Ravindran, Prabu; Pape, Louise; Rincon, Gonzalo; Hernandez-Ortiz, Juan; Medrano, Juan F; Schwartz, David C

2015-08-28

The cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation. The optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts). Alignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds.
Transcription map of Xq27: candidates for several X-linked diseases.

PubMed

Zucchi, I; Jones, J; Affer, M; Montagna, C; Redolfi, E; Susani, L; Vezzoni, P; Parvari, R; Schlessinger, D; Whyte, M P; Mumm, S

1999-04-15

Human Xq27 contains candidate regions for several disorders, yet is predicted to be a gene-poor cytogenetic band. We have developed a transcription map for the entire cytogenetic band to facilitate the identification of the relatively small number of expected candidate genes. Two approaches were taken to identify genes: (1) a group of 64 unique STSs that were generated during the physical mapping of the region were used in RT-PCR with RNA from human adult and fetal brain and (2) ESTs that have been broadly mapped to this region of the chromosome were finely mapped using a high-resolution yeast artificial chromosome contig. This combined approach identified four distinct regions of transcriptional activity within the Xq27 band. Among them is a region at the centromeric boundary that contains candidate regions for several rare developmental disorders (X-linked recessive hypoparathyroidism, thoracoabdominal syndrome, albinism-deafness syndrome, and Borjeson-Forssman-Lehman syndrome). Two transcriptionally active regions were identified in the center of Xq27 and include candidate regions for X-linked mental retardation syndrome 6, X-linked progressive cone dystrophy, X-linked retinitis pigmentosa 24, and a prostate cancer susceptibility locus. The fourth region of transcriptional activity encompasses the FMR1 (FRAXA) and FMR2 (FRAXE) genes. The analysis thus suggests clustered transcription in Xq27 and provides candidates for several heritable disorders for which the causative genes have not yet been found. Copyright 1999 Academic Press.
Physical mapping of chromosome 12q breakpoints in lipoma, pleomorphic salivary gland adenoma, uterine leiomyoma, and myxoid liposarcoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schoenmakers, E.F.P.M.; Kools, P.F.J.; Mols, R.

1994-03-15

The authors report here the physical mapping of recurrent chromosome 12q13-q15 breakpoints in cell lines derived from primary myxoid liposarcoma, lipoma, uterine leiomyoma, and pleomorphic adenoma of the salivary glands. In fluorescence in situ hybridization (FISH) experiments, they first mapped the position of the chromosome 12 translocation breakpoint in uterine leiomyoma cell line LM-30.1/SV40 relative to loci COL2A1, D12S4, D12S17, D12S6, D12S19, D12S8, and D12S7. It mapped between linkage probes CRI-C86 (D12S19) and p7G11 (D12S8). They then isolated YAC clones using CRI-C86- and p7G11-derived sequence-tagged sites, constructed corresponding YAC contigs of 310 and 800 kb, respectively, and a mixture ofmore » them was used to routinely study the various tumor cell lines by FISH analysis. The chromosome 12 breakpoints of all tumor cell lines tested mapped between cosmids LLNL12NCO1-98C10 and LLNL12NCO1-113D12. None of the breakpoints appeared to map within any of the isolated YAC clones. Furthermore, FISH analysis using cosmid LLNL12-NCO1-144G3, which maps at the CHOP locus, revealed that the chromosome 12 breakpoints in all cell lines of the three benign solid tumors that were tested were located distal to the chromosome 12 translocation breakpoint with the CHOP gene in myxoid liposarcoma cells with t(12;16). In conclusion, the studies seem to indicate that the chromosome 12 breakpoints of myxoid liposarcoma, lipoma, uterine leiomyoma, and pleomorphic adenoma of the salivary glands are all clustered within the 7-cM interval between D12S19 and D12S8, with those of the benign solid tumors distal to CHOP. Finally, the MYF5 gene mapped telomeric to LLNL12NCO1-113D12, and the MIP gene mapped centromeric to the chromosome 12 translocation breakpoint in myxoid liposarcoma cells. 56 refs., 5 figs., 3 tabs.« less
A 1.5-Mb cosmid contig of the CMT1A duplication/HNPP deletion critical region in 17p11.2-p12

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murakami, Tatsufumi; Lupski, J.R.

1996-05-15

Charcot-Marie-Tooth disease type 1A (CMT1A) is associated with a 1.5-Mb tandem duplication in chromosome 17p11.2-p12, and hereditary neuropathy with liability to pressure palsies (HNPP) is associated with a 1.5-Mb deletion at this locus. Both diseases appear to result from an altered copy number of the peripheral myelin protein-22 gene, PMP22, which maps within the critical region. To identify additional genes and characterize chromosomal elements, a 1.5-Mb cosmid contig of the CMT1A duplication/HNPP deletion critical region was assembled using a yeast artificial chromosome (YAC)-based isolation and binning strategy. Whole YAC probes were used for screening a high-density arrayed chromosome 17-specific cosmidmore » library. Selected cosmids were spotted on dot blots and assigned to bins defined by YACs. This binning of cosmids facilitated the subsequent fingerprint analysis. The 1.5-Mb region was covered by 137 cosmids with a minimum overlap set of 52 cosmids assigned to 17 bins and 9 contigs. 20 refs., 2 figs.« less
Transcriptome sequencing for high throughput SNP development and genetic mapping in Pea

PubMed Central

2014-01-01

Background Pea has a complex genome of 4.3 Gb for which only limited genomic resources are available to date. Although SNP markers are now highly valuable for research and modern breeding, only a few are described and used in pea for genetic diversity and linkage analysis. Results We developed a large resource by cDNA sequencing of 8 genotypes representative of modern breeding material using the Roche 454 technology, combining both long reads (400 bp) and high coverage (3.8 million reads, reaching a total of 1,369 megabases). Sequencing data were assembled and generated a 68 K unigene set, from which 41 K were annotated from their best blast hit against the model species Medicago truncatula. Annotated contigs showed an even distribution along M. truncatula pseudochromosomes, suggesting a good representation of the pea genome. 10 K pea contigs were found to be polymorphic among the genetic material surveyed, corresponding to 35 K SNPs. We validated a subset of 1538 SNPs through the GoldenGate assay, proving their ability to structure a diversity panel of breeding germplasm. Among them, 1340 were genetically mapped and used to build a new consensus map comprising a total of 2070 markers. Based on blast analysis, we could establish 1252 bridges between our pea consensus map and the pseudochromosomes of M. truncatula, which provides new insight on synteny between the two species. Conclusions Our approach created significant new resources in pea, i.e. the most comprehensive genetic map to date tightly linked to the model species M. truncatula and a large SNP resource for both academic research and breeding. PMID:24521263
Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

PubMed

Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

2016-03-01

Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
Shotgun Optical Maps of the Whole Escherichia coli O157:H7 Genome

PubMed Central

Lim, Alex; Dimalanta, Eileen T.; Potamousis, Konstantinos D.; Yen, Galex; Apodoca, Jennifer; Tao, Chunhong; Lin, Jieyi; Qi, Rong; Skiadas, John; Ramanathan, Arvind; Perna, Nicole T.; Plunkett, Guy; Burland, Valerie; Mau, Bob; Hackett, Jeremiah; Blattner, Frederick R.; Anantharaman, Thomas S.; Mishra, Bhubaneswar; Schwartz, David C.

2001-01-01

We have constructed NheI and XhoI optical maps of Escherichia coli O157:H7 solely from genomic DNA molecules to provide a uniquely valuable scaffold for contig closure and sequence validation. E. coli O157:H7 is a common pathogen found in contaminated food and water. Our approach obviated the need for the analysis of clones, PCR products, and hybridizations, because maps were constructed from ensembles of single DNA molecules. Shotgun sequencing of bacterial genomes remains labor-intensive, despite advances in sequencing technology. This is partly due to manual intervention required during the last stages of finishing. The applicability of optical mapping to this problem was enhanced by advances in machine vision techniques that improved mapping throughput and created a path to full automation of mapping. Comparisons were made between maps and sequence data that characterized sequence gaps and guided nascent assemblies. PMID:11544203
Gonadal Transcriptome Analysis of Male and Female Olive Flounder (Paralichthys olivaceus)

PubMed Central

Fan, Zhaofei; You, Feng; Wang, Lijuan; Weng, Shenda; Wu, Zhihao; Hu, Jinwei; Zou, Yuxia; Tan, Xungang; Zhang, Peijun

2014-01-01

Olive flounder (Paralichthys olivaceus) is an important commercially cultured marine flatfish in China, Korea, and Japan, of which female grows faster than male. In order to explore the molecular mechanism of flounder sex determination and development, we used RNA-seq technology to investigate transcriptomes of flounder gonads. This produced 22,253,217 and 19,777,841 qualified reads from ovary and testes, which were jointly assembled into 97,233 contigs. Among them, 23,223 contigs were mapped to known genes, of which 2,193 were predicted to be differentially expressed in ovary and 887 in testes. According to annotation information, several sex-related biological pathways including ovarian steroidogenesis and estrogen signaling pathways were firstly found in flounder. The dimorphic expression of overall sex-related genes provides further insights into sex determination and gonadal development. Our study also provides an archive for further studies of molecular mechanism of fish sex determination. PMID:25121093
Optical Whole-Genome Restriction Mapping as a Tool for Rapidly Distinguishing and Identifying Bacterial Contaminants in Clinical Samples

DTIC Science & Technology

2015-08-01

using a single 1200 kilobase contig database Map name dc cc Bc Factor V. cholerae , MJ-1236 chromosome 2 88% 24% 2112 V. cholerae , M66-2 chromosome 2 84...22% 1848 V. cholerae , O1 biovareltor str. N16961 chromosome 2 81% 22% 1782 V. cholerae , M66-2 chromosome 1 77% 60% 4620 V. cholerae , MJ-1236...chromosome 1 76% 59% 4484 V. cholerae , O1 biovar El Tor str. N16961 chromosome I 74% 57% 4218 V. cholerae , O395 chromosome 2 58% 44% 2552 V. cholerae
The Development of a High Density Linkage Map for Black Tiger Shrimp (Penaeus monodon) Based on cSNPs

PubMed Central

Baranski, Matthew; Gopikrishna, Gopalapillay; Robinson, Nicholas A.; Katneni, Vinaya Kumar; Shekhar, Mudagandur S.; Shanmugakarthik, Jayakani; Jothivel, Sarangapani; Gopal, Chavali; Ravichandran, Pitchaiyappan; Kent, Matthew; Arnyasi, Mariann; Ponniah, Alphis G.

2014-01-01

Transcriptome sequencing using Illumina RNA-seq was performed on populations of black tiger shrimp from India. Samples were collected from (i) four landing centres around the east coastline (EC) of India, (ii) survivors of a severe WSSV infection during pond culture (SUR) and (iii) the Andaman Islands (AI) in the Bay of Bengal. Equal quantities of purified total RNA from homogenates of hepatopancreas, muscle, nervous tissue, intestinal tract, heart, gonad, gills, pleopod and lymphoid organs were combined to create AI, EC and SUR pools for RNA sequencing. De novo transcriptome assembly resulted in 136,223 contigs (minimum size 100 base pairs, bp) with a total length 61 Mb, an average length of 446 bp and an average coverage of 163× across all pools. Approximately 16% of contigs were annotated with BLAST hit information and gene ontology annotations. A total of 473,620 putative SNPs/indels were identified. An Illumina iSelect genotyping array containing 6,000 SNPs was developed and used to genotype 1024 offspring belonging to seven full-sibling families. A total of 3959 SNPs were mapped to 44 linkage groups. The linkage groups consisted of between 16–129 and 13–130 markers, of length between 139–10.8 and 109.1–10.5 cM and with intervals averaging between 1.2 and 0.9 cM for the female and male maps respectively. The female map was 28% longer than the male map (4060 and 2917 cM respectively) with a 1.6 higher recombination rate observed for female compared to male meioses. This approach has substantially increased expressed sequence and DNA marker resources for tiger shrimp and is a useful resource for QTL mapping and association studies for evolutionarily and commercially important traits. PMID:24465553
Construction of a YAC contig and STS map spanning 2.5 Mbp in Xq25, the critical region for the X-linked lymphoproliferative (XLP) gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lanyi, A.; Li, B.F.; Li, S.

1994-09-01

X-linked lymphoproliferative disease (XLP) is characterized by a marked vulnerability in Epstein-Barr virus (EBV) infection. Infection of XLP patients with EBV invariably results in fatal mononucleosis, agammaglobulinemia or B-cell lymphoma. The XLP gene lies within a 10 cM region in Xq25 between DXS42 and DXS10. Initial chromosome studies revealed an interstitial, cytogenetically visible deletion in Xq25 in one XLP family (43-004). We estimated the size of the Xq25 deletion by dual laser flow karyotyping to involve 2% of the X chromosome, or approximately 3 Mbp of DNA sequences. To further delineate the deletion we performed a series of pulsed fieldmore » gel electrophoresis (PFGE) analyses which showed that DXS6 and DXS100, two Xq25-specific markers, are missing from 45-004 DNA. Five yeast artificial chromosomes (YACs) from a chromosome X specific YAC library containing sequences deleted in patient`s 43-004 DNA were isolated. These five YACs did not overlap, and their end fragments were used to screen the CEPH MegaYAC library. Seven YACs were isolated from the CEPH MegaYAC library. They could be arranged into a contig which spans between DXS6 and DXS100. The contig contains a minimum of 2.5 Mbp of human DNA. A total of 12 YAC end clone, lambda subclones and STS probes have been used to order clones within the contig. These reagents were also used in Southern blot and patients showed interstitial deletions in Xq25. The size of these deletions range between 0.5 and 2.5 Mbp. The shortest deletion probably represents the critical region for the XLP gene.« less

Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

PubMed Central

Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

2000-01-01

The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409
A high-density genetic map reveals variation in recombination rate across the genome of Daphnia magna.

PubMed

Dukić, Marinela; Berner, Daniel; Roesti, Marius; Haag, Christoph R; Ebert, Dieter

2016-10-13

Recombination rate is an essential parameter for many genetic analyses. Recombination rates are highly variable across species, populations, individuals and different genomic regions. Due to the profound influence that recombination can have on intraspecific diversity and interspecific divergence, characterization of recombination rate variation emerges as a key resource for population genomic studies and emphasises the importance of high-density genetic maps as tools for studying genome biology. Here we present such a high-density genetic map for Daphnia magna, and analyse patterns of recombination rate across the genome. A F2 intercross panel was genotyped by Restriction-site Associated DNA sequencing to construct the third-generation linkage map of D. magna. The resulting high-density map included 4037 markers covering 813 scaffolds and contigs that sum up to 77 % of the currently available genome draft sequence (v2.4) and 55 % of the estimated genome size (238 Mb). Total genetic length of the map presented here is 1614.5 cM and the genome-wide recombination rate is estimated to 6.78 cM/Mb. Merging genetic and physical information we consistently found that recombination rate estimates are high towards the peripheral parts of the chromosomes, while chromosome centres, harbouring centromeres in D. magna, show very low recombination rate estimates. Due to its high-density, the third-generation linkage map for D. magna can be coupled with the draft genome assembly, providing an essential tool for genome investigation in this model organism. Thus, our linkage map can be used for the on-going improvements of the genome assembly, but more importantly, it has enabled us to characterize variation in recombination rate across the genome of D. magna for the first time. These new insights can provide a valuable assistance in future studies of the genome evolution, mapping of quantitative traits and population genetic studies.
A YAC contig encompassing the Treacher Collins syndrome critical region at 5q31. 3-32

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dixon, J.; Gladwin, A.J.; Perveen, R.

Treacher Collins syndrome (TCOF1) is an autosomal dominant disorder of craniofacial development the features of which include conductive hearing loss and cleft palate. Previous studies have localized the TCOF1 locus between D5S519 (proximal) and SPARC (distal), a region of 22 centirays as estimated by radiation hybrid mapping. In the current investigation the authors have created a contig across the TCOF1 critical region, using YAC clones. Isolation of a novel short tandem repeat polymorphism corresponding to the end of one of the YACs has allowed reduction of the size of the critical region to [approximately] 840 kb, which has been coveredmore » with three nonchimeric YACs. Restriction mapping has revealed that the region contains a high density of clustered rare-cutter restriction sites, suggesting that it may contain a number of different genes. The results of the present investigation have further allowed confirmation that the RPS14 locus lies proximal to the critical region and can thereby be excluded from a role in the pathogenesis of TCOF1, while ANX6 lies within the TCOF1 critical region and remains a potential candidate for the mutated gene. 26 refs., 4 figs., 1 tab.« less
Candidate genes that have facilitated freshwater adaptation by palaemonid prawns in the genus Macrobrachium: identification and expression validation in a model species (M. koombooloomba).

PubMed

Rahi, Md Lifat; Amin, Shorash; Mather, Peter B; Hurwood, David A

2017-01-01

The endemic Australian freshwater prawn, Macrobrachium koombooloomba , provides a model for exploring genes involved with freshwater adaptation because it is one of the relatively few Macrobrachium species that can complete its entire life cycle in freshwater. The present study was conducted to identify potential candidate genes that are likely to contribute to effective freshwater adaptation by M. koombooloomba using a transcriptomics approach. De novo assembly of 75 bp paired end 227,564,643 high quality Illumina raw reads from 6 different cDNA libraries revealed 125,917 contigs of variable lengths (200-18,050 bp) with an N50 value of 1597. In total, 31,272 (24.83%) of the assembled contigs received significant blast hits, of which 27,686 and 22,560 contigs were mapped and functionally annotated, respectively. CEGMA (Core Eukaryotic Genes Mapping Approach) based transcriptome quality assessment revealed 96.37% completeness. We identified 43 different potential genes that are likely to be involved with freshwater adaptation in M. koombooloomba . Identified candidate genes included: 25 genes for osmoregulation, five for cell volume regulation, seven for stress tolerance, three for body fluid (haemolymph) maintenance, eight for epithelial permeability and water channel regulation, nine for egg size control and three for larval development. RSEM (RNA-Seq Expectation Maximization) based abundance estimation revealed that 6,253, 5,753 and 3,795 transcripts were expressed (at TPM value ≥10) in post larvae, juveniles and adults, respectively. Differential gene expression (DGE) analysis showed that 15 genes were expressed differentially in different individuals but these genes apparently were not involved with freshwater adaptation but rather were involved in growth, development and reproductive maturation. The genomic resources developed here will be useful for better understanding the molecular basis of freshwater adaptation in Macrobrachium prawns and other crustaceans more broadly.
Candidate genes that have facilitated freshwater adaptation by palaemonid prawns in the genus Macrobrachium: identification and expression validation in a model species (M. koombooloomba)

PubMed Central

Amin, Shorash; Mather, Peter B.; Hurwood, David A.

2017-01-01

Background The endemic Australian freshwater prawn, Macrobrachium koombooloomba, provides a model for exploring genes involved with freshwater adaptation because it is one of the relatively few Macrobrachium species that can complete its entire life cycle in freshwater. Methods The present study was conducted to identify potential candidate genes that are likely to contribute to effective freshwater adaptation by M. koombooloomba using a transcriptomics approach. De novo assembly of 75 bp paired end 227,564,643 high quality Illumina raw reads from 6 different cDNA libraries revealed 125,917 contigs of variable lengths (200–18,050 bp) with an N50 value of 1597. Results In total, 31,272 (24.83%) of the assembled contigs received significant blast hits, of which 27,686 and 22,560 contigs were mapped and functionally annotated, respectively. CEGMA (Core Eukaryotic Genes Mapping Approach) based transcriptome quality assessment revealed 96.37% completeness. We identified 43 different potential genes that are likely to be involved with freshwater adaptation in M. koombooloomba. Identified candidate genes included: 25 genes for osmoregulation, five for cell volume regulation, seven for stress tolerance, three for body fluid (haemolymph) maintenance, eight for epithelial permeability and water channel regulation, nine for egg size control and three for larval development. RSEM (RNA-Seq Expectation Maximization) based abundance estimation revealed that 6,253, 5,753 and 3,795 transcripts were expressed (at TPM value ≥10) in post larvae, juveniles and adults, respectively. Differential gene expression (DGE) analysis showed that 15 genes were expressed differentially in different individuals but these genes apparently were not involved with freshwater adaptation but rather were involved in growth, development and reproductive maturation. Discussion The genomic resources developed here will be useful for better understanding the molecular basis of freshwater adaptation in Macrobrachium prawns and other crustaceans more broadly. PMID:28194319
Bacterial Artificial Chromosome Libraries for Mouse Sequencing and Functional Analysis

PubMed Central

Osoegawa, Kazutoyo; Tateno, Minako; Woon, Peng Yeong; Frengen, Eirik; Mammoser, Aaron G.; Catanese, Joseph J.; Hayashizaki, Yoshihide; de Jong, Pieter J.

2000-01-01

Bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC) libraries providing a combined 33-fold representation of the murine genome have been constructed using two different restriction enzymes for genomic digestion. A large-insert PAC library was prepared from the 129S6/SvEvTac strain in a bacterial/mammalian shuttle vector to facilitate functional gene studies. For genome mapping and sequencing, we prepared BAC libraries from the 129S6/SvEvTac and the C57BL/6J strains. The average insert sizes for the three libraries range between 130 kb and 200 kb. Based on the numbers of clones and the observed average insert sizes, we estimate each library to have slightly in excess of 10-fold genome representation. The average number of clones found after hybridization screening with 28 probes was in the range of 9–14 clones per marker. To explore the fidelity of the genomic representation in the three libraries, we analyzed three contigs, each established after screening with a single unique marker. New markers were established from the end sequences and screened against all the contig members to determine if any of the BACs and PACs are chimeric or rearranged. Only one chimeric clone and six potential deletions have been observed after extensive analysis of 113 PAC and BAC clones. Seventy-one of the 113 clones were conclusively nonchimeric because both end markers or sequences were mapped to the other confirmed contig members. We could not exclude chimerism for the remaining 41 clones because one or both of the insert termini did not contain unique sequence to design markers. The low rate of chimerism, ∼1%, and the low level of detected rearrangements support the anticipated usefulness of the BAC libraries for genome research. [The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AQ797173–AQ797398.] PMID:10645956
High-resolution chromosome mapping of BACs using multi-colour FISH and pooled-BAC FISH as a backbone for sequencing tomato chromosome 6.

PubMed

Szinay, Dóra; Chang, Song-Bin; Khrustaleva, Ludmila; Peters, Sander; Schijlen, Elio; Bai, Yuling; Stiekema, Willem J; van Ham, Roeland C H J; de Jong, Hans; Klein Lankhorst, René M

2008-11-01

Within the framework of the International Solanaceae Genome Project, the genome of tomato (Solanum lycopersicum) is currently being sequenced. We follow a 'BAC-by-BAC' approach that aims to deliver high-quality sequences of the euchromatin part of the tomato genome. BACs are selected from various libraries of the tomato genome on the basis of markers from the F2.2000 linkage map. Prior to sequencing, we validated the precise physical location of the selected BACs on the chromosomes by five-colour high-resolution fluorescent in situ hybridization (FISH) mapping. This paper describes the strategies and results of cytogenetic mapping for chromosome 6 using 75 seed BACs for FISH on pachytene complements. The cytogenetic map obtained showed discrepancies between the actual chromosomal positions of these BACs and their markers on the linkage group. These discrepancies were most notable in the pericentromere heterochromatin, thus confirming previously described suppression of cross-over recombination in that region. In a so called pooled-BAC FISH, we hybridized all seed BACs simultaneously and found a few large gaps in the euchromatin parts of the long arm that are still devoid of seed BACs and are too large for coverage by expanding BAC contigs. Combining FISH with pooled BACs and newly recruited seed BACs will thus aid in efficient targeting of novel seed BACs into these areas. Finally, we established the occurrence of repetitive DNA in heterochromatin/euchromatin borders by combining BAC FISH with hybridization of a labelled repetitive DNA fraction (Cot-100). This strategy provides an excellent means to establish the borders between euchromatin and heterochromatin in this chromosome.
Physical mapping of the torsion dystonia region of human chromosome 9q34

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ozelius, L.J.; Hewett, J.; Shalish, C.

1994-09-01

Torsion dystonia is a syndrome characterized by loss of voluntary movements appearing as sustained muscle contractions and/or abnormal postures. The DYT1 gene is responsible for a subtype of torsion dystonia in which onset of symptoms tends to occur in a limb at an early age (mean 13 years) and to progress to a generalized state. Expression of the disease gene follows an autosomal dominant mode of inheritance with reduced penetrance. We initially mapped this gene to human chromosome 9q34 and have now defined its location to a < 1 cM region near the ASS locus based on historic recombination eventsmore » around a founder mutation in the Ashkenazic Jewish population. Using the CEPH YAC library and a chromosome 9 flow-sorted YAC library, we have generated a YAC contig spanning about 500 kb of this region. These YACs are being used to identify cosmids by direct hybridization to chromosome 9-specific cosmid libraries. Cosmids are being aligned by restriction digest patterns and by hybridization with oligonucleotide repeat probes. In addition, the cosmids are being {open_quotes}trapped{close_quotes} by exon amplification and these exons used to screen cDNA libraries. Thus far we have identified several candidate transcripts in this region.« less
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

DOE PAGES

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

2016-02-24

The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
[Complete genome sequencing and sequence analysis of BCG Tice].

PubMed

Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

2012-10-04

The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.
CAR: contig assembly of prokaryotic draft genomes using rearrangements.

PubMed

Lu, Chin Lung; Chen, Kun-Tze; Huang, Shih-Yuan; Chiu, Hsien-Tai

2014-11-28

Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed. In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage. CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.
The gene for autosomal dominant cerebellar ataxia type II is located in a 5-cM region in 3p12-p13: genetic and physical mapping of the SCA7 locus.

PubMed

David, G; Giunti, P; Abbas, N; Coullin, P; Stevanin, G; Horta, W; Gemmill, R; Weissenbach, J; Wood, N; Cunha, S; Drabkin, H; Harding, A E; Agid, Y; Brice, A

1996-12-01

Two families with autosomal dominant cerebellar ataxia with pigmentary macular dystrophy (ADCA type II) were investigated. Analysis of 23 parent-child couples demonstrated the existence of marked anticipation, greater in paternal than in maternal transmissions, with earlier age at onset and a more rapid clinical course in successive generations. Clinical analysis revealed the presence of a great variability in age at onset, initial symptom, and associated signs, confirming the characteristic clinical heterogeneity of ADCA type II. The gene for ADCA type II previously was mapped to the spinocerebellar ataxia 7 (SCA7) locus on chromosome 3p12-p21.1. Linkage analysis of the two new families of different geographic origin confirmed the characteristic genetic homogeneity of ADCA type II, distinguishing it from ADCA type I. Haplotype analysis permitted refinement of the SCA7 region to the 5-cM interval between markers D3S1312 and D3S1600 on chromosome 3p12-p13. Eighteen sequence-tagged sites were used for the construction of an integrated map of the candidate region, based on a YACs contig. The entire candidate region is contained in a single nonchimeric YAC of 660 kb. The probable involvement of a CAG trinucleotide expansion, suggested by previous studies, should greatly facilitate the identification of the gene for ADCA type II.
The gene for autosomal dominant cerebellar ataxia type II is located in a 5-cM region in 3p12-p13: genetic and physical mapping of the SCA7 locus.

PubMed Central

David, G.; Giunti, P.; Abbas, N.; Coullin, P.; Stevanin, G.; Horta, W.; Gemmill, R.; Weissenbach, J.; Wood, N.; Cunha, S.; Drabkin, H.; Harding, A. E.; Agid, Y.; Brice, A.

1996-01-01

Two families with autosomal dominant cerebellar ataxia with pigmentary macular dystrophy (ADCA type II) were investigated. Analysis of 23 parent-child couples demonstrated the existence of marked anticipation, greater in paternal than in maternal transmissions, with earlier age at onset and a more rapid clinical course in successive generations. Clinical analysis revealed the presence of a great variability in age at onset, initial symptom, and associated signs, confirming the characteristic clinical heterogeneity of ADCA type II. The gene for ADCA type II previously was mapped to the spinocerebellar ataxia 7 (SCA7) locus on chromosome 3p12-p21.1. Linkage analysis of the two new families of different geographic origin confirmed the characteristic genetic homogeneity of ADCA type II, distinguishing it from ADCA type I. Haplotype analysis permitted refinement of the SCA7 region to the 5-cM interval between markers D3S1312 and D3S1600 on chromosome 3p12-p13. Eighteen sequence-tagged sites were used for the construction of an integrated map of the candidate region, based on a YACs contig. The entire candidate region is contained in a single nonchimeric YAC of 660 kb. The probable involvement of a CAG trinucleotide expansion, suggested by previous studies, should greatly facilitate the identification of the gene for ADCA type II. PMID:8940279
Genetic and physical mapping of candidate genes for resistance to Fusarium oxysporum f.sp. tracheiphilum race 3 in cowpea [Vigna unguiculata (L.) Walp].

PubMed

Pottorff, Marti; Wanamaker, Steve; Ma, Yaqin Q; Ehlers, Jeffrey D; Roberts, Philip A; Close, Timothy J

2012-01-01

Fusarium oxysporum f.sp. tracheiphilum (Fot) is a soil-borne fungal pathogen that causes vascular wilt disease in cowpea. Fot race 3 is one of the major pathogens affecting cowpea production in California. Identification of Fot race 3 resistance determinants will expedite delivery of improved cultivars by replacing time-consuming phenotypic screening with selection based on perfect markers, thereby generating successful cultivars in a shorter time period. Resistance to Fot race 3 was studied in the RIL population California Blackeye 27 (resistant) x 24-125B-1 (susceptible). Biparental mapping identified a Fot race 3 resistance locus, Fot3-1, which spanned 3.56 cM on linkage group one of the CB27 x 24-125B-1 genetic map. A marker-trait association narrowed the resistance locus to a 1.2 cM region and identified SNP marker 1_1107 as co-segregating with Fot3-1 resistance. Macro and microsynteny was observed for the Fot3-1 locus region in Glycine max where six disease resistance genes were observed in the two syntenic regions of soybean chromosomes 9 and 15. Fot3-1 was identified on the cowpea physical map on BAC clone CH093L18, spanning approximately 208,868 bp on BAC contig250. The Fot3-1 locus was narrowed to 0.5 cM distance on the cowpea genetic map linkage group 6, flanked by SNP markers 1_0860 and 1_1107. BAC clone CH093L18 was sequenced and four cowpea sequences with similarity to leucine-rich repeat serine/threonine protein kinases were identified and are cowpea candidate genes for the Fot3-1 locus. This study has shown how readily candidate genes can be identified for simply inherited agronomic traits when appropriate genetic stocks and integrated genomic resources are available. High co-linearity between cowpea and soybean genomes illustrated that utilizing synteny can transfer knowledge from a reference legume to legumes with less complete genomic resources. Identification of Fot race 3 resistance genes will enable transfer into high yielding cowpea varieties using marker-assisted selection (MAS).
Genetic and Physical Mapping of Candidate Genes for Resistance to Fusarium oxysporum f.sp. tracheiphilum Race 3 in Cowpea [Vigna unguiculata (L.) Walp

PubMed Central

Pottorff, Marti; Wanamaker, Steve; Ma, Yaqin Q.; Ehlers, Jeffrey D.; Roberts, Philip A.; Close, Timothy J.

2012-01-01

Fusarium oxysporum f.sp. tracheiphilum (Fot) is a soil-borne fungal pathogen that causes vascular wilt disease in cowpea. Fot race 3 is one of the major pathogens affecting cowpea production in California. Identification of Fot race 3 resistance determinants will expedite delivery of improved cultivars by replacing time-consuming phenotypic screening with selection based on perfect markers, thereby generating successful cultivars in a shorter time period. Resistance to Fot race 3 was studied in the RIL population California Blackeye 27 (resistant) x 24-125B-1 (susceptible). Biparental mapping identified a Fot race 3 resistance locus, Fot3-1, which spanned 3.56 cM on linkage group one of the CB27 x 24-125B-1 genetic map. A marker-trait association narrowed the resistance locus to a 1.2 cM region and identified SNP marker 1_1107 as co-segregating with Fot3-1 resistance. Macro and microsynteny was observed for the Fot3-1 locus region in Glycine max where six disease resistance genes were observed in the two syntenic regions of soybean chromosomes 9 and 15. Fot3-1 was identified on the cowpea physical map on BAC clone CH093L18, spanning approximately 208,868 bp on BAC contig250. The Fot3-1 locus was narrowed to 0.5 cM distance on the cowpea genetic map linkage group 6, flanked by SNP markers 1_0860 and 1_1107. BAC clone CH093L18 was sequenced and four cowpea sequences with similarity to leucine-rich repeat serine/threonine protein kinases were identified and are cowpea candidate genes for the Fot3-1 locus. This study has shown how readily candidate genes can be identified for simply inherited agronomic traits when appropriate genetic stocks and integrated genomic resources are available. High co-linearity between cowpea and soybean genomes illustrated that utilizing synteny can transfer knowledge from a reference legume to legumes with less complete genomic resources. Identification of Fot race 3 resistance genes will enable transfer into high yielding cowpea varieties using marker-assisted selection (MAS). PMID:22860000
Gene expression atlas of fruit ripening and transcriptome assembly from RNA-seq data in octoploid strawberry (Fragaria × ananassa).

PubMed

Sánchez-Sevilla, José F; Vallarino, José G; Osorio, Sonia; Bombarely, Aureliano; Posé, David; Merchante, Catharina; Botella, Miguel A; Amaya, Iraida; Valpuesta, Victoriano

2017-10-23

RNA-seq has been used to perform global expression analysis of the achene and the receptacle at four stages of fruit ripening, and of the roots and leaves of strawberry (Fragaria × ananassa). About 967 million reads and 191 Gb of sequence were produced, using Illumina sequencing. Mapping the reads in the related genome of the wild diploid Fragaria vesca revealed differences between the achene and receptacle development program, and reinforced the role played by ethylene in the ripening receptacle. For the strawberry transcriptome assembly, a de novo strategy was followed, generating separate assemblies for each of the ten tissues and stages sampled. The Trinity program was used for these assemblies, resulting in over 1.4 M isoforms. Filtering by a threshold of 0.3 FPKM, and doing Blastx (E-value < 1 e-30) against the UniProt database of plants reduced the number to 472,476 isoforms. Their assembly with the MIRA program (90% homology) resulted in 26,087 contigs. From these, 91.34 percent showed high homology to Fragaria vesca genes and 87.30 percent Fragaria iinumae (BlastN E-value < 1 e-100). Mapping back the reads on the MIRA contigs identified polymorphisms at nucleotide level, using FREEBAYES, as well as estimate their relative abundance in each sample.
High Potential Source for Biomass Degradation Enzyme Discovery and Environmental Aspects Revealed through Metagenomics of Indian Buffalo Rumen

PubMed Central

Singh, K. M.; Reddy, Bhaskar; Patel, Dishita; Patel, A. K.; Patel, J. B.; Joshi, C. G.

2014-01-01

The complex microbiomes of the rumen functions as an effective system for plant cell wall degradation, and biomass utilization provide genetic resource for degrading microbial enzymes that could be used in the production of biofuel. Therefore the buffalo rumen microbiota was surveyed using shot gun sequencing. This metagenomic sequencing generated 3.9 GB of sequences and data were assembled into 137270 contiguous sequences (contigs). We identified potential 2614 contigs encoding biomass degrading enzymes including glycoside hydrolases (GH: 1943 contigs), carbohydrate binding module (CBM: 23 contigs), glycosyl transferase (GT: 373 contigs), carbohydrate esterases (CE: 259 contigs), and polysaccharide lyases (PE: 16 contigs). The hierarchical clustering of buffalo metagenomes demonstrated the similarities and dissimilarity in microbial community structures and functional capacity. This demonstrates that buffalo rumen microbiome was considerably enriched in functional genes involved in polysaccharide degradation with great prospects to obtain new molecules that may be applied in the biofuel industry. PMID:25136572
Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout

PubMed Central

Salem, Mohamed; Paneru, Bam; Al-Tobasei, Rafet; Abdouni, Fatima; Thorgaard, Gary H.; Rexroad, Caird E.; Yao, Jianbo

2015-01-01

Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome. PMID:25793877
Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance

PubMed Central

2011-01-01

Background Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes. Results A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (< 200bp), and gave redundant hits in BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits. Conclusion Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology. PMID:21679424

Viral Predation and Host Immunity Structure Microbial Communities in a Terrestrial Deep Subsurface, Hydraulically Fractured Shale System

NASA Astrophysics Data System (ADS)

Daly, R. A.; Mouser, P. J.; Trexler, R.; Wrighton, K. C.

2014-12-01

Despite a growing appreciation for the ecological role of viruses in marine and gut systems, little is known about their role in the terrestrial deep (> 2000 m) subsurface. We used assembly-based metagenomics to examine the viral component in fluids from hydraulically fractured Marcellus shale gas wells. Here we reconstructed microbial and viral genomes from samples collected 7, 82, and 328 days post fracturing. Viruses accounted for 4.14%, 0.92% and 0.59% of the sample reads that mapped to the assembly. We identified 6 complete, circularized viral genomes and an additional 92 viral contigs > 5 kb with a maximum contig size of 73.6 kb. A BLAST comparison to NCBI viral genomes revealed that 85% of viral contigs had significant hits to the viral order Caudovirales, with 43% of sequences belonging to the family Siphoviridae, 38% to Myoviridae, and 12% to Podoviridae. Enrichment of Caudovirales viruses was supported by a large number of predicted proteins characteristic of tailed viruses including terminases (TerL), tape measure, tail formation, and baseplate related proteins. The viral contigs included evidence of lytic and temperate lifestyles, with the 7 day sample having the greatest number of detected lytic viruses. Notably in this sample, the most abundant virus was lytic and its inferred host, a member of the Vibrionaceae, was not detected at later time points. Analyses of CRISPR sequences (a viral and foreign DNA immune system in bacteria and archaea), linked 18 viral contigs to hosts. CRISPR linkages increased through time and all bacterial and archaeal genomes recovered in the final time point had genes for CRISPR-mediated viral defense. The majority of CRISPR sequences linked phage genomes to several Halanaerobium strains, which are the dominant and persisting members of the community inferred to be responsible for carbon and sulfur cycling in these shales. Network analysis revealed that several viruses were present in the 82 and 328 day samples; this viral persistence is consistent with concomitant temporal stability in geochemistry and microbial community composition. Our findings suggest that after a disturbance (hydraulic fracturing) viral predation and host immunity is an important controller of microbial community structure, metabolism, and thus biogeochemical cycling in the deep subsurface.
Screening white spot syndrome virus (WSSV)-resistant molecular markers from Fenneropenaeus chinensis

NASA Astrophysics Data System (ADS)

Wu, Yingying; Meng, Xianhong; Kong, Jie; Luan, Sheng; Luo, Kun; Wang, Qingyin; Zheng, Yongyun

2017-02-01

White spot syndrome virus (WSSV)-resistant molecular markers were screened from the selectively bred new variety `Huanghai No. 2' of Fenneropenaeus chinensis using unlabeled-probe high-resolution melting (HRM) technique. After the artificial infection with WSSV, the first 96 dead shrimps and the last 96 surviving shrimps were collected, representing WSSV-susceptible and -resistant populations, respectively. The genotypes at well-developed 39 single nucleotide polymorphisms (SNPs) loci were obtained. As revealed in the Chi-square test, 3 SNPs, genotype A/A of contig C364-89AT, genotype A/A of C2635-527CA and genotype C/T of contig C12355-592CT, were positively correlated with disease-resistance traits. Other 2 SNPs, genotype G/G of contig C283-145AG and genotype C/C of contig C12355-592CT, were negatively correlated. Moreover, analysis with BlastX program for disease-resistant SNPs indicated that 3 contigs, Contig283, Contig364 and Contig12355, matched to the functional genes of effector caspase of Penaeus monodon, peptide transporter family 1-like protein, and 40S ribosomal protein S2 of Perca flavescens with high sequence similarity. The results will be helpful to provide theoretical and technical supports for molecular marker-assisted selective breeding of F. chinensis.
Using in situ hybridization and PFGE Southern hybridization to detect translocation breakpoints in a BOR/TRPS patient cell line

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gu, J.Z.; Sapru, M.; Smith, D.

Branchio-oto-renal syndrome (BOR) is an autosomal dominant disorder characterized by ear malformations, cervical fistulae, hearing loss and renal abnormalities. We have integrated the Genethon YAC contig maps with additional markers in the chromosome 8q region genetically linked by a unique patient cell line. This cell line is from a patient who has both the branchio-oto-renal syndrome and tricho-rhino-phalangeal syndrome (TRPS). High resolution cytogenetics demonstrated a direct insertion of materials from 8q13.3q21.13 to 8q24.11. TRPS has been previously linked to deletions involving 8q24.11-q24.13. The rearrangement in this patient suggests that TRPS results from loss of gene function due to insertion atmore » the 8q24.11 breakpoint and the possible location for the BOR gene is at either of the two breakpoints of 8q13.3 and 8q21.13. We have constructed cosmid contigs in 8q24.11. In situ hybridization with cosmids mapped to these locations as probes has helped to narrow down the breakpoints. Combinations of cosmids on either side or overlapping the 8q24.11 breakpoint show split signals on one chromosome 8q arm due to insertion of the materials from the proximal region. Cosmids mapped to the TRPS deletion region have been used to hybridize to pulsed field gel genomic blots of DNA from the patient cell line and detected rearranged genomic fragments. Both in situ hybridization and genomic PFGE Southern blot will be used to precisely locate the breakpoints.« less
Comparative Transcriptomic Approaches Exploring Contamination Stress Tolerance in Salix sp. Reveal the Importance for a Metaorganismal de Novo Assembly Approach for Nonmodel Plants1[OPEN

PubMed Central

Brereton, Nicholas J. B.; Marleau, Julie; Nissim, Werther Guidi; Labrecque, Michel; Joly, Simon; Pitre, Frederic E.

2016-01-01

Metatranscriptomic study of nonmodel organisms requires strategies that retain the highly resolved genetic information generated from model organisms while allowing for identification of the unexpected. A real-world biological application of phytoremediation, the field growth of 10 Salix cultivars on polluted soils, was used as an exemplar nonmodel and multifaceted crop response well-disposed to the study of gene expression. Sequence reads were assembled de novo to create 10 independent transcriptomes, a global transcriptome, and were mapped against the Salix purpurea 94006 reference genome. Annotation of assembled contigs was performed without a priori assumption of the originating organism. Global transcriptome construction from 3.03 billion paired-end reads revealed 606,880 unique contigs annotated from 1588 species, often common in all 10 cultivars. Comparisons between transcriptomic and metatranscriptomic methodologies provide clear evidence that nonnative RNA can mistakenly map to reference genomes, especially to conserved regions of common housekeeping genes, such as actin, α/β-tubulin, and elongation factor 1-α. In Salix, Rubisco activase transcripts were down-regulated in contaminated trees across all 10 cultivars, whereas thiamine thizole synthase and CP12, a Calvin Cycle master regulator, were uniformly up-regulated. De novo assembly approaches, with unconstrained annotation, can improve data quality; care should be taken when exploring such plant genetics to reduce de facto data exclusion by mapping to a single reference genome alone. Salix gene expression patterns strongly suggest cultivar-wide alteration of specific photosynthetic apparatus and protection of the antenna complexes from oxidation damage in contaminated trees, providing an insight into common stress tolerance strategies in a real-world phytoremediation system. PMID:27002060
Refined physical map of the human PAX2/HOX11/NFKB2 cancer gene region at 10q24 and relocalization of the HPV6AI1 viral integration site to 14q13.3-q21.1

PubMed Central

Gough, Sheryl M; McDonald, Margaret; Chen, Xiao-Ning; Korenberg, Julie R; Neri, Antonino; Kahn, Tomas; Eccles, Michael R; Morris, Christine M

2003-01-01

Background Chromosome band 10q24 is a gene-rich domain and host to a number of cancer, developmental, and neurological genes. Recurring translocations, deletions and mutations involving this chromosome band have been observed in different human cancers and other disease conditions, but the precise identification of breakpoint sites, and detailed characterization of the genetic basis and mechanisms which underlie many of these rearrangements has yet to be resolved. Towards this end it is vital to establish a definitive genetic map of this region, which to date has shown considerable volatility through time in published works of scientific journals, within different builds of the same international genomic database, and across the differently constructed databases. Results Using a combination of chromosome and interphase fluorescent in situ hybridization (FISH), BAC end-sequencing and genomic database analysis we present a physical map showing that the order and chromosomal orientation of selected genes within 10q24 is CEN-CYP2C9-PAX2-HOX11-NFKB2-TEL. Our analysis has resolved the orientation of an otherwise dynamically evolving assembly of larger contigs upstream of this region, and in so doing verifies the order and orientation of a further 9 cancer-related genes and GOT1. This study further shows that the previously reported human papillomavirus type 6a DNA integration site HPV6AI1 does not map to 10q24, but that it maps at the interface of chromosome bands 14q13.3-q21.1. Conclusions This revised map will allow more precise localization of chromosome rearrangements involving chromosome band 10q24, and will serve as a useful baseline to better understand the molecular aetiology of chromosomal instability in this region. In particular, the relocation of HPV6AI1 is important to report because this HPV6a integration site, originally isolated from a tonsillar carcinoma, was shown to be rearranged in other HPV6a-related malignancies, including 2 of 25 genital condylomas, and 2 of 7 head and neck tumors tested. Our finding shifts the focus of this genomic interest from 10q24 to the chromosome 14 site. PMID:12697057
New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

PubMed Central

2011-01-01

Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy. PMID:21767393
New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits.

PubMed

Saski, Christopher A; Li, Zhigang; Feltus, Frank A; Luo, Hong

2011-07-18

Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy.
P-Element Insertion Alleles of Essential Genes on the Third Chromosome of Drosophila Melanogaster: Correlation of Physical and Cytogenetic Maps in Chromosomal Region 86e-87f

PubMed Central

Deak, P.; Omar, M. M.; Saunders, RDC.; Pal, M.; Komonyi, O.; Szidonya, J.; Maroy, P.; Zhang, Y.; Ashburner, M.; Benos, P.; Savakis, C.; Siden-Kiamos, I.; Louis, C.; Bolshakov, V. N.; Kafatos, F. C.; Madueno, E.; Modolell, J.; Glover, D. M.

1997-01-01

We have established a collection of 2460 lethal or semi-lethal mutant lines using a procedure thought to insert single P elements into vital genes on the third chromosome of Drosophila melanogaster. More than 1200 randomly selected lines were examined by in situ hybridization and 90% found to contain single insertions at sites that mark 89% of all lettered subdivisions of the Bridges' map. A set of chromosomal deficiencies that collectively uncover ~25% of the euchromatin of chromosome 3 reveal lethal mutations in 468 lines corresponding to 145 complementation groups. We undertook a detailed analysis of the cytogenetic interval 86E-87F and identified 87 P-element-induced mutations falling into 38 complementation groups, 16 of which correspond to previously known genes. Twenty-one of these 38 complementation groups have at least one allele that has a P-element insertion at a position consistent with the cytogenetics of the locus. We have rescued P elements and flanking chromosomal sequences from the 86E-87F region in 35 lines with either lethal or genetically silent P insertions, and used these as probes to identify cosmids and P1 clones from the Drosophila genome projects. This has tied together the physical and genetic maps and has linked 44 previously identified cosmid contigs into seven ``supercontigs'' that span the interval. STS data for sequences flanking one side of the P-element insertions in 49 lines has identified insertions in the αγ element at 87C, two known transposable elements, and the open reading frames of seven putative single copy genes. These correspond to five known genes in this interval, and two genes identified by the homology of their predicted products to known proteins from other organisms. PMID:9409831
A 1463 Gene Cattle–Human Comparative Map With Anchor Points Defined by Human Genome Sequence Coordinates

PubMed Central

Everts-van der Wind, Annelie; Kata, Srinivas R.; Band, Mark R.; Rebeiz, Mark; Larkin, Denis M.; Everts, Robin E.; Green, Cheryl A.; Liu, Lei; Natarajan, Shreedhar; Goldammer, Tom; Lee, Jun Heon; McKay, Stephanie; Womack, James E.; Lewin, Harris A.

2004-01-01

A second-generation 5000 rad radiation hybrid (RH) map of the cattle genome was constructed primarily using cattle ESTs that were targeted to gaps in the existing cattle–human comparative map, as well as to sparsely populated map intervals. A total of 870 targeted markers were added, bringing the number of markers mapped on the RH5000 panel to 1913. Of these, 1463 have significant BLASTN hits (E < e–5) against the human genome sequence. A cattle–human comparative map was created using human genome sequence coordinates of the paired orthologs. One-hundred and ninety-five conserved segments (defined by two or more genes) were identified between the cattle and human genomes, of which 31 are newly discovered and 34 were extended singletons on the first-generation map. The new map represents an improvement of 20% genome-wide comparative coverage compared with the first-generation map. Analysis of gene content within human genome regions where there are gaps in the comparative map revealed gaps with both significantly greater and significantly lower gene content. The new, more detailed cattle–human comparative map provides an improved resource for the analysis of mammalian chromosome evolution, the identification of candidate genes for economically important traits, and for proper alignment of sequence contigs on cattle chromosomes. PMID:15231756
Characterisation of the Nevoid basal cell carcinoma (Gorlin`s) syndrome (NBCCS) gene region on chromosome 9q22-q31

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morris, D.J.; Digweed, M.; Sperling, K.

1994-09-01

Nevoid basal cell carcinoma syndrome (NBCCS) is an autosomal dominantly inherited malignancy-associated disease of unknown etiology. The gene has been mapped to chromosome 9q22-q31 by us and other groups, using linkage analysis and loss of heterozygosity studies. Subsequent linkage and haplotype analyses from 133 meioses in NBCCS families has refined the position of the gene between D9S12 and D9S287. Since the gene for Fanconi`s Anaemia type C (FAAC) has been assigned to the same 9q region, we have performed linkage analysis between FACC and NBCCCS in NBCCS families. No recombination has been observed between NBCCS and FACC and maximum lodmore » scores of 34.98 and 11.94 occur for both diseases at the markers D9S196/D9S197. Southern blot analysis using an FACC cDNA probe has revealed no detectable rearrangements in our NBCCS patients. We have established a YAC contig spanning the region from D9S12 to D9S176 and STS content mapping in 22 YACs has allowed the ordering of 12 loci in the region, including the xeroderma pigmentosum type A (XPAC) gene, as follows: D9S151/D9S12P1 - D9S12P2 - D9S197 - D9S196 - D9S280 - FACC - D9S287/XPAC - D9S180 - D9S6 - D9S176. Using the contig we have been able to eliminate the {alpha}1 type XV collagen gene and the markers D9S119 and D9S297 from the NBCCS candidate region. Twelve YACs have been used to screen a chromosome 9 cosmid library and more than 1000 cosmids from the region have been identified to be used for the construction of a cosmid contig. A selection of these cosmids will be used for the isolation of coding sequencing from the region.« less
Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roux, Simon; Emerson, Joanne B.; Eloe-Fadrosh, Emiley A.

BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, andmore » IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less
Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity

DOE PAGES

Roux, Simon; Emerson, Joanne B.; Eloe-Fadrosh, Emiley A.; ...

2017-09-21

BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, andmore » IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less
Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs

PubMed Central

Pavy, Nathalie; Parsons, Lee S; Paule, Charles; MacKay, John; Bousquet, Jean

2006-01-01

Background High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. Results A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (PSNP), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either PSNP ≥ 0.95 or ≥ 0.99. A total of 9,310 SNPs were detected by using PSNP ≥ 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. Conclusion We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies. PMID:16824208
Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond

PubMed Central

Mascher, Martin; Richmond, Todd A; Gerhardt, Daniel J; Himmelbach, Axel; Clissold, Leah; Sampath, Dharanya; Ayling, Sarah; Steuernagel, Burkhard; Pfeifer, Matthias; D'Ascenzo, Mark; Akhunov, Eduard D; Hedley, Pete E; Gonzales, Ana M; Morrell, Peter L; Kilian, Benjamin; Blattner, Frank R; Scholz, Uwe; Mayer, Klaus FX; Flavell, Andrew J; Muehlbauer, Gary J; Waugh, Robbie; Jeddeloh, Jeffrey A; Stein, Nils

2013-01-01

Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes. PMID:23889683
Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana.

PubMed

Lin, X; Kaul, S; Rounsley, S; Shea, T P; Benito, M I; Town, C D; Fujii, C Y; Mason, T; Bowman, C L; Barnstead, M; Feldblyum, T V; Buell, C R; Ketchum, K A; Lee, J; Ronning, C M; Koo, H L; Moffat, K S; Cronin, L A; Shen, M; Pai, G; Van Aken, S; Umayam, L; Tallon, L J; Gill, J E; Adams, M D; Carrera, A J; Creasy, T H; Goodman, H M; Somerville, C R; Copenhaver, G P; Preuss, D; Nierman, W C; White, O; Eisen, J A; Salzberg, S L; Fraser, C M; Venter, J C

1999-12-16

Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.
Semi-Automatic In Silico Gap Closure Enabled De Novo Assembly of Two Dehalobacter Genomes from Metagenomic Data

PubMed Central

Tang, Shuiquan; Gong, Yunchen; Edwards, Elizabeth A.

2012-01-01

Typically, the assembly and closure of a complete bacterial genome requires substantial additional effort spent in a wet lab for gap resolution and genome polishing. Assembly is further confounded by subspecies polymorphism when starting from metagenome sequence data. In this paper, we describe an in silico gap-resolution strategy that can substantially improve assembly. This strategy resolves assembly gaps in scaffolds using pre-assembled contigs, followed by verification with read mapping. It is capable of resolving assembly gaps caused by repetitive elements and subspecies polymorphisms. Using this strategy, we realized the de novo assembly of the first two Dehalobacter genomes from the metagenomes of two anaerobic mixed microbial cultures capable of reductive dechlorination of chlorinated ethanes and chloroform. Only four additional PCR reactions were required even though the initial assembly with Newbler v. 2.5 produced 101 contigs within 9 scaffolds belonging to two Dehalobacter strains. By applying this strategy to the re-assembly of a recently published genome of Bacteroides, we demonstrate its potential utility for other sequencing projects, both metagenomic and genomic. PMID:23284863
Fine mapping and identification of a candidate gene for the barley Un8 true loose smut resistance gene.

PubMed

Zang, Wen; Eckstein, Peter E; Colin, Mark; Voth, Doug; Himmelbach, Axel; Beier, Sebastian; Stein, Nils; Scoles, Graham J; Beattie, Aaron D

2015-07-01

The candidate gene for the barley Un8 true loose smut resistance gene encodes a deduced protein containing two tandem protein kinase domains. In North America, durable resistance against all known isolates of barley true loose smut, caused by the basidiomycete pathogen Ustilago nuda (Jens.) Rostr. (U. nuda), is under the control of the Un8 resistance gene. Previous genetic studies mapped Un8 to the long arm of chromosome 5 (1HL). Here, a population of 4625 lines segregating for Un8 was used to delimit the Un8 gene to a 0.108 cM interval on chromosome arm 1HL, and assign it to fingerprinted contig 546 of the barley physical map. The minimal tilling path was identified for the Un8 locus using two flanking markers and consisted of two overlapping bacterial artificial chromosomes. One gene located close to a marker co-segregating with Un8 showed high sequence identity to a disease resistance gene containing two kinase domains. Sequence of the candidate gene from the parents of the segregating population, and in an additional 19 barley lines representing a broader spectrum of diversity, showed there was no intron in alleles present in either resistant or susceptible lines, and fifteen amino acid variations unique to the deduced protein sequence in resistant lines differentiated it from the deduced protein sequences in susceptible lines. Some of these variations were present within putative functional domains which may cause a loss of function in the deduced protein sequences within susceptible lines.
YAC cloning Mus musculus telomeric DNA: physical, genetic, in situ and STS markers for the distal telomere of chromosome 10.

PubMed

Kipling, D; Wilson, H E; Thomson, E J; Cooke, H J

1995-06-01

Three Mus musculus DBA/2 YAC libraries were constructed using a half-YAC telomere cloning vector. This functional complementation approach yields libraries which include terminal restriction fragments of the mouse genome. Screening all three libraries led to the isolation of 32 independent clones which carry linear YACs containing the mouse terminal repeat sequence, (TTAGGG)n. These YACs provide a resource to isolate regions of the mouse genome close to chromosome termini and excluded from existing conventional YAC libraries. To demonstrate their utility, a hybridization probe was isolated from Mtel-1, the first (TTAGGG)n-containing YAC isolated. This probe detects a approximately 70 kb Kpnl fragment in the mouse genome which is sensitive to pretreatment with BAL31 exonuclease. A PCR-based genetic marker generated from the sequence of this probe maps 4.4 cM from the most distal anchor locus on chromosome 10 in the EUCIB interspecific backcross. STS primers for this locus, D10Hgu1, were used to isolate YAC 110F4 from a commercially available mouse YAC library. Fluorescence in situ hybridization demonstrates that YAC 110F4 hybridizes to the distal telomere of chromosome 10. Clones in this collection of telomere YACs therefore partially overlap clones in conventional YAC libraries, and thus the previously unavailable terminal regions of the mouse genome can now be linked with the developing mouse STS YAC contig. Genetic markers such as D10Hgu1 allow the ends of the mouse genetic map to be defined, thus closing the map.
The Peculiar Landscape of Repetitive Sequences in the Olive (Olea europaea L.) Genome

PubMed Central

Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

2014-01-01

Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome. PMID:24671744
The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.

PubMed

Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

2014-04-01

Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.

Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin

USDA-ARS?s Scientific Manuscript database

Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin. This study included 2 submissions with a total of 9.8 million bp of assembled contigs....
Metagenomic insights into the rumen microbial fibrolytic enzymes in Indian crossbred cattle fed finger millet straw.

PubMed

Jose, V Lyju; Appoothy, Thulasi; More, Ravi P; Arun, A Sha

2017-12-01

The rumen is a unique natural habitat, exhibiting an unparalleled genetic resource of fibrolytic enzymes of microbial origin that degrade plant polysaccharides. The objectives of this study were to identify the principal plant cell wall-degrading enzymes and the taxonomic profile of rumen microbial communities that are associated with it. The cattle rumen microflora and the carbohydrate-active enzymes were functionally classified through a whole metagenomic sequencing approach. Analysis of the assembled sequences by the Carbohydrate-active enzyme analysis Toolkit identified the candidate genes encoding fibrolytic enzymes belonging to different classes of glycoside hydrolases(11,010 contigs), glycosyltransferases (6366 contigs), carbohydrate esterases (4945 contigs), carbohydrate-binding modules (1975 contigs), polysaccharide lyases (480 contigs), and auxiliary activities (115 contigs). Phylogenetic analysis of CAZyme encoding contigs revealed that a significant proportion of CAZymes were contributed by bacteria belonging to genera Prevotella, Bacteroides, Fibrobacter, Clostridium, and Ruminococcus. The results indicated that the cattle rumen microbiome and the CAZymes are highly complex, structurally similar but compositionally distinct from other ruminants. The unique characteristics of rumen microbiota and the enzymes produced by resident microbes provide opportunities to improve the feed conversion efficiency in ruminants and serve as a reservoir of industrially important enzymes for cellulosic biofuel production.
EULER-PCR: finishing experiments for repeat resolution.

PubMed

Mulyukov, Zufar; Pevzner, Pavel A

2002-01-01

Genomic sequencing typically generates a large collection of unordered contigs or scaffolds. Contig ordering (also known as gap closure) is a non-trivial algorithmic and experimental problem since even relatively simple-to-assemble bacterial genomes typically result in large set of contigs. Neighboring contigs maybe separated either by gaps in read coverage or by repeats. In the later case we say that the contigs are separated by pseudogaps, and we emphasize the important difference between gap closure and pseudogap closure. The existing gap closure approaches do not distinguish between gaps and pseudogaps and treat them in the same way. We describe a new fast strategy for closing pseudogaps (repeat resolution). Since in highly repetitive genomes, the number of pseudogaps may exceed the number of gaps by an order of magnitude, this approach provides a significant advantage over the existing gap closure methods.
Transcriptome changes associated with Tomato spotted wilt virus infection in various life stages of its thrips vector, Frankliniella fusca (Hinds).

PubMed

Shrestha, Anita; Champagne, Donald E; Culbreath, Albert K; Rotenberg, Dorith; Whitfield, Anna E; Srinivasan, Rajagopalbabu

2017-08-01

Persistent propagative viruses maintain intricate interactions with their arthropod vectors. In this study, we investigated the transcriptome-level responses associated with a persistent propagative phytovirus infection in various life stages of its vector using an Illumina HiSeq sequencing platform. The pathosystem components included a Tospovirus, Tomato spotted wilt virus (TSWV), its insect vector, Frankliniella fusca (Hinds), and a plant host, Arachis hypogaea (L.). We assembled (de novo) reads from three developmental stage groups of virus-exposed and non-virus-exposed F. fusca into one transcriptome consisting of 72 366 contigs and identified 1161 differentially expressed (DE) contigs. The number of DE contigs was greatest in adults (female) (562) when compared with larvae (first and second instars) (395) and pupae (pre- and pupae) (204). Upregulated contigs in virus-exposed thrips had blastx annotations associated with intracellular transport and virus replication. Upregulated contigs were also assigned blastx annotations associated with immune responses, including apoptosis and phagocytosis. In virus-exposed larvae, Blast2GO analysis identified functional groups, such as multicellular development with downregulated contigs, while reproduction, embryo development and growth were identified with upregulated contigs in virus-exposed adults. This study provides insights into differences in transcriptome-level responses modulated by TSWV in various life stages of an important vector, F. fusca.
Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction.

PubMed

Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel

2010-01-15

With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.
Characterization of Withania somnifera Leaf Transcriptome and Expression Analysis of Pathogenesis – Related Genes during Salicylic Acid Signaling

PubMed Central

Ghosh Dasgupta, Modhumita; George, Blessan Santhosh; Bhatia, Anil; Sidhu, Om Prakash

2014-01-01

Withania somnifera (L.) Dunal is a valued medicinal plant with pharmaceutical applications. The present study was undertaken to analyze the salicylic acid induced leaf transcriptome of W. somnifera. A total of 45.6 million reads were generated and the de novo assembly yielded 73,523 transcript contig with average transcript contig length of 1620 bp. A total of 71,062 transcripts were annotated and 53,424 of them were assigned GO terms. Mapping of transcript contigs to biological pathways revealed presence of 182 pathways. Seventeen genes representing 12 pathogenesis-related (PR) families were mined from the transcriptome data and their pattern of expression post 17 and 36 hours of salicylic acid treatment was documented. The analysis revealed significant up-regulation of all families of PR genes by 36 hours post treatment except WsPR10. The relative fold expression of transcripts ranged from 1 fold to 6,532 fold. The two families of peroxidases including the lignin-forming anionic peroxidase (WsL-PRX) and suberization-associated anionic peroxidase (WsS-PRX) recorded maximum expression of 377 fold and 6532 fold respectively, while the expression of WsPR10 was down-regulated by 14 fold. Additionally, the most stable reference gene for normalization of qRT-PCR data was also identified. The effect of SA on the accumulation of major secondary metabolites of W. somnifera including withanoside V, withaferin A and withanolide A was also analyzed and an increase in content of all the three metabolites were detected. This is the first report on expression patterns of PR genes during salicylic acid signaling in W. somnifera. PMID:24739900
Shotgun mitogenomics across body size classes in a local assemblage of tropical Diptera: Phylogeny, species diversity and mitochondrial abundance spectrum.

PubMed

Choo, Le Qin; Crampton-Platt, Alex; Vogler, Alfried P

2017-10-01

Mitochondrial genomes can be assembled readily from shotgun-sequenced DNA mixtures of mass-trapped arthropods ("mitochondrial metagenomics"), speeding up the taxonomic characterization. Bulk sequencing was conducted on some 800 individuals of Diptera obtained by canopy fogging of a single tree in Borneo dominated by small (<1.5 mm) individuals. Specimens were split into five body size classes for DNA extraction, to equalize read numbers across specimens and to study how body size, a key ecological trait, interacts with species and phylogenetic diversity. Genome assembly produced 304 orthologous mitochondrial contigs presumed to each represent a different species. The small-bodied fraction was the by far most species-rich (187 contigs). Identification of contigs was through phylogenetic analysis together with 56 reference mitogenomes, which placed most of the Bornean community into seven clades of small-bodied species, indicating phylogenetic conservation of body size. Mapping of shotgun reads against the mitogenomes showed wide ranges of read abundances within each size class. Ranked read abundance plots were largely log-linear, indicating a uniformly filled abundance spectrum, especially for small-bodied species. Small-bodied species differed greatly from other size classes in neutral metacommunity parameters, exhibiting greater levels of immigration, besides greater total community size. We suggest that the established uses of mitochondrial metagenomics for analysis of species and phylogenetic diversity can be extended to parameterize recent theories of community ecology and biodiversity, and by focusing on the number mitochondria, rather than individuals, a new theoretical framework for analysis of mitochondrial abundance spectra can be developed that incorporates metabolic activity approximated by the count of mitochondria. © 2017 John Wiley & Sons Ltd.
Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

DOE PAGES

Howe, Adina; Chain, Patrick S. G.

2015-07-09

Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, theymore » present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.« less
Minimap2: pairwise alignment for nucleotide sequences.

PubMed

Li, Heng

2018-05-10

Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.
Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Howe, Adina; Chain, Patrick S. G.

Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, theymore » present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.« less
Characterization of gonadal transcriptomes from the turbot (Scophthalmus maximus).

PubMed

Hu, Yulong; Huang, Meng; Wang, Weiji; Guan, Jiantao; Kong, Jie

2016-01-01

The mechanisms underlying sexual reproduction and sex ratio determination remains unclear in turbot, a flatfish of great commercial value. And there is limited information in the turbot database regarding genes related to the reproductive system. Here, we conducted high-throughput transcriptome profiling of turbot gonad tissues to better understand their reproductive functions and to supply essential gene sequence information for marker-assisted selection programs in the turbot industry. In this study, two gonad libraries representing sex differences in Scophthalmus maximus yielded 453 818 high-quality reads that were assembled into 24 611 contigs and 33 713 singletons by using 454 pyrosequencing, 13 936 contigs and singletons (CS) of which were annotated using BLASTx. GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analyses revealed that various biological functions and processes were associated with many of the annotated CS. Expression analyses showed that 510 genes were differentially expressed in males versus females; 80% of these genes were annotated. In addition, 6484 and 6036 single nucleotide polymorphisms (SNPs) were identified in male and female libraries, respectively. This transcriptome resource will serve as the foundation for cDNA or SNP microarray construction, gene expression characterization, and sex-specific linkage mapping in turbot.
Construction of random sheared fosmid library from Chinese cabbage and its use for Brassica rapa genome sequencing project.

PubMed

Park, Tae-Ho; Park, Beom-Seok; Kim, Jin-A; Hong, Joon Ki; Jin, Mina; Seol, Young-Joo; Mun, Jeong-Hwan

2011-01-01

As a part of the Multinational Genome Sequencing Project of Brassica rapa, linkage group R9 and R3 were sequenced using a bacterial artificial chromosome (BAC) by BAC strategy. The current physical contigs are expected to cover approximately 90% euchromatins of both chromosomes. As the project progresses, BAC selection for sequence extension becomes more limited because BAC libraries are restriction enzyme-specific. To support the project, a random sheared fosmid library was constructed. The library consists of 97536 clones with average insert size of approximately 40 kb corresponding to seven genome equivalents, assuming a Chinese cabbage genome size of 550 Mb. The library was screened with primers designed at the end of sequences of nine points of scaffold gaps where BAC clones cannot be selected to extend the physical contigs. The selected positive clones were end-sequenced to check the overlap between the fosmid clones and the adjacent BAC clones. Nine fosmid clones were selected and fully sequenced. The sequences revealed two completed gap filling and seven sequence extensions, which can be used for further selection of BAC clones confirming that the fosmid library will facilitate the sequence completion of B. rapa. Copyright © 2011. Published by Elsevier Ltd.
Recombinational and physical mapping of the locus for primary open-angle glaucoma (GLC1A) on chromosome 1q23-q25

DOE Office of Scientific and Technical Information (OSTI.GOV)

Belmouden, A.; Adam, M.F.; De Dinechin, S.D.

1997-02-01

Primary open-angle glaucoma (POAG) is a leading cause of irreversible blindness in industrialized countries. A locus for juvenile-onset POAG, GLC1A, has been mapped to 1q21-q31 in a 9-cM interval. With recombinant haplotypes, we have now reduced the GLC1A interval to a maximum of 3 cM, between the D1S452/NGA1/D1S210 and NGA5 loci. These loci are 2.8 Mb apart on a 4.7-Mb contig that we have completed between the D1S2851 and D1S218 loci and that includes 96 YAC clones and 48 STSs. The new GLC1A interval itself is now covered by 25 YACs, 30 STSs, and 16 restriction enzyme site landmarks. Themore » lack of a NotI site suggests that the region has few CpG islands and a low gene content. This is compatible with its predominant cytogenetic location on the 1q24 G-band. Finally, we have excluded important candidate genes, including genes coding for three ATPases (AMB1, ATP2B4, ATPlA2), an ion channel (VDAC4), antithrombine III (AT3), and prostaglandin synthase (PTGS2). Our results provide a basis to identify the GLC1A gene. 59 refs., 3 figs., 3 tabs.« less
An evaluation of genotyping by sequencing (GBS) to map the Breviaristatum-e (ari-e) locus in cultivated barley.

PubMed

Liu, Hui; Bayer, Micha; Druka, Arnis; Russell, Joanne R; Hackett, Christine A; Poland, Jesse; Ramsay, Luke; Hedley, Pete E; Waugh, Robbie

2014-02-06

We explored the use of genotyping by sequencing (GBS) on a recombinant inbred line population (GPMx) derived from a cross between the two-rowed barley cultivar 'Golden Promise' (ari-e.GP/Vrs1) and the six-rowed cultivar 'Morex' (Ari-e/vrs1) to map plant height. We identified three Quantitative Trait Loci (QTL), the first in a region encompassing the spike architecture gene Vrs1 on chromosome 2H, the second in an uncharacterised centromeric region on chromosome 3H, and the third in a region of chromosome 5H coinciding with the previously described dwarfing gene Breviaristatum-e (Ari-e). Barley cultivars in North-western Europe largely contain either of two dwarfing genes; Denso on chromosome 3H, a presumed ortholog of the rice green revolution gene OsSd1, or Breviaristatum-e (ari-e) on chromosome 5H. A recessive mutant allele of the latter gene, ari-e.GP, was introduced into cultivation via the cv. 'Golden Promise' that was a favourite of the Scottish malt whisky industry for many years and is still used in agriculture today. Using GBS mapping data and phenotypic measurements we show that ari-e.GP maps to a small genetic interval on chromosome 5H and that alternative alleles at a region encompassing Vrs1 on 2H along with a region on chromosome 3H also influence plant height. The location of Ari-e is supported by analysis of near-isogenic lines containing different ari-e alleles. We explored use of the GBS to populate the region with sequence contigs from the recently released physically and genetically integrated barley genome sequence assembly as a step towards Ari-e gene identification. GBS was an effective and relatively low-cost approach to rapidly construct a genetic map of the GPMx population that was suitable for genetic analysis of row type and height traits, allowing us to precisely position ari-e.GP on chromosome 5H. Mapping resolution was lower than we anticipated. We found the GBS data more complex to analyse than other data types but it did directly provide linked SNP markers for subsequent higher resolution genetic analysis.
SNP ID-info: SNP ID searching and visualization platform.

PubMed

Yang, Cheng-Hong; Chuang, Li-Yeh; Cheng, Yu-Huei; Wen, Cheng-Hao; Chang, Phei-Lang; Chang, Hsueh-Wei

2008-09-01

Many association studies provide the relationship between single nucleotide polymorphisms (SNPs), diseases and cancers, without giving a SNP ID, however. Here, we developed the SNP ID-info freeware to provide the SNP IDs within inputting genetic and physical information of genomes. The program provides an "SNP-ePCR" function to generate the full-sequence using primers and template inputs. In "SNPosition," sequence from SNP-ePCR or direct input is fed to match the SNP IDs from SNP fasta-sequence. In "SNP search" and "SNP fasta" function, information of SNPs within the cytogenetic band, contig position, and keyword input are acceptable. Finally, the SNP ID neighboring environment for inputs is completely visualized in the order of contig position and marked with SNP and flanking hits. The SNP identification problems inherent in NCBI SNP BLAST are also avoided. In conclusion, the SNP ID-info provides a visualized SNP ID environment for multiple inputs and assists systematic SNP association studies. The server and user manual are available at http://bio.kuas.edu.tw/snpid-info.
Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya

ABSTRACT Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600more » reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “CandidatusPseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundanceAcidobacteriawere highly transcriptionally active, whereas bins corresponding to high-relative-abundanceVerrucomicrobiawere not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCESoil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: Anauthor video summaryof this article is available.« less
Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

PubMed Central

White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya; Zucker, Jeremy D.; Brislawn, Colin J.; Nicora, Carrie D.; Fansler, Sarah J.; Glaesemann, Kurt R.; Glass, Kevin

2016-01-01

ABSTRACT Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “Candidatus Pseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundance Acidobacteria were highly transcriptionally active, whereas bins corresponding to high-relative-abundance Verrucomicrobia were not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCE Soil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: An author video summary of this article is available. PMID:27822530
Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis

PubMed Central

Aslam, Luqman; Beal, Kathryn; Ann Blomberg, Le; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian (Jake); Van Tassell, Curtis P.; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.

2010-01-01

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest. PMID:20838655
Characterization of the reniform nematode genome by shotgun sequencing.

PubMed

Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Cseke, Sarah B; Buyyarapu, Ramesh; Mc Ewan, Robert; Gu, Yong Q; Lawrence, Kathy; Senwo, Zachary; Sripathi, Padmini; George, Pheba; Sharma, Govind C

2014-04-01

The reniform nematode (RN), a major agricultural pest particularly on cotton in the United States, is among the major plant-parasitic nematodes for which limited genomic information exists. In this study, over 380 Mb of sequence data were generated from pooled DNA of four adult female RNs and assembled into 67,317 contigs, including 25,904 (38.5%) predicted coding contigs and 41,413 (61.5%) noncoding contigs. Most of the characterized repeats were of low complexity (88.9%), and 0.9% of the contigs matched with 53.2% of GenBank ESTs. The most frequent Gene Ontology (GO) terms for molecular function and biological process were protein binding (32%) and embryonic development (20%). Further analysis showed that 741 (1.1%), 94 (0.1%), and 169 (0.25%) RN genomic contigs matched with 1328 (13.9%), 1480 (5.4%), and 1330 (7.4%) supercontigs of Meloidogyne incognita, Brugia malayi, and Pristionchus pacificus, respectively. Chromosome 5 of Caenorhabditis elegans had the highest number of hits to the RN contigs. Seven putative detoxification genes and three carbohydrate-active enzymes (CAZymes) involved in cell wall degradation were studied in more detail. Additionally, kinases, G protein-coupled receptors, and neuropeptides functioning in physiological, developmental, and regulatory processes were identified in the RN genome.
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.

PubMed

Herath, Damayanthi; Tang, Sen-Lin; Tandon, Kshitij; Ackland, David; Halgamuge, Saman Kumara

2017-12-28

In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.

PAVE: program for assembling and viewing ESTs.

PubMed

Soderlund, Carol; Johnson, Eric; Bomhoff, Matthew; Descour, Anne

2009-08-26

New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs. The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs. The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.
Identification of genes from the Treacher Collins candidate region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dixon, M.; Dixon, J.; Edwards, S.

Treacher Collins syndrome (TCOF1) is an autosomal dominant disorder of craniofacial development. The TCOF1 locus has previously been mapped to chromosome 5q32-33. The candidate gene region has been defined as being between two flanking markers, ribosomal protein S14 (RPS14) and Annexin 6 (ANX6), by analyzing recombination events in affected individuals. It is estimated that the distance between these flanking markers is 500 kb by three separate analysis methods: (1) radiation hybrid mapping; (2) genetic linkage; and (3) YAC contig analysis. A cosmid contig which spans the candidate gene region for TCOF1 has been constructed by screening the Los Alamos Nationalmore » Laboratory flow-sorted chromosome 5 cosmid library. Cosmids were obtained by using a combination of probes generated from YAC end clones, Alu-PCR fragments from YACs, and asymmetric PCR fragments from both T7 and T3 cosmid ends. Exon amplifications, the selection of genomic coding sequences based upon the presence of functional splice acceptor and donor sites, was used to identify potential exon sequences. Sequences found to be conserved between species were then used to screen cDNA libraries in order to identify candidate genes. To date, four different cDNAs have been isolated from this region and are being analyzed as potential candidate genes for TCOF1. These include the genes encoding plasma glutathione peroxidase (GPX3), heparin sulfate sulfotransferase (HSST), a gene with homology to the ETS family of proteins and one which shows no homology to any known genes. Work is also in progress to identify and characterize additional cDNAs from the candidate gene region.« less
A YAC contig spanning the nevoid basal cell carcinoma syndrome, Fanconi anaemia group C, and xeroderma pigmentosum group A loci on chromosome 9q

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morris, D.J.; Reis, A.

1994-09-01

Nevoid basal cell carcinoma syndrome (NBCCS, Gorlin syndrome) is an autosomal dominant disorder, characterized primarily by multiple basal cell carcinomas, epithelium-lined jaw cysts, and palmar and plantar pits, as well as various other features. Loss of heterozygosity studies and linkage analysis have mapped the NBCCS gene to chromosome 9q and suggested that it is a tumor suppressor. The apparent sensitivity of NBCCS patients to UV and X-irradiation raises the possibility of hypersensitivity to DNA-damaging reagents or defective DNA repair being etiological in the disorder. The recent mapping of the Fanconi anaemia group C (FACC) and xeroderma pigmentosum complementing group Amore » (XPAC) genes to the same region on 9q has led us to begin the molecular dissection of the 9q22-q31 region. PCR analysis of the presence or absence of 10 microsatellite markers and exons 3 and 4 of the XPAC and FACC genes, respectively, allowed us to order 12 YACs into an overlapping contig and to order the markers as follows: D9S151/D9S12P1-D9S12P2-D9S197-D9S196-D9S280-FACC-D9S287/XPAC-D9S180-D9S6-D9S176. Sizing of the YACs has provided an initial estimate of the size of the NBCCS candidate region between D9S12 and D9S180 to be less than 1.65 Mb. 45 refs., 1 fig., 1 tab.« less
Sequencing, Annotation and Analysis of the Syrian Hamster (Mesocricetus auratus) Transcriptome

PubMed Central

Tchitchek, Nicolas; Safronetz, David; Rasmussen, Angela L.; Martens, Craig; Virtaneva, Kimmo; Porcella, Stephen F.; Feldmann, Heinz

2014-01-01

Background The Syrian hamster (golden hamster, Mesocricetus auratus) is gaining importance as a new experimental animal model for multiple pathogens, including emerging zoonotic diseases such as Ebola. Nevertheless there are currently no publicly available transcriptome reference sequences or genome for this species. Results A cDNA library derived from mRNA and snRNA isolated and pooled from the brains, lungs, spleens, kidneys, livers, and hearts of three adult female Syrian hamsters was sequenced. Sequence reads were assembled into 62,482 contigs and 111,796 reads remained unassembled (singletons). This combined contig/singleton dataset, designated as the Syrian hamster transcriptome, represents a total of 60,117,204 nucleotides. Our Mesocricetus auratus Syrian hamster transcriptome mapped to 11,648 mouse transcripts representing 9,562 distinct genes, and mapped to a similar number of transcripts and genes in the rat. We identified 214 quasi-complete transcripts based on mouse annotations. Canonical pathways involved in a broad spectrum of fundamental biological processes were significantly represented in the library. The Syrian hamster transcriptome was aligned to the current release of the Chinese hamster ovary (CHO) cell transcriptome and genome to improve the genomic annotation of this species. Finally, our Syrian hamster transcriptome was aligned against 14 other rodents, primate and laurasiatheria species to gain insights about the genetic relatedness and placement of this species. Conclusions This Syrian hamster transcriptome dataset significantly improves our knowledge of the Syrian hamster's transcriptome, especially towards its future use in infectious disease research. Moreover, this library is an important resource for the wider scientific community to help improve genome annotation of the Syrian hamster and other closely related species. Furthermore, these data provide the basis for development of expression microarrays that can be used in functional genomics studies. PMID:25398096
A comprehensive resource of drought- and salinity- responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.)

PubMed Central

2009-01-01

Background Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers. Results A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (≤1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666
Transcriptome Analysis of Sarracenia, an Insectivorous Plant

PubMed Central

Srivastava, Anuj; Rogers, Willie L.; Breton, Catherine M.; Cai, Liming; Malmberg, Russell L.

2011-01-01

Sarracenia species (pitcher plants) are carnivorous plants which obtain a portion of their nutrients from insects captured in the pitchers. To investigate these plants, we sequenced the transcriptome of two species, Sarracenia psittacina and Sarracenia purpurea, using Roche 454 pyrosequencing technology. We obtained 46 275 and 36 681 contigs by de novo assembly methods for S. psittacina and S. purpurea, respectively, and further identified 16 163 orthologous contigs between them. Estimation of synonymous substitution rates between orthologous and paralogous contigs indicates the events of genome duplication and speciation within the Sarracenia genus both occurred ∼2 million years ago. The ratios of synonymous and non-synonymous substitution rates indicated that 491 contigs have been under positive selection (Ka/Ks > 1). Significant proportions of these contigs were involved in functions related to binding activity. We also found that the greatest sequence similarity for both of these species was to Vitis vinifera, which is most consistent with a non-current classification of the order Ericales as an asterid. This study has provided new insights into pitcher plants and will contribute greatly to future research on this genus and its distinctive ecological adaptations. PMID:21676972
Transcriptome analysis of sarracenia, an insectivorous plant.

PubMed

Srivastava, Anuj; Rogers, Willie L; Breton, Catherine M; Cai, Liming; Malmberg, Russell L

2011-08-01

Sarracenia species (pitcher plants) are carnivorous plants which obtain a portion of their nutrients from insects captured in the pitchers. To investigate these plants, we sequenced the transcriptome of two species, Sarracenia psittacina and Sarracenia purpurea, using Roche 454 pyrosequencing technology. We obtained 46 275 and 36 681 contigs by de novo assembly methods for S. psittacina and S. purpurea, respectively, and further identified 16 163 orthologous contigs between them. Estimation of synonymous substitution rates between orthologous and paralogous contigs indicates the events of genome duplication and speciation within the Sarracenia genus both occurred ∼2 million years ago. The ratios of synonymous and non-synonymous substitution rates indicated that 491 contigs have been under positive selection (K(a)/K(s) > 1). Significant proportions of these contigs were involved in functions related to binding activity. We also found that the greatest sequence similarity for both of these species was to Vitis vinifera, which is most consistent with a non-current classification of the order Ericales as an asterid. This study has provided new insights into pitcher plants and will contribute greatly to future research on this genus and its distinctive ecological adaptations.
Comprehensive Transcriptome Study to Develop Molecular Resources of the Copepod Calanus sinicus for Their Potential Ecological Applications

PubMed Central

Yang, Qing; Sun, Fanyue; Yang, Zhi; Li, Hongjun

2014-01-01

Calanus sinicus Brodsky (Copepoda, Crustacea) is a dominant zooplanktonic species widely distributed in the margin seas of the Northwest Pacific Ocean. In this study, we utilized an RNA-Seq-based approach to develop molecular resources for C. sinicus. Adult samples were sequenced using the Illumina HiSeq 2000 platform. The sequencing data generated 69,751 contigs from 58.9 million filtered reads. The assembled contigs had an average length of 928.8 bp. Gene annotation allowed the identification of 43,417 unigene hits against the NCBI database. Gene ontology (GO) and KEGG pathway mapping analysis revealed various functional genes related to diverse biological functions and processes. Transcripts potentially involved in stress response and lipid metabolism were identified among these genes. Furthermore, 4,871 microsatellites and 110,137 single nucleotide polymorphisms (SNPs) were identified in the C. sinicus transcriptome sequences. SNP validation by the melting temperature (T m)-shift method suggested that 16 primer pairs amplified target products and showed biallelic polymorphism among 30 individuals. The present work demonstrates the power of Illumina-based RNA-Seq for the rapid development of molecular resources in nonmodel species. The validated SNP set from our study is currently being utilized in an ongoing ecological analysis to support a future study of C. sinicus population genetics. PMID:24982883
A Transcriptome Derived Female-Specific Marker from the Invasive Western Mosquitofish (Gambusia affinis)

PubMed Central

Lamatsch, Dunja K.; Adolfsson, Sofia; Senior, Alistair M.; Christiansen, Guntram; Pichler, Maria; Ozaki, Yuichi; Smeds, Linnea; Schartl, Manfred; Nakagawa, Shinichi

2015-01-01

Sex-specific markers are a prerequisite for understanding reproductive biology, genetic factors involved in sex differences, mechanisms of sex determination, and ultimately the evolution of sex chromosomes. The Western mosquitofish, Gambusia affinis, may be considered a model species for sex-chromosome evolution, as it displays female heterogamety (ZW/ZZ), and is also ecologically interesting as a worldwide invasive species. Here, de novo RNA-sequencing on the gonads of sexually mature G. affinis was used to identify contigs that were highly transcribed in females but not in males (i.e., transcripts with ovary-specific expression). Subsequently, 129 primer pairs spanning 79 contigs were tested by PCR to identify sex-specific transcripts. Of those primer pairs, one female-specific DNA marker was identified, Sanger sequenced and subsequently validated in 115 fish. Sequence analyses revealed a high similarity between the identified sex-specific marker and the 3´ UTR of the aminomethyl transferase (amt) gene of the closely related platyfish (Xiphophorus maculatus). This is the first time that RNA-seq has been used to successfully characterize a sex-specific marker in a fish species in the absence of a genome map. Additionally, the identified sex-specific marker represents one of only a handful of such markers in fishes. PMID:25707007
Fine mapping suggests that the goat Polled Intersex Syndrome and the human Blepharophimosis Ptosis Epicanthus Syndrome map to a 100-kb homologous region.

PubMed

Schibler, L; Cribiu, E P; Oustry-Vaiman, A; Furet, J P; Vaiman, D

2000-03-01

To clone the goat Polled Intersex Syndrome (PIS) gene(s), a chromosome walk was performed from six entry points at 1q43. This enabled 91 BACs to be recovered from a recently constructed goat BAC library. Six BAC contigs of goat chromosome 1q43 (ICC1-ICC6) were thus constructed covering altogether 4.5 Mb. A total of 37 microsatellite sequences were isolated from this 4.5-Mb region (16 in this study), of which 33 were genotyped and mapped. ICC3 (1500 kb) was shown by genetic analysis to encompass the PIS locus in a approximately 400-kb interval without recombinants detected in the resource families (293 informative meioses). A strong linkage disequilibrium was detected among unrelated animals with the two central markers of the region, suggesting a probable location for PIS in approximately 100 kb. High-resolution comparative mapping with human data shows that this DNA segment is the homolog of the human region associated with Blepharophimosis Ptosis Epicanthus inversus Syndrome (BPES) gene located in 3q23. This finding suggests that homologous gene(s) could be responsible for the pathologies observed in humans and goats.
Shotgun Protein Sequencing with Meta-contig Assembly*

PubMed Central

Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

2012-01-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278
Shotgun protein sequencing with meta-contig assembly.

PubMed

Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

2012-10-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Genetic mapping and legume synteny of aphid resistance in African cowpea (Vigna unguiculata L. Walp.) grown in California.

PubMed

Huynh, Bao-Lam; Ehlers, Jeffrey D; Ndeve, Arsenio; Wanamaker, Steve; Lucas, Mitchell R; Close, Timothy J; Roberts, Philip A

The cowpea aphid Aphis craccivora Koch (CPA) is a destructive insect pest of cowpea, a staple legume crop in Sub-Saharan Africa and other semiarid warm tropics and subtropics. In California, CPA causes damage on all local cultivars from early vegetative to pod development growth stages. Sources of CPA resistance are available in African cowpea germplasm. However, their utilization in breeding is limited by the lack of information on inheritance, genomic location and marker linkage associations of the resistance determinants. In the research reported here, a recombinant inbred line (RIL) population derived from a cross between a susceptible California blackeye cultivar (CB27) and a resistant African breeding line (IT97K-556-6) was genotyped with 1,536 SNP markers. The RILs and parents were phenotyped for CPA resistance using field-based screenings during two main crop seasons in a 'hotspot' location for this pest within the primary growing region of the Central Valley of California. One minor and one major quantitative trait locus (QTL) were consistently mapped on linkage groups 1 and 7, respectively, both with favorable alleles contributed from IT97K-556-6. The major QTL appeared dominant based on a validation test in a related F2 population. SNP markers flanking each QTL were positioned in physical contigs carrying genes involved in plant defense based on synteny with related legumes. These markers could be used to introgress resistance alleles from IT97K-556-6 into susceptible local blackeye varieties by backcrossing.
A New Chicken Genome Assembly Provides Insight into Avian Genome Structure.

PubMed

Warren, Wesley C; Hillier, LaDeana W; Tomlinson, Chad; Minx, Patrick; Kremitzki, Milinn; Graves, Tina; Markovic, Chris; Bouk, Nathan; Pruitt, Kim D; Thibaud-Nissen, Francoise; Schneider, Valerie; Mansour, Tamer A; Brown, C Titus; Zimin, Aleksey; Hawken, Rachel; Abrahamsen, Mitch; Pyrkosz, Alexis B; Morisson, Mireille; Fillon, Valerie; Vignal, Alain; Chow, William; Howe, Kerstin; Fulton, Janet E; Miller, Marcia M; Lovell, Peter; Mello, Claudio V; Wirthlin, Morgan; Mason, Andrew S; Kuo, Richard; Burt, David W; Dodgson, Jerry B; Cheng, Hans H

2017-01-05

The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts. Copyright © 2017 Warren et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aplin, H.M.; Hirst, K.L.; Crosby, A.H.

Dentinogenesis imperfecta type II (DGI1) is an autosomal dominant disorder of dentin formation, which has been mapped to human chromosome 4q12-q21. The region most likely to contain the DGI1 locus is a 3.2-cM region surrounding the osteopontin (SPP1) locus. Recently, a novel dentin-specific acidic phosphoprotein (dmp1) has been cloned in the rat and mapped to mouse chromosome 5q21. In the current investigation, we have isolated a cosmid containing the human DMP1 gene. The isolation of a short tandem repeat polymorphism at this locus has allowed us to map the DMP1 locus to human chromosome 4q21 and demonstrate that it ismore » tightly linked to DGI1 in two families (Z{sub max} = 11.01, {theta} = 0.001). The creation of a yeast artificial chromosome contig around SPP1 has further allowed us to demonstrate that DMP1 is located within 150 kb of the bone sialoprotein and 490 kb of the SPP1 loci, respectively. DMP1 is therefore a strong candidate for the DGI1 locus. 12 refs., 2 figs., 1 tab.« less
U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs.

PubMed

Castro, Christina J; Ng, Terry Fei Fan

2017-11-01

Advances in next-generation sequencing technologies enable routine genome sequencing, generating millions of short reads. A crucial step for full genome analysis is the de novo assembly, and currently, performance of different assembly methods is measured by a metric called N 50 . However, the N 50 value can produce skewed, inaccurate results when complex data are analyzed, especially for viral and microbial datasets. To provide a better assessment of assembly output, we developed a new metric called U 50 . The U 50 identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the N 50 metric. Specifically, the U 50 program removes overlapping sequence of multiple contigs by utilizing a mask array, so the performance of the assembly is only measured by unique contigs. We compared simulated and real datasets by using U 50 and N 50 , and our results demonstrated that U 50 has the following advantages over N 50 : (1) reducing erroneously large N 50 values due to a poor assembly, (2) eliminating overinflated N 50 values caused by large measurements from overlapping contigs, (3) eliminating diminished N 50 values caused by an abundance of small contigs, and (4) allowing comparisons across different platforms or samples based on the new percentage-based metric UG 50 %. The use of the U 50 metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. In addition, most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented N 50 value-this is corrected by U 50 . Also, the UG 50 % can be used to compare assembly results from different samples or studies, the cross-comparisons of which cannot be performed with N 50 .
OSLay: optimal syntenic layout of unfinished assemblies.

PubMed

Richter, Daniel C; Schuster, Stephan C; Huson, Daniel H

2007-07-01

The whole genome shotgun approach to genome sequencing results in a collection of contigs that must be ordered and oriented to facilitate efficient gap closure. We present a new tool OSLay that uses synteny between matching sequences in a target assembly and a reference assembly to layout the contigs (or scaffolds) in the target assembly. The underlying algorithm is based on maximum weight matching. The tool provides an interactive visualization of the computed layout and the result can be imported into the assembly editing tool Consed to support the design of primer pairs for gap closure. To enhance efficiency in the gap closure phase of a genome project it is crucial to know which contigs are adjacent in the target genome. Related genome sequences can be used to layout contigs in an assembly. OSLay is freely available from: http://www-ab.informatik.unituebingen.de/software/oslay.
ESTminer: a Web interface for mining EST contig and cluster databases.

PubMed

Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R

2005-03-01

ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.
Identification of expressed sequences in the coffee genome potentially associated with somatic embryogenesis.

PubMed

Silva, A T; Paiva, L V; Andrade, A C; Barduche, D

2013-05-21

Brazil possesses the most modern and productive coffee growing farms in the world, but technological development is desired to cope with the increasing world demand. One way to increase Brazilian coffee growing productivity is wide scale production of clones with superior genotypes, which can be obtained with in vitro propagation technique, or from tissue culture. These procedures can generate thousands of clones. However, the methodologies for in vitro cultivation are genotype-dependent, which leads to an almost empirical development of specific protocols for each species. Therefore, molecular markers linked to the biochemical events of somatic embryogenesis would greatly facilitate the development of such protocols. In this context, sequences potentially involved in embryogenesis processes in the coffee plant were identified in silico from libraries generated by the Brazilian Coffee Genome Project. Through these in silico analyses, we identified 15 EST-contigs related to the embryogenesis process. Among these, 5 EST-contigs (3605, 9850, 13686, 17240, and 17265) could readily be associated with plant embryogenesis. Sequence analysis of EST-contig 3605, 9850, and 17265 revealed similarity to a polygalacturonase, to a cysteine-proteinase, and to an allergenine, respectively. Results also show that EST-contig 17265 sequences presented similarity to an expansin. Finally, analysis of EST-contig 17240 revealed similarity to a protein of unknown function, but it grouped in the similarity dendrogram with the WUSCHEL transcription factor. The data suggest that these EST-contigs are related to the embryogenic process and have potential as molecular markers to increase methodological efficiency in obtaining coffee plant embryogenic materials.
Deep Sequencing Analysis of Apple Infecting Viruses in Korea

PubMed Central

Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

2016-01-01

Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

An ovary transcriptome for all maturational stages of the striped bass (Morone saxatilis), a highly advanced perciform fish.

PubMed

Reading, Benjamin J; Chapman, Robert W; Schaff, Jennifer E; Scholl, Elizabeth H; Opperman, Charles H; Sullivan, Craig V

2012-02-21

The striped bass and its relatives (genus Morone) are important fisheries and aquaculture species native to estuaries and rivers of the Atlantic coast and Gulf of Mexico in North America. To open avenues of gene expression research on reproduction and breeding of striped bass, we generated a collection of expressed sequence tags (ESTs) from a complementary DNA (cDNA) library representative of their ovarian transcriptome. Sequences of a total of 230,151 ESTs (51,259,448 bp) were acquired by Roche 454 pyrosequencing of cDNA pooled from ovarian tissues obtained at all stages of oocyte growth, at ovulation (eggs), and during preovulatory atresia. Quality filtering of ESTs allowed assembly of 11,208 high-quality contigs ≥ 100 bp, including 2,984 contigs 500 bp or longer (average length 895 bp). Blastx comparisons revealed 5,482 gene orthologues (E-value < 10-3), of which 4,120 (36.7% of total contigs) were annotated with Gene Ontology terms (E-value < 10-6). There were 5,726 remaining unknown unique sequences (51.1% of total contigs). All of the high-quality EST sequences are available in the National Center for Biotechnology Information (NCBI) Short Read Archive (GenBank: SRX007394). Informative contigs were considered to be abundant if they were assembled from groups of ESTs comprising ≥ 0.15% of the total short read sequences (≥ 345 reads/contig). Approximately 52.5% of these abundant contigs were predicted to have predominant ovary expression through digital differential display in silico comparisons to zebrafish (Danio rerio) UniGene orthologues. Over 1,300 Gene Ontology terms from Biological Process classes of Reproduction, Reproductive process, and Developmental process were assigned to this collection of annotated contigs. This first large reference sequence database available for the ecologically and economically important temperate basses (genus Morone) provides a foundation for gene expression studies in these species. The predicted predominance of ovary gene expression and assignment of directly relevant Gene Ontology classes suggests a powerful utility of this dataset for analysis of ovarian gene expression related to fundamental questions of oogenesis. Additionally, a high definition Agilent 60-mer oligo ovary 'UniClone' microarray with 8 × 15,000 probe format has been designed based on this striped bass transcriptome (eArray Group: Striper Group, Design ID: 029004).
Simplifier: a web tool to eliminate redundant NGS contigs.

PubMed

Ramos, Rommel Thiago Jucá; Carneiro, Adriana Ribeiro; Azevedo, Vasco; Schneider, Maria Paula; Barh, Debmalya; Silva, Artur

2012-01-01

Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.
Transcriptome Analysis of Thapsia laciniata Rouy Provides Insights into Terpenoid Biosynthesis and Diversity in Apiaceae

PubMed Central

Drew, Damian Paul; Dueholm, Bjørn; Weitzel, Corinna; Zhang, Ye; Sensen, Christoph W.; Simonsen, Henrik Toft

2013-01-01

Thapsia laciniata Rouy (Apiaceae) produces irregular and regular sesquiterpenoids with thapsane and guaiene carbon skeletons, as found in other Apiaceae species. A transcriptomic analysis utilizing Illumina next-generation sequencing enabled the identification of novel genes involved in the biosynthesis of terpenoids in Thapsia. From 66.78 million HQ paired-end reads obtained from T. laciniata roots, 64.58 million were assembled into 76,565 contigs (N50: 1261 bp). Seventeen contigs were annotated as terpene synthases and five of these were predicted to be sesquiterpene synthases. Of the 67 contigs annotated as cytochromes P450, 18 of these are part of the CYP71 clade that primarily performs hydroxylations of specialized metabolites. Three contigs annotated as aldehyde dehydrogenases grouped phylogenetically with the characterized ALDH1 from Artemisia annua and three contigs annotated as alcohol dehydrogenases grouped with the recently described ADH1 from A. annua. ALDH1 and ADH1 were characterized as part of the artemisinin biosynthesis. We have produced a comprehensive EST dataset for T. laciniata roots, which contains a large sample of the T. laciniata transcriptome. These transcriptome data provide the foundation for future research into the molecular basis for terpenoid biosynthesis in Thapsia and on the evolution of terpenoids in Apiaceae. PMID:23698765
De novo characterisation of the greenlip abalone transcriptome (Haliotis laevigata) with a focus on the heat shock protein 70 (HSP70) family.

PubMed

Shiel, Brett P; Hall, Nathan E; Cooke, Ira R; Robinson, Nicholas A; Strugnell, Jan M

2015-02-01

Abalone (Haliotis) are economically important molluscs for fisheries and aquaculture industries worldwide. Despite this, genomic resources for abalone and molluscs are still limited. Here we present a description and functional annotation of the greenlip abalone (Haliotis laevigata) transcriptome. We present a focused analysis on the heat shock protein 70 (HSP70) family of genes with putative functions affecting temperature stress and immunity. A total of ~38 million paired end Illumina reads were obtained, resulting in a Trinity assembly of 222,172 contigs with minimum length of 200 base pairs and maximum length of 33 kilobases. The 20,702 contigs were annotated with gene descriptions by BLAST. We created a program to maximise the number of functionally annotated genes, and over 10,000 contigs were assigned Gene ontologies (GO terms). By using CateGOrizer, immunity related GO terms for stressors such as heat, hypoxia, oxidative stress and wounding received the highest counts. Twenty-six contigs with homology to the HSP70 family of genes were identified. Ninety-one putative single-nucleotide polymorphisms were observed in the abalone HSP70 contigs. Eleven of these were considered non-synonymous. The annotated transcriptome described in this study will be a useful basis for future work investigating the genetic response of abalone to stress.
Comparative Transcriptome Analysis Identifies Candidate Genes Related to Skin Color Differentiation in Red Tilapia.

PubMed

Zhu, Wenbin; Wang, Lanmei; Dong, Zaijie; Chen, Xingting; Song, Feibiao; Liu, Nian; Yang, Hui; Fu, Jianjun

2016-08-11

Red tilapia is becoming more popular for aquaculture production in China in recent years. However, the pigmentation differentiation in genetic breeding is the main problem limiting its development of commercial red tilapia culture and the genetic basis of skin color variation is still unknown. In this study, we conducted Illumina sequencing of transcriptome on three color variety red tilapia. A total of 224,895,758 reads were generated, resulting in 160,762 assembled contigs that were used as reference contigs. The contigs of red tilapia transcriptome had hits in the range of 53.4% to 86.7% of the unique proteins of zebrafish, fugu, medaka, three-spined stickleback and tilapia. And 44,723 contigs containing 77,423 simple sequence repeats (SSRs) were identified, with 16,646 contigs containing more than one SSR. Three skin transcriptomes were compared pairwise and the results revealed that there were 148 common significantly differentially expressed unigenes and several key genes related to pigment synthesis, i.e. tyr, tyrp1, silv, sox10, slc24a5, cbs and slc7a11, were included. The results will facilitate understanding the molecular mechanisms of skin pigmentation differentiation in red tilapia and accelerate the molecular selection of the specific strain with consistent skin colors.
Analysis of complex repeat sequences within the spinal muscular atrophy (SMA) candidate region in 5q13

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davies, K.E.; Morrison, K.E.; Daniels, R.I.

1994-09-01

We previously reported that the 400 kb interval flanked the polymorphic loci D5S435 and D5S557 contains blocks of a chromosome 5 specific repeat. This interval also defines the SMA candidate region by genetic analysis of recombinant families. A YAC contig of 2-3 Mb encompassing this area has been constructed and a 5.5 kb conserved fragment, isolated from a YAC end clone within the above interval, was used to obtain cDNAs from both fetal and adult brain libraries. We describe the identification of cDNAs with stretches of high DNA sequence homology to exons of {beta} glucuronidase on human chromosome 7. Themore » cDNAs map both to the candidate region and to an area of 5p using FISH and deletion hybrid analysis. Hybridization to bacteriophage and cosmid clones from the YACs localizes the {beta} glucuronidase related sequences within the 400 kb region of the YAC contig. The cDNAs show a polymorphic pattern on hybridization to genomic BamH1 fragments in the size range of 10-250 kb. Further analysis using YAC fragmentation vectors is being used to determine how these {beta} glucuronidase related cDNAs are distributed within 5q13. Dinucleotide repeats within the region are being investigated to determine linkage disequilibrium with the disease locus.« less
RNA-seq de novo Assembly Reveals Differential Gene Expression in Glossina palpalis gambiensis Infected with Trypanosoma brucei gambiense vs. Non-Infected and Self-Cured Flies

PubMed Central

Hamidou Soumana, Illiassou; Klopp, Christophe; Ravel, Sophie; Nabihoudine, Ibouniyamine; Tchicaya, Bernadette; Parrinello, Hugues; Abate, Luc; Rialle, Stéphanie; Geiger, Anne

2015-01-01

Trypanosoma brucei gambiense (Tbg), causing the sleeping sickness chronic form, completes its developmental cycle within the tsetse fly vector Glossina palpalis gambiensis (Gpg) before its transmission to humans. Within the framework of an anti-vector disease control strategy, a global gene expression profiling of trypanosome infected (susceptible), non-infected, and self-cured (refractory) tsetse flies was performed, on their midguts, to determine differential genes expression resulting from in vivo trypanosomes, tsetse flies (and their microbiome) interactions. An RNAseq de novo assembly was achieved. The assembled transcripts were mapped to reference sequences for functional annotation. Twenty-four percent of the 16,936 contigs could not be annotated, possibly representing untranslated mRNA regions, or Gpg- or Tbg-specific ORFs. The remaining contigs were classified into 65 functional groups. Only a few transposable elements were present in the Gpg midgut transcriptome, which may represent active transpositions and play regulatory roles. One thousand three hundred and seventy three genes differentially expressed (DEGs) between stimulated and non-stimulated flies were identified at day-3 post-feeding; 52 and 1025 between infected and self-cured flies at 10 and 20 days post-feeding, respectively. The possible roles of several DEGs regarding fly susceptibility and refractoriness are discussed. The results provide new means to decipher fly infection mechanisms, crucial to develop anti-vector control strategies. PMID:26617594
Isolation of a yeast artificial chromosome contig spanning the Greig cephalopolysyndactyly syndrome (GCPS) gene region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vortkamp, A.; Gessler, M.; Le Paslier, D.

1994-08-01

Disruption of the zinc finger gene GLI3 has been shown to be the cause of Greig cephalopolysyndactyly syndrome (GCPS), at least in some GCPS translocation patients. To characterize this genomic region on human chromosome 7p13, we have isolated a YAC contig of more than 1000 kb including the GLI3 gene. In this contig the gene itself spans at least 200-250 kb. A CpG island is located in the vicinity of the 5{prime} region of the known GLI3 cDNA, implying a potential promoter region. 28 refs., 3 figs., 1 tab.
Identification and characterization of a new multigene family in the human MHC: A candidate autoimmune disease susceptibility element (3.8-1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harris, J.M.; Venditti, C.P.; Chorney, M.J.

1994-09-01

An association between idiopathic hemochromatosis (HFE) and the HLA-A3 locus has been previously well-established. In an attempt to identify potential HFE candidate genes, a genomic DNA fragment distal to the HLA-A9 breakpoint was used to screen a B cell cDNA library; a member (3.8-1) of a new multigene family, composed of five distinct genomic cross-reactive fragments, was identified. Clone 3.8-1 represents the 3{prime} end of 9.6 kb transcript which is expressed in multiple tissues including the spleen, thymus, lung and kidney. Sequencing and genome database analysis indicate that 3.8-1 is unique, with no homology to any known entries. The genomicmore » residence of 3-8.1, defined by polymorphism analysis and physical mapping using YAC clones, appears to be absent from the genomes of higher primates, although four other cross-reactivities are maintained. The absence of this gene as well as other probes which map in the TNF to HLA-B interval, suggest that this portion of the human HMC, located between the Class I and Class III regions, arose in humans as the result of a post-speciation insertional event. The large size of the 3.8-1 gene and the possible categorization of 3.8-1 as a human-specific gene are significant given the genetic data that place an autoimmune susceptibility element for IDDM and myasthenia gravis in the precise region where this gene resides. In an attempt to isolate the 5{prime} end of this large transcript, we have constructed a cosmid contig which encompasses the genomic locus of this gene and are progressively isolating coding sequences by exon trapping.« less
Sequence and gene content of a large fragment of a lizard sex chromosome and evaluation of candidate sex differentiating gene R-spondin 1

PubMed Central

2013-01-01

Background Scant genomic information from non-avian reptile sex chromosomes is available, and for only a few lizards, several snakes and one turtle species, and it represents only a small fraction of the total sex chromosome sequences in these species. Results We report a 352 kb of contiguous sequence from the sex chromosome of a squamate reptile, Pogona vitticeps, with a ZZ/ZW sex microchromosome system. This contig contains five protein coding genes (oprd1, rcc1, znf91, znf131, znf180), and major families of repetitive sequences with a high number of copies of LTR and non-LTR retrotransposons, including the CR1 and Bov-B LINEs. The two genes, oprd1 and rcc1 are part of a homologous syntenic block, which is conserved among amniotes. While oprd1 and rcc1 have no known function in sex determination or differentiation in amniotes, this homologous syntenic block in mammals and chicken also contains R-spondin 1 (rspo1), the ovarian differentiating gene in mammals. In order to explore the probability that rspo1 is sex determining in dragon lizards, genomic BAC and cDNA clones were mapped using fluorescence in situ hybridisation. Their location on an autosomal microchromosome pair, not on the ZW sex microchromosomes, eliminates rspo1 as a candidate sex determining gene in P. vitticeps. Conclusion Our study has characterized the largest contiguous stretch of physically mapped sex chromosome sequence (352 kb) from a ZZ/ZW lizard species. Although this region represents only a small fraction of the sex chromosomes of P. vitticeps, it has revealed several features typically associated with sex chromosomes including the accumulation of large blocks of repetitive sequences. PMID:24344927
Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution.

PubMed

Lightfoot, D J; Jarvis, D E; Ramaraj, T; Lee, R; Jellen, E N; Maughan, P J

2017-08-31

Amaranth (Amaranthus hypochondriacus) was a food staple among the ancient civilizations of Central and South America that has recently received increased attention due to the high nutritional value of the seeds, with the potential to help alleviate malnutrition and food security concerns, particularly in arid and semiarid regions of the developing world. Here, we present a reference-quality assembly of the amaranth genome which will assist the agronomic development of the species. Utilizing single-molecule, real-time sequencing (Pacific Biosciences) and chromatin interaction mapping (Hi-C) to close assembly gaps and scaffold contigs, respectively, we improved our previously reported Illumina-based assembly to produce a chromosome-scale assembly with a scaffold N50 of 24.4 Mb. The 16 largest scaffolds contain 98% of the assembly and likely represent the haploid chromosomes (n = 16). To demonstrate the accuracy and utility of this approach, we produced physical and genetic maps and identified candidate genes for the betalain pigmentation pathway. The chromosome-scale assembly facilitated a genome-wide syntenic comparison of amaranth with other Amaranthaceae species, revealing chromosome loss and fusion events in amaranth that explain the reduction from the ancestral haploid chromosome number (n = 18) for a tetraploid member of the Amaranthaceae. The assembly method reported here minimizes cost by relying primarily on short-read technology and is one of the first reported uses of in vivo Hi-C for assembly of a plant genome. Our analyses implicate chromosome loss and fusion as major evolutionary events in the 2n = 32 amaranths and clearly establish the homoeologous relationship among most of the subgenome chromosomes, which will facilitate future investigations of intragenomic changes that occurred post polyploidization.
Upgrading short-read animal genome assemblies to chromosome level using comparative genomics and a universal probe set.

PubMed

Damas, Joana; O'Connor, Rebecca; Farré, Marta; Lenis, Vasileios Panagiotis E; Martell, Henry J; Mandawala, Anjali; Fowler, Katie; Joseph, Sunitha; Swain, Martin T; Griffin, Darren K; Larkin, Denis M

2017-05-01

Most recent initiatives to sequence and assemble new species' genomes de novo fail to achieve the ultimate endpoint to produce contigs, each representing one whole chromosome. Even the best-assembled genomes (using contemporary technologies) consist of subchromosomal-sized scaffolds. To circumvent this problem, we developed a novel approach that combines computational algorithms to merge scaffolds into chromosomal fragments, PCR-based scaffold verification, and physical mapping to chromosomes. Multigenome-alignment-guided probe selection led to the development of a set of universal avian BAC clones that permit rapid anchoring of multiple scaffolds to chromosomes on all avian genomes. As proof of principle, we assembled genomes of the pigeon ( Columbia livia ) and peregrine falcon ( Falco peregrinus ) to chromosome levels comparable, in continuity, to avian reference genomes. Both species are of interest for breeding, cultural, food, and/or environmental reasons. Pigeon has a typical avian karyotype (2n = 80), while falcon (2n = 50) is highly rearranged compared to the avian ancestor. By using chromosome breakpoint data, we established that avian interchromosomal breakpoints appear in the regions of low density of conserved noncoding elements (CNEs) and that the chromosomal fission sites are further limited to long CNE "deserts." This corresponds with fission being the rarest type of rearrangement in avian genome evolution. High-throughput multiple hybridization and rapid capture strategies using the current BAC set provide the basis for assembling numerous avian (and possibly other reptilian) species, while the overall strategy for scaffold assembly and mapping provides the basis for an approach that (provided metaphases can be generated) could be applied to any animal genome. © 2017 Damas et al.; Published by Cold Spring Harbor Laboratory Press.
Upgrading short-read animal genome assemblies to chromosome level using comparative genomics and a universal probe set

PubMed Central

O'Connor, Rebecca; Lenis, Vasileios Panagiotis E.; Martell, Henry J.; Mandawala, Anjali; Fowler, Katie; Joseph, Sunitha; Swain, Martin T.; Griffin, Darren K.; Larkin, Denis M.

2017-01-01

Most recent initiatives to sequence and assemble new species’ genomes de novo fail to achieve the ultimate endpoint to produce contigs, each representing one whole chromosome. Even the best-assembled genomes (using contemporary technologies) consist of subchromosomal-sized scaffolds. To circumvent this problem, we developed a novel approach that combines computational algorithms to merge scaffolds into chromosomal fragments, PCR-based scaffold verification, and physical mapping to chromosomes. Multigenome-alignment-guided probe selection led to the development of a set of universal avian BAC clones that permit rapid anchoring of multiple scaffolds to chromosomes on all avian genomes. As proof of principle, we assembled genomes of the pigeon (Columbia livia) and peregrine falcon (Falco peregrinus) to chromosome levels comparable, in continuity, to avian reference genomes. Both species are of interest for breeding, cultural, food, and/or environmental reasons. Pigeon has a typical avian karyotype (2n = 80), while falcon (2n = 50) is highly rearranged compared to the avian ancestor. By using chromosome breakpoint data, we established that avian interchromosomal breakpoints appear in the regions of low density of conserved noncoding elements (CNEs) and that the chromosomal fission sites are further limited to long CNE “deserts.” This corresponds with fission being the rarest type of rearrangement in avian genome evolution. High-throughput multiple hybridization and rapid capture strategies using the current BAC set provide the basis for assembling numerous avian (and possibly other reptilian) species, while the overall strategy for scaffold assembly and mapping provides the basis for an approach that (provided metaphases can be generated) could be applied to any animal genome. PMID:27903645
Detection of a megabase deletion in a patient with brachio-oto-renal syndrome (BOR) and tricho-rhino-phalangeal syndrome (TRPS): Implications for mapping and cloning the BOR gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gu, J.Z.; Wells, D.E.; Wagner, M.J.

Genetic linkage analysis has previously mapped the locus for the autosomal dominant disorder branchio-oto-renal syndrome (BOR) to the pericentric region of chromosome 8q. A YAC contig spanning the putative BOR region, from D8S543 to D8S541, was constructed and confirmed by sequence-tagged site content mapping using microsatellite markers and by DNA hybridization analysis. YACs spanning the BOR interval were used as fluorescence in situ hybridization probes on a cell line from a patient with BO and tricho-rhino-phalangeal syndrome I that involves a chromosome 8q rearrangement. In addition to the cytogenetically defined direct insertion of material from 8q13.3-q21.13 into 8q24.11, a previouslymore » unidentified deletion of just under one megabase was found in 8q13.3. These data narrowed the most likely location of the BOR gene to a region corresponding to the proximal two-thirds of YAC 869E10 between D8S543 and D8S279. 23 refs., 3 figs.« less
CBrowse: a SAM/BAM-based contig browser for transcriptome assembly visualization and analysis.

PubMed

Li, Pei; Ji, Guoli; Dong, Min; Schmidt, Emily; Lenox, Douglas; Chen, Liangliang; Liu, Qi; Liu, Lin; Zhang, Jie; Liang, Chun

2012-09-15

To address the impending need for exploring rapidly increased transcriptomics data generated for non-model organisms, we developed CBrowse, an AJAX-based web browser for visualizing and analyzing transcriptome assemblies and contigs. Designed in a standard three-tier architecture with a data pre-processing pipeline, CBrowse is essentially a Rich Internet Application that offers many seamlessly integrated web interfaces and allows users to navigate, sort, filter, search and visualize data smoothly. The pre-processing pipeline takes the contig sequence file in FASTA format and its relevant SAM/BAM file as the input; detects putative polymorphisms, simple sequence repeats and sequencing errors in contigs and generates image, JSON and database-compatible CSV text files that are directly utilized by different web interfaces. CBowse is a generic visualization and analysis tool that facilitates close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors in transcriptome sequencing projects. CBrowse is distributed under the GNU General Public License, available at http://bioinfolab.muohio.edu/CBrowse/ liangc@muohio.edu or liangc.mu@gmail.com; glji@xmu.edu.cn Supplementary data are available at Bioinformatics online.
Exploiting a wheat EST database to assess genetic diversity

PubMed Central

2010-01-01

Expressed sequence tag (EST) markers have been used to assess variety and genetic diversity in wheat (Triticum aestivum). In this study, 1549 ESTs from wheat infested with yellow rust were used to examine the genetic diversity of six susceptible and resistant wheat cultivars. The aim of using these cultivars was to improve the competitiveness of public wheat breeding programs through the intensive use of modern, particularly marker-assisted, selection technologies. The F2 individuals derived from cultivar crosses were screened for resistance to yellow rust at the seedling stage in greenhouses and adult stage in the field to identify DNA markers genetically linked to resistance. Five hundred and sixty ESTs were assembled into 136 contigs and 989 singletons. BlastX search results showed that 39 (29%) contigs and 96 (10%) singletons were homologous to wheat genes. The database-matched contigs and singletons were assigned to eight functional groups related to protein synthesis, photosynthesis, metabolism and energy, stress proteins, transporter proteins, protein breakdown and recycling, cell growth and division and reactive oxygen scavengers. PCR analyses with primers based on the contigs and singletons showed that the most polymorphic functional categories were photosynthesis (contigs) and metabolism and energy (singletons). EST analysis revealed considerable genetic variability among the Turkish wheat cultivars resistant and susceptible to yellow rust disease and allowed calculation of the mean genetic distance between cultivars, with the greatest similarity (0.725) being between Harmankaya99 and Sönmez2001, and the lowest (0.622) between Aytin98 and Izgi01. PMID:21637582
Exploiting a wheat EST database to assess genetic diversity.

PubMed

Karakas, Ozge; Gurel, Filiz; Uncuoglu, Ahu Altinkut

2010-10-01

Expressed sequence tag (EST) markers have been used to assess variety and genetic diversity in wheat (Triticum aestivum). In this study, 1549 ESTs from wheat infested with yellow rust were used to examine the genetic diversity of six susceptible and resistant wheat cultivars. The aim of using these cultivars was to improve the competitiveness of public wheat breeding programs through the intensive use of modern, particularly marker-assisted, selection technologies. The F(2) individuals derived from cultivar crosses were screened for resistance to yellow rust at the seedling stage in greenhouses and adult stage in the field to identify DNA markers genetically linked to resistance. Five hundred and sixty ESTs were assembled into 136 contigs and 989 singletons. BlastX search results showed that 39 (29%) contigs and 96 (10%) singletons were homologous to wheat genes. The database-matched contigs and singletons were assigned to eight functional groups related to protein synthesis, photosynthesis, metabolism and energy, stress proteins, transporter proteins, protein breakdown and recycling, cell growth and division and reactive oxygen scavengers. PCR analyses with primers based on the contigs and singletons showed that the most polymorphic functional categories were photosynthesis (contigs) and metabolism and energy (singletons). EST analysis revealed considerable genetic variability among the Turkish wheat cultivars resistant and susceptible to yellow rust disease and allowed calculation of the mean genetic distance between cultivars, with the greatest similarity (0.725) being between Harmankaya99 and Sönmez2001, and the lowest (0.622) between Aytin98 and Izgi01.
Transcriptomic Analysis and the Expression of Disease-Resistant Genes in Oryza meyeriana under Native Condition

PubMed Central

He, Bin; Tao, Xiang; Gu, Yinghong; Wei, Changhe; Cheng, Xiaojie; Xiao, Suqin; Cheng, Zaiquan; Zhang, Yizheng

2015-01-01

Oryza meyeriana (O. meyeriana), with a GG genome type (2n = 24), accumulated plentiful excellent characteristics with respect to resistance to many diseases such as rice shade and blast, even immunity to bacterial blight. It is very important to know if the diseases-resistant genes exist and express in this wild rice under native conditions. However, limited genomic or transcriptomic data of O. meyeriana are currently available. In this study, we present the first comprehensive characterization of the O. meyeriana transcriptome using RNA-seq and obtained 185,323 contigs with an average length of 1,692 bp and an N50 of 2,391 bp. Through differential expression analysis, it was found that there were most tissue-specifically expressed genes in roots, and next to stems and leaves. By similarity search against protein databases, 146,450 had at least a significant alignment to existed gene models. Comparison with the Oryza sativa (japonica-type Nipponbare and indica-type 93–11) genomes revealed that 13% of the O. meyeriana contigs had not been detected in O. sativa. Many diseases-resistant genes, such as bacterial blight resistant, blast resistant, rust resistant, fusarium resistant, cyst nematode resistant and downy mildew gene, were mined from the transcriptomic database. There are two kinds of rice bacterial blight-resistant genes (Xa1 and Xa26) differentially or specifically expressed in O. meyeriana. The 4 Xa1 contigs were all only expressed in root, while three of Xa26 contigs have the highest expression level in leaves, two of Xa26 contigs have the highest expression profile in stems and one of Xa26 contigs was expressed dominantly in roots. The transcriptomic database of O. meyeriana has been constructed and many diseases-resistant genes were found to express under native condition, which provides a foundation for future discovery of a number of novel genes and provides a basis for studying the molecular mechanisms associated with disease resistance in O. meyeriana. PMID:26640944
MEETING: Chlamydomonas Annotation Jamboree - October 2003

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grossman, Arthur R

2007-04-13

Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) ormore » individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual annotators from Europe and the USA). Olivier Vallon has been most active in continued input of annotation information.« less
Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences.

PubMed

Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A

2016-10-15

Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build 'new-gen-assemblies' that result in high-quality 'annotation-ready' pseudomolecules. With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to 'group,' 'merge,' 'order and orient' sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user's total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

The Friedreich ataxia critical region spans a 150-kb interval on chromosome 9q13

DOE Office of Scientific and Technical Information (OSTI.GOV)

Montermini, L.; Zara, F.; Patel, P.I.

1995-11-01

By analysis of crossovers in key recombinant families and by homozygosity analysis of inbred families, the Friedreich ataxia (FRDA) locus was localized in a 300-kb interval between the X104 gene and the microsatellite marker FR8 (D9S888). By homology searches of the sequence databases, we identified X104 as the human tight junction protein ZO-2 gene. We generated a large-scale physical map of the FRDA region by pulsed-field gel electrophoresis analysis of genomic DNA and of three YAC clones derived from different libraries, and we constructed an uninterrupted cosmid contig spanning the FRDA locus. The cAMP-dependent protein kinase {gamma}-catalytic subunit gene wasmore » identified within the critical FRDA interval, but it was excluded as candidate because of its biological properties and because of lack of mutations in FRDA patients. Six new polymorphic markers were isolated between FR2 (D9S886) and FR8 (D9S888), which were used for homozygosity analysis in a family in which parents of an affected child are distantly related. An ancient recombination involving the centromeric FRDA flanking markers had been previously demonstrated in this family. Homozygosity analysis indicated that the FRDA gene is localized in the telomeric 150 kb of the FR2-FR8 interval. 17 refs., 3 figs., 1 tab.« less
[Cosmid libraries containing DNA from human chromosome 13].

PubMed

Kapanadze, B I; Brodianskiĭ, V M; Baranova, A V; Sevat'ianov, S Iu; Fedorova, N D; Kurskov, M M; Kostina, M A; Mironov, A A; Sineokiĭ, S P; Zakhar'ev, V M; Grafodatskiĭ, A S; Modianov, N N; Iankovskiĭ, N K

1996-03-01

We characterized two cosmid libraries constructed from flow-sorted chromosome 13 at the Imperial Cancer Research Fund (ICRF), UK (13,000 clones) and Los Alamos National Laboratory (LANL), USA (17,000 clones). After storage for two years, clones showed high viability (95%) and structural stability. EcoR I and Hind III restriction patterns were studied in more than 500 ICRF and 200 LANL cosmids. The average size of inserts was shown to be 35-37 kb in both the libraries. Most cosmids (83% and 93% of ICRF and LANL libraries, respectively) exceed the lower size limit of DNA fragments that can be packaged and represent a good source for physical mapping of chromosome 13. Total length of inserts is four and five genome equivalents in the ICRF and LANL libraries, respectively. ICRF cosmids showed hybridization to 22 of 24 unique probes tested, which corresponds to a 90% probability of having any DNA fragment represented in the library. More than 1 Mb of chromosome 13 is overlapped by 90 cosmids of 22 groups revealed. A chromosomal region of more than 150 kb, containing the ATP1AL1 gene for alpha-1 peptide of Na+, K(+)-ATPase, is covered by 12 cosmids forming a contig. The results of restriction and hybridization analyses are stored in a CLONE database. These data and all the cosmids described are publicly available.
Structure, tissue distribution, and chromosomal localization of the prepronociceptin gene.

PubMed

Mollereau, C; Simons, M J; Soularue, P; Liners, F; Vassart, G; Meunier, J C; Parmentier, M

1996-08-06

Nociceptin (orphanin FQ), the newly discovered natural agonist of opioid receptor-like (ORL1) receptor, is a neuropeptide that is endowed with pronociceptive activity in vivo. Nociceptin is derived from a larger precursor, prepronociceptin (PPNOC), whose human, mouse, and rat genes we have now isolated. The PPNOC gene is highly conserved in the three species and displays organizational features that are strikingly similar to those of the genes of preproenkephalin, preprodynorphin, and preproopiomelanocortin, the precursors to endogenous opioid peptides, suggesting the four genes belong to the same family-i.e., have a common evolutionary origin. The PPNOC gene encodes a single copy of nociceptin as well as of other peptides whose sequence is strictly conserved across murine and human species; hence it is likely to be neurophysiologically significant. Northern blot analysis shows that the PPNOC gene is predominantly transcribed in the central nervous system (brain and spinal cord) and, albeit weakly, in the ovary, the sole peripheral organ expressing the gene. By using a radiation hybrid cell line panel, the PPNOC gene was mapped to the short arm of human chromosome 8 (8p21), between sequence-tagged site markers WI-5833 and WI-1172, in close proximity of the locus encoding the neurofilament light chain NEFL. Analysis of yeast artificial chromosome clones belonging to the WC8.4 contig covering the 8p21 region did not allow to detect the presence of the gene on these yeast artificial chromosomes, suggesting a gap in the coverage within this contig.
Development of a Genetic Map for Onion (Allium cepa L.) Using Reference-Free Genotyping-by-Sequencing and SNP Assays

PubMed Central

Jo, Jinkwan; Purushotham, Preethi M.; Han, Koeun; Lee, Heung-Ryul; Nah, Gyoungju; Kang, Byoung-Cheorl

2017-01-01

Single nucleotide polymorphisms (SNPs) play important roles as molecular markers in plant genomics and breeding studies. Although onion (Allium cepa L.) is an important crop globally, relatively few molecular marker resources have been reported due to its large genome and high heterozygosity. Genotyping-by-sequencing (GBS) offers a greater degree of complexity reduction followed by concurrent SNP discovery and genotyping for species with complex genomes. In this study, GBS was employed for SNP mining in onion, which currently lacks a reference genome. A segregating F2 population, derived from a cross between ‘NW-001’ and ‘NW-002,’ as well as multiple parental lines were used for GBS analysis. A total of 56.15 Gbp of raw sequence data were generated and 1,851,428 SNPs were identified from the de novo assembled contigs. Stringent filtering resulted in 10,091 high-fidelity SNP markers. Robust SNPs that satisfied the segregation ratio criteria and with even distribution in the mapping population were used to construct an onion genetic map. The final map contained eight linkage groups and spanned a genetic length of 1,383 centiMorgans (cM), with an average marker interval of 8.08 cM. These robust SNPs were further analyzed using the high-throughput Fluidigm platform for marker validation. This is the first study in onion to develop genome-wide SNPs using GBS. The resulting SNP markers and developed linkage map will be valuable tools for genetic mapping of important agronomic traits and marker-assisted selection in onion breeding programs. PMID:28959273
Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication

PubMed Central

Amores, Angel; Catchen, Julian; Ferrara, Allyse; Fontenot, Quenton; Postlethwait, John H.

2011-01-01

Genomic resources for hundreds of species of evolutionary, agricultural, economic, and medical importance are unavailable due to the expense of well-assembled genome sequences and difficulties with multigenerational studies. Teleost fish provide many models for human disease but possess anciently duplicated genomes that sometimes obfuscate connectivity. Genomic information representing a fish lineage that diverged before the teleost genome duplication (TGD) would provide an outgroup for exploring the mechanisms of evolution after whole-genome duplication. We exploited massively parallel DNA sequencing to develop meiotic maps with thrift and speed by genotyping F1 offspring of a single female and a single male spotted gar (Lepisosteus oculatus) collected directly from nature utilizing only polymorphisms existing in these two wild individuals. Using Stacks, software that automates the calling of genotypes from polymorphisms assayed by Illumina sequencing, we constructed a map containing 8406 markers. RNA-seq on two map-cross larvae provided a reference transcriptome that identified nearly 1000 mapped protein-coding markers and allowed genome-wide analysis of conserved synteny. Results showed that the gar lineage diverged from teleosts before the TGD and its genome is organized more similarly to that of humans than teleosts. Thus, spotted gar provides a critical link between medical models in teleost fish, to which gar is biologically similar, and humans, to which gar is genomically similar. Application of our F1 dense mapping strategy to species with no prior genome information promises to facilitate comparative genomics and provide a scaffold for ordering the numerous contigs arising from next generation genome sequencing. PMID:21828280
SNP Identification from RNA Sequencing and Linkage Map Construction of Rubber Tree for Anchoring the Draft Genome

PubMed Central

Shearman, Jeremy R.; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

2015-01-01

Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly. PMID:25831195
SNP identification from RNA sequencing and linkage map construction of rubber tree for anchoring the draft genome.

PubMed

Shearman, Jeremy R; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-Areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

2015-01-01

Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly.
Wheat beta-expansin (EXPB11) genes: Identification of the expressed gene on chromosome 3BS carrying a pollen allergen domain

PubMed Central

2010-01-01

Background Expansins form a large multi-gene family found in wheat and other cereal genomes that are involved in the expansion of cell walls as a tissue grows. The expansin family can be divided up into two main groups, namely, alpha-expansin (EXPA) and beta-expansin proteins (EXPB), with the EXPB group being of particular interest as group 1-pollen allergens. Results In this study, three beta-expansin genes were identified and characterized from a newly sequenced region of the Triticum aestivum cv. Chinese Spring chromosome 3B physical map at the Sr2 locus (FPC contig ctg11). The analysis of a 357 kb sub-sequence of FPC contig ctg11 identified one beta-expansin genes to be TaEXPB11, originally identified as a cDNA from the wheat cv Wyuna. Through the analysis of intron sequences of the three wheat cv. Chinese Spring genes, we propose that two of these beta-expansin genes are duplications of the TaEXPB11 gene. Comparative sequence analysis with two other wheat cultivars (cv. Westonia and cv. Hope) and a Triticum aestivum var. spelta line validated the identification of the Chinese Spring variant of TaEXPB11. The expression in maternal and grain tissues was confirmed by examining EST databases and carrying out RT-PCR experiments. Detailed examination of the position of TaEXPB11 relative to the locus encoding Sr2 disease resistance ruled out the possibility of this gene directly contributing to the resistance phenotype. Conclusions Through 3-D structural protein comparisons with Zea mays EXPB1, we proposed that variations within the coding sequence of TaEXPB11 in wheats may produce a functional change within features such as domain 1 related to possible involvement in cell wall structure and domain 2 defining the pollen allergen domain and binding to IgE protein. The variation established in this gene suggests it is a clearly identifiable member of a gene family and reflects the dynamic features of the wheat genome as it adapted to a range of different environments and uses. Accession Numbers: ctg11 =FN564426 Survey sequences of TaEXPB11ws and TsEXPB11 are provided request. PMID:20507562
De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.

PubMed

Marchant, A; Mougel, F; Almeida, C; Jacquin-Joly, E; Costa, J; Harry, M

2015-04-01

High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.
Transcriptomic analysis of the mussel Elliptio complanata identifies candidate stress-response genes and an abundance of novel or noncoding transcripts

USGS Publications Warehouse

Cornman, Robert S.; Robertson, Laura S.; Galbraith, Heather S.; Blakeslee, Carrie J.

2014-01-01

Mussels are useful indicator species of environmental stress and degradation, and the global decline in freshwater mussel diversity and abundance is of conservation concern. Elliptio complanata is a common freshwater mussel of eastern North America that can serve both as an indicator and as an experimental model for understanding mussel physiology and genetics. To support genetic components of these research goals, we assembled transcriptome contigs from Illumina paired-end reads. Despite efforts to collapse similar contigs, the final assembly was in excess of 136,000 contigs with an N50 of 982 bp. Even so, comparisons to the CEGMA database of conserved eukaryotic genes indicated that ∼20% of genes remain unrepresented. However, numerous candidate stress-response genes were present, and we identified lineage-specific patterns of diversification among molluscs for cytochrome P450 detoxification genes and two saccharide-modifying enzymes: 1,3 beta-galactosyltransferase and fucosyltransferase. Less than a quarter of contigs had protein-level similarity based on modest BLAST and Hmmer3 statistical thresholds. These results add comparative genomic resources for molluscs and suggest a wealth of novel proteins and noncoding transcripts.
GFinisher: a new strategy to refine and finish bacterial genome assemblies

NASA Astrophysics Data System (ADS)

Guizelini, Dieval; Raittz, Roberto T.; Cruz, Leonardo M.; Souza, Emanuel M.; Steffens, Maria B. R.; Pedrosa, Fabio O.

2016-10-01

Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.
CSAR-web: a web server of contig scaffolding using algebraic rearrangements.

PubMed

Chen, Kun-Tze; Lu, Chin Lung

2018-05-04

CSAR-web is a web-based tool that allows the users to efficiently and accurately scaffold (i.e. order and orient) the contigs of a target draft genome based on a complete or incomplete reference genome from a related organism. It takes as input a target genome in multi-FASTA format and a reference genome in FASTA or multi-FASTA format, depending on whether the reference genome is complete or incomplete, respectively. In addition, it requires the users to choose either 'NUCmer on nucleotides' or 'PROmer on translated amino acids' for CSAR-web to identify conserved genomic markers (i.e. matched sequence regions) between the target and reference genomes, which are used by the rearrangement-based scaffolding algorithm in CSAR-web to order and orient the contigs of the target genome based on the reference genome. In the output page, CSAR-web displays its scaffolding result in a graphical mode (i.e. scalable dotplot) allowing the users to visually validate the correctness of scaffolded contigs and in a tabular mode allowing the users to view the details of scaffolds. CSAR-web is available online at http://genome.cs.nthu.edu.tw/CSAR-web.
Improved gap size estimation for scaffolding algorithms.

PubMed

Sahlin, Kristoffer; Street, Nathaniel; Lundeberg, Joakim; Arvestad, Lars

2012-09-01

One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance. In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners. A reference implementation is provided at https://github.com/SciLifeLab/gapest. Supplementary data are availible at Bioinformatics online.
GFinisher: a new strategy to refine and finish bacterial genome assemblies.

PubMed

Guizelini, Dieval; Raittz, Roberto T; Cruz, Leonardo M; Souza, Emanuel M; Steffens, Maria B R; Pedrosa, Fabio O

2016-10-10

Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.
Construction of an 800-kb contig in the near-centromeric region of the rice blast resistance gene Pi-ta2 using a highly representative rice BAC library.

PubMed

Nakamura, S; Asakawa, S; Ohmido, N; Fukui, K; Shimizu, N; Kawasaki, S

1997-05-01

We constructed a rice Bacterial Artificial Chromosome (BAC) library from green leaf protoplasts of the cultivar Shimokita harboring the rice blast resistance gene Pi-ta. The average insert size of 155 kb and the library size of seven genome equivalents make it one of the most comprehensive BAC libraries available, and larger than many plant YAC libraries. The library clones were plated on seven high density membranes of microplate size, enabling efficient colony identification in colony hybridization experiments. Seven percent of clones carried chloroplast DNA. By probing with markers close to the blast resistance genes Pi-ta2(closely linked to Pi-ta) and Pi-b, respectively located in the centromeric region of chromosome 12 and near the telomeric end of chromosome 2, on average 2.2 +/- 1.3 and 8.0 +/- 2.6 BAC clones/marker were isolated. Differences in chromosomal structures may contribute to this wide variation in yield. A contig of about 800 kb, consisting of 19 clones, was constructed in the Pi-ta2 region. This region had a high frequency of repetitive sequences. To circumvent this difficulty, we devised a "two-step walking" method. The contig spanned a 300 kb region between markers located at 0 cM and 0.3 cM from Pi-ta. The ratio of physical to genetic distances (> 1,000 kb/cM) was more than three times larger than the average of rice (300 kb/cM). The low recombination rate and high frequency of repetitive sequences may also be related to the near centromeric character of this region. Fluorescent in situ hybridization (FISH) with a BAC clone from the Pi-b region yielded very clear signals on the long arm of chromosome 2, while a clone from the Pi-ta2 region showed various cross-hybridizing signals near the centromeric regions of all chromosomes.
Integration of transcriptomic and proteomic data from a single wheat cultivar provides new tools for understanding the roles of individual alpha gliadin proteins in flour quality and celiac disease

USDA-ARS?s Scientific Manuscript database

One-hundred-thirty-six expressed sequence tags (ESTs) encoding alpha gliadins from Triticum aestivum cv Butte 86 were identified in public databases and assembled into 19 contigs. Consensus sequences for 12 of the contigs encoded complete alpha gliadin proteins, but only two were identical to protei...
Metagenomics workflow analysis of endophytic bacteria from oil palm fruits

NASA Astrophysics Data System (ADS)

Tanjung, Z. A.; Aditama, R.; Sudania, W. M.; Utomo, C.; Liwang, T.

2017-05-01

Next-Generation Sequencing (NGS) has become a powerful sequencing tool for microbial study especially to lead the establishment of the field area of metagenomics. This study described a workflow to analyze metagenomics data of a Sequence Read Archive (SRA) file under accession ERP004286 deposited by University of Sao Paulo. It was a direct sequencing data generated by 454 pyrosequencing platform originated from oil palm fruits endophytic bacteria which were cultured using oil-palm enriched medium. This workflow used SortMeRNA to split ribosomal reads sequence, Newbler (GS Assembler and GS Mapper) to assemble and map reads into genome reference, BLAST package to identify and annotate contigs sequence, and QualiMap for statistical analysis. Eight bacterial species were identified in this study. Enterobacter cloacae was the most abundant species followed by Citrobacter koseri, Seratia marcescens, Latococcus lactis subsp. lactis, Klebsiella pneumoniae, Citrobacter amalonaticus, Achromobacter xylosoxidans, and Pseudomonas sp. respectively. All of these species have been reported as endophyte bacteria in various plant species and each has potential as plant growth promoting bacteria or another application in agricultural industries.
Improved maize reference genome with single-molecule technologies.

PubMed

Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen

2017-06-22

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Hd6, a rice quantitative trait locus involved in photoperiod sensitivity, encodes the α subunit of protein kinase CK2

PubMed Central

Takahashi, Yuji; Shomura, Ayahiko; Sasaki, Takuji; Yano, Masahiro

2001-01-01

Hd6 is a quantitative trait locus involved in rice photoperiod sensitivity. It was detected in backcross progeny derived from a cross between the japonica variety Nipponbare and the indica variety Kasalath. To isolate a gene at Hd6, we used a large segregating population for the high-resolution and fine-scale mapping of Hd6 and constructed genomic clone contigs around the Hd6 region. Linkage analysis with P1-derived artificial chromosome clone-derived DNA markers delimited Hd6 to a 26.4-kb genomic region. We identified a gene encoding the α subunit of protein kinase CK2 (CK2α) in this region. The Nipponbare allele of CK2α contains a premature stop codon, and the resulting truncated product is undoubtedly nonfunctional. Genetic complementation analysis revealed that the Kasalath allele of CK2α increases days-to-heading. Map-based cloning with advanced backcross progeny enabled us to identify a gene underlying a quantitative trait locus even though it exhibited a relatively small effect on the phenotype. PMID:11416158
Cytogenetic mapping of a novel locus for type II Waardenburg syndrome.

PubMed

Selicorni, Angelo; Guerneri, Silvana; Ratti, Antonia; Pizzuti, Antonio

2002-01-01

An Italian family in which Waardenburg syndrome type II (WS2) segregates together with a der(8) chromosome from a (4p;8p) balanced translocation was studied. Cytogenetic analysis by painting and subtelomeric probe hybridization positioned the chromosome 8 breakpoint at p22-pter. Fluorescence in situ hybridization analysis with yeast artificial chromosomes from a contig spanning the 8p21-pter region refined the breakpoint in an interval of less than 170 kb between markers WI-3823 and D8S1819. The only cloned gene for WS2 is that for microphtalmia (MITF) on chromosome 3p. In this family, MITF mutations were excluded by sequencing the whole coding region. The 8p23 region may represent a third locus for WS2 (WS2C).

Insight into Dominant Cellulolytic Bacteria from Two Biogas Digesters and Their Glycoside Hydrolase Genes

PubMed Central

Zhang, Jun; Zhang, Lei; Geng, Alei; Liu, Fanghua; Zhao, Guoping; Wang, Shengyue; Zhou, Zhihua; Yan, Xing

2015-01-01

Diverse cellulolytic bacteria are essential for maintaining high lignocellulose degradation ability in biogas digesters. However, little was known about functional genes and gene clusters of dominant cellulolytic bacteria in biogas digesters. This is the foundation to understand lignocellulose degradation mechanisms of biogas digesters and apply these gene resource for optimizing biofuel production. A combination of metagenomic and 16S rRNA gene clone library methods was used to investigate the dominant cellulolytic bacteria and their glycoside hydrolase (GH) genes in two biogas digesters. The 16S rRNA gene analysis revealed that the dominant cellulolytic bacteria were strains closely related to Clostridium straminisolvens and an uncultured cellulolytic bacterium designated BG-1. To recover GH genes from cellulolytic bacteria in general, and BG-1 in particular, a refined assembly approach developed in this study was used to assemble GH genes from metagenomic reads; 163 GH-containing contigs ≥ 1 kb in length were obtained. Six recovered GH5 genes that were expressed in E. coli demonstrated multiple lignocellulase activities and one had high mannanase activity (1255 U/mg). Eleven fosmid clones harboring the recovered GH-containing contigs were sequenced and assembled into 10 fosmid contigs. The composition of GH genes in the 163 assembled metagenomic contigs and 10 fosmid contigs indicated that diverse GHs and lignocellulose degradation mechanisms were present in the biogas digesters. In particular, a small portion of BG-1 genome information was recovered by PhyloPythiaS analysis. The lignocellulase gene clusters in BG-1 suggested that it might use a possible novel lignocellulose degradation mechanism to efficiently degrade lignocellulose. Dominant cellulolytic bacteria of biogas digester possess diverse GH genes, not only in sequences but also in their functions, which may be applied for production of biofuel in the future. PMID:26070087
Analysis of the Salivary Gland Transcriptome of Frankliniella occidentalis

PubMed Central

Stafford-Banks, Candice A.; Rotenberg, Dorith; Johnson, Brian R.; Whitfield, Anna E.; Ullman, Diane E.

2014-01-01

Saliva is known to play a crucial role in insect feeding behavior and virus transmission. Currently, little is known about the salivary glands and saliva of thrips, despite the fact that Frankliniella occidentalis (Pergande) (the western flower thrips) is a serious pest due to its destructive feeding, wide host range, and transmission of tospoviruses. As a first step towards characterizing thrips salivary gland functions, we sequenced the transcriptome of the primary salivary glands of F. occidentalis using short read sequencing (Illumina) technology. A de novo-assembled transcriptome revealed 31,392 high quality contigs with an average size of 605 bp. A total of 12,166 contigs had significant BLASTx or tBLASTx hits (E≤1.0E−6) to known proteins, whereas a high percentage (61.24%) of contigs had no apparent protein or nucleotide hits. Comparison of the F. occidentalis salivary gland transcriptome (sialotranscriptome) against a published F. occidentalis full body transcriptome assembled from Roche-454 reads revealed several contigs with putative annotations associated with salivary gland functions. KEGG pathway analysis of the sialotranscriptome revealed that the majority (18 out of the top 20 predicted KEGG pathways) of the salivary gland contig sequences match proteins involved in metabolism. We identified several genes likely to be involved in detoxification and inhibition of plant defense responses including aldehyde dehydrogenase, metalloprotease, glucose oxidase, glucose dehydrogenase, and regucalcin. We also identified several genes that may play a role in the extra-oral digestion of plant structural tissues including β-glucosidase and pectin lyase; and the extra-oral digestion of sugars, including α-amylase, maltase, sucrase, and α-glucosidase. This is the first analysis of a sialotranscriptome for any Thysanopteran species and it provides a foundational tool to further our understanding of how thrips interact with their plant hosts and the viruses they transmit. PMID:24736614
Analysis of the salivary gland transcriptome of Frankliniella occidentalis.

PubMed

Stafford-Banks, Candice A; Rotenberg, Dorith; Johnson, Brian R; Whitfield, Anna E; Ullman, Diane E

2014-01-01

Saliva is known to play a crucial role in insect feeding behavior and virus transmission. Currently, little is known about the salivary glands and saliva of thrips, despite the fact that Frankliniella occidentalis (Pergande) (the western flower thrips) is a serious pest due to its destructive feeding, wide host range, and transmission of tospoviruses. As a first step towards characterizing thrips salivary gland functions, we sequenced the transcriptome of the primary salivary glands of F. occidentalis using short read sequencing (Illumina) technology. A de novo-assembled transcriptome revealed 31,392 high quality contigs with an average size of 605 bp. A total of 12,166 contigs had significant BLASTx or tBLASTx hits (E≤1.0E-6) to known proteins, whereas a high percentage (61.24%) of contigs had no apparent protein or nucleotide hits. Comparison of the F. occidentalis salivary gland transcriptome (sialotranscriptome) against a published F. occidentalis full body transcriptome assembled from Roche-454 reads revealed several contigs with putative annotations associated with salivary gland functions. KEGG pathway analysis of the sialotranscriptome revealed that the majority (18 out of the top 20 predicted KEGG pathways) of the salivary gland contig sequences match proteins involved in metabolism. We identified several genes likely to be involved in detoxification and inhibition of plant defense responses including aldehyde dehydrogenase, metalloprotease, glucose oxidase, glucose dehydrogenase, and regucalcin. We also identified several genes that may play a role in the extra-oral digestion of plant structural tissues including β-glucosidase and pectin lyase; and the extra-oral digestion of sugars, including α-amylase, maltase, sucrase, and α-glucosidase. This is the first analysis of a sialotranscriptome for any Thysanopteran species and it provides a foundational tool to further our understanding of how thrips interact with their plant hosts and the viruses they transmit.
Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics

PubMed Central

2013-01-01

Background The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome. Results The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs. Conclusions In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that not only was the gene nucleotide sequence conserved between soybean and lupin GRRs, but the order and orientation of particular genes in syntenic blocks was homologous, as well. These findings will be valuable to the forthcoming sequencing of the lupin genome. PMID:23379841
Analysis of expressed sequence tags from a single wheat cultivar facilitates interpretation of tandem mass spectrometry data and discrimination of gamma gliadin proteins that may play different functional roles in flour

USDA-ARS?s Scientific Manuscript database

The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...
Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

PubMed

Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

2015-11-20

The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
De novo sequencing and comparative analysis of leaf transcriptomes of diverse condensed tannin-containing lines of underutilized Psophocarpus tetragonolobus (L.) DC

PubMed Central

Singh, Vinayak; Goel, Ridhi; Pande, Veena; Asif, Mehar Hasan; Mohanty, Chandra Sekhar

2017-01-01

Condensed tannin (CT) or proanthocyanidin (PA) is a unique group of phenolic metabolite with high molecular weight with specific structure. It is reported that, the presence of high-CT in the legumes adversely affect the nutrients in the plant and impairs the digestibility upon consumption by animals. Winged bean (Psophocarpus tetragonolobus (L.) DC.) is one of the promising underutilized legume with high protein and oil-content. One of the reasons for its underutilization is due to the presence of CT. Transcriptome sequencing of leaves of two diverse CT-containing lines of P. tetragonolobus was carried out on Illumina Nextseq 500 sequencer to identify the underlying genes and contigs responsible for CT-biosynthesis. RNA-Seq data generated 102586 and 88433 contigs for high (HCTW) and low CT (LCTW) lines of P. tetragonolobus, respectively. Based on the similarity searches against gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) database revealed 5210 contigs involved in 229 different pathways. A total of 1235 contigs were detected to differentially express between HCTW and LCTW lines. This study along with its findings will be helpful in providing information for functional and comparative genomic analysis of condensed tannin biosynthesis in this plant in specific and legumes in general. PMID:28322296
Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

PubMed

Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

2010-11-01

Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.
Development of genetic markers in abalone through construction of a SNP database.

PubMed

Kang, J-H; Appleyard, S A; Elliott, N G; Jee, Y-J; Lee, J B; Kang, S W; Baek, M K; Han, Y S; Choi, T-J; Lee, Y S

2011-06-01

In the absence of a reference genome, single-nucleotide polymorphisms (SNP) discovery in a group of abalone species was undertaken by random sequence assembly. A web-based interface was constructed, and 11 932 DNA sequences from the genus Haliotis were assembled, with 1321 contigs built. Of these, 118 contigs that consisted of at least ten annotation groups were selected. The 1577 putative SNPs were identified from the 118 contigs, with SNPs in several HSP70 gene contigs confirmed by PCR amplification of an 809-bp DNA fragment. SNPs in the HSP70 gene were compared across eight abalone species. A total of 129 polymorphic sites, including heterozygote sites within and among species, were observed. Phylogenetic analysis of the partial HSP70 gene region showed separation of the tested abalone into two groups, one reflecting the southern hemisphere species and the other the northern hemisphere species. Interestingly, Haliotis iris from New Zealand showed a closer relationship to species distributed in the northern Pacific region. Although HSP genes are known to be highly conserved among taxa, the validation of polymorphic SNPs from HSP70 in this mollusc demonstrates the applicability of cross-species SNP markers in abalone and the first step towards universal nuclear markers in Haliotis. © 2010 NFRDI, Animal Genetics © 2010 Stichting International Foundation for Animal Genetics.
Comparison of carotenoid accumulation and biosynthetic gene expression between Valencia and Rohde Red Valencia sweet oranges.

PubMed

Wei, Xu; Chen, Chunxian; Yu, Qibin; Gady, Antoine; Yu, Yuan; Liang, Guolu; Gmitter, Frederick G

2014-10-01

Carotenoid accumulation and biosynthetic gene expression levels during fruit maturation were compared between ordinary Valencia (VAL) and its more deeply colored mutant Rohde Red Valencia orange (RRV). The two cultivars exhibited different carotenoid profiles and regulatory mechanisms in flavedo and juice sacs, respectively. In flavedo, there was uncoordinated carotenoid accumulation and gene expression in RRV during green stages, which might be related to the expression of certain gene(s) in the MEP (methylerythritol phosphate) pathway. The carotenoid biosynthesis pathway shifting from α,β-xanthophylls to β,β-xanthophylls synthesis occurred in RRV earlier than VAL during orange stages. In juice sacs, the low carotenoid content in both cultivars coincided with low expression of LCYE-Contig03 and LCYE-Contig24 during green stages, suggesting LCYE might be a limiting step for carotenoid accumulation. VAL mainly accumulated violaxanthin, but RRV accumulated β-cryptoxanthin and violaxanthin during orange stages, which corresponded to differences in juice color. Several upstream genes (PDS-Contig17, LCYB-Contig19, and ZDS members) and a downstream gene (ZEP) were expressed at higher levels in RRV than VAL, which might be responsible for greater accumulation of β-cryptoxanthin and violaxanthin in RRV, respectively. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Error Correcting Optical Mapping Data.

PubMed

Mukherjee, Kingshuk; Washimkar, Darshan; Muggli, Martin D; Salmela, Leena; Boucher, Christina

2018-05-26

Optical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome [21]. Recently it has been used for scaffolding contigs and assembly validation for large-scale sequencing projects, including the maize [32], goat [6], and amborella [4] genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data are analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data is numerical and susceptible to inaccuracy. We develop cOMET to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMET has high prevision and corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the E. coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Lastly, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous, and covers a larger fraction of the genome.
Genome Structural Diversity among 31 Bordetella pertussis Isolates from Two Recent U.S. Whooping Cough Statewide Epidemics.

PubMed

Bowden, Katherine E; Weigand, Michael R; Peng, Yanhui; Cassiday, Pamela K; Sammons, Scott; Knipe, Kristen; Rowe, Lori A; Loparev, Vladimir; Sheth, Mili; Weening, Keeley; Tondella, M Lucia; Williams, Margaret M

2016-01-01

During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B. pertussis populations. IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B. pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B. pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.
Identification of ovule transcripts from the Apospory-Specific Genomic Region (ASGR)-carrier chromosome

PubMed Central

2011-01-01

Background Apomixis, asexual seed production in plants, holds great potential for agriculture as a means to fix hybrid vigor. Apospory is a form of apomixis where the embryo develops from an unreduced egg that is derived from a somatic nucellar cell, the aposporous initial, via mitosis. Understanding the molecular mechanism regulating aposporous initial specification will be a critical step toward elucidation of apomixis and also provide insight into developmental regulation and downstream signaling that results in apomixis. To discover candidate transcripts for regulating aposporous initial specification in P. squamulatum, we compared two transcriptomes derived from microdissected ovules at the stage of aposporous initial formation between the apomictic donor parent, P. squamulatum (accession PS26), and an apomictic derived backcross 8 (BC8) line containing only the Apospory-Specific Genomic Region (ASGR)-carrier chromosome from P. squamulatum. Toward this end, two transcriptomes derived from ovules of an apomictic donor parent and its apomictic backcross derivative at the stage of apospory initiation, were sequenced using 454-FLX technology. Results Using 454-FLX technology, we generated 332,567 reads with an average read length of 147 base pairs (bp) for the PS26 ovule transcriptome library and 363,637 reads with an average read length of 142 bp for the BC8 ovule transcriptome library. A total of 33,977 contigs from the PS26 ovule transcriptome library and 26,576 contigs from the BC8 ovule transcriptome library were assembled using the Multifunctional Inertial Reference Assembly program. Using stringent in silico parameters, 61 transcripts were predicted to map to the ASGR-carrier chromosome, of which 49 transcripts were verified as ASGR-carrier chromosome specific. One of the alien expressed genes could be assigned as tightly linked to the ASGR by screening of apomictic and sexual F1s. Only one transcript, which did not map to the ASGR, showed expression primarily in reproductive tissue. Conclusions Our results suggest that a strategy of comparative sequencing of transcriptomes between donor parent and backcross lines containing an alien chromosome of interest can be an efficient method of identifying transcripts derived from an alien chromosome in a chromosome addition line. PMID:21521529
Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

PubMed Central

2011-01-01

Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398
Designing of plant artificial chromosome (PAC) by using the Chlorella smallest chromosome as a model system.

PubMed

Noutoshi, Y; Arai, R; Fujie, M; Yamada, T

1997-01-01

As a model for plant-type chromosomes, we have been characterizing molecular organization of the Chlorella vulgaris C-169 chromosome I. To identify chromosome structural elements including the centromeric region and replication origins, we constructed a chromosome I specific cosmid library and aligned each cosmid clones to generate contigs. So far, more than 80% of the entire chromosome I has been covered. A complete clonal physical reconstitution of chromosome I provides information on the structure and genomic organization of plant genome. We propose our strategy to construct an artificial chromosome by assembling the functional chromosome structural elements identified on Chrorella chromosome I.
Whole transcriptome analysis of the poultry red mite Dermanyssus gallinae (De Geer, 1778).

PubMed

Schicht, Sabine; Qi, Weihong; Poveda, Lucy; Strube, Christina

2014-03-01

SUMMARY Although the poultry red mite Dermanyssus gallinae (De Geer, 1778) is the major parasitic pest in poultry farming causing substantial economic losses every year, nucleotide data are rare in the public databases. Therefore, de novo sequencing covering the transcriptome of D. gallinae was carried out resulting in a dataset of 232 097 singletons and 42 130 contiguous sequences (contigs) which were subsequently clustered into 24 140 isogroups consisting of 35 788 isotigs. After removal of sequences possibly originating from bacteria or the chicken host, 267 464 sequences (231 657 singletons, 56 contigs and 35 751 isotigs) remained, of which 10·3% showed homology to proteins derived from other organisms. The most significant Blast top-hit species was the mite Metaseiulus occidentalis followed by the tick Ixodes scapularis. To gain functional knowledge of D. gallinae transcripts, sequences were mapped to Gene Ontology terms, Kyoto Encyclopedia of Gene and Genomes (KEGG) pathways and parsed to InterProScan. The transcriptome dataset provides new insights in general mite genetics and lays a foundation for future studies on stage-specific transcriptomics as well as genomic, proteomic, and metabolomic explorations and might provide new perspectives to control this parasitic mite by identifying possible drug targets or vaccine candidates. It is also worth noting that in different tested species of the class Arachnida no 28S rRNA was detectable in the rRNA profile, indicating that 28S rRNA might consists of two separate, hydrogen-bonded fragments, whose (heat-induced) disruption may led to co-migration with 18S rRNA.
Using the Model Perennial Grass Brachypodium sylvaticum to Engineer Resistance to Multiple Abiotic Stresses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gordon, Sean; Reguera, Maria; Sade, Nir

2015-03-20

We are using the perennial model grass Brachypodium sylvaticum to identify combinations of transgenes that enhance tolerance to multiple, simultaneous abiotic stresses. The most successful transgene combinations will ultimately be used to create improved switchgrass (Panicum virgatum L.) cultivars. To further develop B. sylvaticum as a perennial model grass, and facilitate our planned transcriptional profiling, we are sequencing and annotating the genome. We have generated ~40x genome coverage using PacBio sequencing of the largest possible size selected libraries (18, 22, 25 kb). Our initial assembly using only long-read sequence contained 320 Mb of sequence with an N50 contig length ofmore » 315 kb and an N95 contig length of 40 kb. This assembly consists of 2,430 contigs, the largest of which was 1.6 Mb. The estimated genome size based on c-values is 340 Mb indicating that about 20 Mb of presumably repetitive DNA remains yet unassembled. Significantly, this assembly is far superior to an assembly created from paired-end short-read sequence, ~100x genome coverage. The short-read-only assembly contained only 226 Mb of sequence in 19k contigs. To aid the assembly of the scaffolds into chromosome-scale assemblies we produced an F2 mapping population and have genotyped 480 individuals using a genotype by sequence approach. One of the reasons for using B. sylvaticum as a model system is to determine if the transgenes adversely affect perenniality and winter hardiness. Toward this goal, we examined the freezing tolerance of wild type B. sylvaticum lines to determine the optimal conditions for testing the freezing tolerance of the transgenics. A survey of seven accessions noted significant natural variation in freezing tolerance. Seedling or adult Ain-1 plants, the line used for transformation, survived an 8 hour challenge down to -6 oC and 50% survived a challenge down to -9 oC. Thus, we will be able to easily determine if the transgenes compromise freezing tolerance. In the effort to develop biotechnological tools for perennial grass improvement, we have completed the transformation of B. sylvaticum with constructs containing 20 genes shown to be associated with enhanced abiotic stress tolerance in monocots. In addition, we have transformed plants with constructs containing a combination of genes (i.e. SARK::IPT- Ubi::HSR1::Ubi::NHX1) in order to simultaneously overexpress genes associated with drought + heat tolerance + salt tolerance. We generated single copy insert T1 lines for all constructs and the generation and bulking of homozygous T2 lines is well underway. In addition to our B. sylvaticum transgenics, we transformed B. distachyon with many of the same genes. Some of the transgenic B. distachyon plants subjected to a combined stress of both drought and salinity were able to produced higher yields than wild type plants. Our results indicate a great potential for the development of grasses with improved performance and yield in water-limited areas.« less
What Do Pre-Service Physics Teachers Know and Think about Concept Mapping?

ERIC Educational Resources Information Center

Didis, Nilüfer; Özcan, Özgür; Azar, Ali

2014-01-01

In order to use concept maps in physics classes effectively, teachers' knowledge and ideas about concept mapping are as important as the physics knowledge used in mapping. For this reason, we aimed to examine pre-service physics teachers' knowledge on concept mapping, their ideas about the implementation of concept mapping in physics…
Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences

PubMed Central

Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A.

2016-01-01

Abstract Motivation: Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool—Genome Puzzle Master (GPM)—that enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules. Results: With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to ‘group,’ ‘merge,’ ‘order and orient’ sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user’s total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. Availability and Implementation: The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS Contacts: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318200
A knockout mutation in the lignin biosynthesis gene CCR1 explains a major QTL for acid detergent lignin content in Brassica napus seeds.

PubMed

Liu, Liezhao; Stein, Anna; Wittkop, Benjamin; Sarvari, Pouya; Li, Jiana; Yan, Xingying; Dreyer, Felix; Frauen, Martin; Friedt, Wolfgang; Snowdon, Rod J

2012-05-01

Seed coat phenolic compounds represent important antinutritive fibre components that cause a considerable reduction in value of seed meals from oilseed rape (Brassica napus). The nutritionally most important fibre compound is acid detergent lignin (ADL), to which a significant contribution is made by phenylpropanoid-derived lignin precursors. In this study, we used bulked-segregant analysis in a population of recombinant inbred lines (RILs) from a cross of the Chinese oilseed rape lines GH06 (yellow seed, low ADL) and P174 (black seed, high ADL) to identify markers with tight linkage to a major quantitative trait locus (QTL) for seed ADL content. Fine mapping of the QTL was performed in a backcross population comprising 872 BC(1)F(2) plants from a cross of an F(7) RIL from the above-mentioned population, which was heterozygous for this major QTL and P174. A 3:1 phenotypic segregation for seed ADL content indicated that a single, dominant, major locus causes a substantial reduction in ADL. This locus was successively narrowed to 0.75 cM using in silico markers derived from a homologous Brassica rapa sequence contig spanning the QTL. Subsequently, we located a B. rapa orthologue of the key lignin biosynthesis gene CINNAMOYL CO-A REDUCTASE 1 (CCR1) only 600 kbp (0.75 cM) upstream of the nearest linked marker. Sequencing of PCR amplicons, covering the full-length coding sequences of Bna.CCR1 homologues, revealed a locus in P174 whose sequence corresponds to the Brassica oleracea wild-type allele from chromosome C8. In GH06, however, this allele is replaced by a homologue derived from chromosome A9 that contains a loss-of-function frameshift mutation in exon 1. Genetic and physical map data infer that this loss-of-function allele has replaced a functional Bna.CCR1 locus on chromosome C8 in GH06 by homoeologous non-reciprocal translocation.

De novo genome assembly of the red silk cotton tree (Bombax ceiba).

PubMed

Gao, Yong; Wang, Haibo; Liu, Chao; Chu, Honglong; Dai, Dongqin; Song, Shengnan; Yu, Long; Han, Lihong; Fu, Yi; Tian, Bin; Tang, Lizhou

2018-05-01

Bombax ceiba L. (the red silk cotton tree) is a large deciduous tree that is distributed in tropical and sub-tropical Asia as well as northern Australia. It has great economic and ecological importance, with several applications in industry and traditional medicine in many Asian countries. To facilitate further utilization of this plant resource, we present here the draft genome sequence for B. ceiba. We assembled a relatively intact genome of B. ceiba by using PacBio single-molecule sequencing and BioNano optical mapping technologies. The final draft genome is approximately 895 Mb long, with contig and scaffold N50 sizes of 1.0 Mb and 2.06 Mb, respectively. The high-quality draft genome assembly of B. ceiba will be a valuable resource enabling further genetic improvement and more effective use of this tree species.
HOWDY: an integrated database system for human genome research

PubMed Central

Hirakawa, Mika

2002-01-01

HOWDY is an integrated database system for accessing and analyzing human genomic information (http://www-alis.tokyo.jst.go.jp/HOWDY/). HOWDY stores information about relationships between genetic objects and the data extracted from a number of databases. HOWDY consists of an Internet accessible user interface that allows thorough searching of the human genomic databases using the gene symbols and their aliases. It also permits flexible editing of the sequence data. The database can be searched using simple words and the search can be restricted to a specific cytogenetic location. Linear maps displaying markers and genes on contig sequences are available, from which an object can be chosen. Any search starting point identifies all the information matching the query. HOWDY provides a convenient search environment of human genomic data for scientists unsure which database is most appropriate for their search. PMID:11752279
Draft genome sequence of Lactobacillus mali KCTC 3596.

PubMed

Kim, Dong-Wook; Choi, Sang-Haeng; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dae-Soo; Kim, Ryong Nam; Kim, Aeri; Park, Hong-Seog

2011-09-01

We announce the draft genome sequence of the type strain Lactobacillus mali KCTC 3596 (2,652,969 bp, with a G+C content of 36.0%), which is one of the most prevalent lactic acid bacteria present during the manufacturing process of apple juice. The genome consists of 122 large contigs (>100 bp). All of the contigs were assembled by Newbler Assembler 2.3 (454 Life Science). Copyright © 2011, American Society for Microbiology. All Rights Reserved.
Role of Modular Polyketide Synthases in the Production of Polyether Ladder Compounds in Ciguatoxin-Producing Gambierdiscus polynesiensis and G. excentricus (Dinophyceae).

PubMed

Kohli, Gurjeet S; Campbell, Katrina; John, Uwe; Smith, Kirsty F; Fraga, Santiago; Rhodes, Lesley L; Murray, Shauna A

2017-09-01

Gambierdiscus, a benthic dinoflagellate, produces ciguatoxins that cause the human illness Ciguatera. Ciguatoxins are polyether ladder compounds that have a polyketide origin, indicating that polyketide synthases (PKS) are involved in their production. We sequenced transcriptomes of Gambierdiscus excentricus and Gambierdiscus polynesiensis and found 264 contigs encoding single domain ketoacyl synthases (KS; G. excentricus: 106, G. polynesiensis: 143) and ketoreductases (KR; G. excentricus: 7, G. polynesiensis: 8) with sequence similarity to type I PKSs, as reported in other dinoflagellates. In addition, 24 contigs (G. excentricus: 3, G. polynesiensis: 21) encoding multiple PKS domains (forming typical type I PKSs modules) were found. The proposed structure produced by one of these megasynthases resembles a partial carbon backbone of a polyether ladder compound. Seventeen contigs encoding single domain KS, KR, s-malonyltransacylase, dehydratase and enoyl reductase with sequence similarity to type II fatty acid synthases (FAS) in plants were found. Type I PKS and type II FAS genes were distinguished based on the arrangement of domains on the contigs and their sequence similarity and phylogenetic clustering with known PKS/FAS genes in other organisms. This differentiation of PKS and FAS pathways in Gambierdiscus is important, as it will facilitate approaches to investigating toxin biosynthesis pathways in dinoflagellates. © 2017 The Author(s) Journal of Eukaryotic Microbiology © 2017 International Society of Protistologists.
Transcriptome analysis of Nautilus and pygmy squid developing eye provides insights in lens and eye evolution.

PubMed

Sousounis, Konstantinos; Ogura, Atsushi; Tsonis, Panagiotis A

2013-01-01

Coleoid cephalopods like squids have a camera-type eye similar to vertebrates. On the other hand, Nautilus (Nautiloids) has a pinhole eye that lacks lens and cornea. Since pygmy squid and Nautilus are closely related species they are excellent model organisms to study eye evolution. Having being able to collect Nautilus embryos, we employed next-generation RNA sequencing using Nautilus and pygmy squid developing eyes. Their transcriptomes were compared and analyzed. Enrichment analysis of Gene Ontology revealed that contigs related to nucleic acid binding were largely up-regulated in squid, while the ones related to metabolic processes and extracellular matrix-related genes were up-regulated in Nautilus. These differences are most likely correlated with the complexity of tissue organization in these species. Moreover, when the analysis focused on the eye-related contigs several interesting patterns emerged. First, contigs from both species related to eye tissue differentiation and morphogenesis as well as to cilia showed best hits with their Human counterparts, while contigs related to rabdomeric photoreceptors showed the best hit with their Drosophila counterparts. This bolsters the idea that eye morphogenesis genes have been generally conserved in evolution, and compliments other studies showing that genes involved in photoreceptor differentiation clearly follow the diversification of invertebrate (rabdomeric) and vertebrate (ciliated) photoreceptors. Interestingly some contigs showed as good a hit with Drosophila and Human homologues in Nautilus and squid samples. One of them, capt/CAP1, is known to be preferentially expressed in Drosophila developing eye and in vertebrate lens. Importantly our analysis also provided evidence of gene duplication and diversification of their function in both species. One of these genes is the Neurofibromatosis 1 (NF1/Nf1), which in mice has been implicated in lens formation, suggesting a hitherto unsuspected role in the evolution of the lens in molluscs.
Comparing de novo assemblers for 454 transcriptome data

PubMed Central

2010-01-01

Background Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Results Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended. PMID:20950480
Transcriptome analysis in cotton boll weevil (Anthonomus grandis) and RNA interference in insect pests.

PubMed

Firmino, Alexandre Augusto Pereira; Fonseca, Fernando Campos de Assis; de Macedo, Leonardo Lima Pepino; Coelho, Roberta Ramos; Antonino de Souza, José Dijair; Togawa, Roberto Coiti; Silva-Junior, Orzenil Bonfim; Pappas, Georgios Joannis; da Silva, Maria Cristina Mattar; Engler, Gilbert; Grossi-de-Sa, Maria Fatima

2013-01-01

Cotton plants are subjected to the attack of several insect pests. In Brazil, the cotton boll weevil, Anthonomus grandis, is the most important cotton pest. The use of insecticidal proteins and gene silencing by interference RNA (RNAi) as techniques for insect control are promising strategies, which has been applied in the last few years. For this insect, there are not much available molecular information on databases. Using 454-pyrosequencing methodology, the transcriptome of all developmental stages of the insect pest, A. grandis, was analyzed. The A. grandis transcriptome analysis resulted in more than 500.000 reads and a data set of high quality 20,841 contigs. After sequence assembly and annotation, around 10,600 contigs had at least one BLAST hit against NCBI non-redundant protein database and 65.7% was similar to Tribolium castaneum sequences. A comparison of A. grandis, Drosophila melanogaster and Bombyx mori protein families' data showed higher similarity to dipteran than to lepidopteran sequences. Several contigs of genes encoding proteins involved in RNAi mechanism were found. PAZ Domains sequences extracted from the transcriptome showed high similarity and conservation for the most important functional and structural motifs when compared to PAZ Domains from 5 species. Two SID-like contigs were phylogenetically analyzed and grouped with T. castaneum SID-like proteins. No RdRP gene was found. A contig matching chitin synthase 1 was mined from the transcriptome. dsRNA microinjection of a chitin synthase gene to A. grandis female adults resulted in normal oviposition of unviable eggs and malformed alive larvae that were unable to develop in artificial diet. This is the first study that characterizes the transcriptome of the coleopteran, A. grandis. A new and representative transcriptome database for this insect pest is now available. All data support the state of the art of RNAi mechanism in insects.
Transcriptome Analysis in Cotton Boll Weevil (Anthonomus grandis) and RNA Interference in Insect Pests

PubMed Central

Coelho, Roberta Ramos; Antonino de Souza Jr, José Dijair; Togawa, Roberto Coiti; Silva-Junior, Orzenil Bonfim; Pappas-Jr, Georgios Joannis; da Silva, Maria Cristina Mattar; Engler, Gilbert; Grossi-de-Sa, Maria Fatima

2013-01-01

Cotton plants are subjected to the attack of several insect pests. In Brazil, the cotton boll weevil, Anthonomus grandis, is the most important cotton pest. The use of insecticidal proteins and gene silencing by interference RNA (RNAi) as techniques for insect control are promising strategies, which has been applied in the last few years. For this insect, there are not much available molecular information on databases. Using 454-pyrosequencing methodology, the transcriptome of all developmental stages of the insect pest, A. grandis, was analyzed. The A. grandis transcriptome analysis resulted in more than 500.000 reads and a data set of high quality 20,841 contigs. After sequence assembly and annotation, around 10,600 contigs had at least one BLAST hit against NCBI non-redundant protein database and 65.7% was similar to Tribolium castaneum sequences. A comparison of A. grandis, Drosophila melanogaster and Bombyx mori protein families’ data showed higher similarity to dipteran than to lepidopteran sequences. Several contigs of genes encoding proteins involved in RNAi mechanism were found. PAZ Domains sequences extracted from the transcriptome showed high similarity and conservation for the most important functional and structural motifs when compared to PAZ Domains from 5 species. Two SID-like contigs were phylogenetically analyzed and grouped with T. castaneum SID-like proteins. No RdRP gene was found. A contig matching chitin synthase 1 was mined from the transcriptome. dsRNA microinjection of a chitin synthase gene to A. grandis female adults resulted in normal oviposition of unviable eggs and malformed alive larvae that were unable to develop in artificial diet. This is the first study that characterizes the transcriptome of the coleopteran, A. grandis. A new and representative transcriptome database for this insect pest is now available. All data support the state of the art of RNAi mechanism in insects. PMID:24386449
Transcriptome Analysis of Nautilus and Pygmy Squid Developing Eye Provides Insights in Lens and Eye Evolution

PubMed Central

Sousounis, Konstantinos; Ogura, Atsushi; Tsonis, Panagiotis A.

2013-01-01

Coleoid cephalopods like squids have a camera-type eye similar to vertebrates. On the other hand, Nautilus (Nautiloids) has a pinhole eye that lacks lens and cornea. Since pygmy squid and Nautilus are closely related species they are excellent model organisms to study eye evolution. Having being able to collect Nautilus embryos, we employed next-generation RNA sequencing using Nautilus and pygmy squid developing eyes. Their transcriptomes were compared and analyzed. Enrichment analysis of Gene Ontology revealed that contigs related to nucleic acid binding were largely up-regulated in squid, while the ones related to metabolic processes and extracellular matrix-related genes were up-regulated in Nautilus. These differences are most likely correlated with the complexity of tissue organization in these species. Moreover, when the analysis focused on the eye-related contigs several interesting patterns emerged. First, contigs from both species related to eye tissue differentiation and morphogenesis as well as to cilia showed best hits with their Human counterparts, while contigs related to rabdomeric photoreceptors showed the best hit with their Drosophila counterparts. This bolsters the idea that eye morphogenesis genes have been generally conserved in evolution, and compliments other studies showing that genes involved in photoreceptor differentiation clearly follow the diversification of invertebrate (rabdomeric) and vertebrate (ciliated) photoreceptors. Interestingly some contigs showed as good a hit with Drosophila and Human homologues in Nautilus and squid samples. One of them, capt/CAP1, is known to be preferentially expressed in Drosophila developing eye and in vertebrate lens. Importantly our analysis also provided evidence of gene duplication and diversification of their function in both species. One of these genes is the Neurofibromatosis 1 (NF1/Nf1), which in mice has been implicated in lens formation, suggesting a hitherto unsuspected role in the evolution of the lens in molluscs. PMID:24205087
The gene for human U2 snRNP auxiliary factor small 35-kDa subunit (U2AF1) maps to the progressive myoclonus epilepsy (EPM1) critical region on chromosome 21q22.3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lalioti, M.D.; Rossier, C.; Antonarakis, S.E.

1996-04-15

We used targeted exon trapping to clone portions of genes from human chromosome 21q22.3. One trapped sequence showed complete homology with the cDNA of human U2AF{sup 35} (M96982; HGM-approved nomenclature U2AF1), which encodes for the small 35-kDa subunit of the U2 snRNP auxiliary factor. Using the U2AF1 cDNA as a probe, we mapped this gene to cosmid Q15D2, a P1, and YAC 350F7 of the Chumakov et al. contig, close to the cystathionine-{beta}-synthase gene (CBS) on 21q22.3. This localization was confirmed by PCR using oligonucleotides from the 3{prime} UTR and by FISH. As U2AF1 associated with a number of differentmore » factors during mRNA splicing, overexpression in trisomy 21 individuals could contribute to some Down syndrome phenotypes by interfering with the splicing process. Furthermore, because this gene maps in the critical region for the progressive myoclonus epilepsy I locus (EPM1), mutation analysis will be carried out in patients to evaluate the potential role of U2AF1 as a candidate for EPM1. 24 refs., 1 fig.« less
A comprehensive transcriptome assembly of Pigeonpea (Cajanus cajan L.) using sanger and second-generation sequencing platforms.

PubMed

Kudapa, Himabindu; Bharti, Arvind K; Cannon, Steven B; Farmer, Andrew D; Mulaosmanovic, Benjamin; Kramer, Robin; Bohra, Abhishek; Weeks, Nathan T; Crow, John A; Tuteja, Reetu; Shah, Trushar; Dutta, Sutapa; Gupta, Deepak K; Singh, Archana; Gaikwad, Kishor; Sharma, Tilak R; May, Gregory D; Singh, Nagendra K; Varshney, Rajeev K

2012-09-01

A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript assembly contigs (TACs) with an N50 of 1510 bp, the largest one being ~8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping positions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.
Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array.

PubMed

Antanaviciute, Laima; Fernández-Fernández, Felicidad; Jansen, Johannes; Banchi, Elisa; Evans, Katherine M; Viola, Roberto; Velasco, Riccardo; Dunwell, Jim M; Troggio, Michela; Sargent, Daniel J

2012-05-25

A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the 'Golden Delicious' genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the 'Golden Delicious' pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the 'Golden Delicious' reference sequence will assist in the continued improvement of the genome sequence assembly for that variety.
Towards isolation of the gene for X-linked retinitis pigmentosa (RP3)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dry, K.L.; Aldred, M.A.; Hardwick, L.J.

1994-09-01

Until recently the region of interest containing the gene for X-linked retinitis pigmentosa (RP3) was thought to lie between CYBB (Xp21.1) and the proximal end of the deletion in patient BB (JBBprox). This region was thought to span 100-150 kb. Here we present new mapping data to show that the distance between the 5{prime} (most proximal) end of CYBB and JBBprox is only 50 kb. Recently Roux et al. (1994) have described the isolation of a gene within this region but this showed no disease-associated changes. Further evidence from mapping the deletion in patient NF (who suffered from McLead`s syndromemore » and CGD but not RP) and from linkage analysis of our RP3 families with a new dinucleotide repeat suggests that the gene must extend proximally from JBBprox. In order to extend the region of search we have constructed a YAC contig spanning 800 kb to OTC. We are continuing our search for the RP3 gene using a variety of strategies including exon trapping and cDNA enrichment as well as direct screening of cDNA libraries with subclones from this region.« less
Development and characterization of BAC-end sequence derived SSRs, and their incorporation into a new higher density genetic map for cultivated peanut (Arachis hypogaea L.)

PubMed Central

2012-01-01

Background Cultivated peanut (Arachis hypogaea L.) is an important crop worldwide, valued for its edible oil and digestible protein. It has a very narrow genetic base that may well derive from a relatively recent single polyploidization event. Accordingly molecular markers have low levels of polymorphism and the number of polymorphic molecular markers available for cultivated peanut is still limiting. Results Here, we report a large set of BAC-end sequences (BES), use them for developing SSR (BES-SSR) markers, and apply them in genetic linkage mapping. The majority of BESs had no detectable homology to known genes (49.5%) followed by sequences with similarity to known genes (44.3%), and miscellaneous sequences (6.2%) such as transposable element, retroelement, and organelle sequences. A total of 1,424 SSRs were identified from 36,435 BESs. Among these identified SSRs, dinucleotide (47.4%) and trinucleotide (37.1%) SSRs were predominant. The new set of 1,152 SSRs as well as about 4,000 published or unpublished SSRs were screened against two parents of a mapping population, generating 385 polymorphic loci. A genetic linkage map was constructed, consisting of 318 loci onto 21 linkage groups and covering a total of 1,674.4 cM, with an average distance of 5.3 cM between adjacent loci. Two markers related to resistance gene homologs (RGH) were mapped to two different groups, thus anchoring 1 RGH-BAC contig and 1 singleton. Conclusions The SSRs mined from BESs will be of use in further molecular analysis of the peanut genome, providing a novel set of markers, genetically anchoring BAC clones, and incorporating gene sequences into a linkage map. This will aid in the identification of markers linked to genes of interest and map-based cloning. PMID:22260238
[Psychopharmacology in aviation and astronautics].

PubMed

Vasil'ev, P V; Glod, G D

1977-01-01

Flights aboard modern vehicles are associated with high nervous-emotional and physical stresses. This may induce depletion of reserve capabilities, development of fatigue and, consequently, reduction of work capacity of crewmembers. The paper discusses approaches and results of the use of drugs by pilots and cosmonauts in order to alleviate their fatigue and emotional stress. It gives indications and contraindications for the adminstration of stimulants and tranquilizers. On the basis of a comprehensive analysis of the literature data and their own findings, the authors draw the conclusion that the use of stimulants and anxiolytics may increase the level of reliability and performance of air- and spacecraft pilots during programmed and, particularly, contigent situations of the flight.
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans.

PubMed

Tully, Benjamin J; Graham, Elaina D; Heidelberg, John F

2018-01-16

Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.
The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

PubMed

Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N

2013-06-03

Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
High-throughput sequencing of three Lemnoideae (duckweeds) chloroplast genomes from total DNA.

PubMed

Wang, Wenqin; Messing, Joachim

2011-01-01

Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.
High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

PubMed Central

Wang, Wenqin; Messing, Joachim

2011-01-01

Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804
Analyses of Hypomethylated Oil Palm Gene Space

PubMed Central

Jayanthi, Nagappan; Mohd-Amin, Ab Halim; Azizi, Norazah; Chan, Kuang-Lim; Maqbool, Nauman J.; Maclean, Paul; Brauning, Rudi; McCulloch, Alan; Moraga, Roger; Ong-Abdullah, Meilina; Singh, Rajinder

2014-01-01

Demand for palm oil has been increasing by an average of ∼8% the past decade and currently accounts for about 59% of the world's vegetable oil market. This drives the need to increase palm oil production. Nevertheless, due to the increasing need for sustainable production, it is imperative to increase productivity rather than the area cultivated. Studies on the oil palm genome are essential to help identify genes or markers that are associated with important processes or traits, such as flowering, yield and disease resistance. To achieve this, 294,115 and 150,744 sequences from the hypomethylated or gene-rich regions of Elaeis guineensis and E. oleifera genome were sequenced and assembled into contigs. An additional 16,427 shot-gun sequences and 176 bacterial artificial chromosomes (BAC) were also generated to check the quality of libraries constructed. Comparison of these sequences revealed that although the methylation-filtered libraries were sequenced at low coverage, they still tagged at least 66% of the RefSeq supported genes in the BAC and had a filtration power of at least 2.0. A total 33,752 microsatellites and 40,820 high-quality single nucleotide polymorphism (SNP) markers were identified. These represent the most comprehensive collection of microsatellites and SNPs to date and would be an important resource for genetic mapping and association studies. The gene models predicted from the assembled contigs were mined for genes of interest, and 242, 65 and 14 oil palm transcription factors, resistance genes and miRNAs were identified respectively. Examples of the transcriptional factors tagged include those associated with floral development and tissue culture, such as homeodomain proteins, MADS, Squamosa and Apetala2. The E. guineensis and E. oleifera hypomethylated sequences provide an important resource to understand the molecular mechanisms associated with important agronomic traits in oil palm. PMID:24497974

The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

PubMed Central

2013-01-01

Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509
Finishing bacterial genome assemblies with Mix.

PubMed

Soueidan, Hayssam; Maurier, Florence; Groppi, Alexis; Sirand-Pugnet, Pascal; Tardy, Florence; Citti, Christine; Dupuy, Virginie; Nikolski, Macha

2013-01-01

Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase. In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length. We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects. Mix is implemented in Python and is available at https://github.com/cbib/MIX, novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/.
Fine Mapping Suggests that the Goat Polled Intersex Syndrome and the Human Blepharophimosis Ptosis Epicanthus Syndrome Map to a 100-kb Homologous Region

PubMed Central

Schibler, Laurent; Cribiu, Edmond P.; Oustry-Vaiman, Anne; Furet, Jean-Pierre; Vaiman, Daniel

2000-01-01

To clone the goat Polled Intersex Syndrome (PIS) gene(s), a chromosome walk was performed from six entry points at 1q43. This enabled 91 BACs to be recovered from a recently constructed goat BAC library. Six BAC contigs of goat chromosome 1q43 (ICC1–ICC6) were thus constructed covering altogether 4.5 Mb. A total of 37 microsatellite sequences were isolated from this 4.5-Mb region (16 in this study), of which 33 were genotyped and mapped. ICC3 (1500 kb) was shown by genetic analysis to encompass the PIS locus in a ∼400-kb interval without recombinants detected in the resource families (293 informative meioses). A strong linkage disequilibrium was detected among unrelated animals with the two central markers of the region, suggesting a probable location for PIS in ∼100 kb. High-resolution comparative mapping with human data shows that this DNA segment is the homolog of the human region associated with Blepharophimosis Ptosis Epicanthus inversus Syndrome (BPES) gene located in 3q23. This finding suggests that homologous gene(s) could be responsible for the pathologies observed in humans and goats. [The sequence data, PCR primers and PCR conditions for STS and microsatellites described in this paper have been submitted to the GenBank data library under accession nos. AQ666547–AQ666579, AQ686084–AQ686129, AQ793920–793931, AQ810429–AQ810527, G41201–G41228, and G54270–G54286.] PMID:10720572
Mapping of the chromosome 1p36 region surrounding the Charcot-Marie-Tooth disease type 2A locus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Denton, P.; Gere, S.; Wolpert, C.

1994-09-01

Charcot-Marie-Tooth (CMT) disease is the most common inherited peripheral neuropathy. Although CMT2 is clinically indistinguishable from CMT1, the two forms can be differentiated by pathological and neurophysiological methods. We have established one locus, CMT2A on chromosome 1p36, and have established genetic heterogeneity. This locus maps to the region of the deletions associated with neuroblastoma. We have now identified an additional 11 CMT2 families. Three families are linked to chromosome 1p36 while six families are excluded from this region. Another six families are currently under analysis and collection. To date the CMT2A families represent one third of those CMT2 families examined.more » We have established a microdissection library of the 1p36 region which is currently being characterized for microsatellite repeats and STSs using standard hybridization techniques and a modified degenerate primer method. In addition, new markers (D1S253, D1S450, D1S489, D1S503, GATA27E04, and GATA4H04) placed in this region are being mapped using critical recombinants in the CEPH reference pedigrees. Fluorescent in situ hybridization (FISH) has been used to confirm mapping. A YAC contig is being assembled from the CEPH megabase library using STSs to isolate key YACs which are extended by vectorette end clone and Alu-PCR. These findings suggest that the CMT2 phenotype is secondary to at least two different genes and demonstrates further heterogeneity in the CMT phenotype.« less
Elucidation of the mechanism of homozygous deletion of 3p12-13 in the U2020 cell line reveals the unexpected involvement of other chromosomes.

PubMed

Heppell-Parton, A C; Nacheva, E; Carter, N P; Bergh, J; Ogilvie, D; Rabbitts, P H

1999-06-01

Homozygous deletions in tumor cells have been useful in the localization and validation of tumor suppressor genes. We have described a homozygous deletion in a lung cancer cell line (U2020) which is located within the most proximal of the three regions on the short arm of chromosome 3 believed to be lost in lung cancer development. Construction of a YAC contig map indicates that the deletion spans around 8 Mb, but no large deletion was apparent on conventional cytogenetic analysis of the cell line. To investigate this paradox, whole chromosome, arm-specific, and regional paints have been used. This analysis has revealed that genetic loss has occurred by complex rearrangements of chromosomes 3, rather than simple interstitial deletion. These studies emphasize the power of molecular cytogenetics to disclose unsuspected tumor-specific translocations within the extremely complex karyotypes characteristic of solid tumors.
Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding

PubMed Central

Vij, Shubha; Kuhl, Heiner; Kuznetsova, Inna S.; Komissarov, Aleksey; Yurchenko, Andrey A.; Van Heusden, Peter; Singh, Siddharth; Thevasagayam, Natascha M.; Prakki, Sai Rama Sridatta; Purushothaman, Kathiresan; Saju, Jolly M.; Jiang, Junhui; Mbandi, Stanley Kimbung; Jonas, Mario; Hin Yan Tong, Amy; Mwangi, Sarah; Lau, Doreen; Ngoh, Si Yan; Liew, Woei Chang; Shen, Xueyan; Hon, Lawrence S.; Drake, James P.; Boitano, Matthew; Hall, Richard; Chin, Chen-Shan; Lachumanan, Ramkumar; Korlach, Jonas; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Green, Darrell; Moxon, Simon; Garvin, Tyler; Sedlazeck, Fritz J.; Vurture, Gregory W.; Gopalapillai, Gopikrishna; Kumar Katneni, Vinaya; Noble, Tansyn H.; Scaria, Vinod; Sivasubbu, Sridhar; Jerry, Dean R.; O'Brien, Stephen J.; Schatz, Michael C.; Dalmay, Tamás; Turner, Stephen W.; Lok, Si; Christoffels, Alan; Orbán, László

2016-01-01

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics. PMID:27082250
Dramatic improvement in genome assembly achieved using doubled-haploid genomes.

PubMed

Zhang, Hong; Tan, Engkong; Suzuki, Yutaka; Hirose, Yusuke; Kinoshita, Shigeharu; Okano, Hideyuki; Kudoh, Jun; Shimizu, Atsushi; Saito, Kazuyoshi; Watabe, Shugo; Asakawa, Shuichi

2014-10-27

Improvement in de novo assembly of large genomes is still to be desired. Here, we improved draft genome sequence quality by employing doubled-haploid individuals. We sequenced wildtype and doubled-haploid Takifugu rubripes genomes, under the same conditions, using the Illumina platform and assembled contigs with SOAPdenovo2. We observed 5.4-fold and 2.6-fold improvement in the sizes of the N50 contig and scaffold of doubled-haploid individuals, respectively, compared to the wildtype, indicating that the use of a doubled-haploid genome aids in accurate genome analysis.
Conserved microstructure of the Brassica B Genome of Brassica nigra in relation to homologous regions of Arabidopsis thaliana, B. rapa and B. oleracea

PubMed Central

2013-01-01

Background The Brassica B genome is known to carry several important traits, yet there has been limited analyses of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb region of A. thaliana that had been well characterised in both the Brassica A and C genomes. Results Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding regions in B. nigra. The target region is duplicated in A. thaliana and six homologous contigs were found in B. nigra resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. Conclusions Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement differentiated the B genome suggesting such events could contribute to the lack of recombination observed between B genome species and those of the closely related A/C lineage. PMID:23586706
High resolution physical mapping of single gene fragments on pachytene chromosome 4 and 7 of Rosa.

PubMed

Kirov, Ilya V; Van Laere, Katrijn; Khrustaleva, Ludmila I

2015-07-02

Rosaceae is a family containing many economically important fruit and ornamental species. Although fluorescence in situ hybridization (FISH)-based physical mapping of plant genomes is a valuable tool for map-based cloning, comparative genomics and evolutionary studies, no studies using high resolution physical mapping have been performed in this family. Previously we proved that physical mapping of single-copy genes as small as 1.1 kb is possible on mitotic metaphase chromosomes of Rosa wichurana using Tyramide-FISH. In this study we aimed to further improve the physical map of Rosa wichurana by applying high resolution FISH to pachytene chromosomes. Using high resolution Tyramide-FISH and multicolor Tyramide-FISH, 7 genes (1.7-3 kb) were successfully mapped on pachytene chromosomes 4 and 7 of Rosa wichurana. Additionally, by using multicolor Tyramide-FISH three closely located genes were simultaneously visualized on chromosome 7. A detailed map of heterochromatine/euchromatine patterns of chromosome 4 and 7 was developed with indication of the physical position of these 7 genes. Comparison of the gene order between Rosa wichurana and Fragaria vesca revealed a poor collinearity for chromosome 7, but a perfect collinearity for chromosome 4. High resolution physical mapping of short probes on pachytene chromosomes of Rosa wichurana was successfully performed for the first time. Application of Tyramide-FISH on pachytene chromosomes allowed the mapping resolution to be increased up to 20 times compared to mitotic metaphase chromosomes. High resolution Tyramide-FISH and multicolor Tyramide-FISH might become useful tools for further physical mapping of single-copy genes and for the integration of physical and genetic maps of Rosa wichurana and other members of the Rosaceae.
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.

PubMed

Dayarian, Adel; Michael, Todd P; Sengupta, Anirvan M

2010-06-24

High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures

PubMed Central

Pride, David T; Schoenfeld, Thomas

2008-01-01

Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis. PMID:18798991
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.

PubMed

Pride, David T; Schoenfeld, Thomas

2008-09-17

Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
ESTree db: a Tool for Peach Functional Genomics

PubMed Central

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-01-01

Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742
Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

PubMed

Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

2017-09-04

Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.
ESTree db: a tool for peach functional genomics.

PubMed

Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo

2005-12-01

The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
Comparison of peanut gentics and physical maps provided insights on collinearity, reversions and translocations

USDA-ARS?s Scientific Manuscript database

Genetic and physical maps are the valuable resources for peanut research community in understanding genome organization and serving as the basis for map-based cloning and marker-assisted selection. Physical maps of two diploid wild peanut progenitor species, Arachis duranensis (A genome) and A. ipae...
Hierarchical Scaffolding With Bambus

PubMed Central

Pop, Mihai; Kosack, Daniel S.; Salzberg, Steven L.

2004-01-01

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site. PMID:14707177
Hierarchical scaffolding with Bambus.

PubMed

Pop, Mihai; Kosack, Daniel S; Salzberg, Steven L

2004-01-01

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chapman, Jarrod A.; Ho, Isaac; Sunkara, Sirisha

2011-08-18

We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions inmore » the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ~280 bp or ~3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.« less
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans

PubMed Central

Tully, Benjamin J.; Graham, Elaina D.; Heidelberg, John F.

2018-01-01

Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large. PMID:29337314

Pyrosequencing the Midgut Transcriptome of the Banana Weevil Cosmopolites sordidus (Germar) (Coleoptera: Curculionidae) Reveals Multiple Protease-Like Transcripts.

PubMed

Valencia, Arnubio; Wang, Haichuan; Soto, Alberto; Aristizabal, Manuel; Arboleda, Jorge W; Eyun, Seong-Il; Noriega, Daniel D; Siegfried, Blair

2016-01-01

The banana weevil Cosmopolites sordidus is an important and serious insect pest in most banana and plantain-growing areas of the world. In spite of the economic importance of this insect pest very little genomic and transcriptomic information exists for this species. In the present study, we characterized the midgut transcriptome of C. sordidus using massive 454-pyrosequencing. We generated over 590,000 sequencing reads that assembled into 30,840 contigs with more than 400 bp, representing a significant expansion of existing sequences available for this insect pest. Among them, 16,427 contigs contained one or more GO terms. In addition, 15,263 contigs were assigned an EC number. In-depth transcriptome analysis identified genes potentially involved in insecticide resistance, peritrophic membrane biosynthesis, immunity-related function and defense against pathogens, and Bacillus thuringiensis toxins binding proteins as well as multiple enzymes involved with protein digestion. This transcriptome will provide a valuable resource for understanding larval physiology and for identifying novel target sites and management approaches for this important insect pest.
Pyrosequencing the Midgut Transcriptome of the Banana Weevil Cosmopolites sordidus (Germar) (Coleoptera: Curculionidae) Reveals Multiple Protease-Like Transcripts

PubMed Central

Valencia, Arnubio; Wang, Haichuan; Soto, Alberto; Aristizabal, Manuel; Arboleda, Jorge W.; Eyun, Seong-il; Noriega, Daniel D.; Siegfried, Blair

2016-01-01

The banana weevil Cosmopolites sordidus is an important and serious insect pest in most banana and plantain-growing areas of the world. In spite of the economic importance of this insect pest very little genomic and transcriptomic information exists for this species. In the present study, we characterized the midgut transcriptome of C. sordidus using massive 454-pyrosequencing. We generated over 590,000 sequencing reads that assembled into 30,840 contigs with more than 400 bp, representing a significant expansion of existing sequences available for this insect pest. Among them, 16,427 contigs contained one or more GO terms. In addition, 15,263 contigs were assigned an EC number. In-depth transcriptome analysis identified genes potentially involved in insecticide resistance, peritrophic membrane biosynthesis, immunity-related function and defense against pathogens, and Bacillus thuringiensis toxins binding proteins as well as multiple enzymes involved with protein digestion. This transcriptome will provide a valuable resource for understanding larval physiology and for identifying novel target sites and management approaches for this important insect pest. PMID:26949943
extendFromReads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Kelly P.

2013-10-03

This package assists in genome assembly. extendFromReads takes as input a set of Illumina (eg, MiSeq) DNA sequencing reads, a query seed sequence and a direction to extend the seed. The algorithm collects all seed-- ]matching reads (flipping reverse-- ]orientation hits), trims off the seed and additional sequence in the other direction, sorts the remaining sequences alphabetically, and prints them aligned without gaps from the point of seed trimming. This produces a visual display distinguishing the flanks of multi- ]copy seeds. A companion script hitMates.pl collects the mates of seed-- ]hi]ng reads, whose alignment reveals longer extensions from the seed.more » The collect/trim/sort strategy was made iterative and scaled up in the script denovo.pl, for de novo contig assembly. An index is pre-- ]built using indexReads.pl that for each unique 21-- ]mer found in all the reads, records its gfate h of extension (whether extendable, blocked by low coverage, or blocked by branching after a duplicated sequence) and other characteristics. Importantly, denovo.pl records all branchings that follow a branching contig endpoint, providing contig- ]extension information« less
Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A- and B-genome diploid species of peanut

PubMed Central

2012-01-01

Background Cultivated peanut or groundnut (Arachis hypogaea L.) is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). Both the low level of genetic variation within the cultivated gene pool and its polyploid nature limit the utilization of molecular markers to explore genome structure and facilitate genetic improvement. Nevertheless, a wealth of genetic diversity exists in diploid Arachis species (2n = 2x = 20), which represent a valuable gene pool for cultivated peanut improvement. Interspecific populations have been used widely for genetic mapping in diploid species of Arachis. However, an intraspecific mapping strategy was essential to detect chromosomal rearrangements among species that could be obscured by mapping in interspecific populations. To develop intraspecific reference linkage maps and gain insights into karyotypic evolution within the genus, we comparatively mapped the A- and B-genome diploid species using intraspecific F2 populations. Exploring genome organization among diploid peanut species by comparative mapping will enhance our understanding of the cultivated tetraploid peanut genome. Moreover, new sources of molecular markers that are highly transferable between species and developed from expressed genes will be required to construct saturated genetic maps for peanut. Results A total of 2,138 EST-SSR (expressed sequence tag-simple sequence repeat) markers were developed by mining a tetraploid peanut EST assembly including 101,132 unigenes (37,916 contigs and 63,216 singletons) derived from 70,771 long-read (Sanger) and 270,957 short-read (454) sequences. A set of 97 SSR markers were also developed by mining 9,517 genomic survey sequences of Arachis. An SSR-based intraspecific linkage map was constructed using an F2 population derived from a cross between K 9484 (PI 298639) and GKBSPSc 30081 (PI 468327) in the B-genome species A. batizocoi. A high degree of macrosynteny was observed when comparing the homoeologous linkage groups between A (A. duranensis) and B (A. batizocoi) genomes. Comparison of the A- and B-genome genetic linkage maps also showed a total of five inversions and one major reciprocal translocation between two pairs of chromosomes under our current mapping resolution. Conclusions Our findings will contribute to understanding tetraploid peanut genome origin and evolution and eventually promote its genetic improvement. The newly developed EST-SSR markers will enrich current molecular marker resources in peanut. PMID:23140574
High-resolution mapping and sequence analysis of 597 cDNA clones transcribed from the 1 Mb region in human chromosome 4q16.3 containing Huntington disease gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadano, S.; Ishida, Y.; Tomiyasu, H.

1994-09-01

To complete a transcription map of the 1 Mb region in human chromosome 4p16.3 containing the Huntington disease (HD) gene, the isolation of cDNA clones are being performed throughout. Our method relies on a direct screening of the cDNA libraries probed with single copy microclones from 3 YAC clones spanning 1 Mbp of the HD gene region. AC-DNAs were isolated by a preparative pulsed-field gel electrophoresis, amplified by both a single unique primer (SUP)-PCR and a linker ligation PCR, and 6 microclone-DNA libraries were generated. Then, 8,640 microclones from these libraries were independently amplified by PCR, and arrayed onto themore » membranes. 800-900 microclones that were not cross-hybridized with total human and yeast genomic DNA, TAC vector DNA, and ribosomal cDNA on a dot hybridization (putatively carrying single copy sequences) were pooled to make 9 probe pools. A total of {approximately}1.8x10{sup 7} plaques from the human brain cDNA libraries was screened with 9 pool-probes, and then 672 positive cDNA clones were obtained. So far, 597 cDNA clones were defined and arrayed onto a map of the 1 Mbp of the HD gene region by hybridization with HD region-specific cosmid contigs and YAC clones. Further characterization including a DNA sequencing and Northern blot analysis is currently underway.« less
Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array

PubMed Central

2012-01-01

Background A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Results Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the ‘Golden Delicious’ genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the ‘Golden Delicious’ pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. Conclusions We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the ‘Golden Delicious’ reference sequence will assist in the continued improvement of the genome sequence assembly for that variety. PMID:22631220
The Implementation of Physics Problem Solving Strategy Combined with Concept Map in General Physics Course

NASA Astrophysics Data System (ADS)

Hidayati, H.; Ramli, R.

2018-04-01

This paper aims to provide a description of the implementation of Physic Problem Solving strategy combined with concept maps in General Physics learning at Department of Physics, Universitas Negeri Padang. Action research has been conducted in two cycles where each end of the cycle is reflected and improved for the next cycle. Implementation of Physics Problem Solving strategy combined with concept map can increase student activity in solving general physics problem with an average increase of 15% and can improve student learning outcomes from 42,7 in the cycle I become 62,7 in cycle II in general physics at the Universitas Negeri Padang. In the future, the implementation of Physic Problem Solving strategy combined with concept maps will need to be considered in Physics courses.
Genomic anatomy of the Tyrp1 (brown) deletion complex

PubMed Central

Smyth, Ian M.; Wilming, Laurens; Lee, Angela W.; Taylor, Martin S.; Gautier, Phillipe; Barlow, Karen; Wallis, Justine; Martin, Sancha; Glithero, Rebecca; Phillimore, Ben; Pelan, Sarah; Andrew, Rob; Holt, Karen; Taylor, Ruth; McLaren, Stuart; Burton, John; Bailey, Jonathon; Sims, Sarah; Squares, Jan; Plumb, Bob; Joy, Ann; Gibson, Richard; Gilbert, James; Hart, Elizabeth; Laird, Gavin; Loveland, Jane; Mudge, Jonathan; Steward, Charlie; Swarbreck, David; Harrow, Jennifer; North, Philip; Leaves, Nicholas; Greystrong, John; Coppola, Maria; Manjunath, Shilpa; Campbell, Mark; Smith, Mark; Strachan, Gregory; Tofts, Calli; Boal, Esther; Cobley, Victoria; Hunter, Giselle; Kimberley, Christopher; Thomas, Daniel; Cave-Berry, Lee; Weston, Paul; Botcherby, Marc R. M.; White, Sharon; Edgar, Ruth; Cross, Sally H.; Irvani, Marjan; Hummerich, Holger; Simpson, Eleanor H.; Johnson, Dabney; Hunsicker, Patricia R.; Little, Peter F. R.; Hubbard, Tim; Campbell, R. Duncan; Rogers, Jane; Jackson, Ian J.

2006-01-01

Chromosome deletions in the mouse have proven invaluable in the dissection of gene function. The brown deletion complex comprises >28 independent genome rearrangements, which have been used to identify several functional loci on chromosome 4 required for normal embryonic and postnatal development. We have constructed a 172-bacterial artificial chromosome contig that spans this 22-megabase (Mb) interval and have produced a contiguous, finished, and manually annotated sequence from these clones. The deletion complex is strikingly gene-poor, containing only 52 protein-coding genes (of which only 39 are supported by human homologues) and has several further notable genomic features, including several segments of >1 Mb, apparently devoid of a coding sequence. We have used sequence polymorphisms to finely map the deletion breakpoints and identify strong candidate genes for the known phenotypes that map to this region, including three lethal loci (l4Rn1, l4Rn2, and l4Rn3) and the fitness mutant brown-associated fitness (baf). We have also characterized misexpression of the basonuclin homologue, Bnc2, associated with the inversion-mediated coat color mutant white-based brown (Bw). This study provides a molecular insight into the basis of several characterized mouse mutants, which will allow further dissection of this region by targeted or chemical mutagenesis. PMID:16505357
Characterization of the heart transcriptome of the white shark (Carcharodon carcharias)

PubMed Central

2013-01-01

Background The white shark (Carcharodon carcharias) is a globally distributed, apex predator possessing physical, physiological, and behavioral traits that have garnered it significant public attention. In addition to interest in the genetic basis of its form and function, as a representative of the oldest extant jawed vertebrate lineage, white sharks are also of conservation concern due to their small population size and threat from overfishing. Despite this, surprisingly little is known about the biology of white sharks, and genomic resources are unavailable. To address this deficit, we combined Roche-454 and Illumina sequencing technologies to characterize the first transciptome of any tissue for this species. Results From white shark heart cDNA we generated 665,399 Roche 454 reads (median length 387-bp) that were assembled into 141,626 contigs (mean length 503-bp). We also generated 78,566,588 Illumina reads, which we aligned to the 454 contigs producing 105,014 454/Illumina consensus sequences. To these, we added 3,432 non-singleton 454 contigs. By comparing these sequences to the UniProtKB/Swiss-Prot database we were able to annotate 21,019 translated open reading frames (ORFs) of ≥ 20 amino acids. Of these, 19,277 were additionally assigned Gene Ontology (GO) functional annotations. While acknowledging the limitations of our single tissue transcriptome, Fisher tests showed the white shark transcriptome to be significantly enriched for numerous metabolic GO terms compared to the zebra fish and human transcriptomes, with white shark showing more similarity to human than to zebra fish (i.e. fewer terms were significantly different). We also compared the transcriptome to other available elasmobranch sequences, for signatures of positive selection and identified several genes of putative adaptive significance on the white shark lineage. The white shark transcriptome also contained 8,404 microsatellites (dinucleotide, trinucleotide, or tetranucleotide motifs ≥ five perfect repeats). Detailed characterization of these microsatellites showed that ORFs with trinucleotide repeats, were significantly enriched for transcription regulatory roles and that trinucleotide frequency within ORFs was lower than for a wide range of taxonomic groups including other vertebrates. Conclusion The white shark heart transcriptome represents a valuable resource for future elasmobranch functional and comparative genomic studies, as well as for population and other biological studies vital for effective conservation of this globally vulnerable species. PMID:24112713
Characterization of the heart transcriptome of the white shark (Carcharodon carcharias).

PubMed

Richards, Vincent P; Suzuki, Haruo; Stanhope, Michael J; Shivji, Mahmood S

2013-10-11

The white shark (Carcharodon carcharias) is a globally distributed, apex predator possessing physical, physiological, and behavioral traits that have garnered it significant public attention. In addition to interest in the genetic basis of its form and function, as a representative of the oldest extant jawed vertebrate lineage, white sharks are also of conservation concern due to their small population size and threat from overfishing. Despite this, surprisingly little is known about the biology of white sharks, and genomic resources are unavailable. To address this deficit, we combined Roche-454 and Illumina sequencing technologies to characterize the first transciptome of any tissue for this species. From white shark heart cDNA we generated 665,399 Roche 454 reads (median length 387-bp) that were assembled into 141,626 contigs (mean length 503-bp). We also generated 78,566,588 Illumina reads, which we aligned to the 454 contigs producing 105,014 454/Illumina consensus sequences. To these, we added 3,432 non-singleton 454 contigs. By comparing these sequences to the UniProtKB/Swiss-Prot database we were able to annotate 21,019 translated open reading frames (ORFs) of ≥ 20 amino acids. Of these, 19,277 were additionally assigned Gene Ontology (GO) functional annotations. While acknowledging the limitations of our single tissue transcriptome, Fisher tests showed the white shark transcriptome to be significantly enriched for numerous metabolic GO terms compared to the zebra fish and human transcriptomes, with white shark showing more similarity to human than to zebra fish (i.e. fewer terms were significantly different). We also compared the transcriptome to other available elasmobranch sequences, for signatures of positive selection and identified several genes of putative adaptive significance on the white shark lineage. The white shark transcriptome also contained 8,404 microsatellites (dinucleotide, trinucleotide, or tetranucleotide motifs ≥ five perfect repeats). Detailed characterization of these microsatellites showed that ORFs with trinucleotide repeats, were significantly enriched for transcription regulatory roles and that trinucleotide frequency within ORFs was lower than for a wide range of taxonomic groups including other vertebrates. The white shark heart transcriptome represents a valuable resource for future elasmobranch functional and comparative genomic studies, as well as for population and other biological studies vital for effective conservation of this globally vulnerable species.
COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

PubMed

Lu, Yang Young; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu

2017-03-15

The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based on sequence composition and coverage across multiple samples. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is using L 1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT, GroopM, MaxBin and MetaBAT. The software is available at https://github.com/younglululu/COCACOLA . fsun@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing

PubMed Central

Eastman, Alexander W.; Yuan, Ze-Chun

2015-01-01

Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects. PMID:25653642
Assignment of the dystonia-parkinsonism syndrome locus, DYT3, to a small region within a 1.8-Mb YAC contig of Xq13.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haberhausen, G.; Schmitt, I.; Koehler, A.

1995-09-01

A YAC contig was constructed of Xq13.1 in order to sublocalize the X-linked dystonia-parkinsonism (XDP) syndrome locus, DYT3. The contig spans a region of {approximately}1.8 Mb and includes loci DXS453/DXS348/IL2R{gamma}/GJB1/CCG1/DXS559. For the construction of the contig, nine sequence-tagged sites and four short tandem repeat polymorphisms (STRPs) were isolated. The STRPs, designated as 4704 No. 6 (DXS7113), 4704 No. 7 (DXS7114), 67601 (DXS7117), and B4Pst (DXS7119) were assigned to a region flanked by DXS348 proximally and by DXS559 distally. Their order was DXS348/4704 No. 6/4704 No. 7/67601/B4Pst/DXS559. They were applied to the analysis of allelic association and of haplotypes in 47more » not-obviously-related XDP patients and in 105 Filipino male controls. The same haplotype was found at loci 67601 (DXS7117) and B4Pst (DXS7119) in 42 of 47 patients. This percentage of common haplotypes decreased at the adjacent loci. The findings, together with the previous demonstration of DXS559 being the distal flanking marker of DYT3, assign the disease locus to a small region in Xq13.1 defined by loci 67601 (DXS7117) and B4Pst (DXS7119). The location of DYT3 was born out by the application of a newly developed likelihood method for the analysis of linkage disequilibrium. 28 refs., 1 fig., 6 tabs.« less
The human MCP-2 gene (SCYA8): Cloning, sequence analysis, tissue expression, and assignment to the CC chemokine gene contig on chromosome 17q11.2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Coillie, E.; Fiten, P.; Van Damme, J.

1997-03-01

Monocyte chemotactic proteins (MCPs) form a subfamily of chemokines that recruit leukocytes to sites of inflammation and that may contribute to tumor-associated leukocyte infiltration and to the antiviral state against HIV infection. With the use of degenerate primers that were based on CC chemokine consensus sequences, the known MIP-1{alpha}/LD78{alpha}, MCP-1, and MCP-3 genes and the previously unidentified eotaxin and MCP-2 genes were isolated from a YAC contig from human chromosome 17q11.2. The amplified genomic MCP-2 fragment was used to isolate an MCP-2 cosmid from which the gene sequence was determined. The MCP-2 gene shares with the MCP-1 and MCP-3 genesmore » a conserved intron-exon structure and a coding nucleotide sequence homology of 77%. By Northern blot analysis the 1.0-kb MCP-2 mRNA was predominantly detectable in the small intestine, peripheral blood, heart, placenta, lung, skeletal muscle, ovary, colon, spinal cord, pancreas, and thymus. Transcripts of 1.5 and 2.4 kb were found in the testis, the small intestine, and the colon. The isolation of the MCP-2 gene from the chemokine contig localized it on YAC clones of chromosome 17q11.2, which also contain the eotaxin, MCP-1, MCP-3, and NCC-1/MCP-4 genes. The combination of using degenerate primer PCR and YACs illustrates that novel genes can efficiently be isolated from gene cluster contigs with less redundancy and effort than the isolation of novel ESTs. 42 refs., 5 figs., 2 tabs.« less
Developing the anemone Aiptasia as a tractable model for cnidarian-dinoflagellate symbiosis: the transcriptome of aposymbiotic A. pallida.

PubMed

Lehnert, Erik M; Burriesci, Matthew S; Pringle, John R

2012-06-22

Coral reefs are hotspots of oceanic biodiversity, forming the foundation of ecosystems that are important both ecologically and for their direct practical impacts on humans. Corals are declining globally due to a number of stressors, including rising sea-surface temperatures and pollution; such stresses can lead to a breakdown of the essential symbiotic relationship between the coral host and its endosymbiotic dinoflagellates, a process known as coral bleaching. Although the environmental stresses causing this breakdown are largely known, the cellular mechanisms of symbiosis establishment, maintenance, and breakdown are still largely obscure. Investigating the symbiosis using an experimentally tractable model organism, such as the small sea anemone Aiptasia, should improve our understanding of exactly how the environmental stressors affect coral survival and growth. We assembled the transcriptome of a clonal population of adult, aposymbiotic (dinoflagellate-free) Aiptasia pallida from ~208 million reads, yielding 58,018 contigs. We demonstrated that many of these contigs represent full-length or near-full-length transcripts that encode proteins similar to those from a diverse array of pathways in other organisms, including various metabolic enzymes, cytoskeletal proteins, and neuropeptide precursors. The contigs were annotated by sequence similarity, assigned GO terms, and scanned for conserved protein domains. We analyzed the frequency and types of single-nucleotide variants and estimated the size of the Aiptasia genome to be ~421 Mb. The contigs and annotations are available through NCBI (Transcription Shotgun Assembly database, accession numbers JV077153-JV134524) and at http://pringlelab.stanford.edu/projects.html. The availability of an extensive transcriptome assembly for A. pallida will facilitate analyses of gene-expression changes, identification of proteins of interest, and other studies in this important emerging model system.
Transcriptome analysis reveals the time of the fourth round of genome duplication in common carp (Cyprinus carpio)

PubMed Central

2012-01-01

Background Common carp (Cyprinus carpio) is thought to have undergone one extra round of genome duplication compared to zebrafish. Transcriptome analysis has been used to study the existence and timing of genome duplication in species for which genome sequences are incomplete. Large-scale transcriptome data for the common carp genome should help reveal the timing of the additional duplication event. Results We have sequenced the transcriptome of common carp using 454 pyrosequencing. After assembling the 454 contigs and the published common carp sequences together, we obtained 49,669 contigs and identified genes using homology searches and an ab initio method. We identified 4,651 orthologous pairs between common carp and zebrafish and found 129,984 paralogous pairs within the common carp. An estimation of the synonymous substitution rate in the orthologous pairs indicated that common carp and zebrafish diverged 120 million years ago (MYA). We identified one round of genome duplication in common carp and estimated that it had occurred 5.6 to 11.3 MYA. In zebrafish, no genome duplication event after speciation was observed, suggesting that, compared to zebrafish, common carp had undergone an additional genome duplication event. We annotated the common carp contigs with Gene Ontology terms and KEGG pathways. Compared with zebrafish gene annotations, we found that a set of biological processes and pathways were enriched in common carp. Conclusions The assembled contigs helped us to estimate the time of the fourth-round of genome duplication in common carp. The resource that we have built as part of this study will help advance functional genomics and genome annotation studies in the future. PMID:22424280
Optimizing de novo transcriptome assembly and extending genomic resources for striped catfish (Pangasianodon hypophthalmus).

PubMed

Thanh, Nguyen Minh; Jung, Hyungtaek; Lyons, Russell E; Njaci, Isaac; Yoon, Byoung-Ha; Chand, Vincent; Tuan, Nguyen Viet; Thu, Vo Thi Minh; Mather, Peter

2015-10-01

Striped catfish (Pangasianodon hypophthalmus) is a commercially important freshwater fish used in inland aquaculture in the Mekong Delta, Vietnam. The culture industry is facing a significant challenge however from saltwater intrusion into many low topographical coastal provinces across the Mekong Delta as a result of predicted climate change impacts. Developing genomic resources for this species can facilitate the production of improved culture lines that can withstand raised salinity conditions, and so we have applied high-throughput Ion Torrent sequencing of transcriptome libraries from six target osmoregulatory organs from striped catfish as a genomic resource for use in future selection strategies. We obtained 12,177,770 reads after trimming and processing with an average length of 97bp. De novo assemblies were generated using CLC Genomic Workbench, Trinity and Velvet/Oases with the best overall contig performance resulting from the CLC assembly. De novo assembly using CLC yielded 66,451 contigs with an average length of 478bp and N50 length of 506bp. A total of 37,969 contigs (57%) possessed significant similarity with proteins in the non-redundant database. Comparative analyses revealed that a significant number of contigs matched sequences reported in other teleost fishes, ranging in similarity from 45.2% with Atlantic cod to 52% with zebrafish. In addition, 28,879 simple sequence repeats (SSRs) and 55,721 single nucleotide polymorphisms (SNPs) were detected in the striped catfish transcriptome. The sequence collection generated in the current study represents the most comprehensive genomic resource for P. hypophthalmus available to date. Our results illustrate the utility of next-generation sequencing as an efficient tool for constructing a large genomic database for marker development in non-model species. Copyright © 2015 Elsevier B.V. All rights reserved.
Exploring physics concepts among novice teachers through CMAP tools

NASA Astrophysics Data System (ADS)

Suprapto, N.; Suliyanah; Prahani, B. K.; Jauhariyah, M. N. R.; Admoko, S.

2018-03-01

Concept maps are graphical tools for organising, elaborating and representing knowledge. Through Cmap tools software, it can be explored the understanding and the hierarchical structuring of physics concepts among novice teachers. The software helps physics teachers indicated a physics context, focus questions, parking lots, cross-links, branching, hierarchy, and propositions. By using an exploratory quantitative study, a total 13-concept maps with different physics topics created by novice physics teachers were analysed. The main differences of scoring between lecturer and peer-teachers’ scoring were also illustrated. The study offered some implications, especially for physics educators to determine the hierarchical structure of the physics concepts, to construct a physics focus question, and to see how a concept in one domain of knowledge represented on the map is related to a concept in another domain shown on the map.
Localization of Allotetraploid Gossypium SNPs Using Physical Mapping Resources

USDA-ARS?s Scientific Manuscript database

Recent efforts in Gossypium SNP development have produced thousands of putative SNPs for G. barbadense, G. mustelinum, and G. tomentosum relative to G. hirsutum. Here we report on current efforts to localize putative SNPs using physical mapping resources. Recent advances in physical mapping resour...
The genome of the Antarctic-endemic copepod, Tigriopus kingsejongensis.

PubMed

Kang, Seunghyun; Ahn, Do-Hwan; Lee, Jun Hyuck; Lee, Sung Gu; Shin, Seung Chul; Lee, Jungeun; Min, Gi-Sik; Lee, Hyoungseok; Kim, Hyun-Woo; Kim, Sanghee; Park, Hyun

2017-01-01

The Antarctic intertidal zone is continuously subjected to extremely fluctuating biotic and abiotic stressors. The West Antarctic Peninsula is the most rapidly warming region on Earth. Organisms living in Antarctic intertidal pools are therefore interesting for research into evolutionary adaptation to extreme environments and the effects of climate change. We report the whole genome sequence of the Antarctic-endemic harpacticoid copepod Tigriopus kingsejongensi . The 37 Gb raw DNA sequence was generated using the Illumina Miseq platform. Libraries were prepared with 65-fold coverage and a total length of 295 Mb. The final assembly consists of 48 368 contigs with an N50 contig length of 17.5 kb, and 27 823 scaffolds with an N50 contig length of 159.2 kb. A total of 12 772 coding genes were inferred using the MAKER annotation pipeline. Comparative genome analysis revealed that T. kingsejongensis -specific genes are enriched in transport and metabolism processes. Furthermore, rapidly evolving genes related to energy metabolism showed positive selection signatures. The T. kingsejongensis genome provides an interesting example of an evolutionary strategy for Antarctic cold adaptation, and offers new genetic insights into Antarctic intertidal biota. © The Author 2017. Published by Oxford University Press.

The genome of the Antarctic-endemic copepod, Tigriopus kingsejongensis

PubMed Central

Kang, Seunghyun; Ahn, Do-Hwan; Lee, Jun Hyuck; Lee, Sung Gu; Shin, Seung Chul; Lee, Jungeun; Min, Gi-Sik; Lee, Hyoungseok

2017-01-01

Abstract Background: The Antarctic intertidal zone is continuously subjected to extremely fluctuating biotic and abiotic stressors. The West Antarctic Peninsula is the most rapidly warming region on Earth. Organisms living in Antarctic intertidal pools are therefore interesting for research into evolutionary adaptation to extreme environments and the effects of climate change. Findings: We report the whole genome sequence of the Antarctic-endemic harpacticoid copepod Tigriopus kingsejongensi. The 37 Gb raw DNA sequence was generated using the Illumina Miseq platform. Libraries were prepared with 65-fold coverage and a total length of 295 Mb. The final assembly consists of 48 368 contigs with an N50 contig length of 17.5 kb, and 27 823 scaffolds with an N50 contig length of 159.2 kb. A total of 12 772 coding genes were inferred using the MAKER annotation pipeline. Comparative genome analysis revealed that T. kingsejongensis-specific genes are enriched in transport and metabolism processes. Furthermore, rapidly evolving genes related to energy metabolism showed positive selection signatures. Conclusions: The T. kingsejongensis genome provides an interesting example of an evolutionary strategy for Antarctic cold adaptation, and offers new genetic insights into Antarctic intertidal biota. PMID:28369352
Novel genomic resources for a climate change sensitive mammal: characterization of the American pika transcriptome.

PubMed

Lemay, Matthew A; Henry, Philippe; Lamb, Clayton T; Robson, Kelsey M; Russello, Michael A

2013-05-10

When faced with climate change, species must either shift their home range or adapt in situ in order to maintain optimal physiological balance with their environment. The American pika (Ochotona princeps) is a small alpine mammal with limited dispersal capacity and low tolerance for thermal stress. As a result, pikas have become an important system for examining biotic responses to changing climatic conditions. Previous research using amplified fragment length polymorphisms (AFLPs) has revealed evidence for environmental-mediated selection in O. princeps populations distributed along elevation gradients, yet the anonymity of AFLP loci and lack of available genomic resources precluded the identification of associated gene regions. Here, we harnessed next-generation sequencing technology in order to characterize the American pika transcriptome and identify a large suite of single nucleotide polymorphisms (SNPs), which can be used to elucidate elevation- and site-specific patterns of sequence variation. We constructed pooled cDNA libraries of O. princeps from high (1400 m) and low (300 m) elevation sites along a previously established transect in British Columbia. Transcriptome sequencing using the Roche 454 GS FLX titanium platform generated 780 million base pairs of data, which were assembled into 7,325 high coverage contigs. These contigs were used to identify 24,261 novel SNP loci. Using high resolution melt analysis, we developed 17 of these SNPs into genotyping assays, which were validated with independent DNA samples from British Columbia Canada and Oregon State USA. In addition, we detected haplotypes in the NADH dehydrogenase subunit 5 of the mitochondrial genome that were fixed and different among elevations, suggesting that this may be an informative target gene for studying the role of cellular respiration in local adaptation. We also identified contigs that were unique to each elevation, including a high elevation-specific contig that was a positive match with the hemoglobin alpha chain from the plateau pika, a species restricted to high elevation steppes in Asia. Elevation-specific contigs may represent candidate regions subject to differential levels of gene expression along this elevation gradient. To our knowledge, this is the first broad-scale, transcriptome-level study conducted within the Ochotonidae, providing novel genomic resources for studying pika ecology, behaviour and population history.
Learning about a Level Physics Students' Understandings of Particle Physics Using Concept Mapping

ERIC Educational Resources Information Center

Gourlay, H.

2017-01-01

This paper describes a small-scale piece of research using concept mapping to elicit A level students' understandings of particle physics. Fifty-nine year 12 (16- and 17 year-old) students from two London schools participated. The exercise took place during school physics lessons. Students were instructed how to make a concept map and were…
Developing and Validating an Abbreviated Version of the Microscale Audit for Pedestrian Streetscapes (MAPS-Abbreviated).

PubMed

Cain, Kelli L; Gavand, Kavita A; Conway, Terry L; Geremia, Carrie M; Millstein, Rachel A; Frank, Lawrence D; Saelens, Brian E; Adams, Marc A; Glanz, Karen; King, Abby C; Sallis, James F

2017-06-01

Macroscale built environment factors (e.g., street connectivity) are correlated with physical activity. Less-studied but more modifiable microscale elements (e.g., sidewalks) may also influence physical activity, but shorter audit measures of microscale elements are needed to promote wider use. This study evaluated the relation of an abbreviated 54-item streetscape audit tool with multiple measures of physical activity in four age groups. We developed a 54-item version from the original 120-item Microscale Audit of Pedestrian Streetscapes (MAPS). Audits were conducted on 0.25-0.45 mile routes from participant residences toward the nearest nonresidential destination for children (N=758), adolescents (N=897), younger adults (N=1,655), and older adults (N=367). Active transport and leisure physical activity were measured with surveys, and objective physical activity was measured with accelerometers. Items to retain from original MAPS were selected primarily by correlations with physical activity. Mixed linear regression analyses were conducted for MAPS-Abbreviated summary scores, adjusting for demographics, participant clustering, and macroscale walkability. MAPS-Abbreviated and original MAPS total scores correlated r=.94 The MAPS-Abbreviated tool was related similarly to physical activity outcomes as the original MAPS. Destinations and land use, streetscape and walking path characteristics, and overall total scores were significantly related to active transport in all age groups. Street crossing characteristics were related to active transport in children and older adults. Aesthetics and social characteristics were related to leisure physical activity in children and younger adults, and cul-de-sacs were related with physical activity in youth. Total scores were related to accelerometer-measured physical activity in children and older adults. MAPS-Abbreviated is a validated observational measure for use in research. The length and related cost of implementation has been cited as a barrier to use of microscale instruments, so availability of this shorter validated measure could lead to more widespread use of streetscape audits in health research.
Developing and Validating an Abbreviated Version of the Microscale Audit for Pedestrian Streetscapes (MAPS-Abbreviated)

PubMed Central

Cain, Kelli L.; Gavand, Kavita A.; Conway, Terry L.; Geremia, Carrie M.; Millstein, Rachel A.; Frank, Lawrence D.; Saelens, Brian E.; Adams, Marc A.; Glanz, Karen; King, Abby C.; Sallis, James F.

2017-01-01

Purpose Macroscale built environment factors (e.g., street connectivity) are correlated with physical activity. Less-studied but more modifiable microscale elements (e.g., sidewalks) may also influence physical activity, but shorter audit measures of microscale elements are needed to promote wider use. This study evaluated the relation of an abbreviated 54-item streetscape audit tool with multiple measures of physical activity in four age groups. Methods We developed a 54-item version from the original 120-item Microscale Audit of Pedestrian Streetscapes (MAPS). Audits were conducted on 0.25-0.45 mile routes from participant residences toward the nearest nonresidential destination for children (N=758), adolescents (N=897), younger adults (N=1,655), and older adults (N=367). Active transport and leisure physical activity were measured with surveys, and objective physical activity was measured with accelerometers. Items to retain from original MAPS were selected primarily by correlations with physical activity. Mixed linear regression analyses were conducted for MAPS-Abbreviated summary scores, adjusting for demographics, participant clustering, and macroscale walkability. Results MAPS-Abbreviated and original MAPS total scores correlated r=.94 The MAPS-Abbreviated tool was related similarly to physical activity outcomes as the original MAPS. Destinations and land use, streetscape and walking path characteristics, and overall total scores were significantly related to active transport in all age groups. Street crossing characteristics were related to active transport in children and older adults. Aesthetics and social characteristics were related to leisure physical activity in children and younger adults, and cul-de-sacs were related with physical activity in youth. Total scores were related to accelerometer-measured physical activity in children and older adults. Conclusion MAPS-Abbreviated is a validated observational measure for use in research. The length and related cost of implementation has been cited as a barrier to use of microscale instruments, so availability of this shorter validated measure could lead to more widespread use of streetscape audits in health research. PMID:29270361
Physical Webbing: Collaborative Kinesthetic Three-Dimensional Mind Maps[R

ERIC Educational Resources Information Center

Williams, Marian H.

2012-01-01

Mind Mapping has predominantly been used by individuals or collaboratively in groups as a paper-based or computer-generated learning strategy. In an effort to make Mind Mapping kinesthetic, collaborative, and three-dimensional, an innovative pedagogical strategy, termed Physical Webbing, was devised. In the Physical Web activity, groups…
Comparative physical mapping between wheat chromosome arm 2BL and rice chromosome 4.

PubMed

Lee, Tong Geon; Lee, Yong Jin; Kim, Dae Yeon; Seo, Yong Weon

2010-12-01

Physical maps of chromosomes provide a framework for organizing and integrating diverse genetic information. DNA microarrays are a valuable technique for physical mapping and can also be used to facilitate the discovery of single feature polymorphisms (SFPs). Wheat chromosome arm 2BL was physically mapped using a Wheat Genome Array onto near-isogenic lines (NILs) with the aid of wheat-rice synteny and mapped wheat EST information. Using high variance probe set (HVP) analysis, 314 HVPs constituting genes present on 2BL were identified. The 314 HVPs were grouped into 3 categories: HVPs that match only rice chromosome 4 (298 HVPs), those that match only wheat ESTs mapped on 2BL (1), and those that match both rice chromosome 4 and wheat ESTs mapped on 2BL (15). All HVPs were converted into gene sets, which represented either unique rice gene models or mapped wheat ESTs that matched identified HVPs. Comparative physical maps were constructed for 16 wheat gene sets and 271 rice gene sets. Of the 271 rice gene sets, 257 were mapped to the 18-35 Mb regions on rice chromosome 4. Based on HVP analysis and sequence similarity between the gene models in the rice chromosomes and mapped wheat ESTs, the outermost rice gene model that limits the translocation breakpoint to orthologous regions was identified.
A high density physical map of chromosome 1BL supports evolutionary studies, map-based cloning and sequencing in wheat

PubMed Central

2013-01-01

Background As for other major crops, achieving a complete wheat genome sequence is essential for the application of genomics to breeding new and improved varieties. To overcome the complexities of the large, highly repetitive and hexaploid wheat genome, the International Wheat Genome Sequencing Consortium established a chromosome-based strategy that was validated by the construction of the physical map of chromosome 3B. Here, we present improved strategies for the construction of highly integrated and ordered wheat physical maps, using chromosome 1BL as a template, and illustrate their potential for evolutionary studies and map-based cloning. Results Using a combination of novel high throughput marker assays and an assembly program, we developed a high quality physical map representing 93% of wheat chromosome 1BL, anchored and ordered with 5,489 markers including 1,161 genes. Analysis of the gene space organization and evolution revealed that gene distribution and conservation along the chromosome results from the superimposition of the ancestral grass and recent wheat evolutionary patterns, leading to a peak of synteny in the central part of the chromosome arm and an increased density of non-collinear genes towards the telomere. With a density of about 11 markers per Mb, the 1BL physical map provides 916 markers, including 193 genes, for fine mapping the 40 QTLs mapped on this chromosome. Conclusions Here, we demonstrate that high marker density physical maps can be developed in complex genomes such as wheat to accelerate map-based cloning, gain new insights into genome evolution, and provide a foundation for reference sequencing. PMID:23800011
Comparative genomic analysis of the Haloferax volcanii DS2 and Halobacterium salinarium GRB contig maps reveals extensive rearrangement.

PubMed Central

St Jean, A; Charlebois, R L

1996-01-01

Anonymous probes from the genome of Halobacterium salinarium GRB and 12 gene probes were hybridized to the cosmid clones representing the chromosome and plasmids of Halobacterium salinarium GRB and Haloferax volcanii DS2. The order of and pairwise distances between 35 loci uniquely cross-hybridizing to both chromosomes were analyzed in a search for conservation. No conservation between the genomes could be detected at the 15-kbp resolution used in this study. We found distinct sets of low-copy-number repeated sequences in the chromosome and plasmids of Halobacterium salinarium GRB, indicating some degree of partitioning between these replicons. We propose alternative courses for the evolution of the haloarchaeal genome: (i) that the majority of genomic differences that exist between genera came about at the inception of this group or (ii) that the differences have accumulated over the lifetime of the lineage. The strengths and limitations of investigating these models through comparative genomic studies are discussed. PMID:8682791
Integrating genome assemblies with MAIA

PubMed Central

Nijkamp, Jurgen; Winterbach, Wynand; van den Broek, Marcel; Daran, Jean-Marc; Reinders, Marcel; de Ridder, Dick

2010-01-01

Motivation: De novo assembly of a eukaryotic genome with next-generation sequencing data is still a challenging task. Over the past few years several assemblers have been developed, often suitable for one specific type of sequencing data. The number of known genomes is expanding rapidly, therefore it becomes possible to use multiple reference genomes for assembly projects. We introduce an assembly integrator that makes use of all available data, i.e. multiple de novo assemblies and mappings against multiple related genomes, by optimizing a weighted combination of criteria. Results: The developed algorithm was applied on the de novo sequencing of the Saccharomyces cerevisiae CEN.PK 113-7D strain. Using Solexa and 454 read data, two de novo and three comparative assemblies were constructed and subsequently integrated, yielding 29 contigs, covering more than 12 Mbp; a drastic improvement compared with the single assemblies. Availability: MAIA is available as a Matlab package and can be downloaded from http://bioinformatics.tudelft.nl Contact: j.f.nijkamp@tudelft.nl PMID:20823304
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.

PubMed

Bickhart, Derek M; Rosen, Benjamin D; Koren, Sergey; Sayre, Brian L; Hastie, Alex R; Chan, Saki; Lee, Joyce; Lam, Ernest T; Liachko, Ivan; Sullivan, Shawn T; Burton, Joshua N; Huson, Heather J; Nystrom, John C; Kelley, Christy M; Hutchison, Jana L; Zhou, Yang; Sun, Jiajie; Crisà, Alessandra; Ponce de León, F Abel; Schwartz, John C; Hammond, John A; Waldbieser, Geoffrey C; Schroeder, Steven G; Liu, George E; Dunham, Maitreya J; Shendure, Jay; Sonstegard, Tad S; Phillippy, Adam M; Van Tassell, Curtis P; Smith, Timothy P L

2017-04-01

The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus) based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ∼400-fold improvement in continuity due to properly assembled gaps, compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex yet produced for an individual of a ruminant species.
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome

PubMed Central

Bickhart, Derek M.; Rosen, Benjamin D.; Koren, Sergey; Sayre, Brian L.; Hastie, Alex R.; Chan, Saki; Lee, Joyce; Lam, Ernest T.; Liachko, Ivan; Sullivan, Shawn T.; Burton, Joshua N.; Huson, Heather J.; Nystrom, John C.; Kelley, Christy M.; Hutchison, Jana L.; Zhou, Yang; Sun, Jiajie; Crisà, Alessandra; de León, F. Abel Ponce; Schwartz, John C.; Hammond, John A.; Waldbieser, Geoffrey C.; Schroeder, Steven G.; Liu, George E.; Dunham, Maitreya J.; Shendure, Jay; Sonstegard, Tad S.; Phillippy, Adam M.; Van Tassell, Curtis P.; Smith, Timothy P.L.

2018-01-01

The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus), based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ~400-fold improvement in continuity due to properly assembled gaps compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex ever produced for an individual of a ruminant species. PMID:28263316
Estrogen alters the profile of the transcriptome in river snail Bellamya aeruginosa.

PubMed

Lei, Kun; Liu, Ruizhi; An, Li-Hui; Luo, Ying-Feng; LeBlanc, Gerald A

2015-03-01

We evaluated the transcriptome dynamics of the freshwater river snail Bellamya aeruginosa exposed to 17β-estradiol (E2) using the Roche/454 GS-FLX platform. In total, 41,869 unigenes, with an average length of 586 bp, representing 36,181 contigs and 5,688 singlets were obtained. Among them, 18.08, 36.85, and 25.47 % matched sequences in the GenBank non-redundant nucleic acid database, non-redundant protein database, and Swiss protein database, respectively. Annotation of the unigenes with gene ontology, and then mapping them to biological pathways, revealed large groups of genes related to growth, development, reproduction, signal transduction, and defense mechanisms. Significant differences were found in gene expression in both liver and testicular tissues between control and E2-exposed organisms. These changes in gene expression will help in understanding the molecular mechanisms of the response to physiological stress in the river snail exposed to estrogen, and will facilitate research into biological processes and underlying physiological adaptations to xenoestrogen exposure in gastropods.
HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

PubMed Central

Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

2016-01-01

Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175
Safety analysis of a Russian phage cocktail: from metagenomic analysis to oral application in healthy human subjects.

PubMed

McCallin, Shawna; Alam Sarker, Shafiqul; Barretto, Caroline; Sultana, Shamima; Berger, Bernard; Huq, Sayeda; Krause, Lutz; Bibiloni, Rodrigo; Schmitt, Bertrand; Reuteler, Gloria; Brüssow, Harald

2013-09-01

Phage therapy has a long tradition in Eastern Europe, where preparations are comprised of complex phage cocktails whose compositions have not been described. We investigated the composition of a phage cocktail from the Russian pharmaceutical company Microgen targeting Escherichia coli/Proteus infections. Electron microscopy identified six phage types, with numerically T7-like phages dominating over T4-like phages. A metagenomic approach using taxonomical classification, reference mapping and de novo assembly identified 18 distinct phage types, including 7 genera of Podoviridae, 2 established and 2 proposed genera of Myoviridae, and 2 genera of Siphoviridae. De novo assembly yielded 7 contigs greater than 30 kb, including a 147-kb Myovirus genome and a 42-kb genome of a potentially new phage. Bioinformatic analysis did not reveal undesired genes and a small human volunteer trial did not associate adverse effects with oral phage exposure. Copyright © 2013 Elsevier Inc. All rights reserved.
Predicting protein contact map using evolutionary and physical constraints by integer programming.

PubMed

Wang, Zhiyong; Xu, Jinbo

2013-07-01

Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole-contact map. A couple of recent methods predict contact map by using mutual information, taking into consideration contact correlation and enforcing a sparsity restraint, but these methods demand for a very large number of sequence homologs for the protein under consideration and the resultant contact map may be still physically infeasible. This article presents a novel method PhyCMAP for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming. The evolutionary restraints are much more informative than mutual information, and the physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and, thus, significantly improves prediction accuracy. Experimental results confirm that PhyCMAP outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration. http://raptorx.uchicago.edu.
De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing

PubMed Central

2011-01-01

Background Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. Results From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. Conclusion The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition. PMID:21492485
De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing.

PubMed

Natarajan, Purushothaman; Parani, Madasamy

2011-04-15

Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition.
Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes.

PubMed

Mahmood, Khalid; Højland, Dorte H; Asp, Torben; Kristensen, Michael

2016-01-01

Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification.
MELOGEN: an EST database for melon functional genomics

PubMed Central

Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

2007-01-01

Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes also the basis for an oligo-based microarray for melon that is being used in experiments to further analyse the melon transcriptome. PMID:17767721

An insight into the sialome of the horse fly, Tabanus bromius

PubMed Central

Ribeiro, José M.C.; Kazimirova, Maria; Takac, Peter; Andersen, John F.; Francischetti, Ivo M.B.

2015-01-01

Blood feeding animals face their host's defenses against tissue injury and blood loss while attempting to feed. One adaptation to surmount these barriers involves the evolution of a salivary potion that disarms their host's inflammatory and anti-hemostatic processes. The composition of the peptide moiety of this potion, or sialome (from the Greek sialo=saliva), can be deducted in part by proper interpretation of the blood feeder' sialotranscriptome. In this work we disclose the sialome of the blood feeding adult female Tabanus bromius. Following assembly of over 75 million Illumina reads (101 nt long) 16,683 contigs were obtained from which 4,078 coding sequences were extracted. From these, 320 were assigned as coding for putative secreted proteins. These 320 contigs mapped 85% of the reads. The antigen-5 proteins family was studied in detail, indicating three Tabanus specific clades with and without disintegrin domains, as well as with and without leukotriene binding domains. Defensins were also detailed; a clade of salivary tabanid peptides was found lacking the propeptide domain ending in the KR dipeptide signaling furin cleavage. Novel protein families were also disclosed. Viral transcripts were identified closely matching the Kotonkan virus capsid proteins. Full length Mariner transposases were also identified. A total of 3,043 coding sequences and their protein products were deposited in Genbank. Hyperlinked excel spreadsheets containing the coding sequences and their annotation are available at http://exon.niaid.nih.gov/transcriptome/T_bromius/Tbromius-web.xlsx (hyperlinked excel spreadsheet, 11 MB) and http://exon.niaid.nih.gov/transcriptome/T_bromius/Tbromius-SA.zip (Standalone excel with all local links, 360 MB). These sequences provide for a platform from which further proteomic studies may be designed to identify salivary proteins from T. bromius that are of pharmacological interest or used as immunological markers of host exposure. PMID:26369729
De novo transcriptome assembly analysis of weed Apera spica-venti from seven tissues and growth stages.

PubMed

Babineau, Marielle; Mahmood, Khalid; Mathiassen, Solvejg K; Kudsk, Per; Kristensen, Michael

2017-02-06

Loose silky bentgrass (Apera spica-venti) is an important weed in Europe with a recent increase in herbicide resistance cases. The lack of genetic information about this noxious weed limits its biological understanding such as growth, reproduction, genetic variation, molecular ecology and metabolic herbicide resistance. This study produced a reference transcriptome for A. spica-venti from different tissues (leaf, root, stem) and various growth stages (seed at phenological stages 05, 07, 08, 09). The de novo assembly was performed on individual and combined dataset followed by functional annotations. Individual transcripts and gene families involved in metabolic based herbicide resistance were identified. Eight separate transcriptome assemblies were performed and compared. The combined transcriptome assembly consists of 83,349 contigs with an N50 and average contig length of 762 and 658 bp, respectively. This dataset contains 74,724 transcripts consisting of total 54,846,111 bp. Among them 94% had a homologue to UniProtKB, 73% retrieved a GO mapping, and 50% were functionally annotated. Compared with other grass species, A. spica-venti has 26% proteins in common to Brachypodium distachyon, and 41% to Lolium spp. Glycosyltransferases had the highest number of transcripts in each tissue followed by the cytochrome P450s. The GSTF1 and CYP89A2 transcripts were recovered from the majority of tissues and aligned at a maximum of 66 and 30% to proven herbicide resistant allele from Alopecurus myosuroides and Lolium rigidum, respectively. De novo transcriptome assembly enabled the generation of the first reference transcriptome of A. spica-venti. This can serve as stepping stone for understanding the metabolic herbicide resistance as well as the general biology of this problematic weed. Furthermore, this large-scale sequence data is a valuable scientific resource for comparative transcriptome analysis for Poaceae grasses.
Culture-Independent Identification of Manganese-Oxidizing Genes from Deep-Sea Hydrothermal Vent Chemoautotrophic Ferromanganese Microbial Communities Using a Metagenomic Approach

NASA Astrophysics Data System (ADS)

Davis, R.; Tebo, B. M.

2013-12-01

Microbial activity has long been recognized as being important to the fate of manganese (Mn) in hydrothermal systems, yet we know very little about the organisms that catalyze Mn oxidation, the mechanisms by which Mn is oxidized or the physiological function that Mn oxidation serves in these hydrothermal systems. Hydrothermal vents with thick ferromanganese microbial mats and Mn oxide-coated rocks observed throughout the Pacific Ring of Fire are ideal models to study the mechanisms of microbial Mn oxidation, as well as primary productivity in these metal-cycling ecosystems. We sampled ferromanganese microbial mats from Vai Lili Vent Field (Tmax=43°C) located on the Eastern Lau Spreading Center and Mn oxide-encrusted rhyolytic pumice (4°C) from Niua South Seamount on the Tonga Volcanic Arc. Metagenomic libraries were constructed and assembled from these samples and key genes known to be involved in Mn oxidation and carbon fixation pathways were identified in the reconstructed genomes. The Vai Lili metagenome assembled to form 121,157 contiguous sequences (contigs) greater than 1000bp in length, with an N50 of 8,261bp and a total metagenome size of 593 Mbp. Contigs were binned using an emergent self-organizing map of tetranucleotide frequencies. Putative homologs of the multicopper Mn-oxidase MnxG were found in the metagenome that were related to both the Pseudomonas-like and Bacillus-like forms of the enzyme. The bins containing the Pseudomonas-like mnxG genes are most closely related to uncultured Deltaproteobacteria and Chloroflexi. The Deltaproteobacteria bin appears to be an obligate anaerobe with possible chemoautotrophic metabolisms, while the Chloroflexi appears to be a heterotrophic organism. The metagenome from the Mn-stained pumice was assembled into 122,092 contigs greater than 1000bp in length with an N50 of 7635 and a metagenome size of 385 Mbp. Both forms of mnxG genes are present in this metagenome as well as the genes encoding the putative Mn oxidases McoA and MopA. The greater diversity of Mn oxidase pathways in this metagenome suggests a more diverse Mn oxidizing microbial community in the cold pumice sample. Key enzymes for four of the six known carbon fixation pathways (the Calvin Cycle, the reductive TCA cycle, the Wood-Ljungdahl pathway, and the 3-hydroxypropionate/4-hydroxybutyrate Cycle) were also identified in both samples indicating primary production occurs via a diverse community of carbon fixing organisms. Together, these samples contain active, diverse populations of Mn oxidizing bacteria living in association with microbial communities supported by chemoautotrophic carbon fixation.
A new leaf rust resistance gene Lr79 mapped in chromosome 3BL from the durum wheat landrace Aus26582.

PubMed

Qureshi, Naeela; Bariana, Harbans; Kumran, Vikas Venu; Muruga, Sivasamy; Forrest, Kerrie L; Hayden, Mathew J; Bansal, Urmil

2018-05-01

A new leaf rust resistance gene Lr79 has been mapped in the long arm of chromosome 3B and a linked marker was identified for marker-assisted selection. Aus26582, a durum wheat landrace from the A. E. Watkins Collection, showed seedling resistance against durum-specific and common wheat-specific Puccinia triticina (Pt) pathotypes. Genetic analysis using a recombinant inbred line (RIL) population developed from a cross between Aus26582 and the susceptible parent Bansi with Australian Pt pathotype showed digenic inheritance and the underlying loci were temporarily named LrAW2 and LrAW3. LrAW2 was located in chromosome 6BS and this study focused on characterisation of LrAW3 using RILs lacking LrAW2. LrAW3 was incorporated into the DArTseq map of Aus26582/Bansi and was located in chromosome 3BL. Markers linked with LrAW3 were developed from the chromosome survey sequence contig 3B_10474240 in which closely-linked DArTseq markers 1128708 and 3948563 were located. Although bulk segregant analysis (BSA) with the 90 K Infinium array identified 51 SNPs associated with LrAW3, only one SNP-derived KASP marker mapped close to the locus. Deletion bin mapping of LrAW3-linked markers located LrAW3 between bins 3BL11-0.85-0.90 and 3BL7-0.63. Since no other all stage leaf rust resistance gene is located in chromosome 3BL, LrAW3 represented a new locus and was designated Lr79. Marker sun786 mapped 1.8 cM distal to Lr79 and Aus26582 was null for this locus. However, the marker can be reliably scored as it also amplifies a monomorphic fragment that serves as an internal control to differentiate the null status of Aus26582 from reaction failure. This marker was validated among a set of durum and common wheat cultivars and was shown to be useful for marker-assisted selection of Lr79 at both ploidy levels.
Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

PubMed Central

2012-01-01

Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding community in marker-assisted breeding, and for performing comparative genomic studies in Rosaceae. PMID:23039990
An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

PubMed

Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

2016-08-10

To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.
Intestinal virome changes precede autoimmunity in type I diabetes-susceptible children

PubMed Central

Vatanen, Tommi; Droit, Lindsay; Kostic, Aleksandar D.; Poon, Tiffany W.; Vlamakis, Hera; Siljander, Heli; Härkönen, Taina; Hämäläinen, Anu-Maaria; Peet, Aleksandr; Tillmann, Vallo; Ilonen, Jorma; Wang, David; Knip, Mikael; Xavier, Ramnik J.

2017-01-01

Viruses have long been considered potential triggers of autoimmune diseases. Here we defined the intestinal virome from birth to the development of autoimmunity in children at risk for type 1 diabetes (T1D). A total of 220 virus-enriched preparations from serially collected fecal samples from 11 children (cases) who developed serum autoantibodies associated with T1D (of whom five developed clinical T1D) were compared with samples from controls. Intestinal viromes of case subjects were less diverse than those of controls. Among eukaryotic viruses, we identified significant enrichment of Circoviridae-related sequences in samples from controls in comparison with cases. Enterovirus, kobuvirus, parechovirus, parvovirus, and rotavirus sequences were frequently detected but were not associated with autoimmunity. For bacteriophages, we found higher Shannon diversity and richness in controls compared with cases and observed that changes in the intestinal virome over time differed between cases and controls. Using Random Forests analysis, we identified disease-associated viral bacteriophage contigs after subtraction of age-associated contigs. These disease-associated contigs were statistically linked to specific components of the bacterial microbiome. Thus, changes in the intestinal virome preceded autoimmunity in this cohort. Specific components of the virome were both directly and inversely associated with the development of human autoimmune disease. PMID:28696303
Investigating the viral ecology of global bee communities with high-throughput metagenomics.

PubMed

Galbraith, David A; Fuller, Zachary L; Ray, Allyson M; Brockmann, Axel; Frazier, Maryann; Gikungu, Mary W; Martinez, J Francisco Iturralde; Kapheim, Karen M; Kerby, Jeffrey T; Kocher, Sarah D; Losyev, Oleksiy; Muli, Elliud; Patch, Harland M; Rosa, Cristina; Sakamoto, Joyce M; Stanley, Scott; Vaudo, Anthony D; Grozinger, Christina M

2018-06-11

Bee viral ecology is a fascinating emerging area of research: viruses exert a range of effects on their hosts, exacerbate impacts of other environmental stressors, and, importantly, are readily shared across multiple bee species in a community. However, our understanding of bee viral communities is limited, as it is primarily derived from studies of North American and European Apis mellifera populations. Here, we examined viruses in populations of A. mellifera and 11 other bee species from 9 countries, across 4 continents and Oceania. We developed a novel pipeline to rapidly and inexpensively screen for bee viruses. This pipeline includes purification of encapsulated RNA/DNA viruses, sequence-independent amplification, high throughput sequencing, integrated assembly of contigs, and filtering to identify contigs specifically corresponding to viral sequences. We identified sequences for (+)ssRNA, (-)ssRNA, dsRNA, and ssDNA viruses. Overall, we found 127 contigs corresponding to novel viruses (i.e. previously not observed in bees), with 27 represented by >0.1% of the reads in a given sample, and 7 contained an RdRp or replicase sequence which could be used for robust phylogenetic analysis. This study provides a sequence-independent pipeline for viral metagenomics analysis, and greatly expands our understanding of the diversity of viruses found in bee communities.
Assembly: a resource for assembled genomes at NCBI

PubMed Central

Kitts, Paul A.; Church, Deanna M.; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G.; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D.; Pruitt, Kim D.; Kimchi, Avi

2016-01-01

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site. PMID:26578580
Next Generation Sequencing Identifies Five Major Classes of Potentially Therapeutic Enzymes Secreted by Lucilia sericata Medical Maggots.

PubMed

Franta, Zdeněk; Vogel, Heiko; Lehmann, Rüdiger; Rupp, Oliver; Goesmann, Alexander; Vilcinskas, Andreas

2016-01-01

Lucilia sericata larvae are used as an alternative treatment for recalcitrant and chronic wounds. Their excretions/secretions contain molecules that facilitate tissue debridement, disinfect, or accelerate wound healing and have therefore been recognized as a potential source of novel therapeutic compounds. Among the substances present in excretions/secretions various peptidase activities promoting the wound healing processes have been detected but the peptidases responsible for these activities remain mostly unidentified. To explore these enzymes we applied next generation sequencing to analyze the transcriptomes of different maggot tissues (salivary glands, gut, and crop) associated with the production of excretions/secretions and/or with digestion as well as the rest of the larval body. As a result we obtained more than 123.8 million paired-end reads, which were assembled de novo using Trinity and Oases assemblers, yielding 41,421 contigs with an N50 contig length of 2.22 kb and a total length of 67.79 Mb. BLASTp analysis against the MEROPS database identified 1729 contigs in 577 clusters encoding five peptidase classes (serine, cysteine, aspartic, threonine, and metallopeptidases), which were assigned to 26 clans, 48 families, and 185 peptidase species. The individual enzymes were differentially expressed among maggot tissues and included peptidase activities related to the therapeutic effects of maggot excretions/secretions.
De novo assembly of the transcriptome of Aegiceras corniculatum, a mangrove species in the Indo-West Pacific region.

PubMed

Fang, Lu; Yang, Yuchen; Guo, Wuxia; Li, Jianfang; Zhong, Cairong; Huang, Yelin; Zhou, Renchao; Shi, Suhua

2016-08-01

Aegiceras corniculatum (L.) Blanco is one of the most salt tolerant mangrove species and can thrive in 3% salinity at the seaward edge of mangrove forests. Here we sequenced the transcriptome of A. corniculatum used Illumina GA platform to develop its genomic resources for ecological and evolutionary studies. We obtained about 50 million high-quality paired-end reads with 75bp in length. Using the short read assembler Velvet, we yielded 49,437 contigs with the average length of 625bp. A total of 32,744 (66.23%) contigs showed significant similarity to the GenBank non-redundant (NR) protein database. 30,911 and 18,004 of these sequences were assigned to Gene Ontology and eukaryotic orthologous groups of proteins (KOG). A total of 4942 transcripts from our assemblies had significant similarity with KEGG Orthologs and were involved in 144 KEGG pathways, while 9899 unigenes had enzyme commission (EC) numbers. In addition, 9792 transcriptome-derived SSRs were identified from 7342 sequences. With our strict criteria, 4165 candidate SNPs were also identified from 2058 contigs. Some of these SNPs were further validated by Sanger sequencing. Genomic resources generated in this study should be valuable in ecological, evolutionary, and functional genomics studies for this mangrove species. Copyright © 2016 Elsevier B.V. All rights reserved.
Preliminary surficial geologic map database of the Amboy 30 x 60 minute quadrangle, California

USGS Publications Warehouse

Bedford, David R.; Miller, David M.; Phelps, Geoffrey A.

2006-01-01

The surficial geologic map database of the Amboy 30x60 minute quadrangle presents characteristics of surficial materials for an area approximately 5,000 km2 in the eastern Mojave Desert of California. This map consists of new surficial mapping conducted between 2000 and 2005, as well as compilations of previous surficial mapping. Surficial geology units are mapped and described based on depositional process and age categories that reflect the mode of deposition, pedogenic effects occurring post-deposition, and, where appropriate, the lithologic nature of the material. The physical properties recorded in the database focus on those that drive hydrologic, biologic, and physical processes such as particle size distribution (PSD) and bulk density. This version of the database is distributed with point data representing locations of samples for both laboratory determined physical properties and semi-quantitative field-based information. Future publications will include the field and laboratory data as well as maps of distributed physical properties across the landscape tied to physical process models where appropriate. The database is distributed in three parts: documentation, spatial map-based data, and printable map graphics of the database. Documentation includes this file, which provides a discussion of the surficial geology and describes the format and content of the map data, a database 'readme' file, which describes the database contents, and FGDC metadata for the spatial map information. Spatial data are distributed as Arc/Info coverage in ESRI interchange (e00) format, or as tabular data in the form of DBF3-file (.DBF) file formats. Map graphics files are distributed as Postscript and Adobe Portable Document Format (PDF) files, and are appropriate for representing a view of the spatial database at the mapped scale.
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

PubMed Central

Decap, Dries; Fostier, Jan; Reumers, Joke

2015-01-01

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406
Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity.

PubMed

Edger, Patrick P; VanBuren, Robert; Colle, Marivi; Poorten, Thomas J; Wai, Ching Man; Niederhuth, Chad E; Alger, Elizabeth I; Ou, Shujun; Acharya, Charlotte B; Wang, Jie; Callow, Pete; McKain, Michael R; Shi, Jinghua; Collier, Chad; Xiong, Zhiyong; Mower, Jeffrey P; Slovin, Janet P; Hytönen, Timo; Jiang, Ning; Childs, Kevin L; Knapp, Steven J

2018-02-01

Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ∼7.9 million base pairs (Mb), representing a ∼300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ∼24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions. © The Authors 2017. Published by Oxford University Press.
Using Playground Maps for Movement

ERIC Educational Resources Information Center

Colvin, A. Vonnie

2016-01-01

Many schools now decorate their outside hard surface areas with maps. These maps provide color and excitement to a playground and are a terrific teaching tool for geography. But these maps can easily be integrated into physical education as well to promote both physical activity as well as knowledge of geography. The purpose of this article is to…
Construction of an ultra-high density consensus genetic map, and enhancement of the physical map from genome sequencing in Lupinus angustifolius.

PubMed

Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan

2018-01-01

An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.
An expressed sequence tag (EST) library for Drosophila serrata, a model system for sexual selection and climatic adaptation studies.

PubMed

Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F

2009-01-21

The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for this model species and facilitate comparative evolutionary studies within the montium subgroup of the D. melanogaster lineage.
Graph mining for next generation sequencing: leveraging the assembly graph for biological insights.

PubMed

Warnke-Sommer, Julia; Ali, Hesham

2016-05-06

The assembly of Next Generation Sequencing (NGS) reads remains a challenging task. This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. The principle objective of current assembly tools is to assemble NGS reads into contiguous stretches of sequence called contigs while maximizing for both accuracy and contig length. The end goal of this process is to produce longer contigs with the major focus being on assembly only. Sequence read assembly is an aggregative process, during which read overlap relationship information is lost as reads are merged into longer sequences or contigs. The assembly graph is information rich and capable of capturing the genomic architecture of an input read data set. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes. Focus was used to create hybrid graphs to model metagenomics data sets obtained from the gut microbiomes of five individuals with Crohn's disease and eight healthy individuals. Repetitive and mobile genetic elements are found to be associated with hybrid graph structure. Using graph mining techniques, a comparative study of the Crohn's disease and healthy data sets was conducted with focus on antibiotics resistance genes associated with transposase genes. Results demonstrated significant differences in the phylogenetic distribution of categories of antibiotics resistance genes in the healthy and diseased patients. Focus was also evaluated as a pure assembly tool and produced excellent results when compared against the Meta-velvet, Omega, and UD-IDBA assemblers. Mining the hybrid graph can reveal biological phenomena captured by its structure. We demonstrate the advantages of considering assembly graphs as data-mining support in addition to their role as frameworks for assembly.
Transcriptome Wide Identification and Validation of Calcium Sensor Gene Family in the Developing Spikes of Finger Millet Genotypes for Elucidating Its Role in Grain Calcium Accumulation

PubMed Central

Singh, Uma M.; Chandra, Muktesh; Shankhdhar, Shailesh C.; Kumar, Anil

2014-01-01

Background In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. Principal Finding In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Conclusion Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species. PMID:25157851
Transcriptome Analysis in Venom Gland of the Predatory Giant Ant Dinoponera quadriceps: Insights into the Polypeptide Toxin Arsenal of Hymenopterans

PubMed Central

Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi

2014-01-01

Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135

Revealing the missing expressed genes beyond the human reference genome by RNA-Seq.

PubMed

Chen, Geng; Li, Ruiyuan; Shi, Leming; Qi, Junyi; Hu, Pengzhan; Luo, Jian; Liu, Mingyao; Shi, Tieliu

2011-12-02

The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.
Transcriptome wide identification and validation of calcium sensor gene family in the developing spikes of finger millet genotypes for elucidating its role in grain calcium accumulation.

PubMed

Singh, Uma M; Chandra, Muktesh; Shankhdhar, Shailesh C; Kumar, Anil

2014-01-01

In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species.
Sequencing of transcriptomes from two Miscanthus species reveals functional specificity in rhizomes, and clarifies evolutionary relationships

PubMed Central

2014-01-01

Background Miscanthus is a promising biomass crop for temperate regions. Despite the increasing interest in this plant, limited sequence information has constrained research into its biology, physiology, and breeding. The whole genome transcriptomes of M. sinensis and M. sacchariflorus presented in this study may provide good resources to understand functional compositions of two important Miscanthus genomes and their evolutionary relationships. Results For M. sinensis, a total of 457,891 and 512,950 expressed sequence tags (ESTs) were produced from leaf and rhizome tissues, respectively, which were assembled into 12,166 contigs and 89,648 singletons for leaf, and 13,170 contigs and 112,138 singletons for rhizome. For M. sacchariflorus, a total of 288,806 and 267,952 ESTs from leaf and rhizome tissues, respectively, were assembled into 8,732 contigs and 66,881 singletons for leaf, and 8,104 contigs and 63,212 singletons for rhizome. Based on the distributions of synonymous nucleotide substitution (Ks), sorghum and Miscanthus diverged about 6.2 million years ago (MYA), Saccharum and Miscanthus diverged 4.6 MYA, and M. sinensis and M. sacchariflorus diverged 1.5 MYA. The pairwise alignment of predicted protein sequences from sorghum-Miscanthus and two Miscanthus species found a total of 43,770 and 35,818 nsSNPs, respectively. The impacts of striking mutations found by nsSNPs were much lower between sorghum and Miscanthus than those between the two Miscanthus species, perhaps as a consequence of the much higher level of gene duplication in Miscanthus and resulting ability to buffer essential functions against disturbance. Conclusions The ESTs generated in the present study represent a significant addition to Miscanthus functional genomics resources, permitting us to discover some candidate genes associated with enhanced biomass production. Ks distributions based on orthologous ESTs may serve as a guideline for future research into the evolution of Miscanthus species as well as its close relatives sorghum and Saccharum. PMID:24884969
Sequencing of transcriptomes from two Miscanthus species reveals functional specificity in rhizomes, and clarifies evolutionary relationships.

PubMed

Kim, Changsoo; Lee, Tae-Ho; Guo, Hui; Chung, Sung Jin; Paterson, Andrew H; Kim, Do-Soon; Lee, Geung-Joo

2014-05-18

Miscanthus is a promising biomass crop for temperate regions. Despite the increasing interest in this plant, limited sequence information has constrained research into its biology, physiology, and breeding. The whole genome transcriptomes of M. sinensis and M. sacchariflorus presented in this study may provide good resources to understand functional compositions of two important Miscanthus genomes and their evolutionary relationships. For M. sinensis, a total of 457,891 and 512,950 expressed sequence tags (ESTs) were produced from leaf and rhizome tissues, respectively, which were assembled into 12,166 contigs and 89,648 singletons for leaf, and 13,170 contigs and 112,138 singletons for rhizome. For M. sacchariflorus, a total of 288,806 and 267,952 ESTs from leaf and rhizome tissues, respectively, were assembled into 8,732 contigs and 66,881 singletons for leaf, and 8,104 contigs and 63,212 singletons for rhizome. Based on the distributions of synonymous nucleotide substitution (Ks), sorghum and Miscanthus diverged about 6.2 million years ago (MYA), Saccharum and Miscanthus diverged 4.6 MYA, and M. sinensis and M. sacchariflorus diverged 1.5 MYA. The pairwise alignment of predicted protein sequences from sorghum-Miscanthus and two Miscanthus species found a total of 43,770 and 35,818 nsSNPs, respectively. The impacts of striking mutations found by nsSNPs were much lower between sorghum and Miscanthus than those between the two Miscanthus species, perhaps as a consequence of the much higher level of gene duplication in Miscanthus and resulting ability to buffer essential functions against disturbance. The ESTs generated in the present study represent a significant addition to Miscanthus functional genomics resources, permitting us to discover some candidate genes associated with enhanced biomass production. Ks distributions based on orthologous ESTs may serve as a guideline for future research into the evolution of Miscanthus species as well as its close relatives sorghum and Saccharum.
Generation and characterization of the sea bass Dicentrarchus labrax brain and liver transcriptomes.

PubMed

Magnanou, Elodie; Klopp, Christophe; Noirot, Celine; Besseau, Laurence; Falcón, Jack

2014-07-01

The sea bass Dicentrarchus labrax is the center of interest of an increasing number of basic or applied research investigations, even though few genomic or transcriptomic data is available. Current public data only represent a very partial view of its transcriptome. To fill this need, we characterized brain and liver transcriptomes in a generalist manner that would benefit the entire scientific community. We also tackled some bioinformatics questions, related to the effect of RNA fragment size on the assembly quality. Using Illumina RNA-seq, we sequenced organ pools from both wild and farmed Atlantic and Mediterranean fishes. We built two distinct cDNA libraries per organ that only differed by the length of the selected mRNA fragments. Efficiency of assemblies performed on either or both fragments size differed depending on the organ, but remained very close reflecting the quality of the technical replication. We generated more than 19,538Mbp of data. Over 193million reads were assembled into 35,073 contigs (average length=2374bp; N50=3257). 59% contigs were annotated with SwissProt, which corresponded to 12,517 unique genes. We compared the Gene Ontology (GO) contig distribution between the sea bass and the tilapia. We also looked for brain and liver GO specific signatures as well as KEGG pathway coverage. 23,050 putative micro-satellites and 134,890 putative SNPs were identified. Our sampling strategy and assembly pipeline provided a reliable and broad reference transcriptome for the sea bass. It constitutes an indisputable quantitative and qualitative improvement of the public data, as it provides 5 times more base pairs with fewer and longer contigs. Both organs present unique signatures consistent with their specific physiological functions. The discrepancy in fragment size effect on assembly quality between organs lies in their difference in complexity and thus does not allow prescribing any general strategy. This information on two key organs will facilitate further functional approaches. Copyright © 2014 Elsevier B.V. All rights reserved.
De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes.

PubMed

Ashrafi, Hamid; Hill, Theresa; Stoffel, Kevin; Kozik, Alexander; Yao, Jiqiang; Chin-Wo, Sebastian Reyes; Van Deynze, Allen

2012-10-30

Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80-120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.

PubMed

Papudeshi, Bhavya; Haggerty, J Matthew; Doane, Michael; Morris, Megan M; Walsh, Kevin; Beattie, Douglas T; Pande, Dnyanada; Zaeri, Parisa; Silva, Genivaldo G Z; Thompson, Fabiano; Edwards, Robert A; Dinsdale, Elizabeth A

2017-11-28

Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.).

PubMed

Raju, Nikku L; Gnanesh, Belaghihalli N; Lekha, Pazhamala; Jayashree, Balaji; Pande, Suresh; Hiremath, Pavana J; Byregowda, Munishamappa; Singh, Nagendra K; Varshney, Rajeev K

2010-03-11

Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (or= 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.
The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.)

PubMed Central

2010-01-01

Background Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). Results A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (≤ 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. Conclusion The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding. PMID:20222972
Comparative transcriptome analysis of Gossypium hirsutum L. in response to sap sucking insects: aphid and whitefly

PubMed Central

2013-01-01

Background Cotton (Gossypium hirsutum L.) is a major fiber crop that is grown worldwide; it faces extensive damage from sap-sucking insects, including aphids and whiteflies. Genome-wide transcriptome analysis was performed to understand the molecular details of interaction between Gossypium hirsutum L. and sap-sucking pests, namely Aphis gossypii (Aphid) and Bemisia tabacci (Whiteflies). Roche’s GS-Titanium was used to sequence transcriptomes of cotton infested with aphids and whiteflies for 2 h and 24 h. Results A total of 100935 contigs were produced with an average length of 529 bp after an assembly in all five selected conditions. The Blastn of the non-redundant (nr) cotton EST database resulted in the identification of 580 novel contigs in the cotton plant. It should be noted that in spite of minimal physical damage caused by the sap-sucking insects, they can change the gene expression of plants in 2 h of infestation; further change in gene expression due to whiteflies is quicker than due to aphids. The impact of the whitefly 24 h after infestation was more or less similar to that of the aphid 2 h after infestation. Aphids and whiteflies affect many genes that are regulated by various phytohormones and in response to microbial infection, indicating the involvement of complex crosstalk between these pathways. The KOBAS analysis of differentially regulated transcripts in response to aphids and whiteflies indicated that both the insects induce the metabolism of amino acids biosynthesis specially in case of whiteflies infestation at later phase. Further we also observed that expression of transcript related to photosynthesis specially carbon fixation were significantly influenced by infestation of Aphids and Whiteflies. Conclusions A comparison of different transcriptomes leads to the identification of differentially and temporally regulated transcripts in response to infestation by aphids and whiteflies. Most of these differentially expressed contigs were related to genes involved in biotic, abiotic stresses and enzymatic activities related to hydrolases, transferases, and kinases. The expression of some marker genes such as the overexpressors of cationic peroxidase 3, lipoxygenase I, TGA2, and non-specific lipase, which are involved in phytohormonal-mediated plant resistance development, was suppressed after infestation by aphids and whiteflies, indicating that insects suppressed plant resistance in order to facilitate their infestation. We also concluded that cotton shares several pathways such as phagosomes, RNA transport, and amino acid metabolism with Arabidopsis in response to the infestation by aphids and whiteflies. PMID:23577705
Katome: de novo DNA assembler implemented in rust

NASA Astrophysics Data System (ADS)

Neumann, Łukasz; Nowak, Robert M.; Kuśmirek, Wiktor

2017-08-01

Katome is a new de novo sequence assembler written in the Rust programming language, designed with respect to future parallelization of the algorithms, run time and memory usage optimization. The application uses new algorithms for the correct assembly of repetitive sequences. Performance and quality tests were performed on various data, comparing the new application to `dnaasm', `ABySS' and `Velvet' genome assemblers. Quality tests indicate that the new assembler creates more contigs than well-established solutions, but the contigs have better quality with regard to mismatches per 100kbp and indels per 100kbp. Additionally, benchmarks indicate that the Rust-based implementation outperforms `dnaasm', `ABySS' and `Velvet' assemblers, written in C++, in terms of assembly time. Lower memory usage in comparison to `dnaasm' is observed.
Is Your Neighborhood Designed to Support Physical Activity? A Brief Streetscape Audit Tool.

PubMed

Sallis, James F; Cain, Kelli L; Conway, Terry L; Gavand, Kavita A; Millstein, Rachel A; Geremia, Carrie M; Frank, Lawrence D; Saelens, Brian E; Glanz, Karen; King, Abby C

2015-09-03

Macro level built environment factors (eg, street connectivity, walkability) are correlated with physical activity. Less studied but more modifiable microscale elements of the environment (eg, crosswalks) may also affect physical activity, but short audit measures of microscale elements are needed to promote wider use. This study evaluated the relation of a 15-item neighborhood environment audit tool with a full version of the tool to assess neighborhood design on physical activity in 4 age groups. From the 120-item Microscale Audit of Pedestrian Streetscapes (MAPS) measure of street design, sidewalks, and street crossings, we developed the 15-item version (MAPS-Mini) on the basis of associations with physical activity and attribute modifiability. As a sample of a likely walking route, MAPS-Mini was conducted on a 0.25-mile route from participant residences toward the nearest nonresidential destination for children (n = 758), adolescents (n = 897), younger adults (n = 1,655), and older adults (n = 367). Active transportation and leisure physical activity were measured with age-appropriate surveys, and accelerometers provided objective physical activity measures. Mixed-model regressions were conducted for each MAPS item and a total environment score, adjusted for demographics, participant clustering, and macrolevel walkability. Total scores of MAPS-Mini and the 120-item MAPS correlated at r = .85. Total microscale environment scores were significantly related to active transportation in all age groups. Items related to active transport in 3 age groups were presence of sidewalks, curb cuts, street lights, benches, and buffer between street and sidewalk. The total score was related to leisure physical activity and accelerometer measures only in children. The MAPS-Mini environment measure is short enough to be practical for use by community groups and planning agencies and is a valid substitute for the full version that is 8 times longer.
The Effects of Integrating Computer-Based Concept Mapping for Physics Learning in Junior High School

ERIC Educational Resources Information Center

Chang, Cheng-Chieh; Yeh, Ting-Kuang; Shih, Chang-Ming

2016-01-01

It generally is accepted that concept mapping has a noticeable impact on learning. But literatures show the use of concept mapping is not benefit all learners. The present study explored the effects of incorporating computer-based concept mapping in physics instruction. A total of 61 9th-grade students participated in this study. By using a…
A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them.

PubMed

Humble, E; Martinez-Barrio, A; Forcada, J; Trathan, P N; Thorne, M A S; Hoffmann, M; Wolf, J B W; Hoffman, J I

2016-07-01

Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41 Gb; scaffold/contig N50 : 3.1 Mb/27.5 kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, reanalysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modelling. © 2015 John Wiley & Sons Ltd.
Ontology and diversity of transcript-associated microsatellites mined from a globe artichoke EST database

PubMed Central

Scaglione, Davide; Acquadro, Alberto; Portis, Ezio; Taylor, Christopher A; Lanteri, Sergio; Knapp, Steven J

2009-01-01

Background The globe artichoke (Cynara cardunculus var. scolymus L.) is a significant crop in the Mediterranean basin. Despite its commercial importance and its both dietary and pharmaceutical value, knowledge of its genetics and genomics remains scant. Microsatellite markers have become a key tool in genetic and genomic analysis, and we have exploited recently acquired EST (expressed sequence tag) sequence data (Composite Genome Project - CGP) to develop an extensive set of microsatellite markers. Results A unigene assembly was created from over 36,000 globe artichoke EST sequences, containing 6,621 contigs and 12,434 singletons. Over 12,000 of these unigenes were functionally assigned on the basis of homology with Arabidopsis thaliana reference proteins. A total of 4,219 perfect repeats, located within 3,308 unigenes was identified and the gene ontology (GO) analysis highlighted some GO term's enrichments among different classes of microsatellites with respect to their position. Sufficient flanking sequence was available to enable the design of primers to amplify 2,311 of these microsatellites, and a set of 300 was tested against a DNA panel derived from 28 C. cardunculus genotypes. Consistent amplification and polymorphism was obtained from 236 of these assays. Their polymorphic information content (PIC) ranged from 0.04 to 0.90 (mean 0.66). Between 176 and 198 of the assays were informative in at least one of the three available mapping populations. Conclusion EST-based microsatellites have provided a large set of de novo genetic markers, which show significant amounts of polymorphism both between and within the three taxa of C. cardunculus. They are thus well suited as assays for phylogenetic analysis, the construction of genetic maps, marker-assisted breeding, transcript mapping and other genomic applications in the species. PMID:19785740
Collinearity Analysis and High-Density Genetic Mapping of the Wheat Powdery Mildew Resistance Gene Pm40 in PI 672538

PubMed Central

Fatima, Syeda Akash; Yang, Jiezhi; Chen, Wanquan; Liu, Taiguo; Hu, Yuting; Li, Qing; Guo, Jingwei; Zhang, Min; Lei, Li; Li, Xin; Tang, Shengwen; Luo, Peigao

2016-01-01

The wheat powdery mildew resistance gene Pm40, which is located on chromosomal arm 7BS, is effective against nearly all prevalent races of Blumeria graminis f. sp tritici (Bgt) in China and is carried by the common wheat germplasm PI 672538. A set of the F1, F2 and F2:3 populations from the cross of the resistant PI 672538 with the susceptible line L1034 were used to conduct genetic analysis of powdery mildew resistance and construct a high-density linkage map of the Pm40 gene. We constructed a high-density linkage genetic map with a total length of 6.18 cM and average spacing between markers of 0.48 cM.Pm40 is flanked by Xwmc335 and BF291338 at genetic distances of 0.58 cM and 0.26 cM, respectively, in deletion bin C-7BS-1-0.27. Comparative genomic analysis based on EST-STS markers established a high level of collinearity of the Pm40 genomic region with a 1.09-Mbp genomic region on Brachypodium chromosome 3, a 1.16-Mbp genomic region on rice chromosome 8, and a 1.62-Mbp genomic region on sorghum chromosome 7. We further anchored the Pm40 target intervals to the wheat genome sequence. A putative linear index of 85 wheat contigs containing 97 genes on 7BS was constructed. In total, 9 genes could be considered as candidates for the resistances to powdery mildew in the target genomic regions, which encoded proteins that were involved in the plant defense and response to pathogen attack. These results will facilitate the development of new markers for map-based cloning and marker-assisted selection of Pm40 in wheat breeding programs. PMID:27755575
Discovery of Pod Shatter-Resistant Associated SNPs by Deep Sequencing of a Representative Library Followed by Bulk Segregant Analysis in Rapeseed

PubMed Central

Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong

2012-01-01

Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop species. PMID:22529909
Correlation of physical and genetic maps of human chromosome 16

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sutherland, G.R.

1991-01-01

This project aimed to divide chromosome 16 into approximately 50 intervals of {approximately}2Mb in size by constructing a series of mouse/human somatic cell hybrids each containing a rearranged chromosome 16. Using these hybrids, DNA probes would be regionally mapped by Southern blot or PCR analysis. Preference would be given to mapping probes which demonstrated polymorphisms for which the CEPH panel of families had been typed. This would allow a correlation of the physical and linkage maps of this chromosome. The aims have been substantially achieved. 49 somatic cell hybrids have been constructed which have allowed definition of 46, and potentiallymore » 57, different physical intervals on the chromosome. 164 loci have been fully mapped into these intervals. A correlation of the physical and genetic maps of the chromosome is in an advanced stage of preparation. The somatic cell hybrids constructed have been widely distributed to groups working on chromosome 16 and other genome projects.« less
Correlation of physical and genetic maps of human chromosome 16. Annual progress report, October 1, 1990--July 31, 1991

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sutherland, G.R.

1991-12-31

This project aimed to divide chromosome 16 into approximately 50 intervals of {approximately}2Mb in size by constructing a series of mouse/human somatic cell hybrids each containing a rearranged chromosome 16. Using these hybrids, DNA probes would be regionally mapped by Southern blot or PCR analysis. Preference would be given to mapping probes which demonstrated polymorphisms for which the CEPH panel of families had been typed. This would allow a correlation of the physical and linkage maps of this chromosome. The aims have been substantially achieved. 49 somatic cell hybrids have been constructed which have allowed definition of 46, and potentiallymore » 57, different physical intervals on the chromosome. 164 loci have been fully mapped into these intervals. A correlation of the physical and genetic maps of the chromosome is in an advanced stage of preparation. The somatic cell hybrids constructed have been widely distributed to groups working on chromosome 16 and other genome projects.« less
Transcriptome resources for the frogs Lithobates clamitans and Pseudacris regilla, emphasizing antimicrobial peptides and conserved loci for phylogenetics

USGS Publications Warehouse

Robertson, Laura S.; Cornman, Robert S.

2014-01-01

We developed genetic resources for two North American frogs, Lithobates clamitans and Pseudacris regilla, widespread native amphibians that are potential indicator species of environmental health. For both species, mRNA from multiple tissues was sequenced using 454 technology. De novo assemblies with Mira3 resulted in 50 238 contigs (N50 = 687 bp) and 48 213 contigs (N50 = 686 bp) for L. clamitans and P. regilla, respectively, after clustering with CD-Hit-EST and purging contigs below 200 bp. We performed BLASTX similarity searches against the Xenopus tropicalis proteome and, for predicted ORFs, HMMER similarity searches against the Pfam-A database. Because there is broad interest in amphibian immune factors, we manually annotated putative antimicrobial peptides. To identify conserved regions suitable for amplicon resequencing across a broad taxonomic range, we performed an additional assembly of public short-read transcriptome data derived from two species of the genus Rana and identified reciprocal best TBLASTX matches among all assemblies. Although P. regilla, a hylid frog, is substantially more diverged from the ranid species, we identified 56 genes that were sufficiently conserved to allow nondegenerate primer design with Primer3. In addition to providing a foundation for comparative genomics and quantitative gene expression analysis, our results enable quick development of nuclear sequence-based markers for phylogenetics or population genetics.

Next Generation Sequencing Identifies Five Major Classes of Potentially Therapeutic Enzymes Secreted by Lucilia sericata Medical Maggots

PubMed Central

Franta, Zdeněk; Vogel, Heiko; Lehmann, Rüdiger; Rupp, Oliver; Goesmann, Alexander; Vilcinskas, Andreas

2016-01-01

Lucilia sericata larvae are used as an alternative treatment for recalcitrant and chronic wounds. Their excretions/secretions contain molecules that facilitate tissue debridement, disinfect, or accelerate wound healing and have therefore been recognized as a potential source of novel therapeutic compounds. Among the substances present in excretions/secretions various peptidase activities promoting the wound healing processes have been detected but the peptidases responsible for these activities remain mostly unidentified. To explore these enzymes we applied next generation sequencing to analyze the transcriptomes of different maggot tissues (salivary glands, gut, and crop) associated with the production of excretions/secretions and/or with digestion as well as the rest of the larval body. As a result we obtained more than 123.8 million paired-end reads, which were assembled de novo using Trinity and Oases assemblers, yielding 41,421 contigs with an N50 contig length of 2.22 kb and a total length of 67.79 Mb. BLASTp analysis against the MEROPS database identified 1729 contigs in 577 clusters encoding five peptidase classes (serine, cysteine, aspartic, threonine, and metallopeptidases), which were assigned to 26 clans, 48 families, and 185 peptidase species. The individual enzymes were differentially expressed among maggot tissues and included peptidase activities related to the therapeutic effects of maggot excretions/secretions. PMID:27119084
Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

NASA Astrophysics Data System (ADS)

Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation

PubMed Central

Kiefer, Christina; Fehlmann, Tobias; Backes, Christina

2017-01-01

Abstract Metagenomics-based studies of mixed microbial communities are impacting biotechnology, life sciences and medicine. Computational binning of metagenomic data is a powerful approach for the culture-independent recovery of population-resolved genomic sequences, i.e. from individual or closely related, constituent microorganisms. Existing binning solutions often require a priori characterized reference genomes and/or dedicated compute resources. Extending currently available reference-independent binning tools, we developed the BusyBee Web server for the automated deconvolution of metagenomic data into population-level genomic bins using assembled contigs (Illumina) or long reads (Pacific Biosciences, Oxford Nanopore Technologies). A reversible compression step as well as bootstrapped supervised binning enable quick turnaround times. The binning results are represented in interactive 2D scatterplots. Moreover, bin quality estimates, taxonomic annotations and annotations of antibiotic resistance genes are computed and visualized. Ground truth-based benchmarks of BusyBee Web demonstrate comparably high performance to state-of-the-art binning solutions for assembled contigs and markedly improved performance for long reads (median F1 scores: 70.02–95.21%). Furthermore, the applicability to real-world metagenomic datasets is shown. In conclusion, our reference-independent approach automatically bins assembled contigs or long reads, exhibits high sensitivity and precision, enables intuitive inspection of the results, and only requires FASTA-formatted input. The web-based application is freely accessible at: https://ccb-microbe.cs.uni-saarland.de/busybee. PMID:28472498
Identification of genes associated with low furanocoumarin content in grapefruit.

PubMed

Chen, Chunxian; Yu, Qibin; Wei, Xu; Cancalon, Paul F; Gmitter, Fred G

2014-10-01

Some furanocoumarins in grapefruit (Citrus paradisi) are associated with the so-called grapefruit juice effect. Previous phytochemical quantification and genetic analysis suggested that the synthesis of these furanocoumarins may be controlled by a single gene in the pathway. In this study, cDNA-amplified fragment length polymorphism (cDNA-AFLP) analysis of fruit tissues was performed to identify the candidate gene(s) likely associated with low furanocoumarin content in grapefruit. Fifteen tentative differentially expressed fragments were cloned through the cDNA-AFLP analysis of the grapefruit variety Foster and its spontaneous low-furanocoumarin mutant Low Acid Foster. Sequence analysis revealed a cDNA-AFLP fragment, Contig 6, was homologous to a substrate-proved psoralen synthase gene, CYP71A22, and was part of citrus unigenes Cit.3003 and Csi.1332, and predicted genes Ciclev10004717m in mandarin and orange1.1g041507m in sweet orange. The two predicted genes contained the highly conserved motifs at one of the substrate recognition sites of CYP71A22. Digital gene expression profile showed the unigenes were expressed only in fruit and seed. Quantitative real-time PCR also proved Contig 6 was down-regulated in Low Acid Foster. These results showed the differentially expressed Contig 6 was related to the reduced furanocoumarin levels in the mutant. The identified fragment, homologs, unigenes, and genes may facilitate further furanocoumarin genetic study and grapefruit variety improvement.
SNP-markers in Allium species to facilitate introgression breeding in onion.

PubMed

Scholten, Olga E; van Kaauwen, Martijn P W; Shahin, Arwa; Hendrickx, Patrick M; Keizer, L C Paul; Burger, Karin; van Heusden, Adriaan W; van der Linden, C Gerard; Vosman, Ben

2016-08-31

Within onion, Allium cepa L., the availability of disease resistance is limited. The identification of sources of resistance in related species, such as Allium roylei and Allium fistulosum, was a first step towards the improvement of onion cultivars by breeding. SNP markers linked to resistance and polymorphic between these related species and onion cultivars are a valuable tool to efficiently introgress disease resistance genes. In this paper we describe the identification and validation of SNP markers valuable for onion breeding. Transcriptome sequencing resulted in 192 million RNA seq reads from the interspecific F1 hybrid between A. roylei and A. fistulosum (RF) and nine onion cultivars. After assembly, reliable SNPs were discovered in about 36 % of the contigs. For genotyping of the interspecific three-way cross population, derived from a cross between an onion cultivar and the RF (CCxRF), 1100 SNPs that are polymorphic in RF and monomorphic in the onion cultivars (RF SNPs) were selected for the development of KASP assays. A molecular linkage map based on 667 RF-SNP markers was constructed for CCxRF. In addition, KASP assays were developed for 1600 onion-SNPs (SNPs polymorphic among onion cultivars). A second linkage map was constructed for an F2 of onion x A. roylei (F2(CxR)) that consisted of 182 onion-SNPs and 119 RF-SNPs, and 76 previously mapped markers. Markers co-segregating in both the F2(CxR) and the CCxRF population were used to assign the linkage groups of RF to onion chromosomes. To validate usefulness of these SNP markers, QTL mapping was applied in the CCxRF population that segregates for resistance to Botrytis squamosa and resulted in a QTL for resistance on chromosome 6 of A. roylei. Our research has more than doubled the publicly available marker sequences of expressed onion genes and two onion-related species. It resulted in a detailed genetic map for the interspecific CCxRF population. This is the first paper that reports the detection of a QTL for resistance to B. squamosa in A. roylei.
Map of physics

NASA Astrophysics Data System (ADS)

2008-10-01

Based on bibliometric data from information-services provider Thomson Reuters, this map reveals "core areas" of physics, shown as coloured circular nodes, and the relationship between these subdisciplines, shown as lines.
Meeting the Demands of Professional Education: A Study of Mind Mapping in a Professional Doctoral Physical Therapy Education Program

ERIC Educational Resources Information Center

Pollard, Elicia L.

2010-01-01

The purposes of this study are to investigate whether the quiz scores of physical therapy students who integrated mind mapping in their learning strategies are significantly different than the quiz scores of students who did not use mind mapping to learn in a lecture-based research course and examine the students' perceptions of mind mapping as a…
Constructing a 'Chromonome' of Yellowtail (Seriola quinqueradiata) for Comparative Analysis of Chromosomal Rearrangements

PubMed Central

Kawase, Junya; Aoki, Jun-ya; Araki, Kazuo

2018-01-01

To investigate chromosome evolution in fish species, we newly mapped 181 markers that allowed us to construct a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map with 1,713 DNA markers, which was far denser than a previous map, and we anchored the de novo assembled sequences onto the RH physical map. Finally, we mapped a total of 13,977 expressed sequence tags (ESTs) on a genome sequence assembly aligned with the physical map. Using the high-density physical map and anchored genome sequences, we accurately compared the yellowtail genome structure with the genome structures of five model fishes to identify characteristics of the yellowtail genome. Between yellowtail and Japanese medaka (Oryzias latipes), almost all regions of the chromosomes were conserved and some blocks comprising several markers were translocated. Using the genome information of the spotted gar (Lepisosteus oculatus) as a reference, we further documented syntenic relationships and chromosomal rearrangements that occurred during evolution in four other acanthopterygian species (Japanese medaka, zebrafish, spotted green pufferfish and three-spined stickleback). The evolutionary chromosome translocation frequency was 1.5-2-times higher in yellowtail than in medaka, pufferfish, and stickleback. PMID:29290830
Surficial geologic map of the Amboy 30' x 60' quadrangle, San Bernardino County, California

USGS Publications Warehouse

Bedford, David R.; Miller, David M.; Phelps, Geoffrey A.

2010-01-01

The surficial geologic map of the Amboy 30' x 60' quadrangle presents characteristics of surficial materials for an area of approximately 5,000 km2 in the eastern Mojave Desert of southern California. This map consists of new surficial mapping conducted between 2000 and 2007, as well as compilations from previous surficial mapping. Surficial geologic units are mapped and described based on depositional process and age categories that reflect the mode of deposition, pedogenic effects following deposition, and, where appropriate, the lithologic nature of the material. Many physical properties were noted and measured during the geologic mapping. This information was used to classify surficial deposits and to understand their ecological importance. We focus on physical properties that drive hydrologic, biologic, and physical processes such as particle-size distribution (PSD) and bulk density. The database contains point data representing locations of samples for both laboratory determined physical properties and semiquantitative field-based information in the database. We include the locations of all field observations and note the type of information collected in the field to help assist in assessing the quality of the mapping. The publication is separated into three parts: documentation, spatial data, and printable map graphics of the database. Documentation includes this pamphlet, which provides a discussion of the surficial geology and units and the map. Spatial data are distributed as ArcGIS Geodatabase in Microsoft Access format and are accompanied by a readme file, which describes the database contents, and FGDC metadata for the spatial map information. Map graphics files are distributed as Postscript and Adobe Portable Document Format (PDF) files that provide a view of the spatial database at the mapped scale.
De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

PubMed Central

2012-01-01

Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Conclusions Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project. PMID:23110314
Molecular characterization and combined genotype association study of bovine cluster of differentiation 14 gene with clinical mastitis in crossbred dairy cattle

PubMed Central

Selvan, A. Sakthivel; Gupta, I. D.; Verma, A.; Chaudhari, M. V.; Magotra, A.

2016-01-01

Aim: The present study was undertaken with the objectives to characterize and to analyze combined genotypes of cluster of differentiation 14 (CD14) gene to explore its association with clinical mastitis in Karan Fries (KF) cows maintained in the National Dairy Research Institute herd, Karnal. Materials and Methods: Genomic DNA was extracted using blood of randomly selected 94 KF lactating cattle by phenol-chloroform method. After checking its quality and quantity, polymerase chain reaction (PCR) was carried out using six sets of reported gene-specific primers to amplify complete KF CD14 gene. The forward and reverse sequences for each PCR fragments were assembled to form complete sequence for the respective region of KF CD14 gene. The multiple sequence alignments of the edited sequence with the corresponding reference with reported Bos taurus sequence (EU148610.1) were performed with ClustalW software to identify single nucleotide polymorphisms (SNPs). Basic Local Alignment Search Tool analysis was performed to compare the sequence identity of KF CD14 gene with other species. The restriction fragment length polymorphism (RFLP) analysis was carried out in all KF cows using Helicobacter pylori 188I (Hpy188I) (contig 2) and Haemophilus influenzae I (HinfI) (contig 4) restriction enzyme (RE). Cows were assigned genotypes obtained by PCR-RFLP analysis, and association study was done using Chi-square (χ2) test. The genotypes of both contigs (loci) number 2 and 4 were combined with respect to each animal to construct combined genotype patterns. Results: Two types of sequences of KF were obtained: One with 2630 bp having one insertion at 616 nucleotide (nt) position and one deletion at 1117 nt position, and the another sequence was of 2629 bp having only one deletion at 615 nt position. ClustalW, multiple alignments of KF CD14 gene sequence with B. taurus cattle sequence (EU148610.1), revealed 24 nt changes (SNPs). Cows were also screened using PCR-RFLP with Hpy188I (contig 2) and HinfI (contig 4) RE, which revealed three genotypes each that differed significantly regarding mastitis incidence. The maximum possible combination of these two loci shown nine combined genotype patterns and it was observed only eight combined genotypes out of nine: AACC, AACD, AADD, ABCD, ABDD, BBCC, BBCD, and BBDD. The combined genotype ABCC was not observed in the studied population of KF cows. Out of 94 animals, AACD combined genotype animals (10.63%) were found to be not affected with mastitis, and ABDD combined genotyped animals was observed having the highest mastitis incidence of 15.96%. Conclusion: AACD typed cows were found to be least susceptible to mastitis incidence as compared to other combined genotypes. PMID:27536026
Molecular characterization and combined genotype association study of bovine cluster of differentiation 14 gene with clinical mastitis in crossbred dairy cattle.

PubMed

Selvan, A Sakthivel; Gupta, I D; Verma, A; Chaudhari, M V; Magotra, A

2016-07-01

The present study was undertaken with the objectives to characterize and to analyze combined genotypes of cluster of differentiation 14 (CD14) gene to explore its association with clinical mastitis in Karan Fries (KF) cows maintained in the National Dairy Research Institute herd, Karnal. Genomic DNA was extracted using blood of randomly selected 94 KF lactating cattle by phenol-chloroform method. After checking its quality and quantity, polymerase chain reaction (PCR) was carried out using six sets of reported gene-specific primers to amplify complete KF CD14 gene. The forward and reverse sequences for each PCR fragments were assembled to form complete sequence for the respective region of KF CD14 gene. The multiple sequence alignments of the edited sequence with the corresponding reference with reported Bos taurus sequence (EU148610.1) were performed with ClustalW software to identify single nucleotide polymorphisms (SNPs). Basic Local Alignment Search Tool analysis was performed to compare the sequence identity of KF CD14 gene with other species. The restriction fragment length polymorphism (RFLP) analysis was carried out in all KF cows using Helicobacter pylori 188I (Hpy188I) (contig 2) and Haemophilus influenzae I (HinfI) (contig 4) restriction enzyme (RE). Cows were assigned genotypes obtained by PCR-RFLP analysis, and association study was done using Chi-square (χ (2)) test. The genotypes of both contigs (loci) number 2 and 4 were combined with respect to each animal to construct combined genotype patterns. Two types of sequences of KF were obtained: One with 2630 bp having one insertion at 616 nucleotide (nt) position and one deletion at 1117 nt position, and the another sequence was of 2629 bp having only one deletion at 615 nt position. ClustalW, multiple alignments of KF CD14 gene sequence with B. taurus cattle sequence (EU148610.1), revealed 24 nt changes (SNPs). Cows were also screened using PCR-RFLP with Hpy188I (contig 2) and HinfI (contig 4) RE, which revealed three genotypes each that differed significantly regarding mastitis incidence. The maximum possible combination of these two loci shown nine combined genotype patterns and it was observed only eight combined genotypes out of nine: AACC, AACD, AADD, ABCD, ABDD, BBCC, BBCD, and BBDD. The combined genotype ABCC was not observed in the studied population of KF cows. Out of 94 animals, AACD combined genotype animals (10.63%) were found to be not affected with mastitis, and ABDD combined genotyped animals was observed having the highest mastitis incidence of 15.96%. AACD typed cows were found to be least susceptible to mastitis incidence as compared to other combined genotypes.
Final Report for LDRD Project 02-ERD-069: Discovering the Unknown Mechanism(s) of Virulence in a BW, Class A Select Agent

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chain, P; Garcia, E

2003-02-06

The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

PubMed

Sim, Mikang; Kim, Jaebum

2015-02-01

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Asset Mapping: A Tool to Enhance Your CSPAP Efforts

ERIC Educational Resources Information Center

Allar, Ishonté; Bulger, Sean

2018-01-01

Comprehensive school physical activity programs (CSPAPs) are one way to help students achieve most, if not all, of the recommended 60 minutes of daily moderate-to-vigorous physical activity (MVPA). Early in the process, one can use asset mapping to help enhance CSPAP efforts. Asset maps provide a valuable opportunity to identify potential partners…
A third-generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map.

PubMed

Solignac, Michel; Mougel, Florence; Vautrin, Dominique; Monnerot, Monique; Cornuet, Jean-Marie

2007-01-01

The honey bee is a key model for social behavior and this feature led to the selection of the species for genome sequencing. A genetic map is a necessary companion to the sequence. In addition, because there was originally no physical map for the honey bee genome project, a meiotic map was the only resource for organizing the sequence assembly on the chromosomes. We present the genetic (meiotic) map here and describe the main features that emerged from comparison with the sequence-based physical map. The genetic map of the honey bee is saturated and the chromosomes are oriented from the centromeric to the telomeric regions. The map is based on 2,008 markers and is about 40 Morgans (M) long, resulting in a marker density of one every 2.05 centiMorgans (cM). For the 186 megabases (Mb) of the genome mapped and assembled, this corresponds to a very high average recombination rate of 22.04 cM/Mb. Honey bee meiosis shows a relatively homogeneous recombination rate along and across chromosomes, as well as within and between individuals. Interference is higher than inferred from the Kosambi function of distance. In addition, numerous recombination hotspots are dispersed over the genome. The very large genetic length of the honey bee genome, its small physical size and an almost complete genome sequence with a relatively low number of genes suggest a very promising future for association mapping in the honey bee, particularly as the existence of haploid males allows easy bulk segregant analysis.
PeanutDB: an integrated bioinformatics web portal for Arachis hypogaea transcriptomics

PubMed Central

2012-01-01

Background The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource. Description With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors. Conclusion As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the Peanut research community with an easy-to-use web portal that will definitely facilitate genomics research and molecular breeding in this less-studied crop. PMID:22712730
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa.

PubMed

Shahin, Arwa; van Kaauwen, Martijn; Esselink, Danny; Bargsten, Joachim W; van Tuyl, Jaap M; Visser, Richard G F; Arens, Paul

2012-11-20

Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies.
KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.

PubMed

Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki

2013-07-09

The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with useful annotation information with easy-to-use web interfaces, which helps researchers to efficiently search for target sequences such as insect resistance-related genes. KONAGAbase will be continuously updated and additional genomic/transcriptomic resources and analysis tools will be provided for further efficient analysis of the mechanism of insecticide resistance and the development of effective insecticides with a novel mode of action for DBM.
Characterization of the transcriptome of an ecologically important avian species, the Vinous-throated Parrotbill Paradoxornis webbianus bulomachus (Paradoxornithidae; Aves)

PubMed Central

2012-01-01

Background Adaptive divergence driven by environmental heterogeneity has long been a fascinating topic in ecology and evolutionary biology. The study of the genetic basis of adaptive divergence has, however, been greatly hampered by a lack of genomic information. The recent development of transcriptome sequencing provides an unprecedented opportunity to generate large amounts of genomic data for detailed investigations of the genetics of adaptive divergence in non-model organisms. Herein, we used the Illumina sequencing platform to sequence the transcriptome of brain and liver tissues from a single individual of the Vinous-throated Parrotbill, Paradoxornis webbianus bulomachus, an ecologically important avian species in Taiwan with a wide elevational range of sea level to 3100 m. Results Our 10.1 Gbp of sequences were first assembled based on Zebra Finch (Taeniopygia guttata) and chicken (Gallus gallus) RNA references. The remaining reads were then de novo assembled. After filtering out contigs with low coverage (<10X), we retained 67,791 of 487,336 contigs, which covered approximately 5.3% of the P. w. bulomachus genome. Of 7,779 contigs retained for a top-hit species distribution analysis, the majority (about 86%) were matched to known Zebra Finch and chicken transcripts. We also annotated 6,365 contigs to gene ontology (GO) terms: in total, 122 GO-slim terms were assigned, including biological process (41%), molecular function (32%), and cellular component (27%). Many potential genetic markers for future adaptive genomic studies were also identified: 8,589 single nucleotide polymorphisms, 1,344 simple sequence repeats and 109 candidate genes that might be involved in elevational or climate adaptation. Conclusions Our study shows that transcriptome data can serve as a rich genetic resource, even for a single run of short-read sequencing from a single individual of a non-model species. This is the first study providing transcriptomic information for species in the avian superfamily Sylvioidea, which comprises more than 1,000 species. Our data can be used to study adaptive divergence in heterogeneous environments and investigate other important ecological and evolutionary questions in parrotbills from different populations and even in other species in the Sylvioidea. PMID:22530590

RNA-Seq reveals genotype-specific molecular responses to water deficit in eucalyptus

PubMed Central

2011-01-01

Background In a context of climate change, phenotypic plasticity provides long-lived species, such as trees, with the means to adapt to environmental variations occurring within a single generation. In eucalyptus plantations, water availability is a key factor limiting productivity. However, the molecular mechanisms underlying the adaptation of eucalyptus to water shortage remain unclear. In this study, we compared the molecular responses of two commercial eucalyptus hybrids during the dry season. Both hybrids differ in productivity when grown under water deficit. Results Pyrosequencing of RNA extracted from shoot apices provided extensive transcriptome coverage - a catalog of 129,993 unigenes (49,748 contigs and 80,245 singletons) was generated from 398 million base pairs, or 1.14 million reads. The pyrosequencing data enriched considerably existing Eucalyptus EST collections, adding 36,985 unigenes not previously represented. Digital analysis of read abundance in 14,460 contigs identified 1,280 that were differentially expressed between the two genotypes, 155 contigs showing differential expression between treatments (irrigated vs. non irrigated conditions during the dry season), and 274 contigs with significant genotype-by-treatment interaction. The more productive genotype displayed a larger set of genes responding to water stress. Moreover, stress signal transduction seemed to involve different pathways in the two genotypes, suggesting that water shortage induces distinct cellular stress cascades. Similarly, the response of functional proteins also varied widely between genotypes: the most productive genotype decreased expression of genes related to photosystem, transport and secondary metabolism, whereas genes related to primary metabolism and cell organisation were over-expressed. Conclusions For the most productive genotype, the ability to express a broader set of genes in response to water availability appears to be a key characteristic in the maintenance of biomass growth during the dry season. Its strategy may involve a decrease of photosynthetic activity during the dry season associated with resources reallocation through major changes in the expression of primary metabolism associated genes. Further efforts will be needed to assess the adaptive nature of the genes highlighted in this study. PMID:22047139
Next-generation transcriptome sequencing, SNP discovery and validation in four market classes of peanut, Arachis hypogaea L.

PubMed

Chopra, Ratan; Burow, Gloria; Farmer, Andrew; Mudge, Joann; Simpson, Charles E; Wilkins, Thea A; Baring, Michael R; Puppala, Naveen; Chamberlin, Kelly D; Burow, Mark D

2015-06-01

Single-nucleotide polymorphisms, which can be identified in the thousands or millions from comparisons of transcriptome or genome sequences, are ideally suited for making high-resolution genetic maps, investigating population evolutionary history, and discovering marker-trait linkages. Despite significant results from their use in human genetics, progress in identification and use in plants, and particularly polyploid plants, has lagged. As part of a long-term project to identify and use SNPs suitable for these purposes in cultivated peanut, which is tetraploid, we generated transcriptome sequences of four peanut cultivars, namely OLin, New Mexico Valencia C, Tamrun OL07 and Jupiter, which represent the four major market classes of peanut grown in the world, and which are important economically to the US southwest peanut growing region. CopyDNA libraries of each genotype were used to generate 2 × 54 paired-end reads using an Illumina GAIIx sequencer. Raw reads were mapped to a custom reference consisting of Tifrunner 454 sequences plus peanut ESTs in GenBank, compromising 43,108 contigs; 263,840 SNP and indel variants were identified among four genotypes compared to the reference. A subset of 6 variants was assayed across 24 genotypes representing four market types using KASP chemistry to assess the criteria for SNP selection. Results demonstrated that transcriptome sequencing can identify SNPs usable as selectable DNA-based markers in complex polyploid species such as peanut. Criteria for effective use of SNPs as markers are discussed in this context.
Delimitation of duplicated segments and identification of their parental origin in two partial chromosome 3p duplications.

PubMed

Antonini, Sylvie; Kim, Chong A; Sugayama, Sofia M; Vianna-Morgante, Angela M

2002-11-22

Two chromosome 3 short arm duplications identified through G-banding were further investigated using fluorescence in situ hybridization (FISH) and polymerase chain reaction (PCR) of microsatellite markers, aiming at mapping breakpoints and disclosing mechanisms of origin of these chromosome aberrations. Patient 1 was found to be a mosaic: a 3p12 --> 3p21 duplication was observed in most of his cells, and a normal cell line occurred with a frequency of about 3% in blood. In situ hybridization of chromosome 3 short- and long-arm libraries confirmed the short-arm duplication. Using FISH of short-arm sequences, the YAC 961_h_3 was shown to contain the proximal breakpoint (3p12.1 or 3p12.2), and the distal breakpoint was located between the YACs 729_c_3 and 806_h_2, which are adjacent in the WC 3.10 contig (3p21.1). In Patient 2, G-banding indicated a 3p21 --> 3p24 duplication, without mosaicism. In situ hybridization of chromosome 3 short- and long-arm libraries confirmed the duplication of short-arm sequences. FISH of chromosome 3 sequences showed that the YAC 749_a_7 spanned the proximal breakpoint (3p21.33). The distal breakpoint mapped to the interval between YACs 932_b_6 (3p24.3) and 909_b_6 (3p25). In both cases, microsatellite genotyping pointed to a rearrangement between paternal sister chromatids. Copyright 2002 Wiley-Liss, Inc.
SfiI genomic cleavage map of Escherichia coli K-12 strain MG1655.

PubMed Central

Perkins, J D; Heath, J D; Sharma, B R; Weinstock, G M

1992-01-01

An SfiI restriction map of Escherichia coli K-12 strain MG1655 is presented. The map contains thirty-one cleavage sites separating fragments ranging in size from 407 kb to 3.7 kb. Several techniques were used in the construction of this map, including CHEF pulsed field gel electrophoresis; physical analysis of a set of twenty-six auxotrophic transposon insertions; correlation with the restriction map of Kohara and coworkers using the commercially available E. coli Gene Mapping Membranes; analysis of publicly available sequence information; and correlation of the above data with the combined genetic and physical map developed by Rudd, et al. The combination of these techniques has yielded a map in which all but one site can be localized within a range of +/- 2 kb, and over half the sites can be localized precisely by sequence data. Two sites present in the EcoSeq5 sequence database are not cleaved in MG1655 and four sites are noted to be sensitive to methylation by the dcm methylase. This map, combined with the NotI physical map of MG1655, can aid in the rapid, precise mapping of several different types of genetic alterations, including transposon mediated mutations and other insertions, inversions, deletions and duplications. Images PMID:1312707
A second generation integrated map of the rainbow trout (Oncorhynchus mykiss) genome: analysis of synteny with model fish genomes

USDA-ARS?s Scientific Manuscript database

In this paper we generated DNA fingerprints and end sequences from bacterial artificial chromosomes (BACs) from two new libraries to improve the first generation integrated physical and genetic map of the rainbow trout (Oncorhynchus mykiss) genome. The current version of the physical map is compose...
Application of Intervention Mapping to the Development of a Complex Physical Therapist Intervention.

PubMed

Jones, Taryn M; Dear, Blake F; Hush, Julia M; Titov, Nickolai; Dean, Catherine M

2016-12-01

Physical therapist interventions, such as those designed to change physical activity behavior, are often complex and multifaceted. In order to facilitate rigorous evaluation and implementation of these complex interventions into clinical practice, the development process must be comprehensive, systematic, and transparent, with a sound theoretical basis. Intervention Mapping is designed to guide an iterative and problem-focused approach to the development of complex interventions. The purpose of this case report is to demonstrate the application of an Intervention Mapping approach to the development of a complex physical therapist intervention, a remote self-management program aimed at increasing physical activity after acquired brain injury. Intervention Mapping consists of 6 steps to guide the development of complex interventions: (1) needs assessment; (2) identification of outcomes, performance objectives, and change objectives; (3) selection of theory-based intervention methods and practical applications; (4) organization of methods and applications into an intervention program; (5) creation of an implementation plan; and (6) generation of an evaluation plan. The rationale and detailed description of this process are presented using an example of the development of a novel and complex physical therapist intervention, myMoves-a program designed to help individuals with an acquired brain injury to change their physical activity behavior. The Intervention Mapping framework may be useful in the development of complex physical therapist interventions, ensuring the development is comprehensive, systematic, and thorough, with a sound theoretical basis. This process facilitates translation into clinical practice and allows for greater confidence and transparency when the program efficacy is investigated. © 2016 American Physical Therapy Association.
The Effectiveness of Concept Maps in Teaching Physics Concepts Applied to Engineering Education: Experimental Comparison of the Amount of Learning Achieved with and without Concept Maps

ERIC Educational Resources Information Center

Martinez, Guadalupe; Perez, Angel Luis; Suero, Maria Isabel; Pardo, Pedro J.

2013-01-01

A study was conducted to quantify the effectiveness of concept maps in learning physics in engineering degrees. The following research question was posed: What was the difference in learning results from the use of concept maps to study a particular topic in an engineering course? The study design was quasi-experimental and used a post-test as a…
De novo assembly and annotation of the Antarctic copepod (Tigriopus kingsejongensis) transcriptome.

PubMed

Kim, Hui-Su; Lee, Bo-Young; Han, Jeonghoon; Lee, Young Hwan; Min, Gi-Sik; Kim, Sanghee; Lee, Jae-Seong

2016-08-01

The whole transcriptome of the Antarctic copepod (Tigriopus kingsejongensis) was sequenced using Illumina RNA-seq. De novo assembly was performed with 64,785,098 raw reads using Trinity, which assembled into 81,653 contigs. TransDecoder found 38,250 candidate coding contigs which showed homology to other species by BLAST analysis. Functional gene annotation was performed by Gene Ontology (GO), InterProScan, and KEGG pathway analyses. Finally, we identified a number of expressed gene catalog for T. kingsejongensis that is a useful model animal for gene information-based polar research to uncover molecular mechanisms of environmental adaptation on harsh environments. In particular, we observed highly developing lipid metabolism in T. kingsejongensis directly compared to those of the Far East Pacific coast copepod Tigriopus japonicus at the transcriptome level. Copyright © 2016 Elsevier B.V. All rights reserved.
[Phylogenetic analysis of genomes of Vibrio cholerae strains isolated on the territory of Rostov region].

PubMed

Kuleshov, K V; Markelov, M L; Dedkov, V G; Vodop'ianov, A S; Kermanov, A V; Pisanov, R V; Kruglikov, V D; Mazrukho, A B; Maleev, V V; Shipulin, G A

2013-01-01

Determination of origin of 2 Vibrio cholerae strains isolated on the territory of Rostov region by using full genome sequencing data. Toxigenic strain 2011 EL- 301 V. cholerae 01 El Tor Inaba No. 301 (ctxAB+, tcpA+) and nontoxigenic strain V. cholerae O1 Ogawa P- 18785 (ctxAB-, tcpA+) were studied. Sequencing was carried out on the MiSeq platform. Phylogenetic analysis of the genomes obtained was carried out based on comparison of conservative part of the studied and 54 previously sequenced genomes. 2011EL-301 strain genome was presented by 164 contigs with an average coverage of 100, N50 parameter was 132 kb, for strain P- 18785 - 159 contigs with a coverage of69, N50 - 83 kb. The contigs obtained for strain 2011 EL-301 were deposited in DDBJ/EMBL/GenBank databases with access code AJFN02000000, for strain P-18785 - ANHS00000000. 716 protein-coding orthologous genes were detected. Based on phylogenetic analysis strain P- 18785 belongs to PG-1 subgroup (a group of predecessor strains of the 7th pandemic). Strain 2011EL-301 belongs to groups of strains of the 7th pandemic and is included into the cluster with later isolates that are associated with cases of cholera in South Africa and cases of import of cholera to the USA from Pakistan. The data obtained allows to establish phylogenetic connections with V cholerae strains isolated earlier.
A comprehensive evaluation of assembly scaffolding tools

PubMed Central

2014-01-01

Background Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity. PMID:24581555
Evidence for retrovirus and paramyxovirus infection of multiple bat species in china.

PubMed

Yuan, Lihong; Li, Min; Li, Linmiao; Monagin, Corina; Chmura, Aleksei A; Schneider, Bradley S; Epstein, Jonathan H; Mei, Xiaolin; Shi, Zhengli; Daszak, Peter; Chen, Jinping

2014-05-16

Bats are recognized reservoirs for many emerging zoonotic viruses of public health importance. Identifying and cataloguing the viruses of bats is a logical approach to evaluate the range of potential zoonoses of bat origin. We characterized the fecal pathogen microbiome of both insectivorous and frugivorous bats, incorporating 281 individual bats comprising 20 common species, which were sampled in three locations of Yunnan province, by combining reverse transcription polymerase chain reaction (RT-PCR) assays and next-generation sequencing. Seven individual bats were paramyxovirus-positive by RT-PCR using degenerate primers, and these paramyxoviruses were mainly classified into three genera (Rubulavirus, Henipavirus and Jeilongvirus). Various additional novel pathogens were detected in the paramyxovirus-positive bats using Illumina sequencing. A total of 7066 assembled contigs (≥200 bp) were constructed, and 105 contigs matched eukaryotic viruses (of them 103 belong to 2 vertebrate virus families, 1 insect virus, and 1 mycovirus), 17 were parasites, and 4913 were homologous to prokaryotic microorganisms. Among the 103 vertebrate viral contigs, 79 displayed low identity (<70%) to known viruses including human viruses at the amino acid level, suggesting that these belong to novel and genetically divergent viruses. Overall, the most frequently identified viruses, particularly in bats from the family Hipposideridae, were retroviruses. The present study expands our understanding of the bat virome in species commonly found in Yunnan, China, and provides insight into the overall diversity of viruses that may be capable of directly or indirectly crossing over into humans.
Evidence for Retrovirus and Paramyxovirus Infection of Multiple Bat Species in China

PubMed Central

Yuan, Lihong; Li, Min; Li, Linmiao; Monagin, Corina; Chmura, Aleksei A.; Schneider, Bradley S.; Epstein, Jonathan H.; Mei, Xiaolin; Shi, Zhengli; Daszak, Peter; Chen, Jinping

2014-01-01

Bats are recognized reservoirs for many emerging zoonotic viruses of public health importance. Identifying and cataloguing the viruses of bats is a logical approach to evaluate the range of potential zoonoses of bat origin. We characterized the fecal pathogen microbiome of both insectivorous and frugivorous bats, incorporating 281 individual bats comprising 20 common species, which were sampled in three locations of Yunnan province, by combining reverse transcription polymerase chain reaction (RT-PCR) assays and next-generation sequencing. Seven individual bats were paramyxovirus-positive by RT-PCR using degenerate primers, and these paramyxoviruses were mainly classified into three genera (Rubulavirus, Henipavirus and Jeilongvirus). Various additional novel pathogens were detected in the paramyxovirus-positive bats using Illumina sequencing. A total of 7066 assembled contigs (≥200 bp) were constructed, and 105 contigs matched eukaryotic viruses (of them 103 belong to 2 vertebrate virus families, 1 insect virus, and 1 mycovirus), 17 were parasites, and 4913 were homologous to prokaryotic microorganisms. Among the 103 vertebrate viral contigs, 79 displayed low identity (<70%) to known viruses including human viruses at the amino acid level, suggesting that these belong to novel and genetically divergent viruses. Overall, the most frequently identified viruses, particularly in bats from the family Hipposideridae, were retroviruses. The present study expands our understanding of the bat virome in species commonly found in Yunnan, China, and provides insight into the overall diversity of viruses that may be capable of directly or indirectly crossing over into humans. PMID:24841387
Analysis of differentially expressed genes in two immunologically distinct strains of Eimeria maxima using suppression subtractive hybridization and dot-blot hybridization

PubMed Central

2014-01-01

Background It is well known that different Eimeria maxima strains exhibit significant antigenic variation. However, the genetic basis of these phenotypes remains unclear. Methods Total RNA and mRNA were isolated from unsporulated oocysts of E. maxima strains SH and NT, which were found to have significant differences in immunogenicity in our previous research. Two subtractive cDNA libraries were constructed using suppression subtractive hybridization (SSH) and specific genes were further analyzed by dot-blot hybridization and qRT-PCR analysis. Results A total of 561 clones were selected from both cDNA libraries and the length of the inserted fragments was 0.25–1.0 kb. Dot-blot hybridization revealed a total of 86 differentially expressed clones (63 from strain SH and 23 from strain NT). Nucleotide sequencing analysis of these clones revealed ten specific contigs (six from strain SH and four from strain NT). Further analysis found that six contigs from strain SH and three from strain NT shared significant identities with previously reported proteins, and one contig was presumed to be novel. The specific differentially expressed genes were finally verified by RT-PCR and qRT-PCR analyses. Conclusions The data presented here suggest that specific genes identified between the two strains may be important molecules in the immunogenicity of E. maxima that may present potential new drug targets or vaccine candidates for coccidiosis. PMID:24894832
Transcript expression plasticity as a response to alternative larval host plants in the speciation process of corn and rice strains of Spodoptera frugiperda.

PubMed

Silva-Brandão, Karina Lucas; Horikoshi, Renato Jun; Bernardi, Daniel; Omoto, Celso; Figueira, Antonio; Brandão, Marcelo Mendes

2017-10-16

Our main purpose was to evaluate the expression of plastic and evolved genes involved in ecological speciation in the noctuid moth Spodoptera frugiperda, the fall armyworm (FAW); and to demonstrate how host plants might influence lineage differentiation in this polyphagous insect. FAW is an important pest of several crops worldwide, and it is differentiated into host plant-related strains, corn (CS) and rice strains (RS). RNA-Seq and transcriptome characterization were applied to evaluate unbiased genetic expression differences in larvae from the two strains, fed on primary (corn) and alternative (rice) host plants. We consider that genes that are differently regulated by the same FAW strain, as a response to different hosts, are "plastic". Otherwise, differences in gene expression between the two strains fed on the same host are considered constitutive differences. Individual performance parameters (larval and pupal weight) varied among conditions (strains vs. hosts). A total of 3657 contigs was related to plastic response, and 2395 contigs were differentially regulated in the two strains feeding on preferential and alternative hosts (constitutive contigs). Three molecular functions were present in all comparisons, both down- and up-regulated: oxidoreductase activity, metal-ion binding, and hydrolase activity. Metabolization of foreign chemicals is among the key functions involved in the phenotypic variation of FAW strains. From an agricultural perspective, high plasticity in families of detoxifying genes indicates the capacity for a rapid response to control compounds such as insecticides.
Physical-enhanced secure strategy in an OFDM-PON.

PubMed

Zhang, Lijia; Xin, Xiangjun; Liu, Bo; Yu, Jianjun

2012-01-30

The physical layer of optical access network is vulnerable to various attacks. As the dramatic increase of users and network capacity, the issue of physical-layer security becomes more and more important. This paper proposes a physical-enhanced secure strategy for orthogonal frequency division multiplexing passive optical network (OFDM-PON) by employing frequency domain chaos scrambling. The Logistic map is adopted for the chaos mapping. The chaos scrambling strategy can dynamically allocate the scrambling matrices for different OFDM frames according to the initial condition, which enhance the confidentiality of the physical layer. A mathematical model of this secure system is derived firstly, which achieves a secure transmission at physical layer in OFDM-PON. The results from experimental implementation using Logistic mapped chaos scrambling are also given to further demonstrate the efficiency of this secure strategy. An 10.125 Gb/s 64QAM-OFDM data with Logistic mapped chaos scrambling are successfully transmitted over 25-km single mode fiber (SMF), and the experimental results show that proposed security scheme can protect the system from eavesdropper and attacker, while keep a good performance for the legal ONU.
Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes

PubMed Central

Asp, Torben; Kristensen, Michael

2016-01-01

Background Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. Results The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Conclusion Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification. PMID:27019205
De novo transcriptome sequencing of the Octopus vulgaris hemocytes using Illumina RNA-Seq technology: response to the infection by the gastrointestinal parasite Aggregata octopiana.

PubMed

Castellanos-Martínez, Sheila; Arteta, David; Catarino, Susana; Gestal, Camino

2014-01-01

Octopus vulgaris is a highly valuable species of great commercial interest and excellent candidate for aquaculture diversification; however, the octopus' well-being is impaired by pathogens, of which the gastrointestinal coccidian parasite Aggregata octopiana is one of the most important. The knowledge of the molecular mechanisms of the immune response in cephalopods, especially in octopus is scarce. The transcriptome of the hemocytes of O. vulgaris was de novo sequenced using the high-throughput paired-end Illumina technology to identify genes involved in immune defense and to understand the molecular basis of octopus tolerance/resistance to coccidiosis. A bi-directional mRNA library was constructed from hemocytes of two groups of octopus according to the infection by A. octopiana, sick octopus, suffering coccidiosis, and healthy octopus, and reads were de novo assembled together. The differential expression of transcripts was analysed using the general assembly as a reference for mapping the reads from each condition. After sequencing, a total of 75,571,280 high quality reads were obtained from the sick octopus group and 74,731,646 from the healthy group. The general transcriptome of the O. vulgaris hemocytes was assembled in 254,506 contigs. A total of 48,225 contigs were successfully identified, and 538 transcripts exhibited differential expression between groups of infection. The general transcriptome revealed genes involved in pathways like NF-kB, TLR and Complement. Differential expression of TLR-2, PGRP, C1q and PRDX genes due to infection was validated using RT-qPCR. In sick octopuses, only TLR-2 was up-regulated in hemocytes, but all of them were up-regulated in caecum and gills. The transcriptome reported here de novo establishes the first molecular clues to understand how the octopus immune system works and interacts with a highly pathogenic coccidian. The data provided here will contribute to identification of biomarkers for octopus resistance against pathogens, which could improve octopus farming in the near future.
De Novo Transcriptome Sequencing of the Octopus vulgaris Hemocytes Using Illumina RNA-Seq Technology: Response to the Infection by the Gastrointestinal Parasite Aggregata octopiana

PubMed Central

Castellanos-Martínez, Sheila; Arteta, David; Catarino, Susana; Gestal, Camino

2014-01-01

Background Octopus vulgaris is a highly valuable species of great commercial interest and excellent candidate for aquaculture diversification; however, the octopus’ well-being is impaired by pathogens, of which the gastrointestinal coccidian parasite Aggregata octopiana is one of the most important. The knowledge of the molecular mechanisms of the immune response in cephalopods, especially in octopus is scarce. The transcriptome of the hemocytes of O. vulgaris was de novo sequenced using the high-throughput paired-end Illumina technology to identify genes involved in immune defense and to understand the molecular basis of octopus tolerance/resistance to coccidiosis. Results A bi-directional mRNA library was constructed from hemocytes of two groups of octopus according to the infection by A. octopiana, sick octopus, suffering coccidiosis, and healthy octopus, and reads were de novo assembled together. The differential expression of transcripts was analysed using the general assembly as a reference for mapping the reads from each condition. After sequencing, a total of 75,571,280 high quality reads were obtained from the sick octopus group and 74,731,646 from the healthy group. The general transcriptome of the O. vulgaris hemocytes was assembled in 254,506 contigs. A total of 48,225 contigs were successfully identified, and 538 transcripts exhibited differential expression between groups of infection. The general transcriptome revealed genes involved in pathways like NF-kB, TLR and Complement. Differential expression of TLR-2, PGRP, C1q and PRDX genes due to infection was validated using RT-qPCR. In sick octopuses, only TLR-2 was up-regulated in hemocytes, but all of them were up-regulated in caecum and gills. Conclusion The transcriptome reported here de novo establishes the first molecular clues to understand how the octopus immune system works and interacts with a highly pathogenic coccidian. The data provided here will contribute to identification of biomarkers for octopus resistance against pathogens, which could improve octopus farming in the near future. PMID:25329466
The Effect of Using Concept Mapping on Student's Attitude and Achievement When Learning the Physics Topic of Circular and Rotational Motion

ERIC Educational Resources Information Center

Luchembe, Dennis; Chinyama, Kaumba; Jumbe, Jack

2014-01-01

The study was conducted to show the effectiveness of concept mapping as a teaching strategy to undergraduate students taking introductory physics course. A number of researchers have investigated the effectiveness of concept mapping on student academic achievement. The main focus of these studies have been on comparing the effectiveness of concept…
Crowdsourcing Physical Network Topology Mapping With Net.Tagger

DTIC Science & Technology

2016-03-01

backend server infrastructure . This in- cludes a full security audit, better web services handling, and integration with the OSM stack and dataset to...a novel approach to network infrastructure mapping that combines smartphone apps with crowdsourced collection to gather data for offline aggregation...and analysis. The project aims to build a map of physical network infrastructure such as fiber-optic cables, facilities, and access points. The

Algebra and topology for applications to physics

NASA Technical Reports Server (NTRS)

Rozhkov, S. S.

1987-01-01

The principal concepts of algebra and topology are examined with emphasis on applications to physics. In particular, attention is given to sets and mapping; topological spaces and continuous mapping; manifolds; and topological groups and Lie groups. The discussion also covers the tangential spaces of the differential manifolds, including Lie algebras, vector fields, and differential forms, properties of differential forms, mapping of tangential spaces, and integration of differential forms.
GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data

PubMed Central

Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

2008-01-01

The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org. PMID:17932055
Physics faculty beliefs and values about the teaching and learning of problem solving. II. Procedures for measurement and analysis

NASA Astrophysics Data System (ADS)

Henderson, Charles; Yerushalmi, Edit; Kuo, Vince H.; Heller, Kenneth; Heller, Patricia

2007-12-01

To identify and describe the basis upon which instructors make curricular and pedagogical decisions, we have developed an artifact-based interview and an analysis technique based on multilayered concept maps. The policy capturing technique used in the interview asks instructors to make judgments about concrete instructional artifacts similar to those they likely encounter in their teaching environment. The analysis procedure alternatively employs both an a priori systems view analysis and an emergent categorization to construct a multilayered concept map, which is a hierarchically arranged set of concept maps where child maps include more details than parent maps. Although our goal was to develop a model of physics faculty beliefs about the teaching and learning of problem solving in the context of an introductory calculus-based physics course, the techniques described here are applicable to a variety of situations in which instructors make decisions that influence teaching and learning.
Genetic profiling of Trypanosoma cruzi directly in infected tissues using nested PCR of polymorphic microsatellites.

PubMed

Valadares, Helder Magno Silva; Pimenta, Juliana Ramos; de Freitas, Jorge Marcelo; Duffy, Tomás; Bartholomeu, Daniella C; Oliveira, Riva de Paula; Chiari, Egler; Moreira, Maria da Consolação Vieira; Filho, Geraldo Brasileiro; Schijman, Alejandro Gabriel; Franco, Glória Regina; Machado, Carlos Renato; Pena, Sérgio Danilo Junho; Macedo, Andréa Mara

2008-06-01

The investigation of the importance of the genetics of Trypanosoma cruzi in determining the clinical course of Chagas disease will depend on precise characterisation of the parasites present in the tissue lesions. This can be adequately accomplished by the use of hypervariable nuclear markers such as microsatellites. However the unilocal nature of these loci and the scarcity of parasites in chronic lesions make it necessary to use high sensitivity PCR with nested primers, whose design depends on the availability of long flanking regions, a feature not hitherto available for any known T. cruzi microsatellites. Herein, making use of the extensive T. cruzi genome sequence now available and using the Tandem Repeats Finder software, it was possible to identify and characterise seven new microsatellite loci--six composed of trinucleotide (TcTAC15, TcTAT20, TcAAT8, TcATT14, TcGAG10 and TcCAA10) and one composed of tetranucleotide (TcAAAT6) motifs. All except the TcCAA10 locus were physically mapped onto distinct intergenic regions of chromosome III of the CL Brener clone contigs. The TcCAA10 locus was localised within a hypothetical protein gene in the T. cruzi genome. All microsatellites were polymorphic and useful for T. cruzi genetic variability studies. Using the TcTAC15 locus it was possible to separate the strains belonging to the T. cruzi I lineage (DTU I) from those belonging to T. cruzi II (DTU IIb), T. cruzi III (DTU IIc) and a hybrid group (DTU IId, IIe). The long flanking regions of these novel microsatellites allowed construction of nested primers and the use of full nested PCR protocols. This strategy enabled us to detect and differentiate T. cruzi strains directly in clinical specimens including heart, blood, CSF and skin tissues from patients in the acute and chronic phases of Chagas disease.
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

PubMed

Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D

2012-06-07

Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.
De-Novo Assembly and Analysis of the Heterozygous Triploid Genome of the Wine Spoilage Yeast Dekkera bruxellensis AWRI1499

PubMed Central

Chambers, Paul J.; Pretorius, Isak S.

2012-01-01

Despite its industrial importance, the yeast species Dekkera (Brettanomyces) bruxellensis has remained poorly understood at the genetic level. In this study we describe whole genome sequencing and analysis for a prevalent wine spoilage strain, AWRI1499. The 12.7 Mb assembly, consisting of 324 contigs in 99 scaffolds (super-contigs) at 26-fold coverage, exhibits a relatively high density of single nucleotide polymorphisms (SNPs). Haplotype sampling for 1.2% of open reading frames suggested that the D. bruxellensis AWRI1499 genome is comprised of a moderately heterozygous diploid genome, in combination with a divergent haploid genome. Gene content analysis revealed enrichment in membrane proteins, particularly transporters, along with oxidoreductase enzymes. Availability of this assembly and annotation provides a resource for further investigation of genomic organization in this species, and functional characterization of genes that may confer important phenotypic traits. PMID:22470482
Identification and validation of single nucleotide polymorphisms as tools to detect hybridization and population structure in freshwater stingrays.

PubMed

Cruz, Vanessa P; Vera, Manuel; Pardo, Belén G; Taggart, John; Martinez, Paulino; Oliveira, Claudio; Foresti, Fausto

2017-05-01

Single nucleotide polymorphism (SNP) markers were identified and validated for two stingrays species, Potamotrygon motoro and Potamotrygon falkneri, using double digest restriction-site associated DNA (ddRAD) reads using 454-Roche technology. A total of 226 774 reads (65.5 Mb) were obtained (mean read length 289 ± 183 bp) detecting a total of 5399 contigs (mean contig length: 396 ± 91 bp). Mining this data set, a panel of 143 in silico SNPs was selected. Eighty-two of these SNPs were successfully validated and 61 were polymorphic: 14 in P. falkneri, 21 in P. motoro, 3 in both species and 26 fixed for alternative variants in both species, thus being useful for population analyses and hybrid detection. © 2016 John Wiley & Sons Ltd.
Prokaryotic Contig Annotation Pipeline Server: Web Application for a Prokaryotic Genome Annotation Pipeline Based on the Shiny App Package.

PubMed

Park, Byeonghyeok; Baek, Min-Jeong; Min, Byoungnam; Choi, In-Geol

2017-09-01

Genome annotation is a primary step in genomic research. To establish a light and portable prokaryotic genome annotation pipeline for use in individual laboratories, we developed a Shiny app package designated as "P-CAPS" (Prokaryotic Contig Annotation Pipeline Server). The package is composed of R and Python scripts that integrate publicly available annotation programs into a server application. P-CAPS is not only a browser-based interactive application but also a distributable Shiny app package that can be installed on any personal computer. The final annotation is provided in various standard formats and is summarized in an R markdown document. Annotation can be visualized and examined with a public genome browser. A benchmark test showed that the annotation quality and completeness of P-CAPS were reliable and compatible with those of currently available public pipelines.
Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux

PubMed Central

Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

2012-01-01

We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology. PMID:22848480
Recovering complete and draft population genomes from metagenome datasets

DOE PAGES

Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

2016-03-08

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
MetaQUAST: evaluation of metagenome assemblies.

PubMed

Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey

2016-04-01

During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets. http://bioinf.spbau.ru/metaquast aleksey.gurevich@spbu.ru Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Draft genome sequence of a Kluyvera intermedia isolate from a patient with a pancreatic abscess.

PubMed

Thele, Roland; Gumpert, Heidi; Christensen, Louise B; Worning, Peder; Schønning, Kristian; Westh, Henrik; Hansen, Thomas A

2017-09-01

The genus Kluyvera comprises potential pathogens that can cause many infections. This study reports a Kluyvera intermedia strain (FOSA7093) from a pancreatic cyst specimen from a long-term hospitalised patient. Whole-genome sequencing (WGS) of the K. intermedia isolate was performed and the strain was reported as sensitive to Danish-registered antibiotics although it had a fosA-like gene in the genome. There were nine contigs that aligned to a plasmid, and these contigs contained several heavy metal resistance gene homologues. Furthermore, a prophage was discovered in the genome. WGS represents an efficient tool for monitoring Kluyvera spp. and its role as a reservoir of multidrug resistance. Therefore, this susceptible K. intermedia genome has many characteristics that allow comparison of resistant K. intermedia that might be discovered in the future. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Recovering complete and draft population genomes from metagenome datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
The Combination of Sonography and Physical Examination Improves the Patency and Suitability of Hemodialysis Arteriovenous Fistula in Vascular Access.

PubMed

Mat Said, Normawati; Musa, Kamarul Imran; Mohamed Daud, Mohamed Ashraf; Haron, Juhara

2016-07-01

We compared the patency and the suitability of arteriovenous fistula (AVF) created for vascular access by two approaches: (a) physical examination with preoperative vascular mapping and (b) physical examination alone. We compared the patency and the suitability of AVF created in patients for dialysis. There were two cohorts of patients of 79 patients each: (a) patients with AVF created based on the combination of physical examination and preoperative vascular mapping (PE+VM) and (b) patients with AVF created based on physical examination (PE) alone. Fistula patency is defined as clinical detection of thrill (or auscultation) of murmur over the fistula and coded as having thrills (patent) versus not having thrills (not patent). Suitability of fistula is defined as functioning AVF (AVF can be adequately used via 2-needle cannulation for dialysis) and coded as suitable versus not suitable. AVF created after the preoperative vascular mapping (PE+VM) has 5.70 (at six weeks) and 3.76 (at three months) times higher chance for patency, and 3.08 times higher chance for suitable AVF for dialysis than AVF created after the physical examination (PE) alone. Physical examination with preoperative ultrasound mapping (PE+VM) significantly improves the short term patency and the suitability of AVF for dialysis.
Randomly picked cosmid clones overlap the pyrB and oriC gap in the physical map of the E. coli chromosome.

PubMed Central

Knott, V; Rees, D J; Cheng, Z; Brownlee, G G

1988-01-01

Sets of overlapping cosmid clones generated by random sampling and fingerprinting methods complement data at pyrB (96.5') and oriC (84') in the published physical map of E. coli. A new cloning strategy using sheared DNA, and a low copy, inducible cosmid vector were used in order to reduce bias in libraries, in conjunction with micro-methods for preparing cosmid DNA from a large number of clones. Our results are relevant to the design of the best approach to the physical mapping of large genomes. PMID:2834694
Mapping alpha-Particle X-Ray Fluorescence Spectrometer (Map-X)

NASA Technical Reports Server (NTRS)

Blake, D. F.; Sarrazin, P.; Bristow, T.

2014-01-01

Many planetary surface processes (like physical and chemical weathering, water activity, diagenesis, low-temperature or impact metamorphism, and biogenic activity) leave traces of their actions as features in the size range 10s to 100s of micron. The Mapping alpha-particle X-ray Spectrometer ("Map-X") is intended to provide chemical imaging at 2 orders of magnitude higher spatial resolution than previously flown instruments, yielding elemental chemistry at or below the scale length where many relict physical, chemical, and biological features can be imaged and interpreted in ancient rocks.
Construction of Reference Chromosome-Scale Pseudomolecules for Potato: Integrating the Potato Genome with Genetic and Physical Maps

PubMed Central

Sharma, Sanjeev Kumar; Bolser, Daniel; de Boer, Jan; Sønderkær, Mads; Amoros, Walter; Carboni, Martin Federico; D’Ambrosio, Juan Martín; de la Cruz, German; Di Genova, Alex; Douches, David S.; Eguiluz, Maria; Guo, Xiao; Guzman, Frank; Hackett, Christine A.; Hamilton, John P.; Li, Guangcun; Li, Ying; Lozano, Roberto; Maass, Alejandro; Marshall, David; Martinez, Diana; McLean, Karen; Mejía, Nilo; Milne, Linda; Munive, Susan; Nagy, Istvan; Ponce, Olga; Ramirez, Manuel; Simon, Reinhard; Thomson, Susan J.; Torres, Yerisf; Waugh, Robbie; Zhang, Zhonghua; Huang, Sanwen; Visser, Richard G. F.; Bachem, Christian W. B.; Sagredo, Boris; Feingold, Sergio E.; Orjeda, Gisella; Veilleux, Richard E.; Bonierbale, Merideth; Jacobs, Jeanne M. E.; Milbourne, Dan; Martin, David Michael Alan; Bryan, Glenn J.

2013-01-01

The genome of potato, a major global food crop, was recently sequenced. The work presented here details the integration of the potato reference genome (DM) with a new sequence-tagged site marker−based linkage map and other physical and genetic maps of potato and the closely related species tomato. Primary anchoring of the DM genome assembly was accomplished by the use of a diploid segregating population, which was genotyped with several types of molecular genetic markers to construct a new ~936 cM linkage map comprising 2469 marker loci. In silico anchoring approaches used genetic and physical maps from the diploid potato genotype RH89-039-16 (RH) and tomato. This combined approach has allowed 951 superscaffolds to be ordered into pseudomolecules corresponding to the 12 potato chromosomes. These pseudomolecules represent 674 Mb (~93%) of the 723 Mb genome assembly and 37,482 (~96%) of the 39,031 predicted genes. The superscaffold order and orientation within the pseudomolecules are closely collinear with independently constructed high density linkage maps. Comparisons between marker distribution and physical location reveal regions of greater and lesser recombination, as well as regions exhibiting significant segregation distortion. The work presented here has led to a greatly improved ordering of the potato reference genome superscaffolds into chromosomal “pseudomolecules”. PMID:24062527
GRAD-MAP: A Joint Physics and Astronomy Diversity Initiative at the University of Maryland

NASA Astrophysics Data System (ADS)

Steele, Amy; Smith, Robyn; Wilkins, Ashlee; Jameson, Katie

2018-01-01

Graduate Resources for Advancing Diversity with Maryland’s Astronomy and Physics (GRAD-MAP), builds connections between UMD and mid-Atlantic HBCUs, Minority-Serving Institutions (MSIs), and community colleges. We use seminars, forums, and workshops to foster a diverse community of undergraduates prepared to succeed in graduate school, inclusion-minded graduate student mentors, and faculty versed in the experiences of students at MSIs and the need for changes in the fields of physics and astronomy. Now in its fifth year, GRAD-MAP remains a graduate-student-powered initiative with a three-pronged approach: 1) Fall Collaborative Seminars, 2) A Winter Workshop, and 3) A Summer Scholars Program. This coherent set of programming allows GRAD-MAP to do more than just increase the numbers of minority students participating in astronomy and physics research (or worse, simply shuffle around students who already are or would be active in research). GRAD-MAP is committed to identifying students who are otherwise underserved or overlooked by the traditional academic pipeline, not only to get them on the path to be successful undergraduate researchers and eventual graduate applicants, but also to make substantial, sustainable efforts toward making the climate of academic physics and astronomy more inclusive to them and all other underrepresented minority students. We will describe the key elements of our program, highlight successes and lessons learned, and describe future directions for program elements. GRAD-MAP can serve as a model for other universities committed to diversity and inclusion.
Comparing de novo genome assembly: the long and short of it.

PubMed

Narzisi, Giuseppe; Mishra, Bud

2011-04-29

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
From biomedicine to natural history research: EST resources for ambystomatid salamanders

PubMed Central

Putta, Srikrishna; Smith, Jeramiah J; Walker, John A; Rondet, Mathieu; Weisrock, David W; Monaghan, James; Samuels, Amy K; Kump, Kevin; King, David C; Maness, Nicholas J; Habermann, Bianca; Tanaka, Elly; Bryant, Susan V; Gardiner, David M; Parichy, David M; Voss, S Randal

2004-01-01

Background Establishing genomic resources for closely related species will provide comparative insights that are crucial for understanding diversity and variability at multiple levels of biological organization. We developed ESTs for Mexican axolotl (Ambystoma mexicanum) and Eastern tiger salamander (A. tigrinum tigrinum), species with deep and diverse research histories. Results Approximately 40,000 quality cDNA sequences were isolated for these species from various tissues, including regenerating limb and tail. These sequences and an existing set of 16,030 cDNA sequences for A. mexicanum were processed to yield 35,413 and 20,599 high quality ESTs for A. mexicanum and A. t. tigrinum, respectively. Because the A. t. tigrinum ESTs were obtained primarily from a normalized library, an approximately equal number of contigs were obtained for each species, with 21,091 unique contigs identified overall. The 10,592 contigs that showed significant similarity to sequences from the human RefSeq database reflected a diverse array of molecular functions and biological processes, with many corresponding to genes expressed during spinal cord injury in rat and fin regeneration in zebrafish. To demonstrate the utility of these EST resources, we searched databases to identify probes for regeneration research, characterized intra- and interspecific nucleotide polymorphism, saturated a human – Ambystoma synteny group with marker loci, and extended PCR primer sets designed for A. mexicanum / A. t. tigrinum orthologues to a related tiger salamander species. Conclusions Our study highlights the value of developing resources in traditional model systems where the likelihood of information transfer to multiple, closely related taxa is high, thus simultaneously enabling both laboratory and natural history research. PMID:15310388

Metagenomics of rumen bacteriophage from thirteen lactating dairy cattle

PubMed Central

2013-01-01

Background The bovine rumen hosts a diverse and complex community of Eukarya, Bacteria, Archea and viruses (including bacteriophage). The rumen viral population (the rumen virome) has received little attention compared to the rumen microbial population (the rumen microbiome). We used massively parallel sequencing of virus like particles to investigate the diversity of the rumen virome in thirteen lactating Australian Holstein dairy cattle all housed in the same location, 12 of which were sampled on the same day. Results Fourteen putative viral sequence fragments over 30 Kbp in length were assembled and annotated. Many of the putative genes in the assembled contigs showed no homology to previously annotated genes, highlighting the large amount of work still required to fully annotate the functions encoded in viral genomes. The abundance of the contig sequences varied widely between animals, even though the cattle were of the same age, stage of lactation and fed the same diets. Additionally the twelve animals which were co-habited shared a number of their dominant viral contigs. We compared the functional characteristics of our bovine viromes with that of other viromes, as well as rumen microbiomes. At the functional level, we found strong similarities between all of the viral samples, which were highly distinct from the rumen microbiome samples. Conclusions Our findings suggest a large amount of between animal variation in the bovine rumen virome and that co-habiting animals may have more similar viromes than non co-habited animals. We report the deepest sequencing to date of the rumen virome. This work highlights the enormous amount of novelty and variation present in the rumen virome. PMID:24180266
Expressed sequence tag (EST) analysis of the pine wood nematode Bursaphelenchus xylophilus and B. mucronatus.

PubMed

Kikuchi, Taisei; Aikawa, Takuya; Kosaka, Hajime; Pritchard, Leighton; Ogura, Nobuo; Jones, John T

2007-09-01

Most Bursaphelenchus species feed on fungi that colonise dead or dying trees. However, Bursaphelenchus xylophilus is unique in that in addition to feeding on fungi it has the capacity to be a parasite of live pine trees. We present an analysis of over 13,000 expressed sequence tags (ESTs) from B. xylophilus and, by way of contrast, over 3000 ESTs from a closely related species that does not parasitise plants as readily; B. mucronatus. Four libraries from B. xylophilus, from a variety of life stages including fungal feeding nematodes, nematodes extracted from plants and dauer-like stage nematodes, and one library from B. mucronatus were constructed and used to generate ESTs. Contig analysis showed that the 13,327 B. xylophilus ESTs could be grouped into 2110 contigs and 4377 singletons giving a total of 6487 identified genes. Similarly the 3193 B. mucronatus ESTs yielded a total of 2219 identified genes from 425 contigs and 1794 singletons. A variety of proteins potentially important in the parasitic process of B. xylophilus and B. mucronatus, including plant and fungal cell wall degrading enzymes and a novel gene potentially encoding a expansin-like protein that may disrupt non-covalent bonds in the plant cell wall were identified in the libraries. Additionally several gene candidates potentially involved in dauer entry or maintenance were also identified in the EST dataset. The EST sequences from this study will provide a solid base for future research on the biology, pathogenicity and evolutionary history of this nematode group.
Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology.

PubMed

Judge, Kim; Hunt, Martin; Reuter, Sandra; Tracey, Alan; Quail, Michael A; Parkhill, Julian; Peacock, Sharon J

2016-09-01

Translating the Oxford Nanopore MinION sequencing technology into medical microbiology requires on-going analysis that keeps pace with technological improvements to the instrument and release of associated analysis software. Here, we use a multidrug-resistant Enterobacter kobei isolate as a model organism to compare open source software for the assembly of genome data, and relate this to the time taken to generate actionable information. Three software tools (PBcR, Canu and miniasm) were used to assemble MinION data and a fourth (SPAdes) was used to combine MinION and Illumina data to produce a hybrid assembly. All four had a similar number of contigs and were more contiguous than the assembly using Illumina data alone, with SPAdes producing a single chromosomal contig. Evaluation of the four assemblies to represent the genome structure revealed a single large inversion in the SPAdes assembly, which also incorrectly integrated a plasmid into the chromosomal contig. Almost 50 %, 80 % and 90 % of MinION pass reads were generated in the first 6, 9 and 12 h, respectively. Using data from the first 6 h alone led to a less accurate, fragmented assembly, but data from the first 9 or 12 h generated similar assemblies to that from 48 h sequencing. Assemblies were generated in 2 h using Canu, indicating that going from isolate to assembled data is possible in less than 48 h. MinION data identified that genes responsible for resistance were carried by two plasmids encoding resistance to carbapenem and to sulphonamides, rifampicin and aminoglycosides, respectively.
Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

PubMed

Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

2012-01-01

Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.
DNA nanomapping using CRISPR-Cas9 as a programmable nanoparticle.

PubMed

Mikheikin, Andrey; Olsen, Anita; Leslie, Kevin; Russell-Pavier, Freddie; Yacoot, Andrew; Picco, Loren; Payton, Oliver; Toor, Amir; Chesney, Alden; Gimzewski, James K; Mishra, Bud; Reed, Jason

2017-11-21

Progress in whole-genome sequencing using short-read (e.g., <150 bp), next-generation sequencing technologies has reinvigorated interest in high-resolution physical mapping to fill technical gaps that are not well addressed by sequencing. Here, we report two technical advances in DNA nanotechnology and single-molecule genomics: (1) we describe a labeling technique (CRISPR-Cas9 nanoparticles) for high-speed AFM-based physical mapping of DNA and (2) the first successful demonstration of using DVD optics to image DNA molecules with high-speed AFM. As a proof of principle, we used this new "nanomapping" method to detect and map precisely BCL2-IGH translocations present in lymph node biopsies of follicular lymphoma patents. This HS-AFM "nanomapping" technique can be complementary to both sequencing and other physical mapping approaches.
Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh].

PubMed

Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K

2011-01-20

Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.
Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

PubMed Central

2011-01-01

Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

PubMed Central

2012-01-01

Background Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies. PMID:23167289
Transcriptomic analysis of grain amaranth (Amaranthus hypochondriacus) using 454 pyrosequencing: comparison with A. tuberculatus, expression profiling in stems and in response to biotic and abiotic stress

PubMed Central

2011-01-01

Background Amaranthus hypochondriacus, a grain amaranth, is a C4 plant noted by its ability to tolerate stressful conditions and produce highly nutritious seeds. These possess an optimal amino acid balance and constitute a rich source of health-promoting peptides. Although several recent studies, mostly involving subtractive hybridization strategies, have contributed to increase the relatively low number of grain amaranth expressed sequence tags (ESTs), transcriptomic information of this species remains limited, particularly regarding tissue-specific and biotic stress-related genes. Thus, a large scale transcriptome analysis was performed to generate stem- and (a)biotic stress-responsive gene expression profiles in grain amaranth. Results A total of 2,700,168 raw reads were obtained from six 454 pyrosequencing runs, which were assembled into 21,207 high quality sequences (20,408 isotigs + 799 contigs). The average sequence length was 1,064 bp and 930 bp for isotigs and contigs, respectively. Only 5,113 singletons were recovered after quality control. Contigs/isotigs were further incorporated into 15,667 isogroups. All unique sequences were queried against the nr, TAIR, UniRef100, UniRef50 and Amaranthaceae EST databases for annotation. Functional GO annotation was performed with all contigs/isotigs that produced significant hits with the TAIR database. Only 8,260 sequences were found to be homologous when the transcriptomes of A. tuberculatus and A. hypochondriacus were compared, most of which were associated with basic house-keeping processes. Digital expression analysis identified 1,971 differentially expressed genes in response to at least one of four stress treatments tested. These included several multiple-stress-inducible genes that could represent potential candidates for use in the engineering of stress-resistant plants. The transcriptomic data generated from pigmented stems shared similarity with findings reported in developing stems of Arabidopsis and black cottonwood (Populus trichocarpa). Conclusions This study represents the first large-scale transcriptomic analysis of A. hypochondriacus, considered to be a highly nutritious and stress-tolerant crop. Numerous genes were found to be induced in response to (a)biotic stress, many of which could further the understanding of the mechanisms that contribute to multiple stress-resistance in plants, a trait that has potential biotechnological applications in agriculture. PMID:21752295
De novo transcriptomic analysis of hydrogen production in the green alga Chlamydomonas moewusii through RNA-Seq

PubMed Central

2013-01-01

Background Microalgae can make a significant contribution towards meeting global renewable energy needs in both carbon-based and hydrogen (H2) biofuel. The development of energy-related products from algae could be accelerated with improvements in systems biology tools, and recent advances in sequencing technology provide a platform for enhanced transcriptomic analyses. However, these techniques are still heavily reliant upon available genomic sequence data. Chlamydomonas moewusii is a unicellular green alga capable of evolving molecular H2 under both dark and light anaerobic conditions, and has high hydrogenase activity that can be rapidly induced. However, to date, there is no systematic investigation of transcriptomic profiling during induction of H2 photoproduction in this organism. Results In this work, RNA-Seq was applied to investigate transcriptomic profiles during the dark anaerobic induction of H2 photoproduction. 156 million reads generated from 7 samples were then used for de novo assembly after data trimming. BlastX results against NCBI database and Blast2GO results were used to interpret the functions of the assembled 34,136 contigs, which were then used as the reference contigs for RNA-Seq analysis. Our results indicated that more contigs were differentially expressed during the period of early and higher H2 photoproduction, and fewer contigs were differentially expressed when H2-photoproduction rates decreased. In addition, C. moewusii and C. reinhardtii share core functional pathways, and transcripts for H2 photoproduction and anaerobic metabolite production were identified in both organisms. C. moewusii also possesses similar metabolic flexibility as C. reinhardtii, and the difference between C. moewusii and C. reinhardtii on hydrogenase expression and anaerobic fermentative pathways involved in redox balancing may explain their different profiles of hydrogenase activity and secreted anaerobic metabolites. Conclusions Herein, we have described a workflow using commercial software to analyze RNA-Seq data without reference genome sequence information, which can be applied to other unsequenced microorganisms. This study provided biological insights into the anaerobic fermentation and H2 photoproduction of C. moewusii, and the first transcriptomic RNA-Seq dataset of C. moewusii generated in this study also offer baseline data for further investigation (e.g. regulatory proteins related to fermentative pathway discussed in this study) of this organism as a H2-photoproduction strain. PMID:23971877
Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian

PubMed Central

2014-01-01

Background The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. Description We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215–364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. Conclusions The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a “non-model system.” PMID:24467778
Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian.

PubMed

Stefanik, Derek J; Lubinski, Tristan J; Granger, Brian R; Byrd, Allyson L; Reitzel, Adam M; DeFilippo, Lukas; Lorenc, Allison; Finnerty, John R

2014-01-28

The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215-364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a "non-model system."
Mental and Physical (MAP) Training: a neurogenesis-inspired intervention that enhances health in humans.

PubMed

Shors, Tracey J; Olson, Ryan L; Bates, Marsha E; Selby, Edward A; Alderman, Brandon L

2014-11-01

New neurons are generated in the hippocampus each day and their survival is greatly enhanced through effortful learning (Shors, 2014). The numbers of cells produced can be increased by physical exercise (van Praag, Kempermann, & Gage, 1999). These findings inspired us to develop a clinical intervention for humans known as Mental and Physical Training, or MAP Training. Each session consists of 30min of mental training with focused attention meditation (20min sitting and 10min walking). Meditation is an effortful training practice that involves learning about the transient nature of thoughts and thought patterns, and acquiring skills to recognize them without necessarily attaching meaning and/or emotions to them. The mental training component is followed by physical training with 30min of aerobic exercise performed at moderate intensity. During this component, participants learn choreographed dance routines while engaging in aerobic exercise. In a pilot "proof-of-concept" study, we provided supervised MAP Training (2 sessions per week for 8weeks) to a group of young mothers in the local community who were recently homeless, most of them having previously suffered from physical and sexual abuse, addiction, and depression. Preliminary data suggest that MAP Training improves dependent measures of aerobic fitness (as assessed by maximal rate of oxygen consumed) while decreasing symptoms of depression and anxiety. Similar changes were not observed in a group of recently homeless women who did not participate in MAP Training. It is not currently possible to determine whether new neurons in the human brain increase in number as a result of MAP Training. Rather these preliminary results of MAP Training illustrate how neuroscientific research can be translated into novel clinical interventions that benefit human health and wellness. Copyright © 2014 Elsevier Inc. All rights reserved.
Using microsatellites to understand the physical distribution of recombination on soybean chromosomes.

PubMed

Ott, Alina; Trautschold, Brian; Sandhu, Devinder

2011-01-01

Soybean is a major crop that is an important source of oil and proteins. A number of genetic linkage maps have been developed in soybean. Specifically, hundreds of simple sequence repeat (SSR) markers have been developed and mapped. Recent sequencing of the soybean genome resulted in the generation of vast amounts of genetic information. The objectives of this investigation were to use SSR markers in developing a connection between genetic and physical maps and to determine the physical distribution of recombination on soybean chromosomes. A total of 2,188 SSRs were used for sequence-based physical localization on soybean chromosomes. Linkage information was used from different maps to create an integrated genetic map. Comparison of the integrated genetic linkage maps and sequence based physical maps revealed that the distal 25% of each chromosome was the most marker-dense, containing an average of 47.4% of the SSR markers and 50.2% of the genes. The proximal 25% of each chromosome contained only 7.4% of the markers and 6.7% of the genes. At the whole genome level, the marker density and gene density showed a high correlation (R(2)) of 0.64 and 0.83, respectively with the physical distance from the centromere. Recombination followed a similar pattern with comparisons indicating that recombination is high in telomeric regions, though the correlation between crossover frequency and distance from the centromeres is low (R(2) = 0.21). Most of the centromeric regions were low in recombination. The crossover frequency for the entire soybean genome was 7.2%, with extremes much higher and lower than average. The number of recombination hotspots varied from 1 to 12 per chromosome. A high correlation of 0.83 between the distribution of SSR markers and genes suggested close association of SSRs with genes. The knowledge of distribution of recombination on chromosomes may be applied in characterizing and targeting genes.
Fine-mapping of a marbling trait to a 2.9-cM region on bovine chromosome 7 in Japanese Black cattle.

PubMed

Hirano, T; Watanabe, T; Inoue, K; Sugimoto, Y

2008-02-01

To locate quantitative trait loci (QTL) for intramuscular fat deposition (marbling) in a local population of Japanese Black cattle, we performed a genome scan using a paternal half-sib family of Bull A. A marbling QTL was mapped in the region flanked by DIK0079 (20.7 cM) and TGLA303 (39.3 cM) on bovine chromosome (BTA) 7, affecting 5.0% of the total family variance. Haplotype analysis of the QTL region revealed that the marbling-increasing Q allele was transmitted from the dam. On the other hand, Bull B, a maternal half-sib of Bull A, did not receive the Q allele from its dam, based on the following findings: (i) a marbling QTL on BTA7 was not detected in the Bull B paternal half-sib family; (ii) recombination between DIK0079 (20.7 cM) and RM006 (25.4 cM) in the QTL region was observed in the maternal chromosome of Bull B; and (iii) the Q-harbouring steers from Bull A exhibited significantly higher marbling than the steers from Bull B and the remaining steers from Bull A. To precisely compare the maternal chromosomes of both bulls, we constructed a bacterial artificial chromosome contig covering the region between DIK0079 and RM006 and developed DNA markers. The recombination occurred between DIK8042 and DIK8044, indicating that the marbling QTL was in a 2.9-cM region flanked by DIK0079 and DIK8044.
The multiple sex chromosomes of platypus and echidna are not completely identical and several share homology with the avian Z.

PubMed

Rens, Willem; O'Brien, Patricia C M; Grützner, Frank; Clarke, Oliver; Graphodatskaya, Daria; Tsend-Ayush, Enkhjargal; Trifonov, Vladimir A; Skelton, Helen; Wallis, Mary C; Johnston, Steve; Veyrunes, Frederic; Graves, Jennifer A M; Ferguson-Smith, Malcolm A

2007-01-01

Sex-determining systems have evolved independently in vertebrates. Placental mammals and marsupials have an XY system, birds have a ZW system. Reptiles and amphibians have different systems, including temperature-dependent sex determination, and XY and ZW systems that differ in origin from birds and placental mammals. Monotremes diverged early in mammalian evolution, just after the mammalian clade diverged from the sauropsid clade. Our previous studies showed that male platypus has five X and five Y chromosomes, no SRY, and DMRT1 on an X chromosome. In order to investigate monotreme sex chromosome evolution, we performed a comparative study of platypus and echidna by chromosome painting and comparative gene mapping. Chromosome painting reveals a meiotic chain of nine sex chromosomes in the male echidna and establishes their order in the chain. Two of those differ from those in the platypus, three of the platypus sex chromosomes differ from those of the echidna and the order of several chromosomes is rearranged. Comparative gene mapping shows that, in addition to bird autosome regions, regions of bird Z chromosomes are homologous to regions in four platypus X chromosomes, that is, X1, X2, X3, X5, and in chromosome Y1. Monotreme sex chromosomes are easiest to explain on the hypothesis that autosomes were added sequentially to the translocation chain, with the final additions after platypus and echidna divergence. Genome sequencing and contig anchoring show no homology yet between platypus and therian Xs; thus, monotremes have a unique XY sex chromosome system that shares some homology with the avian Z.
De novo transcriptome assembly and identification of genes associated with feed conversion ratio and breast muscle yield in domestic ducks.

PubMed

Zhu, Feng; Yuan, Jian-Ming; Zhang, Zhen-He; Hao, Jin-Ping; Yang, Yu-Ze; Hu, Shen-Qiang; Yang, Fang-Xi; Qu, Lu-Jiang; Hou, Zhuo-Cheng

2015-12-01

Breast muscle yield and feed conversion efficiency are the major breeding aims in duck breeding. Understanding the role of specific transcripts in the muscle and small intestine might lead to the elucidation of interrelated biological processes. In this study, we obtained jejunum and breast muscle samples from two strains of Peking ducks that were sorted by feed conversion ratio (FCR) and breast muscle percentage into two-tailed populations. Ten RNA-Seq libraries were developed from the pooled samples and sequenced using the Hiseq2000 platform. We created a reference duck transcript database using de novo assembly methods, which included 16 663 irredundant contigs with an N50 length of 1530 bp. This new duck reference cDNA dataset significantly improved the mapping rate for RNA-Seq data, from 50% to 70%. Mapping and annotation were followed by Gene Ontology analysis, which showed that numerous genes were differentially expressed between the low and high FCR groups. The differentially expressed genes in the jejunum were enriched in biological processes related to immune response and immune response activation, whereas those in the breast muscle were significantly enriched in biological processes related to muscle cell differentiation and organ development. We identified new candidate genes, that is, PCK1, for improving the FCR and breast muscle yield of ducks and obtained much better reference duck transcripts. This study suggested that de novo assembly is essential when applying transcriptome analysis to a species with an incomplete genome. © 2015 Stichting International Foundation for Animal Genetics.
The need for sustained and integrated high-resolution mapping of dynamic coastal environments

USGS Publications Warehouse

Stockdon, Hilary F.; Lillycrop, Jeff W.; Howd, Peter A.; Wozencraft, Jennifer M.

2007-01-01

The evolution of the United States' coastal zone response to both human activities and natural processes is dynamic. Coastal resource and population protection requires understanding, in detail, the processes needed for change as well as the physical setting. Sustained coastal area mapping allows change to be documented and baseline conditions to be established, as well as future behavior to be predicted in conjunction with physical process models. Hyperspectral imagers and airborne lidars, as well as other recent mapping technology advances, allow rapid national scale land use information and high-resolution elevation data collection. Coastal hazard risk evaluation has critical dependence on these rich data sets. A fundamental storm surge model parameter in predicting flooding location, for example, is coastal elevation data, and a foundation in identifying the most vulnerable populations and resources is land use maps. A wealth of information for physical change process study, coastal resource and community management and protection, and coastal area hazard vulnerability determination, is available in a comprehensive national coastal mapping plan designed to take advantage of recent mapping technology progress and data distribution, management, and collection.
Genome sequencing of methanogenic Archaea Methanosarcina mazei TUC01 strain isolated from an Amazonian Flooded Area

NASA Astrophysics Data System (ADS)

Baraúna, R. A.; Graças, D. A.; Ramos, R. T.; Carneiro, A. R.; Lopes, T. S.; Lima, A. R.; Zahlouth, R. L.; Pellizari, V. H.; Silva, A.

2013-05-01

Methanosarcina mazei is a strictly anaerobic methanogen from the Methanosarcinales order. This species is known for its broad catabolic range among methanogens and is widespread throughout diverse environments. The draft genome of a strain cultivated from the sediment of the Tucuruí hydroelectric power station, the fourth largest hydroelectric dam in the world, is described here. Approximately 80% of methane is produced by biogenic sources, such as methanogenic archaea from M. mazei species. Although the methanogenesis pathway is well known, some aspects of the core genome, genome evolution and shared genes are still unclear. A sediment sample from the Tucuruí hydropower station reservoir was inoculated in mineral media supplemented with acetate and methanol. This media was maintained in an H2:CO2 (80:20) atmosphere to enrich and cultivate M. mazei. The enrichment was conducted at 30°C under standard anaerobic conditions. After several molecular and cellular analyses, total DNA was extracted from a non-pure culture of M. mazei, amplified using phi29 DNA polymerase (BioLabs) and finally used as a source template for genome sequencing. The draft genome was obtained after two rounds of sequencing. First, the genome was sequenced using a SOLiD System V3 with a mate-paired library, which yielded 24,405,103 and 24,399,268 reads (50 bp) for the R3 and F3 tags, respectively. The second round of sequencing was performed using the SOLiD 5500 XL platform with a mate-paired library, resulting in a total of 113,588,848 reads (60 bp) for each tag (F3 and R3). All reads obtained by this procedure were filtered using Quality Assessment software, whereby reads with an average quality score below Phred 20 were removed. Velvet and Edena were used to assemble the reads, and Simplifier was used to remove the redundant sequences. After this, a total of 16,811 contigs were obtained. M. mazei GO1 (AE008384) genome was used to map the contigs and generate the scaffolds. We used the Graphical Contig Analyzer for All Sequencing Platforms software (G4ALL; http://g4all.sourceforge.net/) to manually curate and generate the genome scaffold with gaps. The resultant gaps were manually closed using CLC Genomics Workbench software. M. mazei TUC01 genome contained 3,420,400 bp with a GC content of 42.47% distributed over 3 scaffolds that were annotated by RAST. A total of 2,959 coding DNA sequences (CDS) were predicted. The genome of M. mazei TUC01 (accession number: CP003077) will provide valuable information about the ecology of Methanosarcinales order and more accurate information about the methanogenesis pathway observed in the Neotropics. SPONSOR: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES); Agência Nacional de Energia Elétrica (ANEEL); Centrais Elétricas do Norte do Brasil (Eletronorte).
Construction of physical maps for the sex-specific regions of papaya sex chromosomes.

PubMed

Na, Jong-Kuk; Wang, Jianping; Murray, Jan E; Gschwend, Andrea R; Zhang, Wenli; Yu, Qingyi; Navajas-Pérez, Rafael; Feltus, F Alex; Chen, Cuixia; Kubat, Zdenek; Moore, Paul H; Jiang, Jiming; Paterson, Andrew H; Ming, Ray

2012-05-08

Papaya is a major fruit crop in tropical and subtropical regions worldwide. It is trioecious with three sex forms: male, female, and hermaphrodite. Sex determination is controlled by a pair of nascent sex chromosomes with two slightly different Y chromosomes, Y for male and Yh for hermaphrodite. The sex chromosome genotypes are XY (male), XYh (hermaphrodite), and XX (female). The papaya hermaphrodite-specific Yh chromosome region (HSY) is pericentromeric and heterochromatic. Physical mapping of HSY and its X counterpart is essential for sequencing these regions and uncovering the early events of sex chromosome evolution and to identify the sex determination genes for crop improvement. A reiterate chromosome walking strategy was applied to construct the two physical maps with three bacterial artificial chromosome (BAC) libraries. The HSY physical map consists of 68 overlapped BACs on the minimum tiling path, and covers all four HSY-specific Knobs. One gap remained in the region of Knob 1, the only knob structure shared between HSY and X, due to the lack of HSY-specific sequences. This gap was filled on the physical map of the HSY corresponding region in the X chromosome. The X physical map consists of 44 BACs on the minimum tiling path with one gap remaining in the middle, due to the nature of highly repetitive sequences. This gap was filled on the HSY physical map. The borders of the non-recombining HSY were defined genetically by fine mapping using 1460 F2 individuals. The genetically defined HSY spanned approximately 8.5 Mb, whereas its X counterpart extended about 5.4 Mb including a 900 Kb region containing the Knob 1 shared by the HSY and X. The 8.5 Mb HSY corresponds to 4.5 Mb of its X counterpart, showing 4 Mb (89%) DNA sequence expansion. The 89% increase of DNA sequence in HSY indicates rapid expansion of the Yh chromosome after genetic recombination was suppressed 2-3 million years ago. The genetically defined borders coincide with the common BACs on the minimum tiling paths of HSY and X. The minimum tiling paths of HSY and its X counterpart are being used for sequencing these X and Yh-specific regions.

GRAD-MAP: A Joint Physics and Astronomy Diversity Initiative at the University of Maryland

NASA Astrophysics Data System (ADS)

Wilkins, Ashlee N.; Jameson, Katherine; Taylor, Corbin James; Anderson, Neil; Megson, Peter; Roberg-Clark, Gareth; Sheppard, Kyle; Uher, Tim; Hammer, Donna; Vogel, Stuart N.

2016-01-01

Graduate Resources for Advancing Diversity with Maryland's Astronomy and Physics (GRAD-MAP), builds connections between UMD and mid-Atlantic HBCUs, Minority-Serving Institutions, and community colleges, and uses seminars, forums, and workshops to foster a diverse community of undergraduates prepared to succeed in graduate school, and is now in its third year. GRAD-MAP launched with a three-pronged approach: 1) Collaborative Seminars, 2) A Winter Workshop, and 3) A Spring Symposium. This program allows GRAD-MAP to do more than just increase the numbers of minority students participating in astronomy and physics research (or, worse, simply shuffle around students who already are or would be); it is committed to identifying students who are otherwise underserved or overlooked by the traditional academic pipeline, not only to get them on the path to be successful undergraduate researchers and eventual graduate applicants, but also to make the climate of academic physics and astronomy more inclusive to them and all other underrepresented minority students. Our poster describes the key elements of our program, and highlights successes and lessons learned; GRAD-MAP can serve as a model for other universities committed to diversity and inclusion.
Optical mapping and sequencing of the Escherichia coli KO11 genome reveal extensive chromosomal rearrangements, and multiple tandem copies of the Zymomonas mobilis pdc and adhB genes.

PubMed

Turner, Peter C; Yomano, Lorraine P; Jarboe, Laura R; York, Sean W; Baggett, Christy L; Moritz, Brélan E; Zentz, Emily B; Shanmugam, K T; Ingram, Lonnie O

2012-04-01

Escherichia coli KO11 (ATCC 55124) was engineered in 1990 to produce ethanol by chromosomal insertion of the Zymomonas mobilis pdc and adhB genes into E. coli W (ATCC 9637). KO11FL, our current laboratory version of KO11, and its parent E. coli W were sequenced, and contigs assembled into genomic sequences using optical NcoI restriction maps as templates. E. coli W contained plasmids pRK1 (102.5 kb) and pRK2 (5.4 kb), but KO11FL only contained pRK2. KO11FL optical maps made with AflII and with BamHI showed a tandem repeat region, consisting of at least 20 copies of a 10-kb unit. The repeat region was located at the insertion site for the pdc, adhB, and chloramphenicol-resistance genes. Sequence coverage of these genes was about 25-fold higher than average, consistent with amplification of the foreign genes that were inserted as circularized DNA. Selection for higher levels of chloramphenicol resistance originally produced strains with higher pdc and adhB expression, and hence improved fermentation performance, by increasing the gene copy number. Sequence data for an earlier version of KO11, ATCC 55124, indicated that multiple copies of pdc adhB were present. Comparison of the W and KO11FL genomes showed large inversions and deletions in KO11FL, mostly enabled by IS10, which is absent from W but present at 30 sites in KO11FL. The early KO11 strain ATCC 55124 had no rearrangements, contained only one IS10, and lacked most accumulated single nucleotide polymorphisms (SNPs) present in KO11FL. Despite rearrangements and SNPs in KO11FL, fermentation performance was equal to that of ATCC 55124.
NemaPath: online exploration of KEGG-based metabolic pathways for nematodes

PubMed Central

Wylie, Todd; Martin, John; Abubucker, Sahar; Yin, Yong; Messina, David; Wang, Zhengyuan; McCarter, James P; Mitreva, Makedonka

2008-01-01

Background Nematode.net is a web-accessible resource for investigating gene sequences from parasitic and free-living nematode genomes. Beyond the well-characterized model nematode C. elegans, over 500,000 expressed sequence tags (ESTs) and nearly 600,000 genome survey sequences (GSSs) have been generated from 36 nematode species as part of the Parasitic Nematode Genomics Program undertaken by the Genome Center at Washington University School of Medicine. However, these sequencing data are not present in most publicly available protein databases, which only include sequences in Swiss-Prot. Swiss-Prot, in turn, relies on GenBank/Embl/DDJP for predicted proteins from complete genomes or full-length proteins. Description Here we present the NemaPath pathway server, a web-based pathway-level visualization tool for navigating putative metabolic pathways for over 30 nematode species, including 27 parasites. The NemaPath approach consists of two parts: 1) a backend tool to align and evaluate nematode genomic sequences (curated EST contigs) against the annotated Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database; 2) a web viewing application that displays annotated KEGG pathway maps based on desired confidence levels of primary sequence similarity as defined by a user. NemaPath also provides cross-referenced access to nematode genome information provided by other tools available on Nematode.net, including: detailed NemaGene EST cluster information; putative translations; GBrowse EST cluster views; links from nematode data to external databases for corresponding synonymous C. elegans counterparts, subject matches in KEGG's gene database, and also KEGG Ontology (KO) identification. Conclusion The NemaPath server hosts metabolic pathway mappings for 30 nematode species and is available on the World Wide Web at . The nematode source sequences used for the metabolic pathway mappings are available via FTP , as provided by the Genome Center at Washington University School of Medicine. PMID:18983679
A novel allele of TaGW2-A1 is located in a finely mapped QTL that increases grain weight but decreases grain number in wheat (Triticum aestivum L.).

PubMed

Zhai, Huijie; Feng, Zhiyu; Du, Xiaofen; Song, Yane; Liu, Xinye; Qi, Zhongqi; Song, Long; Li, Jiang; Li, Linghong; Peng, Huiru; Hu, Zhaorong; Yao, Yingyin; Xin, Mingming; Xiao, Shihe; Sun, Qixin; Ni, Zhongfu

2018-03-01

A novel TaGW2-A1 allele was identified from a stable, robust QTL region, which is pleiotropic for thousand grain weight, grain number per spike, and grain morphometric parameters in wheat. Thousand grain weight (TGW) and grain number per spike (GNS) are two crucial determinants of wheat spike yield, and genetic dissection of their relationships can help to fine-tune these two components and maximize grain yield. By evaluating 191 recombinant inbred lines in 11 field trials, we identified five genomic regions on chromosomes 1B, 3A, 3B, 5B, or 7A that solely influenced either TGW or GNS, and a further region on chromosome 6A that concurrently affected TGW and GNS. The QTL of interest on chromosome 6A, which was flanked by wsnp_BE490604A_Ta_2_1 and wsnp_RFL_Contig1340_448996 and designated as QTgw/Gns.cau-6A, was finely mapped to a genetic interval shorter than 0.538 cM using near isogenic lines (NILs). The elite NILs of QTgw/Gns.cau-6A increased TGW by 8.33%, but decreased GNS by 3.05% in six field trials. Grain Weight 2 (TaGW2-A1), a well-characterized gene that negatively regulates TGW and grain width in wheat, was located within the finely mapped interval of QTgw/Gns.cau-6A. A novel and rare TaGW2-A1 allele with a 114-bp deletion in the 5' flanking region was identified in the parent with higher TGW, and it reduced TaGW2-A1 promoter activity and expression. In conclusion, these results expand our knowledge of the genetic and molecular basis of TGW-GNS trade-offs in wheat. The QTLs and the novel TaGW2-A1 allele are likely useful for the development of cultivars with higher TGW and/or higher GNS.
Safety analysis of a Russian phage cocktail: From MetaGenomic analysis to oral application in healthy human subjects

DOE Office of Scientific and Technical Information (OSTI.GOV)

McCallin, Shawna, E-mail: semccallin@yahoo.com; Alam Sarker, Shafiqul, E-mail: sasarker@icddrb.org; Barretto, Caroline, E-mail: Caroline.Barretto@rdls.nestle.com

Phage therapy has a long tradition in Eastern Europe, where preparations are comprised of complex phage cocktails whose compositions have not been described. We investigated the composition of a phage cocktail from the Russian pharmaceutical company Microgen targeting Escherichia coli/Proteus infections. Electron microscopy identified six phage types, with numerically T7-like phages dominating over T4-like phages. A metagenomic approach using taxonomical classification, reference mapping and de novo assembly identified 18 distinct phage types, including 7 genera of Podoviridae, 2 established and 2 proposed genera of Myoviridae, and 2 genera of Siphoviridae. De novo assembly yielded 7 contigs greater than 30 kb,more » including a 147-kb Myovirus genome and a 42-kb genome of a potentially new phage. Bioinformatic analysis did not reveal undesired genes and a small human volunteer trial did not associate adverse effects with oral phage exposure. - Highlights: • We analyzed the composition of a commercial Russian phage cocktail. • The cocktail consists of at least 10 different phage genera. • No undesired genes were detected. • No adverse effects were seen upon oral application in a small human clinical trial.« less
Rapid Y degeneration and dosage compensation in plant sex chromosomes

PubMed Central

Papadopulos, Alexander S. T.; Chester, Michael; Ridout, Kate; Filatov, Dmitry A.

2015-01-01

The nonrecombining regions of animal Y chromosomes are known to undergo genetic degeneration, but previous work has failed to reveal large-scale gene degeneration on plant Y chromosomes. Here, we uncover rapid and extensive degeneration of Y-linked genes in a plant species, Silene latifolia, that evolved sex chromosomes de novo in the last 10 million years. Previous transcriptome-based studies of this species missed unexpressed, degenerate Y-linked genes. To identify sex-linked genes, regardless of their expression, we sequenced male and female genomes of S. latifolia and integrated the genomic contigs with a high-density genetic map. This revealed that 45% of Y-linked genes are not expressed, and 23% are interrupted by premature stop codons. This contrasts with X-linked genes, in which only 1.3% of genes contained stop codons and 4.3% of genes were not expressed in males. Loss of functional Y-linked genes is partly compensated for by gene-specific up-regulation of X-linked genes. Our results demonstrate that the rate of genetic degeneration of Y-linked genes in S. latifolia is as fast as in animals, and that the evolutionary trajectories of sex chromosomes are similar in the two kingdoms. PMID:26438872
Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

NASA Astrophysics Data System (ADS)

Chen, Ellson Y.

1997-05-01

So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.
The photoreceptor cell-specific nuclear receptor gene (PNR) accounts for retinitis pigmentosa in the Crypto-Jews from Portugal (Marranos), survivors from the Spanish Inquisition.

PubMed

Gerber, S; Rozet, J M; Takezawa, S I; dos Santos, L C; Lopes, L; Gribouval, O; Penet, C; Perrault, I; Ducroq, D; Souied, E; Jeanpierre, M; Romana, S; Frézal, J; Ferraz, F; Yu-Umesono, R; Munnich, A; Kaplan, J

2000-09-01

The last Crypto-Jews (Marranos) are the survivors of Spanish Jews who were persecuted in the late fifteenth century, escaped to Portugal and were forced to convert to save their lives. Isolated groups still exist in mountainous areas such as Belmonte in the Beira-Baixa province of Portugal. We report here the genetic study of a highly consanguineous endogamic population of Crypto-Jews of Belmonte affected with autosomal recessive retinitis pigmentosa (RP). A genome-wide search for homozygosity allowed us to localize the disease gene to chromosome 15q22-q24 (Zmax=2.95 at theta=0 at the D15S131 locus). Interestingly, the photoreceptor cell-specific nuclear receptor (PNR) gene, the expression of which is restricted to the outer nuclear layer of retinal photoreceptor cells, was found to map to the YAC contig encompassing the disease locus. A search for mutations allowed us to ascribe the RP of Crypto-Jews of Belmonte to a homozygous missense mutation in the PNR gene. Preliminary haplotype studies support the view that this mutation is relatively ancient but probably occurred after the population settled in Belmonte.
Initial sequence and comparative analysis of the cat genome

PubMed Central

Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

2007-01-01

The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172
Cloning, genomic organization, and chromosomal localization of human citrate transport protein to the DiGeorge/velocardiofacial syndrome minimal critical region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goldmuntz, E.; Budarf, M.L.; Wang, Zhili

1996-04-15

DiGeorge syndrome (DGS) and velocardiofacial syndrome have been shown to be associated with microdeletions of chromosomal region 22q11. More recently, patients with conotruncal anomaly face syndrome and some nonsyndromic patients with isolated forms of conotruncal cardiac defects have been found to have 22q11 microdeletions as well. The commonly deleted region, called the DiGeorge chromosomal region (DGCR), spans approximately 1.2 mb and is estimated to contain at least 30 genes. We report a computational approach for gene identification that makes use of large-scale sequencing of cosmids from a contig spanning the DGCR. Using this methodology, we have mapped the human homologmore » of a rodent citrate transport protein to the DGCR. We have isolated a partial cDNA containing the complete open reading frame and have determined the genomic structure by comparing the genomic sequence from the cosmid to the sequence of the cDNA clone. Whether the citrate transport protein can be implicated in the biological etiology of DGS or other 22q11 microdeletion syndromes remains to be defined. 36 refs., 3 figs., 1 tab.« less
Genetic and molecular characterization of the maize rp3 rust resistance locus.

PubMed Central

Webb, Craig A; Richter, Todd E; Collins, Nicholas C; Nicolas, Marie; Trick, Harold N; Pryor, Tony; Hulbert, Scot H

2002-01-01

In maize, the Rp3 gene confers resistance to common rust caused by Puccinia sorghi. Flanking marker analysis of rust-susceptible rp3 variants suggested that most of them arose via unequal crossing over, indicating that rp3 is a complex locus like rp1. The PIC13 probe identifies a nucleotide binding site-leucine-rich repeat (NBS-LRR) gene family that maps to the complex. Rp3 variants show losses of PIC13 family members relative to the resistant parents when probed with PIC13, indicating that the Rp3 gene is a member of this family. Gel blots and sequence analysis suggest that at least 9 family members are at the locus in most Rp3-carrying lines and that at least 5 of these are transcribed in the Rp3-A haplotype. The coding regions of 14 family members, isolated from three different Rp3-carrying haplotypes, had DNA sequence identities from 93 to 99%. Partial sequencing of clones of a BAC contig spanning the rp3 locus in the maize inbred line B73 identified five different PIC13 paralogues in a region of approximately 140 kb. PMID:12242248
Draft Genome Sequence of Gordonia sp. Strain UCD-TK1 (Phylum Actinobacteria)

PubMed Central

Koenigsaecker, Tynisha M.; Coil, David A.

2016-01-01

Here, we present the draft genome of Gordonia sp. strain UCD-TK1. The assembly contains 5,470,576 bp in 98 contigs. This strain was isolated from a disinfected ambulatory surgery center. PMID:27738036
Mapping quantum-classical Liouville equation: projectors and trajectories.

PubMed

Kelly, Aaron; van Zon, Ramses; Schofield, Jeremy; Kapral, Raymond

2012-02-28

The evolution of a mixed quantum-classical system is expressed in the mapping formalism where discrete quantum states are mapped onto oscillator states, resulting in a phase space description of the quantum degrees of freedom. By defining projection operators onto the mapping states corresponding to the physical quantum states, it is shown that the mapping quantum-classical Liouville operator commutes with the projection operator so that the dynamics is confined to the physical space. It is also shown that a trajectory-based solution of this equation can be constructed that requires the simulation of an ensemble of entangled trajectories. An approximation to this evolution equation which retains only the Poisson bracket contribution to the evolution operator does admit a solution in an ensemble of independent trajectories but it is shown that this operator does not commute with the projection operators and the dynamics may take the system outside the physical space. The dynamical instabilities, utility, and domain of validity of this approximate dynamics are discussed. The effects are illustrated by simulations on several quantum systems.
Designing quantum information processing via structural physical approximation.

PubMed

Bae, Joonwoo

2017-10-01

In quantum information processing it may be possible to have efficient computation and secure communication beyond the limitations of classical systems. In a fundamental point of view, however, evolution of quantum systems by the laws of quantum mechanics is more restrictive than classical systems, identified to a specific form of dynamics, that is, unitary transformations and, consequently, positive and completely positive maps to subsystems. This also characterizes classes of disallowed transformations on quantum systems, among which positive but not completely maps are of particular interest as they characterize entangled states, a general resource in quantum information processing. Structural physical approximation offers a systematic way of approximating those non-physical maps, positive but not completely positive maps, with quantum channels. Since it has been proposed as a method of detecting entangled states, it has stimulated fundamental problems on classifications of positive maps and the structure of Hermitian operators and quantum states, as well as on quantum measurement such as quantum design in quantum information theory. It has developed efficient and feasible methods of directly detecting entangled states in practice, for which proof-of-principle experimental demonstrations have also been performed with photonic qubit states. Here, we present a comprehensive review on quantum information processing with structural physical approximations and the related progress. The review mainly focuses on properties of structural physical approximations and their applications toward practical information applications.
Designing quantum information processing via structural physical approximation

NASA Astrophysics Data System (ADS)

Bae, Joonwoo

2017-10-01

In quantum information processing it may be possible to have efficient computation and secure communication beyond the limitations of classical systems. In a fundamental point of view, however, evolution of quantum systems by the laws of quantum mechanics is more restrictive than classical systems, identified to a specific form of dynamics, that is, unitary transformations and, consequently, positive and completely positive maps to subsystems. This also characterizes classes of disallowed transformations on quantum systems, among which positive but not completely maps are of particular interest as they characterize entangled states, a general resource in quantum information processing. Structural physical approximation offers a systematic way of approximating those non-physical maps, positive but not completely positive maps, with quantum channels. Since it has been proposed as a method of detecting entangled states, it has stimulated fundamental problems on classifications of positive maps and the structure of Hermitian operators and quantum states, as well as on quantum measurement such as quantum design in quantum information theory. It has developed efficient and feasible methods of directly detecting entangled states in practice, for which proof-of-principle experimental demonstrations have also been performed with photonic qubit states. Here, we present a comprehensive review on quantum information processing with structural physical approximations and the related progress. The review mainly focuses on properties of structural physical approximations and their applications toward practical information applications.
Live cell refractometry using Hilbert phase microscopy and confocal reflectance microscopy.

PubMed

Lue, Niyom; Choi, Wonshik; Popescu, Gabriel; Yaqoob, Zahid; Badizadegan, Kamran; Dasari, Ramachandra R; Feld, Michael S

2009-11-26

Quantitative chemical analysis has served as a useful tool for understanding cellular metabolisms in biology. Among many physical properties used in chemical analysis, refractive index in particular has provided molecular concentration that is an important indicator for biological activities. In this report, we present a method of extracting full-field refractive index maps of live cells in their native states. We first record full-field optical thickness maps of living cells by Hilbert phase microscopy and then acquire physical thickness maps of the same cells using a custom-built confocal reflectance microscope. Full-field and axially averaged refractive index maps are acquired from the ratio of optical thickness to physical thickness. The accuracy of the axially averaged index measurement is 0.002. This approach can provide novel biological assays of label-free living cells in situ.
Live Cell Refractometry Using Hilbert Phase Microscopy and Confocal Reflectance Microscopy†

PubMed Central

Lue, Niyom; Choi, Wonshik; Popescu, Gabriel; Yaqoob, Zahid; Badizadegan, Kamran; Dasari, Ramachandra R.; Feld, Michael S.

2010-01-01

Quantitative chemical analysis has served as a useful tool for understanding cellular metabolisms in biology. Among many physical properties used in chemical analysis, refractive index in particular has provided molecular concentration that is an important indicator for biological activities. In this report, we present a method of extracting full-field refractive index maps of live cells in their native states. We first record full-field optical thickness maps of living cells by Hilbert phase microscopy and then acquire physical thickness maps of the same cells using a custom-built confocal reflectance microscope. Full-field and axially averaged refractive index maps are acquired from the ratio of optical thickness to physical thickness. The accuracy of the axially averaged index measurement is 0.002. This approach can provide novel biological assays of label-free living cells in situ. PMID:19803506
Theobroma cacao: A genetically integrated physical map and genome-scale comparative synteny analysis

USDA-ARS?s Scientific Manuscript database

A comprehensive integrated genomic framework is considered a centerpiece of genomic research. In collaboration with the USDA-ARS (SHRS) and Mars Inc., the Clemson University Genomics Institute (CUGI) has developed a genetically anchored physical map of the T. cacao genome. Three BAC libraries contai...
Link Maps and Map Meetings: Scaffolding Student Learning

ERIC Educational Resources Information Center

Lindstrom, Christine; Sharma, Manjula D.

2009-01-01

With student numbers decreasing and traditional teaching methods having been found inefficient, it is widely accepted that alternative teaching methods need to be explored in tertiary physics education. In 2006 a different teaching environment was offered to 244 first year students with little or no prior formal instruction in physics. Students…
Systematic mapping review of the factors influencing physical activity and sedentary behaviour in ethnic minority groups in Europe: a DEDIPAC study.

PubMed

Langøien, Lars Jørun; Terragni, Laura; Rugseth, Gro; Nicolaou, Mary; Holdsworth, Michelle; Stronks, Karien; Lien, Nanna; Roos, Gun

2017-07-24

Physical activity and sedentary behaviour are associated with health and wellbeing. Studies indicate that ethnic minority groups are both less active and more sedentary than the majority population and that factors influencing these behaviours may differ. Mapping the factors influencing physical activity and sedentary behaviour among ethnic minority groups living in Europe can help to identify determinants of physical activity and sedentary behaviour, research gaps and guide future research. A systematic mapping review was conducted to map the factors associated with physical activity and sedentary behaviour among ethnic minority groups living in Europe (protocol PROSPERO ID = CRD42014014575). Six databases were searched for quantitative and qualitative research published between 1999 and 2014. In synthesizing the findings, all factors were sorted and structured into clusters following a data driven approach and concept mapping. Sixty-three articles were identified out of 7794 returned by the systematic search. These included 41 quantitative and 22 qualitative studies. Of these 58 focused on physical activity, 5 on both physical activity and sedentary behaviour and none focused on sedentary behaviour. The factors associated with physical activity and sedentary behaviour were grouped into eight clusters. Social & cultural environment (n = 55) and Psychosocial (39) were the clusters containing most factors, followed by Physical environment & accessibility (33), Migration context (15), Institutional environment (14), Social & material resources (12), Health and health communication (12), Political environment (3). An important finding was that cultural and religious issues, in particular those related to gender issues, were recurring factors across the clusters. Physical activity and sedentary behaviour among ethnic minority groups living in Europe are influenced by a wide variety of factors, especially informed by qualitative studies. More comparative studies are needed as well as inclusion of a wider spectrum of the diverse ethnic minority groups resettled in different European countries. Few studies have investigated factors influencing sedentary behaviour. It is important in the future to address specific factors influencing physical activity and sedentary behaviour among different ethnic minority groups in order to plan and implement effective interventions.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.