Sample records for analyzing deep sequencing

  1. Low-Latency Telerobotic Sample Return and Biomolecular Sequencing for Deep Space Gateway

    NASA Astrophysics Data System (ADS)

    Lupisella, M.; Bleacher, J.; Lewis, R.; Dworkin, J.; Wright, M.; Burton, A.; Rubins, K.; Wallace, S.; Stahl, S.; John, K.; Archer, D.; Niles, P.; Regberg, A.; Smith, D.; Race, M.; Chiu, C.; Russell, J.; Rampe, E.; Bywaters, K.

    2018-02-01

    Low-latency telerobotics, crew-assisted sample return, and biomolecular sequencing can be used to acquire and analyze lunar farside and/or Apollo landing site samples. Sequencing can also be used to monitor and study Deep Space Gateway environment and crew health.

  2. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  3. Geoseq: a tool for dissecting deep-sequencing datasets.

    PubMed

    Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

    2010-10-12

    Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  4. A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses

    USDA-ARS?s Scientific Manuscript database

    Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...

  5. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes.

    PubMed

    Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.

  6. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes

    PubMed Central

    Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372

  7. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

    PubMed Central

    Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2015-01-01

    Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593

  8. An Efficient Strategy of Screening for Pathogens in Wild-Caught Ticks and Mosquitoes by Reusing Small RNA Deep Sequencing Data

    PubMed Central

    An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang

    2014-01-01

    This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575

  9. Deep sequencing of hepatitis C virus hypervariable region 1 reveals no correlation between genetic heterogeneity and antiviral treatment outcome

    PubMed Central

    2014-01-01

    Background Hypervariable region 1 (HVR1) contained within envelope protein 2 (E2) gene is the most variable part of HCV genome and its translation product is a major target for the host immune response. Variability within HVR1 may facilitate evasion of the immune response and could affect treatment outcome. The aim of the study was to analyze the impact of HVR1 heterogeneity employing sensitive ultra-deep sequencing, on the outcome of PEG-IFN-α (pegylated interferon α) and ribavirin treatment. Methods HVR1 sequences were amplified from pretreatment serum samples of 25 patients infected with genotype 1b HCV (12 responders and 13 non-responders) and were subjected to pyrosequencing (GS Junior, 454/Roche). Reads were corrected for sequencing error using ShoRAH software, while population reconstruction was done using three different minimal variant frequency cut-offs of 1%, 2% and 5%. Statistical analysis was done using Mann–Whitney and Fisher’s exact tests. Results Complexity, Shannon entropy, nucleotide diversity per site, genetic distance and the number of genetic substitutions were not significantly different between responders and non-responders, when analyzing viral populations at any of the three frequencies (≥1%, ≥2% and ≥5%). When clonal sample was used to determine pyrosequencing error, 4% of reads were found to be incorrect and the most abundant variant was present at a frequency of 1.48%. Use of ShoRAH reduced the sequencing error to 1%, with the most abundant erroneous variant present at frequency of 0.5%. Conclusions While deep sequencing revealed complex genetic heterogeneity of HVR1 in chronic hepatitis C patients, there was no correlation between treatment outcome and any of the analyzed quasispecies parameters. PMID:25016390

  10. Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing

    PubMed Central

    Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken

    2012-01-01

    Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646

  11. Phylogenetic and Genome-Wide Deep-Sequencing Analyses of Canine Parvovirus Reveal Co-Infection with Field Variants and Emergence of a Recent Recombinant Strain

    PubMed Central

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348

  12. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study

    PubMed Central

    Porter, Danielle P.; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D.; White, Kirsten L.

    2015-01-01

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study. PMID:26690199

  13. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study.

    PubMed

    Porter, Danielle P; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D; White, Kirsten L

    2015-12-07

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study.

  14. Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.

    PubMed

    Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A

    2017-11-01

    To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.

  15. A first insight into the occurrence and expression of functional amoA and accA genes of autotrophic and ammonia-oxidizing bathypelagic Crenarchaeota of Tyrrhenian Sea

    NASA Astrophysics Data System (ADS)

    Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata

    2009-05-01

    The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.

  16. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.

    PubMed

    Wrzeszczynski, Kazimierz O; Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A; Moore Vogel, Julia L; Bruce, Jeffrey N; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V; Zody, Michael C; Jobanputra, Vaidehi; Royyuru, Ajay K; Darnell, Robert B

    2017-08-01

    To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. NCT02725684.

  17. Dissecting genetic and environmental mutation signatures with model organisms.

    PubMed

    Segovia, Romulo; Tam, Annie S; Stirling, Peter C

    2015-08-01

    Deep sequencing has impacted on cancer research by enabling routine sequencing of genomes and exomes to identify genetic changes associated with carcinogenesis. Researchers can now use the frequency, type, and context of all mutations in tumor genomes to extract mutation signatures that reflect the driving mutational processes. Identifying mutation signatures, however, may not immediately suggest a mechanism. Consequently, several recent studies have employed deep sequencing of model organisms exposed to discrete genetic or environmental perturbations. These studies exploit the simpler genomes and availability of powerful genetic tools in model organisms to analyze mutation signatures under controlled conditions, forging mechanistic links between mutational processes and signatures. We discuss the power of this approach and suggest that many such studies may be on the horizon. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.

    PubMed

    Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan

    2018-05-24

    High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.

  19. MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage.

    PubMed

    Smith, Eric E; Nandigam, Kaveer R N; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M

    2010-09-01

    MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds. Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semiautomated method. WMH and microbleeds were compared according to site of symptomatic hematoma origin (lobar versus deep) or by pattern of hemorrhages, including both hematomas and microbleeds, on MRI gradient recalled echo sequence (grouped as lobar only-probable CAA, lobar only-possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Patients with lobar and deep hemispheric hematoma had similar median normalized WMH volumes (19.5 cm versus 19.9 cm(3), P=0.74) and prevalence of >or=1 microbleed (54% versus 52%, P=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category; however, the prevalence of brain stem T2 hyperintensity was lower in lobar hematoma versus deep hematoma (54% versus 70%, P=0.004). Mixed ICH was common (23%). Patients with mixed ICH had large normalized WMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI gradient recalled echo sequence in up to one fourth of patients; in these patients, both hypertension and CAA may be contributing to the burden of WMH.

  20. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  1. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  2. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma

    PubMed Central

    Wrzeszczynski, Kazimierz O.; Frank, Mayu O.; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A.; Moore Vogel, Julia L.; Bruce, Jeffrey N.; Lassman, Andrew B.; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V.; Zody, Michael C.; Jobanputra, Vaidehi; Royyuru, Ajay K.

    2017-01-01

    Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. ClinicalTrials.gov identifier: NCT02725684. PMID:28740869

  3. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  4. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  5. RNA deep sequencing as a tool for selection of cell lines for systematic subcellular localization of all human proteins.

    PubMed

    Danielsson, Frida; Wiking, Mikaela; Mahdessian, Diana; Skogs, Marie; Ait Blal, Hammou; Hjelmare, Martin; Stadler, Charlotte; Uhlén, Mathias; Lundberg, Emma

    2013-01-04

    One of the major challenges of a chromosome-centric proteome project is to explore in a systematic manner the potential proteins identified from the chromosomal genome sequence, but not yet characterized on a protein level. Here, we describe the use of RNA deep sequencing to screen human cell lines for RNA profiles and to use this information to select cell lines suitable for characterization of the corresponding gene product. In this manner, the subcellular localization of proteins can be analyzed systematically using antibody-based confocal microscopy. We demonstrate the usefulness of selecting cell lines with high expression levels of RNA transcripts to increase the likelihood of high quality immunofluorescence staining and subsequent successful subcellular localization of the corresponding protein. The results show a path to combine transcriptomics with affinity proteomics to characterize the proteins in a gene- or chromosome-centric manner.

  6. Chemoresistance Evolution in Triple-Negative Breast Cancer Delineated by Single-Cell Sequencing.

    PubMed

    Kim, Charissa; Gao, Ruli; Sei, Emi; Brandt, Rachel; Hartman, Johan; Hatschek, Thomas; Crosetto, Nicola; Foukakis, Theodoros; Navin, Nicholas E

    2018-05-03

    Triple-negative breast cancer (TNBC) is an aggressive subtype that frequently develops resistance to chemotherapy. An unresolved question is whether resistance is caused by the selection of rare pre-existing clones or alternatively through the acquisition of new genomic aberrations. To investigate this question, we applied single-cell DNA and RNA sequencing in addition to bulk exome sequencing to profile longitudinal samples from 20 TNBC patients during neoadjuvant chemotherapy (NAC). Deep-exome sequencing identified 10 patients in which NAC led to clonal extinction and 10 patients in which clones persisted after treatment. In 8 patients, we performed a more detailed study using single-cell DNA sequencing to analyze 900 cells and single-cell RNA sequencing to analyze 6,862 cells. Our data showed that resistant genotypes were pre-existing and adaptively selected by NAC, while transcriptional profiles were acquired by reprogramming in response to chemotherapy in TNBC patients. Copyright © 2018 Elsevier Inc. All rights reserved.

  7. Evidence for thermal convection in the deep carbonate aquifer of the eastern sector of the Po Plain, Italy

    NASA Astrophysics Data System (ADS)

    Pasquale, V.; Chiozzi, P.; Verdoya, M.

    2013-05-01

    Temperatures recorded in wells as deep as 6 km drilled for hydrocarbon prospecting were used together with geological information to depict the thermal regime of the sedimentary sequence of the eastern sector of the Po Plain. After correction for drilling disturbance, temperature data were analyzed through an inversion technique based on a laterally constant thermal gradient model. The obtained thermal gradient is quite low within the deep carbonate unit (14 mK m- 1), while it is larger (53 mK m- 1) in the overlying impermeable formations. In the uppermost sedimentary layers, the thermal gradient is close to the regional average (21 mK m- 1). We argue that such a vertical change cannot be ascribed to thermal conductivity variation within the sedimentary sequence, but to deep groundwater flow. Since the hydrogeological characteristics (including litho-stratigraphic sequence and structural setting) hardly permit forced convection, we suggest that thermal convection might occur within the deep carbonate aquifer. The potential of this mechanism was evaluated by means of the Rayleigh number analysis. It turned out that permeability required for convection to occur must be larger than 3 10- 15 m2. The average over-heat ratio is 0.45. The lateral variation of hydrothermal regime was tested by using temperature data representing the aquifer thermal conditions. We found that thermal convection might be more developed and variable at the Ferrara High and its surroundings, where widespread fracturing may have increased permeability.

  8. Making sense of deep sequencing

    PubMed Central

    Goldman, D.; Domschke, K.

    2016-01-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306

  9. De novo peptide sequencing by deep learning

    PubMed Central

    Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming

    2017-01-01

    De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701

  10. Deep sequencing of small RNA repertoires in mice reveals metabolic disorders-associated hepatic miRNAs.

    PubMed

    Liang, Tingming; Liu, Chang; Ye, Zhenchao

    2013-01-01

    Obesity and associated metabolic disorders contribute importantly to the metabolic syndrome. On the other hand, microRNAs (miRNAs) are a class of small non-coding RNAs that repress target gene expression by inducing mRNA degradation and/or translation repression. Dysregulation of specific miRNAs in obesity may influence energy metabolism and cause insulin resistance, which leads to dyslipidemia, steatosis hepatis and type 2 diabetes. In the present study, we comprehensively analyzed and validated dysregulated miRNAs in ob/ob mouse liver, as well as miRNA groups based on miRNA gene cluster and gene family by using deep sequencing miRNA datasets. We found that over 13.8% of the total analyzed miRNAs were dysregulated, of which 37 miRNA species showed significantly differential expression. Further RT-qPCR analysis in some selected miRNAs validated the similar expression patterns observed in deep sequencing. Interestingly, we found that miRNA gene cluster and family always showed consistent dysregulation patterns in ob/ob mouse liver, although they had various enrichment levels. Functional enrichment analysis revealed the versatile physiological roles (over six signal pathways and five human diseases) of these miRNAs. Biological studies indicated that overexpression of miR-126 or inhibition of miR-24 in AML-12 cells attenuated free fatty acids-induced fat accumulation. Taken together, our data strongly suggest that obesity and metabolic disturbance are tightly associated with functional miRNAs. We also identified hepatic miRNA candidates serving as potential biomarkers for the diagnose of the metabolic syndrome.

  11. Deep sequencing and in silico analysis of small RNA library reveals novel miRNA from leaf Persicaria minor transcriptome.

    PubMed

    Samad, Abdul Fatah A; Nazaruddin, Nazaruddin; Murad, Abdul Munir Abdul; Jani, Jaeyres; Zainal, Zamri; Ismail, Ismanizan

    2018-03-01

    In current era, majority of microRNA (miRNA) are being discovered through computational approaches which are more confined towards model plants. Here, for the first time, we have described the identification and characterization of novel miRNA in a non-model plant, Persicaria minor ( P . minor ) using computational approach. Unannotated sequences from deep sequencing were analyzed based on previous well-established parameters. Around 24 putative novel miRNAs were identified from 6,417,780 reads of the unannotated sequence which represented 11 unique putative miRNA sequences. PsRobot target prediction tool was deployed to identify the target transcripts of putative novel miRNAs. Most of the predicted target transcripts (mRNAs) were known to be involved in plant development and stress responses. Gene ontology showed that majority of the putative novel miRNA targets involved in cellular component (69.07%), followed by molecular function (30.08%) and biological process (0.85%). Out of 11 unique putative miRNAs, 7 miRNAs were validated through semi-quantitative PCR. These novel miRNAs discoveries in P . minor may develop and update the current public miRNA database.

  12. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  13. Bacterial community diversity of the deep-sea octocoral Paramuricea placomus.

    PubMed

    Kellogg, Christina A; Ross, Steve W; Brooke, Sandra D

    2016-01-01

    Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus . Samples from five colonies of P. placomus were collected from Baltimore Canyon (379-382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomus colonies was identified, comprising 68-90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomus does not appear to include the genus Endozoicomonas , which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.

  14. Bacterial community diversity of the deep-sea octocoral Paramuricea placomus

    USGS Publications Warehouse

    Kellogg, Christina A.; Ross, Steve W.; Brooke, Sandra D.

    2016-01-01

    Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus. Samples from five colonies of P. placomus were collected from Baltimore Canyon (379–382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomuscolonies was identified, comprising 68–90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomusdoes not appear to include the genus Endozoicomonas, which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.

  15. Diversity of Bacteria at Healthy Human Conjunctiva

    PubMed Central

    Dong, Qunfeng; Brulc, Jennifer M.; Iovieno, Alfonso; Bates, Brandon; Garoutte, Aaron; Miller, Darlene; Revanna, Kashi V.; Gao, Xiang; Antonopoulos, Dionysios A.; Slepak, Vladlen Z.

    2011-01-01

    Purpose. Ocular surface (OS) microbiota contributes to infectious and autoimmune diseases of the eye. Comprehensive analysis of microbial diversity at the OS has been impossible because of the limitations of conventional cultivation techniques. This pilot study aimed to explore true diversity of human OS microbiota using DNA sequencing-based detection and identification of bacteria. Methods. Composition of the bacterial community was characterized using deep sequencing of the 16S rRNA gene amplicon libraries generated from total conjunctival swab DNA. The DNA sequences were classified and the diversity parameters measured using bioinformatics software ESPRIT and MOTHUR and tools available through the Ribosomal Database Project-II (RDP-II). Results. Deep sequencing of conjunctival rDNA from four subjects yielded a total of 115,003 quality DNA reads, corresponding to 221 species-level phylotypes per subject. The combined bacterial community classified into 5 phyla and 59 distinct genera. However, 31% of all DNA reads belonged to unclassified or novel bacteria. The intersubject variability of individual OS microbiomes was very significant. Regardless, 12 genera—Pseudomonas, Propionibacterium, Bradyrhizobium, Corynebacterium, Acinetobacter, Brevundimonas, Staphylococci, Aquabacterium, Sphingomonas, Streptococcus, Streptophyta, and Methylobacterium—were ubiquitous among the analyzed cohort and represented the putative “core” of conjunctival microbiota. The other 47 genera accounted for <4% of the classified portion of this microbiome. Unexpectedly, healthy conjunctiva contained many genera that are commonly identified as ocular surface pathogens. Conclusions. The first DNA sequencing-based survey of bacterial population at the conjunctiva have revealed an unexpectedly diverse microbial community. All analyzed samples contained ubiquitous (core) genera that included commensal, environmental, and opportunistic pathogenic bacteria. PMID:21571682

  16. Bi-PROF

    PubMed Central

    Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha

    2013-01-01

    The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588

  17. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  18. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  19. Oasis 2: improved online analysis of small RNA-seq data.

    PubMed

    Rahman, Raza-Ur; Gautam, Abhivyakti; Bethune, Jörn; Sattar, Abdul; Fiosins, Maksims; Magruder, Daniel Sumner; Capece, Vincenzo; Shomroni, Orr; Bonn, Stefan

    2018-02-14

    Small RNA molecules play important roles in many biological processes and their dysregulation or dysfunction can cause disease. The current method of choice for genome-wide sRNA expression profiling is deep sequencing. Here we present Oasis 2, which is a new main release of the Oasis web application for the detection, differential expression, and classification of small RNAs in deep sequencing data. Compared to its predecessor Oasis, Oasis 2 features a novel and speed-optimized sRNA detection module that supports the identification of small RNAs in any organism with higher accuracy. Next to the improved detection of small RNAs in a target organism, the software now also recognizes potential cross-species miRNAs and viral and bacterial sRNAs in infected samples. In addition, novel miRNAs can now be queried and visualized interactively, providing essential information for over 700 high-quality miRNA predictions across 14 organisms. Robust biomarker signatures can now be obtained using the novel enhanced classification module. Oasis 2 enables biologists and medical researchers to rapidly analyze and query small RNA deep sequencing data with improved precision, recall, and speed, in an interactive and user-friendly environment. Oasis 2 is implemented in Java, J2EE, mysql, Python, R, PHP and JavaScript. It is freely available at https://oasis.dzne.de.

  20. Identification of microRNA-like RNAs from Curvularia lunata associated with maize leaf spot by bioinformation analysis and deep sequencing.

    PubMed

    Liu, Tong; Hu, John; Zuo, Yuhu; Jin, Yazhong; Hou, Jumei

    2016-04-01

    Deep sequencing of small RNAs is a useful tool to identify novel small RNAs that may be involved in fungal growth and pathogenesis. In this study, we used HiSeq deep sequencing to identify 747,487 unique small RNAs from Curvularia lunata. Among these small RNAs were 1012 microRNA-like RNAs (milRNAs), which are similar to other known microRNAs, and 48 potential novel milRNAs without homologs in other organisms have been identified using the miRBase© database. We used quantitative PCR to analyze the expression of four of these milRNAs from C. lunata at different developmental stages. The analysis revealed several changes associated with germinating conidia and mycelial growth, suggesting that these milRNAs may play a role in pathogen infection and mycelial growth. A total of 8334 target mRNAs for the 1012 milRNAs that were identified, and 256 target mRNAs for the 48 novel milRNAs were predicted by computational analysis. These target mRNAs of milRNAs were also performed by gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. To our knowledge, this study is the first report of C. lunata's milRNA profiles. This information will provide a better understanding of pathogen development and infection mechanism.

  1. Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach

    NASA Astrophysics Data System (ADS)

    Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan

    2013-02-01

    Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.

  2. Subsurface microbial diversity in deep-granitic-fracture water in Colorado

    USGS Publications Warehouse

    Sahl, J.W.; Schmidt, R.; Swanner, E.D.; Mandernack, K.W.; Templeton, A.S.; Kieft, Thomas L.; Smith, R.L.; Sanford, W.E.; Callaghan, R.L.; Mitton, J.B.; Spear, J.R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This "Henderson candidate division" dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. Copyright ?? 2008, American Society for Microbiology. All Rights Reserved.

  3. Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado▿

    PubMed Central

    Sahl, Jason W.; Schmidt, Raleigh; Swanner, Elizabeth D.; Mandernack, Kevin W.; Templeton, Alexis S.; Kieft, Thomas L.; Smith, Richard L.; Sanford, William E.; Callaghan, Robert L.; Mitton, Jeffry B.; Spear, John R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This “Henderson candidate division” dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. PMID:17981950

  4. MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage

    PubMed Central

    Smith, Eric E.; Nandigam, Kaveer R.N.; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M.

    2014-01-01

    Background MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds (MB). Methods Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semi-automated method. WMH and MB were compared according to site of symptomatic hematoma origin (lobar vs. deep) or by pattern of hemorrhages, including both hematomas and MB, on MRI GRE sequence (grouped as lobar only--probable CAA, lobar only--possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Results Lobar and deep hemispheric hematoma patients had similar median nWMH volumes (19.5 cm vs. 19.9 cm3, p=0.74) and prevalence of ≥1 MB (54% vs. 52%, p=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category, however the prevalence of brainstem T2 hyperintensity was lower in lobar hematoma vs. deep hematoma (54% vs. 70%, p=0.004). Mixed ICH was common (23%). Mixed ICH patients had large nWMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. Conclusions WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI GRE in up to one quarter of patients; in these patients both hypertension and CAA may be contributing to the burden of WMH. PMID:20689084

  5. High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM.

    PubMed

    Stegmayer, Georgina; Yones, Cristian; Kamenetzky, Laura; Milone, Diego H

    2017-01-01

    The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.

  6. Quantitative phenotyping via deep barcode sequencing

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793

  7. Cancer-Associated Mutations in Endometriosis without Cancer

    PubMed Central

    Anglesio, M.S.; Papadopoulos, N.; Ayhan, A.; Nazeran, T.M.; Noë, M.; Horlings, H.M.; Lum, A.; Jones, S.; Senz, J.; Seckin, T.; Ho, J.; Wu, R.-C.; Lac, V.; Ogawa, H.; Tessier-Cloutier, B.; Alhassan, R.; Wang, A.; Wang, Y.; Cohen, J.D.; Wong, F.; Hasanovic, A.; Orr, N.; Zhang, M.; Popoli, M.; McMahon, W.; Wood, L.D.; Mattox, A.; Allaire, C.; Segars, J.; Williams, C.; Tomasetti, C.; Boyd, N.; Kinzler, K.W.; Gilks, C.B.; Diaz, L.; Wang, T.-L.; Vogelstein, B.; Yong, P.J.; Huntsman, D.G.; Shih, I.-M.

    2017-01-01

    BACKGROUND Endometriosis, defined as the presence of ectopic endometrial stroma and epithelium, affects approximately 10% of reproductive-age women and can cause pelvic pain and infertility. Endometriotic lesions are considered to be benign inflammatory lesions but have cancerlike features such as local invasion and resistance to apoptosis. METHODS We analyzed deeply infiltrating endometriotic lesions from 27 patients by means of exomewide sequencing (24 patients) or cancer-driver targeted sequencing (3 patients). Mutations were validated with the use of digital genomic methods in micro-dissected epithelium and stroma. Epithelial and stromal components of lesions from an additional 12 patients were analyzed by means of a droplet digital polymerase-chain-reaction (PCR) assay for recurrent activating KRAS mutations. RESULTS Exome sequencing revealed somatic mutations in 19 of 24 patients (79%). Five patients harbored known cancer driver mutations in ARID1A, PIK3CA, KRAS, or PPP2R1A, which were validated by Safe-Sequencing System or immunohistochemical analysis. The likelihood of driver genes being affected at this rate in the absence of selection was estimated at P = 0.001 (binomial test). Targeted sequencing and a droplet digital PCR assay identified KRAS mutations in 2 of 3 patients and 3 of 12 patients, respectively, with mutations in the epithelium but not the stroma. One patient harbored two different KRAS mutations, c.35G→T and c.35G→C, and another carried identical KRAS c.35G→A mutations in three distinct lesions. CONCLUSIONS We found that lesions in deep infiltrating endometriosis, which are associated with virtually no risk of malignant transformation, harbor somatic cancer driver mutations. Ten of 39 deep infiltrating lesions (26%) carried driver mutations; all the tested somatic mutations appeared to be confined to the epithelial compartment of endometriotic lesions. PMID:28489996

  8. Water mass dynamics shape Ross Sea protist communities in mesopelagic and bathypelagic layers

    NASA Astrophysics Data System (ADS)

    Zoccarato, Luca; Pallavicini, Alberto; Cerino, Federica; Fonda Umani, Serena; Celussi, Mauro

    2016-12-01

    Deep-sea environments host the largest pool of microbes and represent the last largely unexplored and poorly known ecosystems on Earth. The Ross Sea is characterized by unique oceanographic dynamics and harbors several water masses deeply involved in cooling and ventilation of deep oceans. In this study the V9 region of the 18S rDNA was targeted and sequenced with the Ion Torrent high-throughput sequencing technology to unveil differences in protist communities (>2 μm) correlated with biogeochemical properties of the water masses. The analyzed samples were significantly different in terms of environmental parameters and community composition outlining significant structuring effects of temperature and salinity. Overall, Alveolata (especially Dinophyta), Stramenopiles and Excavata groups dominated mesopelagic and bathypelagic layers, and protist communities were shaped according to the biogeochemistry of the water masses (advection effect and mixing events). Newly-formed High Salinity Shelf Water (HSSW) was characterized by high relative abundance of phototrophic organisms that bloom at the surface during the austral summer. Oxygen-depleted Circumpolar Deep Water (CDW) showed higher abundance of Excavata, common bacterivores in deep water masses. At the shelf-break, Antarctic Bottom Water (AABW), formed by the entrainment of shelf waters in CDW, maintained the eukaryotic genetic signature typical of both parental water masses.

  9. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    PubMed

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  10. Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2018-05-02

    RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.

  11. A Follow-Up of the Multicenter Collaborative Study on HIV-1 Drug Resistance and Tropism Testing Using 454 Ultra Deep Pyrosequencing

    PubMed Central

    St. John, Elizabeth P.; Simen, Birgitte B.; Turenchalk, Gregory S.; Braverman, Michael S.; Abbate, Isabella; Aerssens, Jeroen; Bouchez, Olivier; Gabriel, Christian; Izopet, Jacques; Meixenberger, Karolin; Di Giallonardo, Francesca; Schlapbach, Ralph; Paredes, Roger; Sakwa, James; Schmitz-Agheguian, Gudrun G.; Thielen, Alexander; Victor, Martin

    2016-01-01

    Background Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. Methods A multicenter study was conducted to validate an updated assay design for 454 Life Sciences’ GS FLX Titanium system targeting protease/reverse transcriptase (RTP) and env (V3) regions to identify HIV-1 drug-resistance mutations and determine co-receptor use with high sensitivity. The study included 30 HIV-1 subtype B and 6 subtype non-B samples with viral titers (VT) of 3,940–447,400 copies/mL, two dilution series (52,129–1,340 and 25,130–734 copies/mL), and triplicate samples. Amplicons spanning PR codons 10–99, RT codons 1–251 and the entire V3 region were generated using barcoded primers. Analysis was performed using the GS Amplicon Variant Analyzer and geno2pheno for tropism. For comparison, population sequencing was performed using the ViroSeq HIV-1 genotyping system. Results The median sequencing depth across the 11 sites was 1,829 reads per position for RTP (IQR 592–3,488) and 2,410 for V3 (IQR 786–3,695). 10 preselected drug resistant variants were measured across sites and showed high inter-laboratory correlation across all sites with data (P<0.001). The triplicate samples of a plasmid mixture confirmed the high inter-laboratory consistency (mean% ± stdev: 4.6 ±0.5, 4.8 ±0.4, 4.9 ±0.3) and revealed good intra-laboratory consistency (mean% range ± stdev range: 4.2–5.2 ± 0.04–0.65). In the two dilutions series, no variants >20% were missed, variants 2–10% were detected at most sites (even at low VT), and variants 1–2% were detected by some sites. All mutations detected by population sequencing were also detected by UDS. Conclusions This assay design results in an accurate and reproducible approach to analyze HIV-1 mutant spectra, even at variant frequencies well below those routinely detectable by population sequencing. PMID:26756901

  12. Transcriptome Analysis in Venom Gland of the Predatory Giant Ant Dinoponera quadriceps: Insights into the Polypeptide Toxin Arsenal of Hymenopterans

    PubMed Central

    Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi

    2014-01-01

    Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135

  13. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

    PubMed

    You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

    2018-06-06

    As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Insights into Deep-Sea Sediment Fungal Communities from the East Indian Ocean Using Targeted Environmental Sequencing Combined with Traditional Cultivation

    PubMed Central

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  15. Integrated sequence stratigraphy of the postimpact sediments from the Eyreville core holes, Chesapeake Bay impact structure inner basin

    USGS Publications Warehouse

    Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Edwards, L.E.; Kulpecz, A.A.; Powars, D.S.; Wade, B.S.; Feigenson, M.D.; Wright, J.D.

    2009-01-01

    The Eyreville core holes provide the first continuously cored record of postimpact sequences from within the deepest part of the central Chesapeake Bay impact crater. We analyzed the upper Eocene to Pliocene postimpact sediments from the Eyreville A and C core holes for lithology (semiquantitative measurements of grain size and composition), sequence stratigraphy, and chronostratigraphy. Age is based primarily on Sr isotope stratigraphy supplemented by biostratigraphy (dinocysts, nannofossils, and planktonic foraminifers); age resolution is approximately ??0.5 Ma for early Miocene sequences and approximately ??1.0 Ma for younger and older sequences. Eocene-lower Miocene sequences are subtle, upper middle to lower upper Miocene sequences are more clearly distinguished, and upper Miocene- Pliocene sequences display a distinct facies pattern within sequences. We recognize two upper Eocene, two Oligocene, nine Miocene, three Pliocene, and one Pleistocene sequence and correlate them with those in New Jersey and Delaware. The upper Eocene through Pleistocene strata at Eyreville record changes from: (1) rapidly deposited, extremely fi ne-grained Eocene strata that probably represent two sequences deposited in a deep (>200 m) basin; to (2) highly dissected Oligocene (two very thin sequences) to lower Miocene (three thin sequences) with a long hiatus; to (3) a thick, rapidly deposited (43-73 m/Ma), very fi ne-grained, biosiliceous middle Miocene (16.5-14 Ma) section divided into three sequences (V5-V3) deposited in middle neritic paleoenvironments; to (4) a 4.5-Ma-long hiatus (12.8-8.3 Ma); to (5) sandy, shelly upper Miocene to Pliocene strata (8.3-2.0 Ma) divided into six sequences deposited in shelf and shoreface environments; and, last, to (6) a sandy middle Pleistocene paralic sequence (~400 ka). The Eyreville cores thus record the fi lling of a deep impact-generated basin where the timing of sequence boundaries is heavily infl uenced by eustasy. ?? 2009 The Geological Society of America.

  16. Rapid gene identification in sugar beet using deep sequencing of DNA from phenotypic pools selected from breeding panels.

    PubMed

    Ries, David; Holtgräwe, Daniela; Viehöver, Prisca; Weisshaar, Bernd

    2016-03-15

    The combination of bulk segregant analysis (BSA) and next generation sequencing (NGS), also known as mapping by sequencing (MBS), has been shown to significantly accelerate the identification of causal mutations for species with a reference genome sequence. The usual approach is to cross homozygous parents that differ for the monogenic trait to address, to perform deep sequencing of DNA from F2 plants pooled according to their phenotype, and subsequently to analyze the allele frequency distribution based on a marker table for the parents studied. The method has been successfully applied for EMS induced mutations as well as natural variation. Here, we show that pooling genetically diverse breeding lines according to a contrasting phenotype also allows high resolution mapping of the causal gene in a crop species. The test case was the monogenic locus causing red vs. green hypocotyl color in Beta vulgaris (R locus). We determined the allele frequencies of polymorphic sequences using sequence data from two diverging phenotypic pools of 180 B. vulgaris accessions each. A single interval of about 31 kbp among the nine chromosomes was identified which indeed contained the causative mutation. By applying a variation of the mapping by sequencing approach, we demonstrated that phenotype-based pooling of diverse accessions from breeding panels and subsequent direct determination of the allele frequency distribution can be successfully applied for gene identification in a crop species. Our approach made it possible to identify a small interval around the causative gene. Sequencing of parents or individual lines was not necessary. Whenever the appropriate plant material is available, the approach described saves time compared to the generation of an F2 population. In addition, we provide clues for planning similar experiments with regard to pool size and the sequencing depth required.

  17. Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

    PubMed Central

    Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.

    2012-01-01

    Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512

  18. Characterization of microRNAs from goat (Capra hircus) by Solexa deep-sequencing technology.

    PubMed

    Ling, Y H; Ding, J P; Zhang, X D; Wang, L J; Zhang, Y H; Li, Y S; Zhang, Z J; Zhang, X R

    2013-06-13

    MicroRNAs (miRNAs) are an important class of small noncoding RNAs that are highly conserved in plants and animals. Many miRNAs are known to mediate a myriad of cell processes, including proliferation and differentiation, via the regulation of some transcription and signaling factors, which are closely related to muscle development and disease. In this study, small RNA cDNA libraries of Boer goats were constructed. In addition, we obtained the goat muscle miRNAs by using Solexa deep-sequencing technology and analyzed these miRNA characteristics by combining it with the bioinformatics technology. Based on Solexa sequencing and bioinformatics analysis, 562 species-conserved and 5 goat genome-specific miRNAs were identified, 322 of which exceeded 100 in the expression levels. The results of real-time quantitative polymerase chain reaction from 8 randomly selected miRNAs showed that the 8 miRNAs were expressed in goat muscle, and the expression patterns were consistent with the Solexa sequencing results. The identification and characterization of miRNAs in goat muscle provide important information on the role of miRNA regulation in muscle growth and development. These data will help to facilitate studies on the regulatory roles played by miRNAs during goat growth and development.

  19. Integrated design, execution, and analysis of arrayed and pooled CRISPR genome-editing experiments.

    PubMed

    Canver, Matthew C; Haeussler, Maximilian; Bauer, Daniel E; Orkin, Stuart H; Sanjana, Neville E; Shalem, Ophir; Yuan, Guo-Cheng; Zhang, Feng; Concordet, Jean-Paul; Pinello, Luca

    2018-05-01

    CRISPR (clustered regularly interspaced short palindromic repeats) genome-editing experiments offer enormous potential for the evaluation of genomic loci using arrayed single guide RNAs (sgRNAs) or pooled sgRNA libraries. Numerous computational tools are available to help design sgRNAs with optimal on-target efficiency and minimal off-target potential. In addition, computational tools have been developed to analyze deep-sequencing data resulting from genome-editing experiments. However, these tools are typically developed in isolation and oftentimes are not readily translatable into laboratory-based experiments. Here, we present a protocol that describes in detail both the computational and benchtop implementation of an arrayed and/or pooled CRISPR genome-editing experiment. This protocol provides instructions for sgRNA design with CRISPOR (computational tool for the design, evaluation, and cloning of sgRNA sequences), experimental implementation, and analysis of the resulting high-throughput sequencing data with CRISPResso (computational tool for analysis of genome-editing outcomes from deep-sequencing data). This protocol allows for design and execution of arrayed and pooled CRISPR experiments in 4-5 weeks by non-experts, as well as computational data analysis that can be performed in 1-2 d by both computational and noncomputational biologists alike using web-based and/or command-line versions.

  20. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Metagenomic Analysis of Viral Communities in (Hado)Pelagic Sediments

    PubMed Central

    Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken

    2013-01-01

    In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 106 to 1011 viruses/cm3 of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24−30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10−3 in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95−99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses. PMID:23468952

  2. Metagenomic analysis of viral communities in (hado)pelagic sediments.

    PubMed

    Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken

    2013-01-01

    In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 10(6) to 10(11) viruses/cm(3) of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24-30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10(-3) in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95-99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses.

  3. Genome-wide analyses of long noncoding RNA expression profiles correlated with radioresistance in nasopharyngeal carcinoma via next-generation deep sequencing.

    PubMed

    Li, Guo; Liu, Yong; Liu, Chao; Su, Zhongwu; Ren, Shuling; Wang, Yunyun; Deng, Tengbo; Huang, Donghai; Tian, Yongquan; Qiu, Yuanzheng

    2016-09-06

    Radioresistance is one of the major factors limiting the therapeutic efficacy and prognosis of patients with nasopharyngeal carcinoma (NPC). Accumulating evidence has suggested that aberrant expression of long noncoding RNAs (lncRNAs) contributes to cancer progression. Therefore, here we identified lncRNAs associated with radioresistance in NPC. The differential expression profiles of lncRNAs associated with NPC radioresistance were constructed by next-generation deep sequencing by comparing radioresistant NPC cells with their parental cells. LncRNA-related mRNAs were predicted and analyzed using bioinformatics algorithms compared with the mRNA profiles related to radioresistance obtained in our previous study. Several lncRNAs and associated mRNAs were validated in established NPC radioresistant cell models and NPC tissues. By comparison between radioresistant CNE-2-Rs and parental CNE-2 cells by next-generation deep sequencing, a total of 781 known lncRNAs and 2054 novel lncRNAs were annotated. The top five upregulated and downregulated known/novel lncRNAs were detected using quantitative real-time reverse transcription-polymerase chain reaction, and 7/10 known lncRNAs and 3/10 novel lncRNAs were demonstrated to have significant differential expression trends that were the same as those predicted by deep sequencing. From the prediction process, 13 pairs of lncRNAs and their associated genes were acquired, and the prediction trends of three pairs were validated in both radioresistant CNE-2-Rs and 6-10B-Rs cell lines, including lncRNA n373932 and SLITRK5, n409627 and PRSS12, and n386034 and RIMKLB. LncRNA n373932 and its related SLITRK5 showed dramatic expression changes in post-irradiation radioresistant cells and a negative expression correlation in NPC tissues (R = -0.595, p < 0.05). Our study provides an overview of the expression profiles of radioresistant lncRNAs and potentially related mRNAs, which will facilitate future investigations into the function of lncRNAs in NPC radioresistance.

  4. Genomic architecture and evolution of clear cell renal cell carcinomas defined by multiregion sequencing

    PubMed Central

    Varela, Ignacio; Fisher, Rosalie; McGranahan, Nicholas; Matthews, Nicholas; Santos, Claudio R; Martinez, Pierre; Phillimore, Benjamin; Begum, Sharmin; Rabinowitz, Adam; Spencer-Dene, Bradley; Gulati, Sakshi; Bates, Paul A; Stamp, Gordon; Pickering, Lisa; Gore, Martin; Nicol, David L; Hazell, Steven; Futreal, P Andrew; Stewart, Aengus; Swanton, Charles

    2015-01-01

    Clear cell renal carcinomas (ccRCCs) can display intratumor heterogeneity (ITH). We applied multiregion exome sequencing (M-seq) to resolve the genetic architecture and evolutionary histories of ten ccRCCs. Ultra-deep sequencing identified ITH in all cases. We found that 73–75% of identified ccRCC driver aberrations were subclonal, confounding estimates of driver mutation prevalence. ITH increased with the number of biopsies analyzed, without evidence of saturation in most tumors. Chromosome 3p loss and VHL aberrations were the only ubiquitous events. The proportion of C>T transitions at CpG sites increased during tumor progression. M-seq permits the temporal resolution of ccRCC evolution and refines mutational signatures occurring during tumor development. PMID:24487277

  5. Evolutionary process of deep-sea bathymodiolus mussels.

    PubMed

    Miyazaki, Jun-Ichi; de Oliveira Martins, Leonardo; Fujita, Yuko; Matsumoto, Hiroto; Fujiwara, Yoshihiro

    2010-04-27

    Since the discovery of deep-sea chemosynthesis-based communities, much work has been done to clarify their organismal and environmental aspects. However, major topics remain to be resolved, including when and how organisms invade and adapt to deep-sea environments; whether strategies for invasion and adaptation are shared by different taxa or unique to each taxon; how organisms extend their distribution and diversity; and how they become isolated to speciate in continuous waters. Deep-sea mussels are one of the dominant organisms in chemosynthesis-based communities, thus investigations of their origin and evolution contribute to resolving questions about life in those communities. We investigated worldwide phylogenetic relationships of deep-sea Bathymodiolus mussels and their mytilid relatives by analyzing nucleotide sequences of the mitochondrial cytochrome c oxidase subunit I (COI) and NADH dehydrogenase subunit 4 (ND4) genes. Phylogenetic analysis of the concatenated sequence data showed that mussels of the subfamily Bathymodiolinae from vents and seeps were divided into four groups, and that mussels of the subfamily Modiolinae from sunken wood and whale carcasses assumed the outgroup position and shallow-water modioline mussels were positioned more distantly to the bathymodioline mussels. We provisionally hypothesized the evolutionary history of Bathymodilolus mussels by estimating evolutionary time under a relaxed molecular clock model. Diversification of bathymodioline mussels was initiated in the early Miocene, and subsequently diversification of the groups occurred in the early to middle Miocene. The phylogenetic relationships support the "Evolutionary stepping stone hypothesis," in which mytilid ancestors exploited sunken wood and whale carcasses in their progressive adaptation to deep-sea environments. This hypothesis is also supported by the evolutionary transition of symbiosis in that nutritional adaptation to the deep sea proceeded from extracellular to intracellular symbiotic states in whale carcasses. The estimated evolutionary time suggests that the mytilid ancestors were able to exploit whales during adaptation to the deep sea.

  6. Agonal sequences in 14 filmed hangings with comments on the role of the type of suspension, ischemic habituation, and ethanol intoxication on the timing of agonal responses.

    PubMed

    Sauvageau, Anny; Laharpe, Romano; King, David; Dowling, Graeme; Andrews, Sam; Kelly, Sean; Ambrosi, Corinne; Guay, Jean-Pierre; Geberth, Vernon J

    2011-06-01

    The Working Group on Human Asphyxia has analyzed 14 filmed hangings: 9 autoerotic accidents, 4 suicides, and 1 homicide. The following sequence of agonal responses was observed: rapid loss of consciousness in 10 ± 3 seconds, mild generalized convulsions in 14 ± 3 seconds, decerebrate rigidity in 19 ± 5 seconds, beginning of deep rhythmic abdominal respiratory movements in 19 ± 5 seconds, decorticate rigidity in 38 ± 15 seconds, loss of muscle tone in 1 minute 17 seconds ± 25 seconds, end of deep abdominal respiratory movements in 1 minute 51 seconds ± 30 seconds, and last muscle movement in 4 minutes 12 seconds ± 2 minutes 29 seconds. The type of suspension and ethanol intoxication does not seem to influence the timing of the agonal responses, whereas ischemic habituation in autoerotic practitioner might decelerate the late responses to hanging.

  7. Deep sequencing of Salmonella RNA associated with heterologous Hfq proteins in vivo reveals small RNAs as a major target class and identifies RNA processing phenotypes.

    PubMed

    Sittka, Alexandra; Sharma, Cynthia M; Rolle, Katarzyna; Vogel, Jörg

    2009-01-01

    The bacterial Sm-like protein, Hfq, is a key factor for the stability and function of small non-coding RNAs (sRNAs) in Escherichia coli. Homologues of this protein have been predicted in many distantly related organisms yet their functional conservation as sRNA-binding proteins has not entirely been clear. To address this, we expressed in Salmonella the Hfq proteins of two eubacteria (Neisseria meningitides, Aquifex aeolicus) and an archaeon (Methanocaldococcus jannaschii), and analyzed the associated RNA by deep sequencing. This in vivo approach identified endogenous Salmonella sRNAs as a major target of the foreign Hfq proteins. New Salmonella sRNA species were also identified, and some of these accumulated specifically in the presence of a foreign Hfq protein. In addition, we observed specific RNA processing defects, e.g., suppression of precursor processing of SraH sRNA by Methanocaldococcus Hfq, or aberrant accumulation of extracytoplasmic target mRNAs of the Salmonella GcvB, MicA or RybB sRNAs. Taken together, our study provides evidence of a conserved inherent sRNA-binding property of Hfq, which may facilitate the lateral transmission of regulatory sRNAs among distantly related species. It also suggests that the expression of heterologous RNA-binding proteins combined with deep sequencing analysis of RNA ligands can be used as a molecular tool to dissect individual steps of RNA metabolism in vivo.

  8. Fungal communities from the calcareous deep-sea sediments in the Southwest India Ridge revealed by Illumina sequencing technology.

    PubMed

    Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang

    2016-05-01

    The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.

  9. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism

    PubMed Central

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-01-01

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833

  10. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism.

    PubMed

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-11-20

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.

  11. Accurate identification of RNA editing sites from primitive sequence with deep neural networks.

    PubMed

    Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie

    2018-04-16

    RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.

  12. Deep learning of orthographic representations in baboons.

    PubMed

    Hannagan, Thomas; Ziegler, Johannes C; Dufau, Stéphane; Fagot, Joël; Grainger, Jonathan

    2014-01-01

    What is the origin of our ability to learn orthographic knowledge? We use deep convolutional networks to emulate the primate's ventral visual stream and explore the recent finding that baboons can be trained to discriminate English words from nonwords. The networks were exposed to the exact same sequence of stimuli and reinforcement signals as the baboons in the experiment, and learned to map real visual inputs (pixels) of letter strings onto binary word/nonword responses. We show that the networks' highest levels of representations were indeed sensitive to letter combinations as postulated in our previous research. The model also captured the key empirical findings, such as generalization to novel words, along with some intriguing inter-individual differences. The present work shows the merits of deep learning networks that can simulate the whole processing chain all the way from the visual input to the response while allowing researchers to analyze the complex representations that emerge during the learning process.

  13. A deep learning method for lincRNA detection using auto-encoder algorithm.

    PubMed

    Yu, Ning; Yu, Zeng; Pan, Yi

    2017-12-06

    RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.

  14. CPTAC Releases Largest-Ever Ovarian Cancer Proteome Dataset from Previously Genome Characterized Tumors | Office of Cancer Clinical Proteomics Research

    Cancer.gov

    National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) scientists have just released a comprehensive dataset of the proteomic analysis of high grade serous ovarian tumor samples, previously genomically analyzed by The Cancer Genome Atlas (TCGA).  This is one of the largest public datasets covering the proteome, phosphoproteome and glycoproteome with complementary deep genomic sequencing data on the same tumor.

  15. Monitoring therapy responses at the leukemic subclone level by ultra-deep amplicon resequencing in acute myeloid leukemia.

    PubMed

    Ojamies, P N; Kontro, M; Edgren, H; Ellonen, P; Lagström, S; Almusa, H; Miettinen, T; Eldfors, S; Tamborero, D; Wennerberg, K; Heckman, C; Porkka, K; Wolf, M; Kallioniemi, O

    2017-05-01

    In our individualized systems medicine program, personalized treatment options are identified and administered to chemorefractory acute myeloid leukemia (AML) patients based on exome sequencing and ex vivo drug sensitivity and resistance testing data. Here, we analyzed how clonal heterogeneity affects the responses of 13 AML patients to chemotherapy or targeted treatments using ultra-deep (average 68 000 × coverage) amplicon resequencing. Using amplicon resequencing, we identified 16 variants from 4 patients (frequency 0.54-2%) that were not detected previously by exome sequencing. A correlation-based method was developed to detect mutation-specific responses in serial samples across multiple time points. Significant subclone-specific responses were observed for both chemotherapy and targeted therapy. We detected subclonal responses in patients where clinical European LeukemiaNet (ELN) criteria showed no response. Subclonal responses also helped to identify putative mechanisms underlying drug sensitivities, such as sensitivity to azacitidine in DNMT3A mutated cell clones and resistance to cytarabine in a subclone with loss of NF1 gene. In summary, ultra-deep amplicon resequencing method enables sensitive quantification of subclonal variants and their responses to therapies. This approach provides new opportunities for designing combinatorial therapies blocking multiple subclones as well as for real-time assessment of such treatments.

  16. ampliMethProfiler: a pipeline for the analysis of CpG methylation profiles of targeted deep bisulfite sequenced amplicons.

    PubMed

    Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio

    2016-11-25

    CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .

  17. Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

    PubMed

    Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M

    2015-11-01

    Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.

  18. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  19. Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.

    PubMed

    Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A

    2017-07-01

    Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.

  20. Rational Protein Engineering Guided by Deep Mutational Scanning

    PubMed Central

    Shin, HyeonSeok; Cho, Byung-Kwan

    2015-01-01

    Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267

  1. Mitochondrial sequences of Seriatopora corals show little agreement with morphology and reveal the duplication of a tRNA gene near the control region

    NASA Astrophysics Data System (ADS)

    Flot, J.-F.; Licuanan, W. Y.; Nakano, Y.; Payri, C.; Cruaud, C.; Tillier, S.

    2008-12-01

    The taxonomy of corals of the genus Seriatopora has not previously been studied using molecular sequence markers. As a first step toward a re-evaluation of species boundaries in this genus, mitochondrial sequence variability was analyzed in 51 samples collected from Okinawa, New Caledonia, and the Philippines. Four clusters of sequences were detected that showed little concordance with species currently recognized on a morphological basis. The most likely explanation is that the skeletal characters used for species identification are highly variable (polymorphic or phenotypically plastic); alternative explanations include introgression/hybridization, or deep coalescence and the retention of ancestral mitochondrial polymorphisms. In all individuals sequenced, two copies of trnW were found on either side of the atp8 gene near the putative D-loop, a novel mitochondrial gene arrangement that may have arisen from a duplication of the trnW-atp8 region followed by a deletion of one atp8.

  2. DNA Replication Profiling Using Deep Sequencing.

    PubMed

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  3. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    PubMed

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

  4. Understanding the complex evolution of rapidly mutating viruses with deep sequencing: Beyond the analysis of viral diversity.

    PubMed

    Leung, Preston; Eltahla, Auda A; Lloyd, Andrew R; Bull, Rowena A; Luciani, Fabio

    2017-07-15

    With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2017-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  6. DNA barcoding reveals seasonal shifts in diet and consumption of deep-sea fishes in wedge-tailed shearwaters

    PubMed Central

    Ando, Haruko; Horikoshi, Kazuo; Suzuki, Hajime; Isagi, Yuji

    2018-01-01

    The foraging ecology of pelagic seabirds is difficult to characterize because of their large foraging areas. In the face of this difficulty, DNA metabarcoding may be a useful approach to analyze diet compositions and foraging behaviors. Using this approach, we investigated the diet composition and its seasonal variation of a common seabird species on the Ogasawara Islands, Japan: the wedge-tailed shearwater Ardenna pacifica. We collected fecal samples during the prebreeding (N = 73) and rearing (N = 96) periods. The diet composition of wedge-tailed shearwater was analyzed by Ion Torrent sequencing using two universal polymerase chain reaction primers for the 12S and 16S mitochondrial DNA regions that targeted vertebrates and mollusks, respectively. The results of a BLAST search of obtained sequences detected 31 and 1 vertebrate and mollusk taxa, respectively. The results of the diet composition analysis showed that wedge-tailed shearwaters frequently consumed deep-sea fishes throughout the sampling season, indicating the importance of these fishes as a stable food resource. However, there was a marked seasonal shift in diet, which may reflect seasonal changes in food resource availability and wedge-tailed shearwater foraging behavior. The collected data regarding the shearwater diet may be useful for in situ conservation efforts. Future research that combines DNA metabarcoding with other tools, such as data logging, may provide further insight into the foraging ecology of pelagic seabirds. PMID:29630670

  7. Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and "deep" sequencing to plasma RNA and proviral DNA.

    PubMed

    Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard

    2010-08-01

    Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.

  8. Characterization by Deep Sequencing of Prunus virus T, a Novel Tepovirus Infecting Prunus Species.

    PubMed

    Marais, Armelle; Faure, Chantal; Mustafayev, Eldar; Barone, Maria; Alioto, Daniela; Candresse, Thierry

    2015-01-01

    Double-stranded RNAs purified from a cherry tree collected in Italy and a plum tree collected in Azerbaijan were submitted to deep sequencing. Contigs showing weak but significant identity with various members of the family Betaflexiviridae were reconstructed. Sequence comparisons led to the conclusion that the viral isolates identified in the analyzed Prunus plants belong to the same viral species. Their genome organization is similar to that of some members of the family Betaflexiviridae, with three overlapping open reading frames (RNA polymerase, movement protein, and capsid protein). Phylogenetic analyses of the deduced encoded proteins showed a clustering with the sole member of the genus Tepovirus, Potato virus T (PVT). Given these results, the name Prunus virus T (PrVT) is proposed for the new virus. It should be considered as a new member of the genus Tepovirus, even if the level of nucleotide identity with PVT is borderline with the genus demarcation criteria for the family Betaflexiviridae. A reverse-transcription polymerase chain reaction detection assay was developed and allowed the identification of two other PrVT isolates and an estimate of 1% prevalence in the large Prunus collection screened. Due to the mixed infection status of all hosts identified to date, it was not possible to correlate the presence of PrVT with specific symptoms.

  9. Clinical utility of circulating tumor DNA for molecular assessment in pancreatic cancer.

    PubMed

    Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Morizane, Chigusa; Nara, Satoshi; Hama, Natsuko; Suzuki, Masami; Furukawa, Eisaku; Kato, Mamoru; Hayashi, Hideyuki; Kohno, Takashi; Ueno, Hideki; Shimada, Kazuaki; Okusaka, Takuji; Nakagama, Hitoshi; Shibata, Tatsuhiro; Yachida, Shinichi

    2015-12-16

    Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect and monitor molecular characteristics of tumors. In the present study, we determined the mutational status of KRAS in plasma cfDNA using multiplex picoliter-droplet digital PCR in 259 patients with PDAC. We constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA and matched germline DNA samples in 48 patients who had ≥1% mutant allele frequencies of KRAS in plasma cfDNA. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2%) examined by targeted deep sequencing of cfDNA. We also analyzed somatic copy number alterations based on the targeted sequencing data using our in-house algorithm, and potentially targetable amplifications were detected. Assessment of mutations and copy number alterations in plasma cfDNA may provide a prognostic and diagnostic tool to assist decisions regarding optimal therapeutic strategies for PDAC patients.

  10. Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text.

    PubMed

    Duarte, Francisco; Martins, Bruno; Pinto, Cátia Sousa; Silva, Mário J

    2018-04-01

    We address the assignment of ICD-10 codes for causes of death by analyzing free-text descriptions in death certificates, together with the associated autopsy reports and clinical bulletins, from the Portuguese Ministry of Health. We leverage a deep neural network that combines word embeddings, recurrent units, and neural attention, for the generation of intermediate representations of the textual contents. The neural network also explores the hierarchical nature of the input data, by building representations from the sequences of words within individual fields, which are then combined according to the sequences of fields that compose the inputs. Moreover, we explore innovative mechanisms for initializing the weights of the final nodes of the network, leveraging co-occurrences between classes together with the hierarchical structure of ICD-10. Experimental results attest to the contribution of the different neural network components. Our best model achieves accuracy scores over 89%, 81%, and 76%, respectively for ICD-10 chapters, blocks, and full-codes. Through examples, we also show that our method can produce interpretable results, useful for public health surveillance. Copyright © 2018 Elsevier Inc. All rights reserved.

  11. Bacteria Community in the Terrestrial Deep Subsurface Microbiology Research of the Chinese Continent Scientific Drilling

    NASA Astrophysics Data System (ADS)

    Wang, Y.; Xia, Y.; Dong, H.; Dong, X.; Yang, K.; Dong, Z.; Huang, L.

    2005-12-01

    Microbial communities in the deep drill cores from the Chinese Continent Scientific Drilling were analyzed with culture-independent and dependent techniques. Genomic DNA was extracted from two metamorphic rocks: S1 from 430 and S13 from 1033 meters below the ground surface. The 16S rRNA gene was amplified by polymerase chain reaction (PCR) followed by cloning and sequencing. The total cell number was counted using the 4',6-diamidino-2-phenylindole (DAPI) staining and biomass of two specific bacteria were quantified using real-time PCR. Enrichment was set up for a rock from 3911 meters below the surface in medium for authotrophic methanogens (i.e., CO2 + H2). The total cell number in S13 was 1.0 × 104 cells per gram of rock. 16S rRNA gene analysis indicated that low G + C Gram positive sequences were dominant (50 percent of all 54 clone sequenced) followed by the alpha-, beta, and gamma-Proteobacteria. Within the low G + C Gram positive bacteria, most clone sequences were similar to species of Bacillus from various natural environments (deserts, rivers etc.). Within the Proteobacteria, our clone sequences were similar to species of Acinetobacter, Acidovorax, and Aeromonas. The RT-RCP results showed that biomass of two particular clone sequences (CCSD1305, similar to Aeromonas caviae and CCSD1307, similar to Acidovorax facilis) was 95 and 1258 cells/g, respectively. A bacterial isolate was obtained from the 3911-m rock in methanogenic medium. It was Gram negative with no flagella, immobile, and facultative anaerobic, and grows optimally at 65oC. Phylogenetic analysis indicated that it was closely related to the genus of Bacillus. Physiological tests further revealed that it was a strain of Bacillus caldotenax.

  12. Enhanced sensitivity for detection of low-level germline mosaic RB1 mutations in sporadic retinoblastoma cases using deep semiconductor sequencing.

    PubMed

    Chen, Zhao; Moran, Kimberly; Richards-Yutz, Jennifer; Toorens, Erik; Gerhart, Daniel; Ganguly, Tapan; Shields, Carol L; Ganguly, Arupa

    2014-03-01

    Sporadic retinoblastoma (RB) is caused by de novo mutations in the RB1 gene. Often, these mutations are present as mosaic mutations that cannot be detected by Sanger sequencing. Next-generation deep sequencing allows unambiguous detection of the mosaic mutations in lymphocyte DNA. Deep sequencing of the RB1 gene on lymphocyte DNA from 20 bilateral and 70 unilateral RB cases was performed, where Sanger sequencing excluded the presence of mutations. The individual exons of the RB1 gene from each sample were amplified, pooled, ligated to barcoded adapters, and sequenced using semiconductor sequencing on an Ion Torrent Personal Genome Machine. Six low-level mosaic mutations were identified in bilateral RB and four in unilateral RB cases. The incidence of low-level mosaic mutation was estimated to be 30% and 6%, respectively, in sporadic bilateral and unilateral RB cases, previously classified as mutation negative. The frequency of point mutations detectable in lymphocyte DNA increased from 96% to 97% for bilateral RB and from 13% to 18% for unilateral RB. The use of deep sequencing technology increased the sensitivity of the detection of low-level germline mosaic mutations in the RB1 gene. This finding has significant implications for improved clinical diagnosis, genetic counseling, surveillance, and management of RB. © 2013 WILEY PERIODICALS, INC.

  13. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed Central

    Barrett, Nolan H.; McCarthy, Peter J.

    2017-01-01

    ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886

  14. The thickness of cover sequences in the Western Desert of Iraq from a power spectrum analysis of gravity and magnetic data

    NASA Astrophysics Data System (ADS)

    Mousa, Ahmed; Mickus, Kevin; Al-Rahim, Ali

    2017-05-01

    The Western Desert of Iraq is part of the stable shelf region on the Arabian Plate where the subsurface structural makeup is relatively unknown due to the lack of cropping out rocks, deep drill holes and deep seismic refraction and reflection profiles. To remedy this situation, magnetic and gravity data were analyzed to determine the thickness of the Phanerozoic cover sequences. The 2-D power spectrum method was used to estimate the depth to density and magnetic susceptibility interfaces by using 0.5° square windows. Additionally, the gravity data were analyzed using isostatic residual and decompensative methods to isolate gravity anomalies due to upper crustal density sources. The decompensative gravity anomaly and the differentially reduced to the pole magnetic map indicate a series of mainly north-south and northwest-southeast trending maxima and minima anomalies related to Proterozoic basement lithologies and the varying thickness of cover sequences. The magnetic and gravity derived thickness of cover sequences maps indicate that these thicknesses range from 4.5 to 11.5 km. Both maps in general are in agreement but more detail in the cover thicknesses was determined by the gravity analysis. The gravity-based cover thickness maps indicates regions with shallower depths than the magnetic-based cover thickness t map which may be due to density differences between limestone and shale units within the Paleozoic sediments. The final thickness maps indicate that the Western Desert is a complicated region of basins and uplifts that are more complex than have been shown on previous structural maps of the Western Desert. These basins and uplifts may be related to Paleozoic compressional tectonic events and possibly to the opening of the Tethys Ocean. In addition, petroleum exploration could be extended to three basins outlined by our analysis within the relatively unexplored western portions of the Western Desert.

  15. Identification of Prostate Cancer-Specific microDNAs

    DTIC Science & Technology

    2016-02-01

    circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20

  16. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    PubMed

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  17. A comparison of microbial communities in deep-sea polymetallic nodules and the surrounding sediments in the Pacific Ocean

    NASA Astrophysics Data System (ADS)

    Wu, Yue-Hong; Liao, Li; Wang, Chun-Sheng; Ma, Wei-Lin; Meng, Fan-Xu; Wu, Min; Xu, Xue-Wei

    2013-09-01

    Deep-sea polymetallic nodules, rich in metals such as Fe, Mn, and Ni, are potential resources for future exploitation. Early culturing and microscopy studies suggest that polymetallic nodules are at least partially biogenic. To understand the microbial communities in this environment, we compared microbial community composition and diversity inside nodules and in the surrounding sediments. Three sampling sites in the Pacific Ocean containing polymetallic nodules were used for culture-independent investigations of microbial diversity. A total of 1013 near full-length bacterial 16S rRNA gene sequences and 640 archaeal 16S rRNA gene sequences with ~650 bp from nodules and the surrounding sediments were analyzed. Bacteria showed higher diversity than archaea. Interestingly, sediments contained more diverse bacterial communities than nodules, while the opposite was detected for archaea. Bacterial communities tend to be mostly unique to sediments or nodules, with only 13.3% of sequences shared. The most abundant bacterial groups detected only in nodules were Pseudoalteromonas and Alteromonas, which were predicted to play a role in building matrix outside cells to induce or control mineralization. However, archaeal communities were mostly shared between sediments and nodules, including the most abundant OTU containing 290 sequences from marine group I Thaumarchaeota. PcoA analysis indicated that microhabitat (i.e., nodule or sediment) seemed to be a major factor influencing microbial community composition, rather than sampling locations or distances between locations.

  18. A deep intronic mutation in the SLC12A3 gene leads to Gitelman syndrome.

    PubMed

    Nozu, Kandai; Iijima, Kazumoto; Nozu, Yoshimi; Ikegami, Ei; Imai, Takehide; Fu, Xue Jun; Kaito, Hiroshi; Nakanishi, Koichi; Yoshikawa, Norishige; Matsuo, Masafumi

    2009-11-01

    Many mutations have been detected in the SLC12A3 gene of Gitelman syndrome (GS, OMIM 263800) patients. In previous studies, only one mutant allele was detected in approximately 20 to 41% of patients with GS; however, the exact reason for the nonidentification has not been established. In this study, we used RT-PCR using mRNA to investigate for the first time transcript abnormalities caused by deep intronic mutation. Direct sequencing analysis of leukocyte DNA identified one base insertion in exon 6 (c.818_819insG), but no mutation was detected in another allele. We analyzed RNA extracted from leukocytes and urine sediments and detected unknown sequence containing 238bp between exons 13 and 14. The genomic DNA analysis of intron 13 revealed a single-base substitution (c.1670-191C>T) that creates a new donor splice site within the intron resulting in the inclusion of a novel cryptic exon in mRNA. This is the first report of creation of a splice site by a deep intronic single-nucleotide change in GS and the first report to detect the onset mechanism in a patient with GS and missing mutation in one allele. This molecular onset mechanism may partly explain the poor success rate of mutation detection in both alleles of patients with GS.

  19. Rapid Creation and Quantitative Monitoring of High Coverage shRNA Libraries

    PubMed Central

    Bassik, Michael C.; Lebbink, Robert Jan; Churchman, L. Stirling; Ingolia, Nicholas T.; Patena, Weronika; LeProust, Emily M.; Schuldiner, Maya; Weissman, Jonathan S.; McManus, Michael T.

    2009-01-01

    Short hairpin RNA (shRNA) libraries are limited by the low efficacy of many shRNAs, giving false negatives, and off-target effects, giving false positives. Here we present a strategy for rapidly creating expanded shRNA pools (∼30 shRNAs/gene) that are analyzed by deep-sequencing (EXPAND). This approach enables identification of multiple effective target-specific shRNAs from a complex pool, allowing a rigorous statistical evaluation of whether a gene is a true hit. PMID:19448642

  20. Deep Learning of Orthographic Representations in Baboons

    PubMed Central

    Hannagan, Thomas; Ziegler, Johannes C.; Dufau, Stéphane; Fagot, Joël; Grainger, Jonathan

    2014-01-01

    What is the origin of our ability to learn orthographic knowledge? We use deep convolutional networks to emulate the primate's ventral visual stream and explore the recent finding that baboons can be trained to discriminate English words from nonwords [1]. The networks were exposed to the exact same sequence of stimuli and reinforcement signals as the baboons in the experiment, and learned to map real visual inputs (pixels) of letter strings onto binary word/nonword responses. We show that the networks' highest levels of representations were indeed sensitive to letter combinations as postulated in our previous research. The model also captured the key empirical findings, such as generalization to novel words, along with some intriguing inter-individual differences. The present work shows the merits of deep learning networks that can simulate the whole processing chain all the way from the visual input to the response while allowing researchers to analyze the complex representations that emerge during the learning process. PMID:24416300

  1. Single-Cell Sequencing for Drug Discovery and Drug Development.

    PubMed

    Wu, Hongjin; Wang, Charles; Wu, Shixiu

    2017-01-01

    Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification

    PubMed Central

    2013-01-01

    Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339

  3. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed

    Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J

    2017-02-02

    The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.

  4. Piezophilic Bacteria Isolated from Sediment of the Shimokita Coalbed, Japan

    NASA Astrophysics Data System (ADS)

    Fang, J.; Kato, C.; Hori, T.; Morono, Y.; Inagaki, F.

    2013-12-01

    The Earth is a cold planet as well as pressured planet, hosting both the surface biosphere and the deep biosphere. Pressure ranges over four-orders of magnitude in the surface biosphere and probably more in the deep biosphere. Pressure is an important thermodynamic property of the deep biosphere that affects microbial physiology and biochemistry. Bacteria that require high-pressure conditions for optimal growth are called piezophilic bacteria. Subseafloor marine sediments are one of the most extensive microbial habitats on Earth. Marine sediments cover more than two-thirds of the Earth's surface, and represent a major part of the deep biosphere. Owing to its vast size and intimate connection with the surface biosphere, particularly the oceans, the deep biosphere has enormous potential for influencing global-scale biogeochemical processes, including energy, climate, carbon and nutrient cycles. Therefore, studying piezophilic bacteria of the deep biosphere has important implications in increasing our understanding of global biogeochemical cycles, the interactions between the biosphere and the geosphere, and the evolution of life. Sediment samples were obtained during IODP Expedition 337, from 1498 meters below sea floor (mbsf) (Sample 6R-3), 1951~1999 mbsf (19R-1~25R-3; coalbed mix), and 2406 mbsf (29R-7). The samples were mixed with MB2216 growth medium and cultivated under anaerobic conditions at 35 MPa (megapascal) pressure. Growth temperatures were adjusted to in situ environmental conditions, 35°C for 6R-3, 45°C for 19R-1~25R-3, and 55°C for 29R-7. The cultivation was performed three times, for 30 days each time. Microbial cells were obtained and the total DNA was extracted. At the same time, isolation of microbes was also performed under anaerobic conditions. Microbial communities in the coalbed sediment were analyzed by cloning, sequencing, and terminal restriction fragment length polymorphism (t-RFLP) of 16S ribosomal RNA genes. From the partial 16S rRNA gene sequences, we have identified abundant Alkalibacterium sp. in 6R-3 and 29R-7 at the first HP cultivation. We also identified Haloactibacillus sp. in 6R-3 and Anoxybacillus related sp. in 19R-1~25R-3 at the third HP cultivation. These microorganisms are likely piezophiles and play an important role in degradation of sedimentary organic matter and production of microbial metabolites sustaining the deep microbial ecosystem in the Shimokita Coalbed. The complete 16S sequencing and isolation of piezophiles are now ongoing.

  5. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  6. Transcriptome sequences resolve deep relationships of the grape family.

    PubMed

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  7. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

    PubMed

    Ding, Jiarui; Condon, Anne; Shah, Sohrab P

    2018-05-21

    Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

  8. Uniform, optimal signal processing of mapped deep-sequencing data.

    PubMed

    Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

    2013-07-01

    Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.

  9. Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

    PubMed

    Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

    2017-08-01

    In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium

    PubMed Central

    2018-01-01

    ABSTRACT Pseudomonas oceani DSM 100277T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani, which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. PMID:29650573

  11. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  12. High-Throughput Sequencing Reveals Differential Expression of miRNAs in Intestine from Sea Cucumber during Aestivation

    PubMed Central

    Chen, Muyan; Zhang, Xiumei; Liu, Jianning; Storey, Kenneth B.

    2013-01-01

    The regulatory role of miRNA in gene expression is an emerging hot new topic in the control of hypometabolism. Sea cucumber aestivation is a complicated physiological process that includes obvious hypometabolism as evidenced by a decrease in the rates of oxygen consumption and ammonia nitrogen excretion, as well as a serious degeneration of the intestine into a very tiny filament. To determine whether miRNAs play regulatory roles in this process, the present study analyzed profiles of miRNA expression in the intestine of the sea cucumber (Apostichopus japonicus), using Solexa deep sequencing technology. We identified 308 sea cucumber miRNAs, including 18 novel miRNAs specific to sea cucumber. Animals sampled during deep aestivation (DA) after at least 15 days of continuous torpor, were compared with animals from a non-aestivation (NA) state (animals that had passed through aestivation and returned to the active state). We identified 42 differentially expressed miRNAs [RPM (reads per million) >10, |FC| (|fold change|) ≥1, FDR (false discovery rate) <0.01] during aestivation, which were validated by two other miRNA profiling methods: miRNA microarray and real-time PCR. Among the most prominent miRNA species, miR-200-3p, miR-2004, miR-2010, miR-22, miR-252a, miR-252a-3p and miR-92 were significantly over-expressed during deep aestivation compared with non-aestivation animals. Preliminary analyses of their putative target genes and GO analysis suggest that these miRNAs could play important roles in global transcriptional depression and cell differentiation during aestivation. High-throughput sequencing data and microarray data have been submitted to GEO database. PMID:24143179

  13. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

    PubMed Central

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2018-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980

  14. Aftershock occurrence rate decay for individual sequences and catalogs

    NASA Astrophysics Data System (ADS)

    Nyffenegger, Paul A.

    One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.

  15. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  16. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  17. Deep learning methods for protein torsion angle prediction.

    PubMed

    Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin

    2017-09-18

    Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

  18. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

    PubMed

    Jones, David T; Kandathil, Shaun M

    2018-04-26

    In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.

  19. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    PubMed Central

    Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

    2008-01-01

    Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152

  20. Phylogeny of sipunculan worms: A combined analysis of four gene regions and morphology.

    PubMed

    Schulze, Anja; Cutler, Edward B; Giribet, Gonzalo

    2007-01-01

    The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.

  1. A multiple-alignment based primer design algorithm for genetically highly variable DNA targets

    PubMed Central

    2013-01-01

    Background Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to population diversity and primer localization in relatively conserved regions, in addition to recognized constraints typically considered in primer design. Results Design constraints include degenerate sites to maximize population coverage, matching of melting temperatures, optimizing de novo sequence length, finding optimal bio-barcodes to allow efficient downstream analyses, and minimizing risk of dimerization. To facilitate primer design addressing these and other constraints, we created a novel computer program (PrimerDesign) that automates this complex procedure. We show its powers and limitations and give examples of successful designs for the analysis of HIV-1 populations. Conclusions PrimerDesign is useful for researchers who want to design DNA primers and probes for analyzing highly variable DNA populations. It can be used to design primers for PCR, RT-PCR, Sanger sequencing, next-generation sequencing, and other experimental protocols targeting highly variable DNA samples. PMID:23965160

  2. TEA: the epigenome platform for Arabidopsis methylome study.

    PubMed

    Su, Sheng-Yao; Chen, Shu-Hwa; Lu, I-Hsuan; Chiang, Yih-Shien; Wang, Yu-Bin; Chen, Pao-Yang; Lin, Chung-Yen

    2016-12-22

    Bisulfite sequencing (BS-seq) has become a standard technology to profile genome-wide DNA methylation at single-base resolution. It allows researchers to conduct genome-wise cytosine methylation analyses on issues about genomic imprinting, transcriptional regulation, cellular development and differentiation. One single data from a BS-Seq experiment is resolved into many features according to the sequence contexts, making methylome data analysis and data visualization a complex task. We developed a streamlined platform, TEA, for analyzing and visualizing data from whole-genome BS-Seq (WGBS) experiments conducted in the model plant Arabidopsis thaliana. To capture the essence of the genome methylation level and to meet the efficiency for running online, we introduce a straightforward method for measuring genome methylation in each sequence context by gene. The method is scripted in Java to process BS-Seq mapping results. Through a simple data uploading process, the TEA server deploys a web-based platform for deep analysis by linking data to an updated Arabidopsis annotation database and toolkits. TEA is an intuitive and efficient online platform for analyzing the Arabidopsis genomic DNA methylation landscape. It provides several ways to help users exploit WGBS data. TEA is freely accessible for academic users at: http://tea.iis.sinica.edu.tw .

  3. Light in the darkness: New perspective on lanternfish relationships and classification using genomic and morphological data.

    PubMed

    Martin, Rene P; Olson, Emily E; Girard, Matthew G; Smith, Wm Leo; Davis, Matthew P

    2018-04-01

    Massive parallel sequencing allows scientists to gather DNA sequences composed of millions of base pairs that can be combined into large datasets and analyzed to infer organismal relationships at a genome-wide scale in non-model organisms. Although the use of these large datasets is becoming more widespread, little to no work has been done in estimating phylogenetic relationships using UCEs in deep-sea fishes. Among deep-sea animals, the 257 species of lanternfishes (Myctophiformes) are among the most important open-ocean lineages, representing half of all mesopelagic vertebrate biomass. With this relative abundance, they are key members of the midwater food web where they feed on smaller invertebrates and fishes in addition to being a primary prey item for other open-ocean animals. Understanding the evolution and relationships of midwater organisms generally, and this dominant group of fishes in particular, is necessary for understanding and preserving the underexplored deep-sea ecosystem. Despite substantial congruence in the evolutionary relationships among deep-sea lanternfishes at higher classification levels in previous studies, the relationships among tribes, genera, and species within Myctophidae often conflict across phylogenetic studies or lack resolution and support. Herein we provide the first genome-scale phylogenetic analysis of lanternfishes, and we integrate these data from across the nuclear genome with additional protein-coding gene sequences and morphological data to further test evolutionary relationships among lanternfishes. Our phylogenetic hypotheses of relationships among lanternfishes are entirely congruent across a diversity of analyses that vary in methods, taxonomic sampling, and data analyzed. Within the Myctophiformes, the Neoscopelidae is inferred to be monophyletic and sister to a monophyletic Myctophidae. The current classification of lanternfishes is incongruent with our phylogenetic tree, so we recommend revisions that retain much of the traditional tribal structure and recognize five subfamilies instead of the traditional two subfamilies. The revised monophyletic taxonomy of myctophids includes the elevation of three former lampanyctine tribes to subfamilies. A restricted Lampanyctinae was recovered sister to Notolychninae. These two clades together were recovered as the sister group to the Gymnoscopelinae. Combined, these three subfamilies were recovered as the sister group to a clade composed of a monophyletic Diaphinae sister to the traditional Myctophinae. Our results corroborate recent multilocus molecular studies that infer a polyphyletic Myctophum in Myctophinae, and a para- or polyphyletic Lampanyctus and Nannobrachium within Lampanyctinae. We resurrect Dasyscopelus and Ctenoscopelus for the independent clades traditionally classified as species of Myctophum, and we place Nannobrachium into the synonymy of Lampanyctus. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

    PubMed Central

    2011-01-01

    Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the host and also improves our current understanding of this host-parasitoid interaction. PMID:21906285

  5. Deep Impact Sequence Planning Using Multi-Mission Adaptable Planning Tools With Integrated Spacecraft Models

    NASA Technical Reports Server (NTRS)

    Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina

    2006-01-01

    The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.

  6. Transcriptome Sequences Resolve Deep Relationships of the Grape Family

    PubMed Central

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307

  7. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, CY; Yang, H; Wei, CL

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.« less

  8. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    PubMed Central

    2011-01-01

    Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis. PMID:21356090

  9. Deep Learning and Its Applications in Biomedicine.

    PubMed

    Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi

    2018-02-01

    Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.

  10. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    USDA-ARS?s Scientific Manuscript database

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  11. Natural Variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.

  12. Deep Recurrent Neural Networks for Human Activity Recognition

    PubMed Central

    Murad, Abdulmajid

    2017-01-01

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103

  13. Deep Recurrent Neural Networks for Human Activity Recognition.

    PubMed

    Murad, Abdulmajid; Pyun, Jae-Young

    2017-11-06

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.

  14. [Predominant strains of polycyclic aromatic hydrocarbon-degrading consortia from deep sea of the Middle Atlantic Ridge].

    PubMed

    Cui, Zhisong; Shao, Zongze

    2009-07-01

    In order to identify the predominant strains of polycyclic aromatic hydrocarbon (PAH)-degrading consortia harboring in sea water and surface sediment collected from deep sea of the Middle Atlantic Ridge. We employed enrichment method and spread-plate method to isolate cultivable bacteria and PAHs degraders from deep sea samples. Phylogenetic analysis was conducted by 16S rRNA gene sequencing of the bacteria. Then we analyzed the dominant bacteria in the PAHs-degrading consortia by denaturing gradient gel electrophoresis (DGGE) combined with DNA sequencing. Altogether 16 cultivable bacteria were obtained, including one PAHs degrader Novosphingobium sp. 4D. Phylogenetic analysis showed that strains closely related to Alcanivorax dieselolei NO1A (5/16) and Tistrella mobilis TISTR 1108T (5/16) constituted two biggest groups among the cultivable bacteria. DGGE analysis showed that strain 4L (also 4M and 4N, Alcanivorax dieselolei NO1A, 99.21%), 4D (Novosphingobium pentaromativorans US6-1(T), 97.07%) and 4B (also 4E, 4H and 4K, Tistrella mobilis TISTR 1108T, > 99%) dominated the consortium MC2D. While in consortium MC3CO, the predominant strains were strain 5C (also 5H, Alcanivorax dieselolei NO1A, > 99%), uncultivable strain represented by band 5-8 (Novosphingobium aromaticivorans DSM 12444T, 99.41%), 5J (Tistrella mobilis TISTR 1108T, 99.52%) and 5F (also 5G, Thalassospira lucentensis DSM 14000T, < 97%). We found that strains of genus Alcanivorax, Novosphingobium, Tistrella and Thalassospira were predominant bacteria of PAHs-degrading consortia in sea water and surface sediment of Middle Atlantic Ridge deep sea, with Novosphingobium spp. as their main PAHs degraders.

  15. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium.

    PubMed

    García-Valdés, Elena; Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge

    2018-04-12

    Pseudomonas oceani DSM 100277 T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani , which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. Copyright © 2018 García-Valdés et al.

  16. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    PubMed

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  17. RNA-Seq analysis to capture the transcriptome landscape of a single cell

    PubMed Central

    Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim

    2013-01-01

    We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668

  18. Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.

    PubMed

    Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario

    2011-04-01

    Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.

  19. RaptorX-Property: a web server for protein structure property prediction.

    PubMed

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-07-08

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum.

    PubMed

    Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam

    2015-07-01

    The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  1. First Investigation of the Microbiology of the Deepest Layer of Ocean Crust

    PubMed Central

    Mason, Olivia U.; Nakagawa, Tatsunori; Rosner, Martin; Van Nostrand, Joy D.; Zhou, Jizhong; Maruyama, Akihiko; Fisk, Martin R.; Giovannoni, Stephen J.

    2010-01-01

    The gabbroic layer comprises the majority of ocean crust. Opportunities to sample this expansive crustal environment are rare because of the technological demands of deep ocean drilling; thus, gabbroic microbial communities have not yet been studied. During the Integrated Ocean Drilling Program Expeditions 304 and 305, igneous rock samples were collected from 0.45-1391.01 meters below seafloor at Hole 1309D, located on the Atlantis Massif (30 °N, 42 °W). Microbial diversity in the rocks was analyzed by denaturing gradient gel electrophoresis and sequencing (Expedition 304), and terminal restriction fragment length polymorphism, cloning and sequencing, and functional gene microarray analysis (Expedition 305). The gabbroic microbial community was relatively depauperate, consisting of a low diversity of proteobacterial lineages closely related to Bacteria from hydrocarbon-dominated environments and to known hydrocarbon degraders, and there was little evidence of Archaea. Functional gene diversity in the gabbroic samples was analyzed with a microarray for metabolic genes (“GeoChip”), producing further evidence of genomic potential for hydrocarbon degradation - genes for aerobic methane and toluene oxidation. Genes coding for anaerobic respirations, such as nitrate reduction, sulfate reduction, and metal reduction, as well as genes for carbon fixation, nitrogen fixation, and ammonium-oxidation, were also present. Our results suggest that the gabbroic layer hosts a microbial community that can degrade hydrocarbons and fix carbon and nitrogen, and has the potential to employ a diversity of non-oxygen electron acceptors. This rare glimpse of the gabbroic ecosystem provides further support for the recent finding of hydrocarbons in deep ocean gabbro from Hole 1309D. It has been hypothesized that these hydrocarbons might originate abiotically from serpentinization reactions that are occurring deep in the Earth's crust, raising the possibility that the lithic microbial community reported here might utilize carbon sources produced independently of the surface biosphere. PMID:21079766

  2. First investigation of the microbiology of the deepest layer of ocean crust.

    PubMed

    Mason, Olivia U; Nakagawa, Tatsunori; Rosner, Martin; Van Nostrand, Joy D; Zhou, Jizhong; Maruyama, Akihiko; Fisk, Martin R; Giovannoni, Stephen J

    2010-11-05

    The gabbroic layer comprises the majority of ocean crust. Opportunities to sample this expansive crustal environment are rare because of the technological demands of deep ocean drilling; thus, gabbroic microbial communities have not yet been studied. During the Integrated Ocean Drilling Program Expeditions 304 and 305, igneous rock samples were collected from 0.45-1391.01 meters below seafloor at Hole 1309D, located on the Atlantis Massif (30 °N, 42 °W). Microbial diversity in the rocks was analyzed by denaturing gradient gel electrophoresis and sequencing (Expedition 304), and terminal restriction fragment length polymorphism, cloning and sequencing, and functional gene microarray analysis (Expedition 305). The gabbroic microbial community was relatively depauperate, consisting of a low diversity of proteobacterial lineages closely related to Bacteria from hydrocarbon-dominated environments and to known hydrocarbon degraders, and there was little evidence of Archaea. Functional gene diversity in the gabbroic samples was analyzed with a microarray for metabolic genes ("GeoChip"), producing further evidence of genomic potential for hydrocarbon degradation--genes for aerobic methane and toluene oxidation. Genes coding for anaerobic respirations, such as nitrate reduction, sulfate reduction, and metal reduction, as well as genes for carbon fixation, nitrogen fixation, and ammonium-oxidation, were also present. Our results suggest that the gabbroic layer hosts a microbial community that can degrade hydrocarbons and fix carbon and nitrogen, and has the potential to employ a diversity of non-oxygen electron acceptors. This rare glimpse of the gabbroic ecosystem provides further support for the recent finding of hydrocarbons in deep ocean gabbro from Hole 1309D. It has been hypothesized that these hydrocarbons might originate abiotically from serpentinization reactions that are occurring deep in the Earth's crust, raising the possibility that the lithic microbial community reported here might utilize carbon sources produced independently of the surface biosphere.

  3. The Multiple Stellar Populations in the Ancient LMC Globular Clusters Hodge 11 and NGC 2210

    NASA Astrophysics Data System (ADS)

    Chaboyer, Brian; Gilligan, Christina; Wagner-Kaiser, Rachel; Mackey, Dougal; Sarajedini, Ata; Cummings, Jeffrey; Grocholski, Aaron; Geisler, Doug; Cohen, Roger; Villanova, Sandro; Yang, Soung-Chul; Parisi, Celeste

    2018-01-01

    Hubble Space telescope images of the ancient LMC globular clusters Hodge 11 and NGC 2210 in the F336W, F606W and F814W filters were obtained between June 2016 and April 2017. These deep images has been analyzed with the Dolphot software package. High quality photometry has been obtained from three magnitudes brighter than the horizontal branch, to about four magnitudes fainter than the main sequence turn-off. Both clusters show an excess of red main sequence stars in the F336W filter, indicating that multiple stellar populations exist in both clusters. Hodge 11 shows irregularities in its horizontal branch morphology, which is indicative of the presence of an approximately 0.1 dex internal helium abundance spread.

  4. Exon 11 skipping of SCN10A coding for voltage-gated sodium channels in dorsal root ganglia

    PubMed Central

    Schirmeyer, Jana; Szafranski, Karol; Leipold, Enrico; Mawrin, Christian; Platzer, Matthias; Heinemann, Stefan H

    2014-01-01

    The voltage-gated sodium channel NaV1.8 (encoded by SCN10A) is predominantly expressed in dorsal root ganglia (DRG) and plays a critical role in pain perception. We analyzed SCN10A transcripts isolated from human DRGs using deep sequencing and found a novel splice variant lacking exon 11, which codes for 98 amino acids of the domain I/II linker. Quantitative PCR analysis revealed an abundance of this variant of up to 5–10% in human, while no such variants were detected in mouse or rat. Since no obvious functional differences between channels with and without the exon-11 sequence were detected, it is suggested that SCN10A exon 11 skipping in humans is a tolerated event. PMID:24763188

  5. Identification of ribonucleotide reductase mutation causing temperature-sensitivity of herpes simplex virus isolates from whitlow by deep sequencing.

    PubMed

    Daikoku, Tohru; Oyama, Yukari; Yajima, Misako; Sekizuka, Tsuyoshi; Kuroda, Makoto; Shimada, Yuka; Takehara, Kazuhiko; Miwa, Naoko; Okuda, Tomoko; Sata, Tetsutaro; Shiraki, Kimiyasu

    2015-06-01

    Herpes simplex virus 2 caused a genital ulcer, and a secondary herpetic whitlow appeared during acyclovir therapy. The secondary and recurrent whitlow isolates were acyclovir-resistant and temperature-sensitive in contrast to a genital isolate. We identified the ribonucleotide reductase mutation responsible for temperature-sensitivity by deep-sequencing analysis.

  6. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.

    PubMed

    Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M

    2018-05-01

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  7. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  8. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  9. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.

    PubMed

    Umarov, Ramzan Kh; Solovyev, Victor V

    2017-01-01

    Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.

  10. Deep Illumina-Based Shotgun Sequencing Reveals Dietary Effects on the Structure and Function of the Fecal Microbiome of Growing Kittens

    PubMed Central

    Deusch, Oliver; O’Flynn, Ciaran; Colyer, Alison; Morris, Penelope; Allaway, David; Jones, Paul G.; Swanson, Kelly S.

    2014-01-01

    Background Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome. Methodology & Principal Findings Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high–protein, low–carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC) were collected at 8, 12 and 16 weeks of age (n = 6 per group). A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified) demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007) between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022) enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome. Conclusions These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary protein:carbohydrate ratio and highlight the impact of diet early in life. PMID:25010839

  11. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks

    PubMed Central

    Avsec, Žiga; Cheng, Jun; Gagneur, Julien

    2018-01-01

    Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928

  12. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    NASA Astrophysics Data System (ADS)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  13. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    PubMed

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  14. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    PubMed

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  15. Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge

    NASA Astrophysics Data System (ADS)

    Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua

    2018-01-01

    Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.

  16. Recovery of soil unicellular eukaryotes: an efficiency and activity analysis on the single cell level.

    PubMed

    Lentendu, Guillaume; Hübschmann, Thomas; Müller, Susann; Dunker, Susanne; Buscot, François; Wilhelm, Christian

    2013-12-01

    Eukaryotic unicellular organisms are an important part of the soil microbial community, but they are often neglected in soil functional microbial diversity analysis, principally due to the absence of specific investigation methods in the special soil environment. In this study we used a method based on high-density centrifugation to specifically isolate intact algal and yeast cells, with the aim to analyze them with flow cytometry and sort them for further molecular analysis such as deep sequencing. Recovery efficiency was tested at low abundance levels that fit those in natural environments (10(4) to 10(6) cells per g soil). Five algae and five yeast morphospecies isolated from soil were used for the testing. Recovery efficiency was between 1.5 to 43.16% and 2 to 30.2%, respectively, and was dependent on soil type for three of the algae. Control treatments without soil showed that the majority of cells were lost due to the method itself (58% and 55.8% respectively). However, the cell extraction technique did not much compromise cell vitality because a fluorescein di-acetate assay indicated high viability percentages (73.3% and 97.2% of cells, respectively). The low abundant algae and yeast morphospecies recovered from soil were cytometrically analyzed and sorted. Following, their DNA was isolated and amplified using specific primers. The developed workflow enables isolation and enrichment of intact autotrophic and heterotrophic soil unicellular eukaryotes from natural environments for subsequent application of deep sequencing technologies. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. DNA Barcode Analysis of Thrips (Thysanoptera) Diversity in Pakistan Reveals Cryptic Species Complexes.

    PubMed

    Iftikhar, Romana; Ashfaq, Muhammad; Rasool, Akhtar; Hebert, Paul D N

    2016-01-01

    Although thrips are globally important crop pests and vectors of viral disease, species identifications are difficult because of their small size and inconspicuous morphological differences. Sequence variation in the mitochondrial COI-5' (DNA barcode) region has proven effective for the identification of species in many groups of insect pests. We analyzed barcode sequence variation among 471 thrips from various plant hosts in north-central Pakistan. The Barcode Index Number (BIN) system assigned these sequences to 55 BINs, while the Automatic Barcode Gap Discovery detected 56 partitions, a count that coincided with the number of monophyletic lineages recognized by Neighbor-Joining analysis and Bayesian inference. Congeneric species showed an average of 19% sequence divergence (range = 5.6% - 27%) at COI, while intraspecific distances averaged 0.6% (range = 0.0% - 7.6%). BIN analysis suggested that all intraspecific divergence >3.0% actually involved a species complex. In fact, sequences for three major pest species (Haplothrips reuteri, Thrips palmi, Thrips tabaci), and one predatory thrips (Aeolothrips intermedius) showed deep intraspecific divergences, providing evidence that each is a cryptic species complex. The study compiles the first barcode reference library for the thrips of Pakistan, and examines global haplotype diversity in four important pest thrips.

  18. Microbial Characterization of Qatari Barchan Sand Dunes

    PubMed Central

    Chatziefthimiou, Aspassia D.; Nguyen, Hanh; Richer, Renee; Louge, Michel; Sultan, Ali A.; Schloss, Patrick; Hay, Anthony G.

    2016-01-01

    This study represents the first characterization of sand microbiota in migrating barchan sand dunes. Bacterial communities were studied through direct counts and cultivation, as well as 16S rRNA gene and metagenomic sequence analysis to gain an understanding of microbial abundance, diversity, and potential metabolic capabilities. Direct on-grain cell counts gave an average of 5.3 ± 0.4 x 105 cells g-1 of sand. Cultured isolates (N = 64) selected for 16S rRNA gene sequencing belonged to the phyla Actinobacteria (58%), Firmicutes (27%) and Proteobacteria (15%). Deep-sequencing of 16S rRNA gene amplicons from 18 dunes demonstrated a high relative abundance of Proteobacteria, particularly enteric bacteria, and a dune-specific-pattern of bacterial community composition that correlated with dune size. Shotgun metagenome sequences of two representative dunes were analyzed and found to have similar relative bacterial abundance, though the relative abundances of eukaryotic, viral and enterobacterial sequences were greater in sand from the dune closer to a camel-pen. Functional analysis revealed patterns similar to those observed in desert soils; however, the increased relative abundance of genes encoding sporulation and dormancy are consistent with the dune microbiome being well-adapted to the exceptionally hyper-arid Qatari desert. PMID:27655399

  19. microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.

    PubMed

    Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong

    2012-01-01

    microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.

  20. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    PubMed Central

    2010-01-01

    Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131

  1. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    PubMed

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  2. Molecular phylogeny and comparative morphology indicate that odontostomatids (Alveolata, Ciliophora) form a distinct class-level taxon related to Armophorea.

    PubMed

    Fernandes, Noemi M; Vizzoni, Vinicius F; Borges, Bárbara do N; A G Soares, Carlos; Silva-Neto, Inácio D da; S Paiva, Thiago da

    2018-04-18

    The odontostomatids are among the least studied ciliates, possibly due to their small sizes, restriction to anaerobic environments and difficulty in culturing. Consequently, their phylogenetic affinities to other ciliate taxa are still poorly understood. In the present study, we analyzed newly obtained ribosomal gene sequences of the odontostomatids Discomorphella pedroeneasi and Saprodinium dentatum, together with sequences from the literature, including Epalxella antiquorum and a large assemblage of ciliate sequences representing the major recognized classes. The results show that D. pedroeneasi and S. dentatum form a deep-diverging branch related to metopid and clevelandellid armophoreans, corroborating the old literature. However E. antiquorum clustered with the morphologically discrepant plagiopylids, indicating that either the complex odontostomatid body architecture evolved convergently, or the positioning of E. antiquorum as a plagiopylid is artifactual. A new ciliate class, Odontostomatea n. cl., is proposed based on molecular analyses and comparative morphology of odontostomatids with related taxa. Copyright © 2018. Published by Elsevier Inc.

  3. Sensitive Deep-Sequencing-Based HIV-1 Genotyping Assay To Simultaneously Determine Susceptibility to Protease, Reverse Transcriptase, Integrase, and Maturation Inhibitors, as Well as HIV-1 Coreceptor Tropism

    PubMed Central

    Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.

    2014-01-01

    With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals. PMID:24468782

  4. Dendrites, deep learning, and sequences in the hippocampus.

    PubMed

    Bhalla, Upinder S

    2017-10-12

    The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.

  5. Revealing the unexplored fungal communities in deep groundwater of crystalline bedrock fracture zones in Olkiluoto, Finland.

    PubMed

    Sohlberg, Elina; Bomberg, Malin; Miettinen, Hanna; Nyyssönen, Mari; Salavirta, Heikki; Vikman, Minna; Itävaara, Merja

    2015-01-01

    The diversity and functional role of fungi, one of the ecologically most important groups of eukaryotic microorganisms, remains largely unknown in deep biosphere environments. In this study we investigated fungal communities in packer-isolated bedrock fractures in Olkiluoto, Finland at depths ranging from 296 to 798 m below surface level. DNA- and cDNA-based high-throughput amplicon sequencing analysis of the fungal internal transcribed spacer (ITS) gene markers was used to examine the total fungal diversity and to identify the active members in deep fracture zones at different depths. Results showed that fungi were present in fracture zones at all depths and fungal diversity was higher than expected. Most of the observed fungal sequences belonged to the phylum Ascomycota. Phyla Basidiomycota and Chytridiomycota were only represented as a minor part of the fungal community. Dominating fungal classes in the deep bedrock aquifers were Sordariomycetes, Eurotiomycetes, and Dothideomycetes from the Ascomycota phylum and classes Microbotryomycetes and Tremellomycetes from the Basidiomycota phylum, which are the most frequently detected fungal taxa reported also from deep sea environments. In addition some fungal sequences represented potentially novel fungal species. Active fungi were detected in most of the fracture zones, which proves that fungi are able to maintain cellular activity in these oligotrophic conditions. Possible roles of fungi and their origin in deep bedrock groundwater can only be speculated in the light of current knowledge but some species may be specifically adapted to deep subsurface environment and may play important roles in the utilization and recycling of nutrients and thus sustaining the deep subsurface microbial community.

  6. Identification and functional analysis of flowering related microRNAs in common wild rice (Oryza rufipogon Griff.).

    PubMed

    Chen, Zongxiang; Li, Fuli; Yang, Songnan; Dong, Yibo; Yuan, Qianhua; Wang, Feng; Li, Weimin; Jiang, Ying; Jia, Shirong; Pei, Xinwu

    2013-01-01

    MicroRNAs (miRNAs) is a class of non-coding RNAs involved in post- transcriptional control of gene expression, via degradation and/or translational inhibition. Six-hundred sixty-one rice miRNAs are known that are important in plant development. However, flowering-related miRNAs have not been characterized in Oryza rufipogon Griff. It was approved by supervision department of Guangdong wild rice protection. We analyzed flowering-related miRNAs in O. rufipogon using high-throughput sequencing (deep sequencing) to understand the changes that occurred during rice domestication, and to elucidate their functions in flowering. Three O. rufipogon sRNA libraries, two vegetative stage (CWR-V1 and CWR-V2) and one flowering stage (CWR-F2) were sequenced using Illumina deep sequencing. A total of 20,156,098, 21,531,511 and 20,995,942 high quality sRNA reads were obtained from CWR-V1, CWR-V2 and CWR-F2, respectively, of which 3,448,185, 4,265,048 and 2,833,527 reads matched known miRNAs. We identified 512 known rice miRNAs in 214 miRNA families and predicted 290 new miRNAs. Targeted functional annotation, GO and KEGG pathway analyses predicted that 187 miRNAs regulate expression of flowering-related genes. Differential expression analysis of flowering-related miRNAs showed that: expression of 95 miRNAs varied significantly between the libraries, 66 are flowering-related miRNAs, such as oru-miR97, oru-miR117, oru-miR135, oru-miR137, et al. 17 are early-flowering -related miRNAs, including osa-miR160f, osa-miR164d, osa-miR167d, osa-miR169a, osa-miR172b, oru-miR4, et al., induced during the floral transition. Real-time PCR revealed the same expression patterns as deep sequencing. miRNAs targets were confirmed for cleavage by 5'-RACE in vivo, and were negatively regulated by miRNAs. This is the first investigation of flowering miRNAs in wild rice. The result indicates that variation in miRNAs occurred during rice domestication and lays a foundation for further study of phase change and flowering in O. rufipogon. Complicated regulatory networks mediated by multiple miRNAs regulate the expression of flowering genes that control the induction of flowering.

  7. Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.

    PubMed

    Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E

    2013-02-12

    High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.

  8. Deep RNNs for video denoising

    NASA Astrophysics Data System (ADS)

    Chen, Xinyuan; Song, Li; Yang, Xiaokang

    2016-09-01

    Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.

  9. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile, and accurate RNA structure analysis

    PubMed Central

    Smola, Matthew J.; Rice, Greggory M.; Busan, Steven; Siegfried, Nathan A.; Weeks, Kevin M.

    2016-01-01

    SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures. PMID:26426499

  10. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes.

    PubMed

    Bazak, Lily; Haviv, Ami; Barak, Michal; Jacob-Hirsch, Jasmine; Deng, Patricia; Zhang, Rui; Isaacs, Farren J; Rechavi, Gideon; Li, Jin Billy; Eisenberg, Eli; Levanon, Erez Y

    2014-03-01

    RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.

  11. An Outbreak of Acute Hepatitis Caused by Genotype IB Hepatitis A Viruses Contaminating the Water Supply in Thailand.

    PubMed

    Ruchusatsawat, Kriangsak; Wongpiyabovorn, Jongkonnee; Kawidam, Chonthicha; Thiemsing, Laddawan; Sangkitporn, Somchai; Yoshizaki, Sayaka; Tatsumi, Masashi; Takeda, Naokazu; Ishii, Koji

    2016-01-01

    In 2000, an outbreak of acute hepatitis A was reported in a province adjacent to Bangkok, Thailand. To investigate the cause of the 2000 hepatitis A outbreaks in Thailand using molecular epidemiological analysis. Serum and stool specimens were collected from patients who were clinically diagnosed with acute viral hepatitis. Water samples from drinking water and deep-drilled wells were also collected. These specimens were subjected to polymerase chain reaction (PCR) amplification and sequencing of the VP1/2A region of the hepatitis A virus (HAV) genome. The entire genome sequence of one of the fecal specimens was determined and phylogenetically analyzed with those of known HAV sequences. Eleven of 24 fecal specimens collected from acute viral hepatitis patients were positive as determined by semi- nested reverse transcription PCR targeting the VP1/2A region of HAV. The nucleotide sequence of these samples had an identical genotype IB sequence, suggesting that the same causative agent was present. The complete nucleotide sequence derived from one of the samples indicated that the Thai genotype IB strain should be classified in a unique phylogenetic cluster. The analysis using an adjusted odds ratio showed that the consumption of groundwater was the most likely risk factor associated with the disease. © 2017 S. Karger AG, Basel.

  12. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-08

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Deep ART Neural Model for Biologically Inspired Episodic Memory and Its Application to Task Performance of Robots.

    PubMed

    Park, Gyeong-Moon; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Gyeong-Moon Park; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Yoo, Yong-Ho; Park, Gyeong-Moon; Kim, Jong-Hwan; Kim, Deok-Hwa

    2018-06-01

    Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.

  14. Site-directed mutagenesis in Petunia × hybrida protoplast system using direct delivery of purified recombinant Cas9 ribonucleoproteins.

    PubMed

    Subburaj, Saminathan; Chung, Sung Jin; Lee, Choongil; Ryu, Seuk-Min; Kim, Duk Hyoung; Kim, Jin-Soo; Bae, Sangsu; Lee, Geung-Joo

    2016-07-01

    Site-directed mutagenesis of nitrate reductase genes using direct delivery of purified Cas9 protein preassembled with guide RNA produces mutations efficiently in Petunia × hybrida protoplast system. The clustered, regularly interspaced, short palindromic repeat (CRISPR)-CRISPR associated endonuclease 9 (CRISPR/Cas9) system has been recently announced as a powerful molecular breeding tool for site-directed mutagenesis in higher plants. Here, we report a site-directed mutagenesis method targeting Petunia nitrate reductase (NR) gene locus. This method could create mutations efficiently using direct delivery of purified Cas9 protein and single guide RNA (sgRNA) into protoplast cells. After transient introduction of RNA-guided endonuclease (RGEN) ribonucleoproteins (RNPs) with different sgRNAs targeting NR genes, mutagenesis at the targeted loci was detected by T7E1 assay and confirmed by targeted deep sequencing. T7E1 assay showed that RGEN RNPs induced site-specific mutations at frequencies ranging from 2.4 to 21 % at four different sites (NR1, 2, 4 and 6) in the PhNR gene locus with average mutation efficiency of 14.9 ± 2.2 %. Targeted deep DNA sequencing revealed mutation rates of 5.3-17.8 % with average mutation rate of 11.5 ± 2 % at the same NR gene target sites in DNA fragments of analyzed protoplast transfectants. Further analysis from targeted deep sequencing showed that the average ratio of deletion to insertion produced collectively by the four NR-RGEN target sites (NR1, 2, 4, and 6) was about 63:37. Our results demonstrated that direct delivery of RGEN RNPs into protoplast cells of Petunia can be exploited as an efficient tool for site-directed mutagenesis of genes or genome editing in plant systems.

  15. Optimization of whole-transcriptome amplification from low cell density deep-sea microbial samples for metatranscriptomic analysis.

    PubMed

    Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R

    2011-01-01

    Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.

  16. Large-scale identification and comparative analysis of miRNA expression profile in the respiratory tree of the sea cucumber Apostichopus japonicus during aestivation.

    PubMed

    Chen, Muyan; Storey, Kenneth B

    2014-02-01

    The sea cucumber Apostichopus japonicus withstands high water temperatures in the summer by suppressing its metabolic rate and entering a state of aestivation. We hypothesized that changes in the expression of miRNAs could provide important post-transcriptional regulation of gene expression during hypometabolism via control over mRNA translation. The present study analyzed profiles of miRNA expression in the sea cucumber respiratory tree using Solexa deep sequencing technology. We identified 279 sea cucumber miRNAs, including 15 novel miRNAs specific to sea cucumber. Animals sampled during deep aestivation (DA; after at least 15 days of continuous torpor) were compared with animals from a non-aestivation (NA) state (animals that had passed through aestivation and returned to an active state). We identified 30 differentially expressed miRNAs ([RPM (reads per million) >10, |FC| (|fold change|)≥1, FDR (false discovery rate)<0.01]) during aestivation, which were validated by two other miRNA profiling methods: miRNA microarray and real-time PCR. Among the most prominent miRNA species, miR-124, miR-124-3p, miR-79, miR-9 and miR-2010 were significantly over-expressed during deep aestivation compared with non-aestivation animals, suggesting that these miRNAs may play important roles in metabolic rate suppression during aestivation. High-throughput sequencing data and microarray data have been submitted to the GEO database with accession number: 16902695. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. The complete mitochondrial genome of the deep-sea sponge Poecillastra laminaris (Astrophorida, Vulcanellidae).

    PubMed

    Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A

    2016-05-01

    The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.

  18. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.

    PubMed

    Wang, Duolin; Zeng, Shuai; Xu, Chunhui; Qiu, Wangren; Liang, Yanchun; Joshi, Trupti; Xu, Dong

    2017-12-15

    Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data. MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep. xudong@missouri.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  19. Fungal diversity in deep-sea sediments associated with asphalt seeps at the Sao Paulo Plateau

    NASA Astrophysics Data System (ADS)

    Nagano, Yuriko; Miura, Toshiko; Nishi, Shinro; Lima, Andre O.; Nakayama, Cristina; Pellizari, Vivian H.; Fujikura, Katsunori

    2017-12-01

    We investigated the fungal diversity in a total of 20 deep-sea sediment samples (of which 14 samples were associated with natural asphalt seeps and 6 samples were not associated) collected from two different sites at the Sao Paulo Plateau off Brazil by Ion Torrent PGM targeting ITS region of ribosomal RNA. Our results suggest that diverse fungi (113 operational taxonomic units (OTUs) based on clustering at 97% sequence similarity assigned into 9 classes and 31 genus) are present in deep-sea sediment samples collected at the Sao Paulo Plateau, dominated by Ascomycota (74.3%), followed by Basidiomycota (11.5%), unidentified fungi (7.1%), and sequences with no affiliation to any organisms in the public database (7.1%). However, it was revealed that only three species, namely Penicillium sp., Cadophora malorum and Rhodosporidium diobovatum, were dominant, with the majority of OTUs remaining a minor community. Unexpectedly, there was no significant difference in major fungal community structure between the asphalt seep and non-asphalt seep sites, despite the presence of mass hydrocarbon deposits and the high amount of macro organisms surrounding the asphalt seeps. However, there were some differences in the minor fungal communities, with possible asphalt degrading fungi present specifically in the asphalt seep sites. In contrast, some differences were found between the two different sampling sites. Classification of OTUs revealed that only 47 (41.6%) fungal OTUs exhibited >97% sequence similarity, in comparison with pre-existing ITS sequences in public databases, indicating that a majority of deep-sea inhabiting fungal taxa still remain undescribed. Although our knowledge on fungi and their role in deep-sea environments is still limited and scarce, this study increases our understanding of fungal diversity and community structure in deep-sea environments.

  20. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining.

    PubMed

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-04

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. LookSeq: a browser-based viewer for deep sequencing data.

    PubMed

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P

    2009-11-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

  2. Identification and profiling of novel microRNAs in the Brassica rapa genome based on small RNA deep sequencing

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link: http://bramrs.rna.kr [1]. PMID:23163954

  3. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed Central

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity. PMID:26701112

  4. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed

    Dell'Anno, Antonio; Carugati, Laura; Corinaldesi, Cinzia; Riccioni, Giulia; Danovaro, Roberto

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity.

  5. Molecular Phylogenetic Analysis of Archaeal Intron-Containing Genes Coding for rRNA Obtained from a Deep-Subsurface Geothermal Water Pool

    PubMed Central

    Takai, Ken; Horikoshi, Koki

    1999-01-01

    Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021

  6. PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes.

    PubMed

    Gregor, Ivan; Dröge, Johannes; Schirmer, Melanie; Quince, Christopher; McHardy, Alice C

    2016-01-01

    Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.

  7. Deep sequencing and genome-wide analysis reveals the expansion of MicroRNA genes in the gall midge Mayetiola destructor

    PubMed Central

    2013-01-01

    Background MicroRNAs (miRNAs) are small non-coding RNAs that play critical roles in regulating post transcriptional gene expression. Gall midges encompass a large group of insects that are of economic importance and also possess fascinating biological traits. The gall midge Mayetiola destructor, commonly known as the Hessian fly, is a destructive pest of wheat and model organism for studying gall midge biology and insect – host plant interactions. Results In this study, we systematically analyzed miRNAs from the Hessian fly. Deep-sequencing a Hessian fly larval transcriptome led to the identification of 89 miRNA species that are either identical or very similar to known miRNAs from other insects, and 184 novel miRNAs that have not been reported from other species. A genome-wide search through a draft Hessian fly genome sequence identified a total of 611 putative miRNA-encoding genes based on sequence similarity and the existence of a stem-loop structure for miRNA precursors. Analysis of the 611 putative genes revealed a striking feature: the dramatic expansion of several miRNA gene families. The largest family contained 91 genes that encoded 20 different miRNAs. Microarray analyses revealed the expression of miRNA genes was strictly regulated during Hessian fly larval development and abundance of many miRNA genes were affected by host genotypes. Conclusion The identification of a large number of miRNAs for the first time from a gall midge provides a foundation for further studies of miRNA functions in gall midge biology and behavior. The dramatic expansion of identical or similar miRNAs provides a unique system to study functional relations among miRNA iso-genes as well as changes in sequence specificity due to small changes in miRNAs and in their mRNA targets. These results may also facilitate the identification of miRNA genes for potential pest control through transgenic approaches. PMID:23496979

  8. A matter of phylogenetic scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the cause of gene tree discord in recent versus deep diversification histories.

    PubMed

    Knowles, L Lacey; Huang, Huateng; Sukumaran, Jeet; Smith, Stephen A

    2018-03-01

    Discordant gene trees are commonly encountered when sequences from thousands of loci are applied to estimate phylogenetic relationships. Several processes contribute to this discord. Yet, we have no methods that jointly model different sources of conflict when estimating phylogenies. An alternative to analyzing entire genomes or all the sequenced loci is to identify a subset of loci for phylogenetic analysis. If we can identify data partitions that are most likely to reflect descent from a common ancestor (i.e., discordant loci that indeed reflect incomplete lineage sorting [ILS], as opposed to some other process, such as lateral gene transfer [LGT]), we can analyze this subset using powerful coalescent-based species-tree approaches. Test data sets were simulated where discord among loci could arise from ILS and LGT. Data sets where analyzed using the newly developed program CLASSIPHY (Huang et al., ) to assess whether our ability to distinguish the cause of discord among loci varied when ILS and LGT occurred in the recent versus deep past and whether the accuracy of these inferences were affected by the mutational process. We show that accuracy of probabilistic classification of individual loci by the cause of discord differed when ILS and LGT events occurred more recently compared with the distant past and that the signal-to-noise ratio arising from the mutational process contributes to difficulties in inferring LGT data partitions. We discuss our findings in terms of the promise and limitations of identifying subsets of loci for species-tree inference that will not violate the underlying coalescent model (i.e., data partitions in which ILS, and not LGT, contributes to discord). We also discuss the empirical implications of our work given the many recalcitrant nodes in the tree of life (e.g., origins of angiosperms, amniotes, or Neoaves), and recent arguments for concatenating loci. © 2018 Botanical Society of America.

  9. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing.

    PubMed

    Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar

    2014-03-04

    Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.

  10. PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes

    PubMed Central

    Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai

    2018-01-01

    Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441

  11. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.

  12. Optimization of conditions to sequence long cDNAs from viruses

    USDA-ARS?s Scientific Manuscript database

    Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...

  13. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    USDA-ARS?s Scientific Manuscript database

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  14. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  15. Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.

    PubMed

    James, Katherine; Cockell, Simon J; Zenkin, Nikolay

    2017-05-01

    The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Insertion sequences enrichment in extreme Red sea brine pool vent.

    PubMed

    Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

    2017-03-01

    Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.

  17. Vertical and horizontal genetic connectivity in Chromis verater, an endemic damselfish found on shallow and mesophotic reefs in the Hawaiian Archipelago and adjacent Johnston Atoll.

    PubMed

    Tenggardjaja, Kimberly A; Bowen, Brian W; Bernardi, Giacomo

    2014-01-01

    Understanding vertical and horizontal connectivity is a major priority in research on mesophotic coral ecosystems (30-150 m). However, horizontal connectivity has been the focus of few studies, and data on vertical connectivity are limited to sessile benthic mesophotic organisms. Here we present patterns of vertical and horizontal connectivity in the Hawaiian Islands-Johnston Atoll endemic threespot damselfish, Chromis verater, based on 319 shallow specimens and 153 deep specimens. The mtDNA markers cytochrome b and control region were sequenced to analyze genetic structure: 1) between shallow (< 30 m) and mesophotic (30-150 m) populations and 2) across the species' geographic range. Additionally, the nuclear markers rhodopsin and internal transcribed spacer 2 of ribosomal DNA were sequenced to assess connectivity between shallow and mesophotic populations. There was no significant genetic differentiation by depth, indicating high levels of vertical connectivity between shallow and deep aggregates of C. verater. Consequently, shallow and deep samples were combined by location for analyses of horizontal connectivity. We detected low but significant population structure across the Hawaiian Archipelago (overall cytochrome b: ΦST = 0.009, P = 0.020; control region: ΦST = 0.012, P = 0.009) and a larger break between the archipelago and Johnston Atoll (cytochrome b: ΦST = 0.068, P < 0.001; control region: ΦST = 0.116, P < 0.001). The population structure within the archipelago was driven by samples from the island of Hawaii at the southeast end of the chain and Lisianski in the middle of the archipelago. The lack of vertical genetic structure supports the refugia hypothesis that deep reefs may constitute a population reservoir for species depleted in shallow reef habitats. These findings represent the first connectivity study on a mobile organism that spans shallow and mesophotic depths and provide a reference point for future connectivity studies on mesophotic fishes.

  18. Vertical and Horizontal Genetic Connectivity in Chromis verater, an Endemic Damselfish Found on Shallow and Mesophotic Reefs in the Hawaiian Archipelago and Adjacent Johnston Atoll

    PubMed Central

    Tenggardjaja, Kimberly A.; Bowen, Brian W.; Bernardi, Giacomo

    2014-01-01

    Understanding vertical and horizontal connectivity is a major priority in research on mesophotic coral ecosystems (30–150 m). However, horizontal connectivity has been the focus of few studies, and data on vertical connectivity are limited to sessile benthic mesophotic organisms. Here we present patterns of vertical and horizontal connectivity in the Hawaiian Islands-Johnston Atoll endemic threespot damselfish, Chromis verater, based on 319 shallow specimens and 153 deep specimens. The mtDNA markers cytochrome b and control region were sequenced to analyze genetic structure: 1) between shallow (<30 m) and mesophotic (30–150 m) populations and 2) across the species' geographic range. Additionally, the nuclear markers rhodopsin and internal transcribed spacer 2 of ribosomal DNA were sequenced to assess connectivity between shallow and mesophotic populations. There was no significant genetic differentiation by depth, indicating high levels of vertical connectivity between shallow and deep aggregates of C. verater. Consequently, shallow and deep samples were combined by location for analyses of horizontal connectivity. We detected low but significant population structure across the Hawaiian Archipelago (overall cytochrome b: ΦST = 0.009, P = 0.020; control region: ΦST = 0.012, P = 0.009) and a larger break between the archipelago and Johnston Atoll (cytochrome b: ΦST = 0.068, P<0.001; control region: ΦST = 0.116, P<0.001). The population structure within the archipelago was driven by samples from the island of Hawaii at the southeast end of the chain and Lisianski in the middle of the archipelago. The lack of vertical genetic structure supports the refugia hypothesis that deep reefs may constitute a population reservoir for species depleted in shallow reef habitats. These findings represent the first connectivity study on a mobile organism that spans shallow and mesophotic depths and provide a reference point for future connectivity studies on mesophotic fishes. PMID:25517964

  19. Analysis of Variability in HIV-1 Subtype A Strains in Russia Suggests a Combination of Deep Sequencing and Multitarget RNA Interference for Silencing of the Virus.

    PubMed

    Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A

    2017-02-01

    Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.

  20. The transcriptome of Bathymodiolus azoricus gill reveals expression of genes from endosymbionts and free-living deep-sea bacteria.

    PubMed

    Egas, Conceição; Pinheiro, Miguel; Gomes, Paula; Barroso, Cristina; Bettencourt, Raul

    2012-08-01

    Deep-sea environments are largely unexplored habitats where a surprising number of species may be found in large communities, thriving regardless of the darkness, extreme cold, and high pressure. Their unique geochemical features result in reducing environments rich in methane and sulfides, sustaining complex chemosynthetic ecosystems that represent one of the most surprising findings in oceans in the last 40 years. The deep-sea Lucky Strike hydrothermal vent field, located in the Mid Atlantic Ridge, is home to large vent mussel communities where Bathymodiolus azoricus represents the dominant faunal biomass, owing its survival to symbiotic associations with methylotrophic or methanotrophic and thiotrophic bacteria. The recent transcriptome sequencing and analysis of gill tissues from B. azoricus revealed a number of genes of bacterial origin, hereby analyzed to provide a functional insight into the gill microbial community. The transcripts supported a metabolically active microbiome and a variety of mechanisms and pathways, evidencing also the sulfur and methane metabolisms. Taxonomic affiliation of transcripts and 16S rRNA community profiling revealed a microbial community dominated by thiotrophic and methanotrophic endosymbionts of B. azoricus and the presence of a Sulfurovum-like epsilonbacterium.

  1. DeepLoc: prediction of protein subcellular localization using deep learning.

    PubMed

    Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole

    2017-11-01

    The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  2. A deep learning framework for improving long-range residue-residue contact prediction using a hierarchical strategy.

    PubMed

    Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng

    2017-09-01

    Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. High-throughput sequencing of the entire genomic regions of CCM1/KRIT1, CCM2 and CCM3/PDCD10 to search for pathogenic deep-intronic splice mutations in cerebral cavernous malformations.

    PubMed

    Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute

    2017-09-01

    Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  4. Musculoskeletal MRI findings of juvenile localized scleroderma.

    PubMed

    Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W

    2017-04-01

    Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.

  5. Dissecting enzyme function with microfluidic-based deep mutational scanning.

    PubMed

    Romero, Philip A; Tran, Tuan M; Abate, Adam R

    2015-06-09

    Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.

  6. Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...

  7. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  8. Deep whole-genome sequencing of 100 southeast Asian Malays.

    PubMed

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  10. Archaeal β diversity patterns under the seafloor along geochemical gradients

    NASA Astrophysics Data System (ADS)

    Koyano, Hitoshi; Tsubouchi, Taishi; Kishino, Hirohisa; Akutsu, Tatsuya

    2014-09-01

    Recently, deep drilling into the seafloor has revealed that there are vast sedimentary ecosystems of diverse microorganisms, particularly archaea, in subsurface areas. We investigated the β diversity patterns of archaeal communities in sediment layers under the seafloor and their determinants. This study was accomplished by analyzing large environmental samples of 16S ribosomal RNA gene sequences and various geochemical data collected from a sediment core of 365.3 m, obtained by drilling into the seafloor off the east coast of the Shimokita Peninsula. To extract the maximum amount of information from these environmental samples, we first developed a method for measuring β diversity using sequence data by applying probability theory on a set of strings developed by two of the authors in a previous publication. We introduced an index of β diversity between sequence populations from which the sequence data were sampled. We then constructed an estimator of the β diversity index based on the sequence data and demonstrated that it converges to the β diversity index between sequence populations with probability of 1 as the number of sampled sequences increases. Next, we applied this new method to quantify β diversities between archaeal sequence populations under the seafloor and constructed a quantitative model of the estimated β diversity patterns. Nearly 90% of the variation in the archaeal β diversity was explained by a model that included as variables the differences in the abundances of chlorine, iodine, and carbon between the sediment layers.

  11. High diversity of picornaviruses in rats from different continents revealed by deep sequencing.

    PubMed

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-Phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-08-17

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission.

  12. High-Throughput Sequencing of Arabidopsis microRNAs: Evidence for Frequent Birth and Death of MIRNA Genes

    PubMed Central

    Fahlgren, Noah; Howell, Miya D.; Kasschau, Kristin D.; Chapman, Elisabeth J.; Sullivan, Christopher M.; Cumbie, Jason S.; Givan, Scott A.; Law, Theresa F.; Grant, Sarah R.; Dangl, Jeffery L.; Carrington, James C.

    2007-01-01

    In plants, microRNAs (miRNAs) comprise one of two classes of small RNAs that function primarily as negative regulators at the posttranscriptional level. Several MIRNA genes in the plant kingdom are ancient, with conservation extending between angiosperms and the mosses, whereas many others are more recently evolved. Here, we use deep sequencing and computational methods to identify, profile and analyze non-conserved MIRNA genes in Arabidopsis thaliana. 48 non-conserved MIRNA families, nearly all of which were represented by single genes, were identified. Sequence similarity analyses of miRNA precursor foldback arms revealed evidence for recent evolutionary origin of 16 MIRNA loci through inverted duplication events from protein-coding gene sequences. Interestingly, these recently evolved MIRNA genes have taken distinct paths. Whereas some non-conserved miRNAs interact with and regulate target transcripts from gene families that donated parental sequences, others have drifted to the point of non-interaction with parental gene family transcripts. Some young MIRNA loci clearly originated from one gene family but form miRNAs that target transcripts in another family. We suggest that MIRNA genes are undergoing relatively frequent birth and death, with only a subset being stabilized by integration into regulatory networks. PMID:17299599

  13. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2016-09-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.

  14. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling

    PubMed Central

    Wang, Sheng; Sun, Siqi

    2017-01-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168

  15. Modeling genome coverage in single-cell sequencing

    PubMed Central

    Daley, Timothy; Smith, Andrew D.

    2014-01-01

    Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873

  16. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    PubMed

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  17. Use of sequence-independent-single-primer-amplification (SISPA) for whole genome sequencing using illumina MiSeq platform for avian influenza virus, Newcastle disease virus, and infectious bronchitis virus

    USDA-ARS?s Scientific Manuscript database

    Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...

  18. A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification.

    PubMed

    Yildirim, Özal

    2018-05-01

    Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.

    PubMed

    Zhang, Buzhong; Li, Linqing; Lü, Qiang

    2018-05-25

    Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.

  20. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma

    PubMed Central

    Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-01-01

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471

  1. Deep sequencing and flow cytometric characterization of expanded effector memory CD8+CD57+ T cells frequently reveals T-cell receptor Vβ oligoclonality and CDR3 homology in acquired aplastic anemia.

    PubMed

    Giudice, Valentina; Feng, Xingmin; Lin, Zenghua; Hu, Wei; Zhang, Fanmao; Qiao, Wangmin; Ibanez, Maria Del Pilar Fernandez; Rios, Olga; Young, Neal S

    2018-05-01

    Oligoclonal expansion of CD8 + CD28 - lymphocytes has been considered indirect evidence for a pathogenic immune response in acquired aplastic anemia. A subset of CD8 + CD28 - cells with CD57 expression, termed effector memory cells, is expanded in several immune-mediated diseases and may have a role in immune surveillance. We hypothesized that effector memory CD8 + CD28 - CD57 + cells may drive aberrant oligoclonal expansion in aplastic anemia. We found CD8 + CD57 + cells frequently expanded in the blood of aplastic anemia patients, with oligoclonal characteristics by flow cytometric Vβ usage analysis: skewing in 1-5 Vβ families and frequencies of immunodominant clones ranging from 1.98% to 66.5%. Oligoclonal characteristics were also observed in total CD8 + cells from aplastic anemia patients with CD8 + CD57 + cell expansion by T-cell receptor deep sequencing, as well as the presence of 1-3 immunodominant clones. Oligoclonality was confirmed by T-cell receptor repertoire deep sequencing of enriched CD8 + CD57 + cells, which also showed decreased diversity compared to total CD4 + and CD8 + cell pools. From analysis of complementarity-determining region 3 sequences in the CD8 + cell pool, a total of 29 sequences were shared between patients and controls, but these sequences were highly expressed in aplastic anemia subjects and also present in their immunodominant clones. In summary, expansion of effector memory CD8 + T cells is frequent in aplastic anemia and mirrors Vβ oligoclonal expansion. Flow cytometric Vβ usage analysis combined with deep sequencing technologies allows high resolution characterization of the T-cell receptor repertoire, and might represent a useful tool in the diagnosis and periodic evaluation of aplastic anemia patients. (Registered at clinicaltrials.gov identifiers: 00001620, 01623167, 00001397, 00071045, 00081523, 00961064 ). Copyright © 2018 Ferrata Storti Foundation.

  2. Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...

  3. Exploring Genomic Diversity Using Metagenomics of Deep-Sea Subsurface Microbes from the Louisville Seamount and the South Pacific Gyre

    NASA Astrophysics Data System (ADS)

    Tully, B. J.; Sylvan, J. B.; Heidelberg, J. F.; Huber, J. A.

    2014-12-01

    There are many limitations involved with sampling microbial diversity from deep-sea subsurface environments, ranging from physical sample collection, low microbial biomass, culturing at in situ conditions, and inefficient nucleic acid extractions. As such, we are continually modifying our methods to obtain better results and expanding what we know about microbes in these environments. Here we present analysis of metagenomes sequences from samples collected from 120 m within the Louisville Seamount and from the top 5-10cm of the sediment in the center of the south Pacific gyre (SPG). Both systems are low biomass with ~102 and ~104 cells per cm3 for Louisville Seamount samples analyzed and the SPG sediment, respectively. The Louisville Seamount represents the first in situ subseafloor basalt and the SPG sediments represent the first in situ low biomass sediment microbial metagenomes. Both of these environments, subseafloor basalt and sediments underlying oligotrophic ocean gyres, represent large provinces of the seafloor environment that remain understudied. Despite the low biomass and DNA generated from these samples, we have generated 16 near complete genomes (5 from Louisville and 11 from the SPG) from the two metagenomic datasets. These genomes are estimated to be between 51-100% complete and span a range of phylogenetic groups, including the Proteobacteria, Actinobacteria, Firmicutes, Chloroflexi, and unclassified bacterial groups. With these genomes, we have assessed potential functional capabilities of these organisms and performed a comparative analysis between the environmental genomes and previously sequenced relatives to determine possible adaptations that may elucidate survival mechanisms for these low energy environments. These methods illustrate a baseline analysis that can be applied to future metagenomic deep-sea subsurface datasets and will help to further our understanding of microbiology within these environments.

  4. Proteome-wide Identification of Novel Ceramide-binding Proteins by Yeast Surface cDNA Display and Deep Sequencing.

    PubMed

    Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin

    2016-04-01

    Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  5. Small RNA Deep Sequencing and the Effects of microRNA408 on Root Gravitropic Bending in Arabidopsis

    NASA Astrophysics Data System (ADS)

    Li, Huasheng; Lu, Jinying; Sun, Qiao; Chen, Yu; He, Dacheng; Liu, Min

    2015-11-01

    MicroRNA (miRNA) is a non-coding small RNA composed of 20 to 24 nucleotides that influences plant root development. This study analyzed the miRNA expression in Arabidopsis root tip cells using Illumina sequencing and real-time PCR before (sample 0) and 15 min after (sample 15) a 3-D clinostat rotational treatment was administered. After stimulation was performed, the expression levels of seven miRNA genes, including Arabidopsis miR160, miR161, miR394, miR402, miR403, miR408, and miR823, were significantly upregulated. Illumina sequencing results also revealed two novel miRNAsthat have not been previously reported, The target genes of these miRNAs included pentatricopeptide repeat-containing protein and diadenosine tetraphosphate hydrolase. An overexpression vector of Arabidopsis miR408 was constructed and transferred to Arabidopsis plant. The roots of plants over expressing miR408 exhibited a slower reorientation upon gravistimulation in comparison with those of wild-type. This result indicate that miR408 could play a role in root gravitropic response.

  6. Clinical genomics information management software linking cancer genome sequence and clinical decisions.

    PubMed

    Watt, Stuart; Jiao, Wei; Brown, Andrew M K; Petrocelli, Teresa; Tran, Ben; Zhang, Tong; McPherson, John D; Kamel-Reid, Suzanne; Bedard, Philippe L; Onetto, Nicole; Hudson, Thomas J; Dancey, Janet; Siu, Lillian L; Stein, Lincoln; Ferretti, Vincent

    2013-09-01

    Using sequencing information to guide clinical decision-making requires coordination of a diverse set of people and activities. In clinical genomics, the process typically includes sample acquisition, template preparation, genome data generation, analysis to identify and confirm variant alleles, interpretation of clinical significance, and reporting to clinicians. We describe a software application developed within a clinical genomics study, to support this entire process. The software application tracks patients, samples, genomic results, decisions and reports across the cohort, monitors progress and sends reminders, and works alongside an electronic data capture system for the trial's clinical and genomic data. It incorporates systems to read, store, analyze and consolidate sequencing results from multiple technologies, and provides a curated knowledge base of tumor mutation frequency (from the COSMIC database) annotated with clinical significance and drug sensitivity to generate reports for clinicians. By supporting the entire process, the application provides deep support for clinical decision making, enabling the generation of relevant guidance in reports for verification by an expert panel prior to forwarding to the treating physician. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life

    PubMed Central

    Sheik, Cody S.; Reese, Brandi Kiel; Twing, Katrina I.; Sylvan, Jason B.; Grim, Sharon L.; Schrenk, Matthew O.; Sogin, Mitchell L.; Colwell, Frederick S.

    2018-01-01

    Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset. PMID:29780369

  8. Low-abundance HIV drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use.

    PubMed

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J

    2009-06-29

    It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.

  9. Low-Abundance HIV Drug-Resistant Viral Variants in Treatment-Experienced Persons Correlate with Historical Antiretroviral Use

    PubMed Central

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.

    2009-01-01

    Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031

  10. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life.

    PubMed

    Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S

    2018-01-01

    Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.

  11. Inferring Phylogenetic Networks Using PhyloNet.

    PubMed

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  12. Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum

    PubMed Central

    Miles, Alistair; Iqbal, Zamin; Vauterin, Paul; Pearson, Richard; Campino, Susana; Theron, Michel; Gould, Kelda; Mead, Daniel; Drury, Eleanor; O'Brien, John; Ruano Rubio, Valentin; MacInnis, Bronwyn; Mwangi, Jonathan; Samarakoon, Upeka; Ranford-Cartwright, Lisa; Ferdig, Michael; Hayton, Karen; Su, Xin-zhuan; Wellems, Thomas; Rayner, Julian; McVean, Gil; Kwiatkowski, Dominic

    2016-01-01

    The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired. PMID:27531718

  13. Microbes of deep marine sediments as viewed by metagenomics

    NASA Astrophysics Data System (ADS)

    Biddle, J.

    2015-12-01

    Ten years after the first deep marine sediment metagenome was produced, questions still exist about the nucleic acid sequences we have retrieved. Current data sets, including the Peru Margin, Costa Rica Margin and Iberian Margin show that consistently, data forms larger assemblies at depth due to the reduced complexity of the microbial community. But are these organisms active or preserved? At SMTZs, a change in the assembly statistics is noted, as well as an increase in cell counts, suggesting that cells are truly active. As depth increases, genome sizes are consistently large, suggesting that much like soil microbes, sedimentary microbes may maintain a larger reportorie of genomic potential. Functional changes are seen with depth, but at many sites are not correlated to specific geochemistries. Individual genomes show changes with depth, which raises interesting questions on how the subsurface is settled and maintained. The subsurface does have a distinct genomic signature, including unusual microbial groups, which we are now able to analyze for total genomic content.

  14. Deep infiltrating endometriosis: Should rectal and vaginal opacification be systematically used in MR imaging?

    PubMed

    Uyttenhove, F; Langlois, C; Collinet, P; Rubod, C; Verpillat, P; Bigot, J; Kerdraon, O; Faye, N

    2016-06-01

    To evaluate the interest of rectal and vaginal filling in vaginal and recto-sigmoid endometriosis with MR imaging. To compare the results between a senior and a junior radiologist review. Sixty-seven patients with clinically suspected deep infiltrating endometriosis were included in our MRI protocol consisting of repeated T2-weigthed sequences (axial and sagittal) before and after rectal and vaginal marking with ultrasonography gel. Vaginal and recto-sigmoid endometriosis lesions were analyzed before and after opacification. The inter-reader agreement between senior and junior scores was studied. Concerning vaginal and muscularis and beyond colonic involvement, no significant difference (P=0.32) was observed and the inter-reader agreement was excellent (K=0.96 and 0.97 respectively). Concerning serosa colonic lesions, a significant difference was observed (P=0.01) and the inter-reader agreement was poor (K=0). Rectal and vaginal filling in endometriosis staging with MRI is not necessary no matter the reader experiment. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  15. Nonlinear analysis and synthesis of video images using deep dynamic bottleneck neural networks for face recognition.

    PubMed

    Moghadam, Saeed Montazeri; Seyyedsalehi, Seyyed Ali

    2018-05-31

    Nonlinear components extracted from deep structures of bottleneck neural networks exhibit a great ability to express input space in a low-dimensional manifold. Sharing and combining the components boost the capability of the neural networks to synthesize and interpolate new and imaginary data. This synthesis is possibly a simple model of imaginations in human brain where the components are expressed in a nonlinear low dimensional manifold. The current paper introduces a novel Dynamic Deep Bottleneck Neural Network to analyze and extract three main features of videos regarding the expression of emotions on the face. These main features are identity, emotion and expression intensity that are laid in three different sub-manifolds of one nonlinear general manifold. The proposed model enjoying the advantages of recurrent networks was used to analyze the sequence and dynamics of information in videos. It is noteworthy to mention that this model also has also the potential to synthesize new videos showing variations of one specific emotion on the face of unknown subjects. Experiments on discrimination and recognition ability of extracted components showed that the proposed model has an average of 97.77% accuracy in recognition of six prominent emotions (Fear, Surprise, Sadness, Anger, Disgust, and Happiness), and 78.17% accuracy in the recognition of intensity. The produced videos revealed variations from neutral to the apex of an emotion on the face of the unfamiliar test subject which is on average 0.8 similar to reference videos in the scale of the SSIM method. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Graphical classification of DNA sequences of HLA alleles by deep learning.

    PubMed

    Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi

    2018-04-01

    Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.

  17. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-05

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.

  18. 3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.

    PubMed

    Goldfarb, Katherine C; Cech, Thomas R

    2013-09-21

    Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.

  19. Multiplicity and molecular epidemiology of Plasmodium vivax and Plasmodium falciparum infections in East Africa.

    PubMed

    Zhong, Daibin; Lo, Eugenia; Wang, Xiaoming; Yewhalaw, Delenasaw; Zhou, Guofa; Atieli, Harrysone E; Githeko, Andrew; Hemming-Schroeder, Elizabeth; Lee, Ming-Chieh; Afrane, Yaw; Yan, Guiyun

    2018-05-02

    Parasite genetic diversity and multiplicity of infection (MOI) affect clinical outcomes, response to drug treatment and naturally-acquired or vaccine-induced immunity. Traditional methods often underestimate the frequency and diversity of multiclonal infections due to technical sensitivity and specificity. Next-generation sequencing techniques provide a novel opportunity to study complexity of parasite populations and molecular epidemiology. Symptomatic and asymptomatic Plasmodium vivax samples were collected from health centres/hospitals and schools, respectively, from 2011 to 2015 in Ethiopia. Similarly, both symptomatic and asymptomatic Plasmodium falciparum samples were collected, respectively, from hospitals and schools in 2005 and 2015 in Kenya. Finger-pricked blood samples were collected and dried on filter paper. Long amplicon (> 400 bp) deep sequencing of merozoite surface protein 1 (msp1) gene was conducted to determine multiplicity and molecular epidemiology of P. vivax and P. falciparum infections. The results were compared with those based on short amplicon (117 bp) deep sequencing. A total of 139 P. vivax and 222 P. falciparum samples were pyro-sequenced for pvmsp1 and pfmsp1, yielding a total of 21 P. vivax and 99 P. falciparum predominant haplotypes. The average MOI for P. vivax and P. falciparum were 2.16 and 2.68, respectively, which were significantly higher than that of microsatellite markers and short amplicon (117 bp) deep sequencing. Multiclonal infections were detected in 62.2% of the samples for P. vivax and 74.8% of the samples for P. falciparum. Four out of the five subjects with recurrent P. vivax malaria were found to be a relapse 44-65 days after clearance of parasites. No difference was observed in MOI among P. vivax patients of different symptoms, ages and genders. Similar patterns were also observed in P. falciparum except for one study site in Kenyan lowland areas with significantly higher MOI. The study used a novel method to evaluate Plasmodium MOI and molecular epidemiological patterns by long amplicon ultra-deep sequencing. The complexity of infections were similar among age groups, symptoms, genders, transmission settings (spatial heterogeneity), as well as over years (pre- vs. post-scale-up interventions). This study demonstrated that long amplicon deep sequencing is a useful tool to investigate multiplicity and molecular epidemiology of Plasmodium parasite infections.

  20. Identification and Functional Analysis of Flowering Related microRNAs in Common Wild Rice (Oryza rufipogon Griff.)

    PubMed Central

    Dong, Yibo; Yuan, Qianhua; Wang, Feng; Li, Weimin; Jiang, Ying; Jia, Shirong; Pei, XinWu

    2013-01-01

    Background MicroRNAs (miRNAs) is a class of non-coding RNAs involved in post- transcriptional control of gene expression, via degradation and/or translational inhibition. Six-hundred sixty-one rice miRNAs are known that are important in plant development. However, flowering-related miRNAs have not been characterized in Oryza rufipogon Griff. It was approved by supervision department of Guangdong wild rice protection. We analyzed flowering-related miRNAs in O. rufipogon using high-throughput sequencing (deep sequencing) to understand the changes that occurred during rice domestication, and to elucidate their functions in flowering. Results Three O. rufipogon sRNA libraries, two vegetative stage (CWR-V1 and CWR-V2) and one flowering stage (CWR-F2) were sequenced using Illumina deep sequencing. A total of 20,156,098, 21,531,511 and 20,995,942 high quality sRNA reads were obtained from CWR-V1, CWR-V2 and CWR-F2, respectively, of which 3,448,185, 4,265,048 and 2,833,527 reads matched known miRNAs. We identified 512 known rice miRNAs in 214 miRNA families and predicted 290 new miRNAs. Targeted functional annotation, GO and KEGG pathway analyses predicted that 187 miRNAs regulate expression of flowering-related genes. Differential expression analysis of flowering-related miRNAs showed that: expression of 95 miRNAs varied significantly between the libraries, 66 are flowering-related miRNAs, such as oru-miR97, oru-miR117, oru-miR135, oru-miR137, et al. 17 are early-flowering -related miRNAs, including osa-miR160f, osa-miR164d, osa-miR167d, osa-miR169a, osa-miR172b, oru-miR4, et al., induced during the floral transition. Real-time PCR revealed the same expression patterns as deep sequencing. miRNAs targets were confirmed for cleavage by 5′-RACE in vivo, and were negatively regulated by miRNAs. Conclusions This is the first investigation of flowering miRNAs in wild rice. The result indicates that variation in miRNAs occurred during rice domestication and lays a foundation for further study of phase change and flowering in O. rufipogon. Complicated regulatory networks mediated by multiple miRNAs regulate the expression of flowering genes that control the induction of flowering. PMID:24386120

  1. A Deep-Coverage Tomato BAC Library and Prospects Toward Development of an STC Framework for Genome Sequencing

    PubMed Central

    Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.

    2000-01-01

    Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957

  2. Complete genome sequence of a novel genotype of squash mosaic virus

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...

  3. First complete genome sequence of an emerging cucumber green mottle mosaic virus isolate in North America

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...

  4. A Comprehensive Phylogenetic Analysis of the Scleractinia (Cnidaria, Anthozoa) Based on Mitochondrial CO1 Sequence Data

    PubMed Central

    Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.

    2010-01-01

    Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and complex corals. Deep-sea corals are likely to be critical to understanding anthozoan evolution and the origins of the Scleractinia. PMID:20628613

  5. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing

    PubMed Central

    Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.

    2016-01-01

    Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640

  6. Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network

    NASA Astrophysics Data System (ADS)

    Jiang, Hongkai; Li, Xingqiu; Shao, Haidong; Zhao, Ke

    2018-06-01

    Traditional intelligent fault diagnosis methods for rolling bearings heavily depend on manual feature extraction and feature selection. For this purpose, an intelligent deep learning method, named the improved deep recurrent neural network (DRNN), is proposed in this paper. Firstly, frequency spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly, DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features from the input spectrum sequences. Thirdly, an adaptive learning rate is adopted to improve the training performance of the constructed DRNN. The proposed method is verified with experimental rolling bearing data, and the results confirm that the proposed method is more effective than traditional intelligent fault diagnosis methods.

  7. Asymmetric purine-pyrimidine distribution in cellular small RNA population of papaya

    PubMed Central

    2012-01-01

    Background The small RNAs (sRNA) are a regulatory class of RNA mainly represented by the 21 and 24-nucleotide size classes. The cellular sRNAs are processed by RNase III family enzyme dicer (Dicer like in plant) from a self-complementary hairpin loop or other type of RNA duplexes. The papaya genome has been sequenced, but its microRNAs and other regulatory RNAs are yet to be analyzed. Results We analyzed the genomic features of the papaya sRNA population from three sRNA deep sequencing libraries made from leaves, flowers, and leaves infected with Papaya Ringspot Virus (PRSV). We also used the deep sequencing data to annotate the micro RNA (miRNA) in papaya. We identified 60 miRNAs, 24 of which were conserved in other species, and 36 of which were novel miRNAs specific to papaya. In contrast to the Chargaff’s purine-pyrimidine equilibrium, cellular sRNA was significantly biased towards a purine rich population. Of the two purine bases, higher frequency of adenine was present in 23nt or longer sRNAs, while 22nt or shorter sRNAs were over represented by guanine bases. However, this bias was not observed in the annotated miRNAs in plants. The 21nt species were expressed from fewer loci but expressed at higher levels relative to the 24nt species. The highly expressed 21nt species were clustered in a few isolated locations of the genome. The PRSV infected leaves showed higher accumulation of 21 and 22nt sRNA compared to uninfected leaves. We observed higher accumulation of miRNA* of seven annotated miRNAs in virus-infected tissue, indicating the potential function of miRNA* under stressed conditions. Conclusions We have identified 60 miRNAs in papaya. Our study revealed the asymmetric purine-pyrimidine distribution in cellular sRNA population. The 21nt species of sRNAs have higher expression levels than 24nt sRNA. The miRNA* of some miRNAs shows higher accumulation in PRSV infected tissues, suggesting that these strands are not totally functionally redundant. The findings open a new avenue for further investigation of the sRNA silencing pathway in plants. PMID:23216749

  8. Asymmetric purine-pyrimidine distribution in cellular small RNA population of papaya.

    PubMed

    Aryal, Rishi; Yang, Xiaozeng; Yu, Qingyi; Sunkar, Ramanjulu; Li, Lei; Ming, Ray

    2012-12-05

    The small RNAs (sRNA) are a regulatory class of RNA mainly represented by the 21 and 24-nucleotide size classes. The cellular sRNAs are processed by RNase III family enzyme dicer (Dicer like in plant) from a self-complementary hairpin loop or other type of RNA duplexes. The papaya genome has been sequenced, but its microRNAs and other regulatory RNAs are yet to be analyzed. We analyzed the genomic features of the papaya sRNA population from three sRNA deep sequencing libraries made from leaves, flowers, and leaves infected with Papaya Ringspot Virus (PRSV). We also used the deep sequencing data to annotate the micro RNA (miRNA) in papaya. We identified 60 miRNAs, 24 of which were conserved in other species, and 36 of which were novel miRNAs specific to papaya. In contrast to the Chargaff's purine-pyrimidine equilibrium, cellular sRNA was significantly biased towards a purine rich population. Of the two purine bases, higher frequency of adenine was present in 23nt or longer sRNAs, while 22nt or shorter sRNAs were over represented by guanine bases. However, this bias was not observed in the annotated miRNAs in plants. The 21nt species were expressed from fewer loci but expressed at higher levels relative to the 24nt species. The highly expressed 21nt species were clustered in a few isolated locations of the genome. The PRSV infected leaves showed higher accumulation of 21 and 22nt sRNA compared to uninfected leaves. We observed higher accumulation of miRNA* of seven annotated miRNAs in virus-infected tissue, indicating the potential function of miRNA* under stressed conditions. We have identified 60 miRNAs in papaya. Our study revealed the asymmetric purine-pyrimidine distribution in cellular sRNA population. The 21nt species of sRNAs have higher expression levels than 24nt sRNA. The miRNA* of some miRNAs shows higher accumulation in PRSV infected tissues, suggesting that these strands are not totally functionally redundant. The findings open a new avenue for further investigation of the sRNA silencing pathway in plants.

  9. Position-specific binding of FUS to nascent RNA regulates mRNA length

    PubMed Central

    Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

    2015-01-01

    More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189

  10. Deep sequencing detects very-low-grade somatic mosaicism in the unaffected mother of siblings with nemaline myopathy.

    PubMed

    Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi

    2014-07-01

    When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Insights about minority HIV-1 strains in transmitted drug resistance mutation dynamics and disease progression.

    PubMed

    Leda, Ana Rachel; Hunter, James; Oliveira, Ursula Castro; Azevedo, Inacio Junqueira; Sucupira, Maria Cecilia Araripe; Diaz, Ricardo Sobhie

    2018-04-19

    The presence of minority transmitted drug resistance mutations was assessed using ultra-deep sequencing and correlated with disease progression among recently HIV-1-infected individuals from Brazil. Samples at baseline during recent infection and 1 year after the establishment of the infection were analysed. Viral RNA and proviral DNA from 25 individuals were subjected to ultra-deep sequencing of the reverse transcriptase and protease regions of HIV-1. Viral strains carrying transmitted drug resistance mutations were detected in 9 out of the 25 patients, for all major antiretroviral classes, ranging from one to five mutations per patient. Ultra-deep sequencing detected strains with frequencies as low as 1.6% and only strains with frequencies >20% were detected by population plasma sequencing (three patients). Transmitted drug resistance strains with frequencies <14.8% did not persist upon established infection. The presence of transmitted drug resistance mutations was negatively correlated with the viral load and with CD4+ T cell count decay. Transmitted drug resistance mutations representing small percentages of the viral population do not persist during infection because they are negatively selected in the first year after HIV-1 seroconversion.

  12. GenomeGems: evaluation of genetic variability from deep sequencing data

    PubMed Central

    2012-01-01

    Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151

  13. Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.

    PubMed

    Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M

    2012-06-01

    Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  14. ComplexContact: a web server for inter-protein contact prediction using deep learning.

    PubMed

    Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

    2018-05-22

    ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

  15. Mixed heterolobosean and novel gregarine lineage genes from culture ATCC 50646: Long-branch artefacts, not lateral gene transfer, distort α-tubulin phylogeny.

    PubMed

    Cavalier-Smith, Thomas

    2015-04-01

    Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.

  16. De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case

    PubMed Central

    Setoh, Yin Xiang; Prow, Natalie A.; Peng, Nias; Hugo, Leon E.; Devine, Gregor; Hazlewood, Jessamine E.

    2017-01-01

    ABSTRACT Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero. Herein, we describe the de novo generation of a new ZIKV isolate, ZIKVNatal, using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKVNatal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKVNatal could be used to establish a fetal brain infection model in IFNAR−/− mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKVNatal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKVNatal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus’s ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses. PMID:28529976

  17. Discovery of Pod Shatter-Resistant Associated SNPs by Deep Sequencing of a Representative Library Followed by Bulk Segregant Analysis in Rapeseed

    PubMed Central

    Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong

    2012-01-01

    Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop species. PMID:22529909

  18. De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case.

    PubMed

    Setoh, Yin Xiang; Prow, Natalie A; Peng, Nias; Hugo, Leon E; Devine, Gregor; Hazlewood, Jessamine E; Suhrbier, Andreas; Khromykh, Alexander A

    2017-01-01

    Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero . Herein, we describe the de novo generation of a new ZIKV isolate, ZIKV Natal , using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKV Natal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKV Natal could be used to establish a fetal brain infection model in IFNAR -/- mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKV Natal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKV Natal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus's ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses.

  19. High resolution chronology of late Cretaceous-early Tertiary events determined from 21,000 yr orbital-climatic cycles in marine sediments

    NASA Technical Reports Server (NTRS)

    Herbert, Timothy D.; Dhondt, Steven

    1988-01-01

    A number of South Atlantic sites cored by the Deep Sea Drilling Project (DSDP) recovered late Cretaceous and early Tertiary sediments with alternating light-dark, high-low carbonate content. The sedimentary oscillations were turned into time series by digitizing color photographs of core segments at a resolution of about 5 points/cm. Spectral analysis of these records indicates prominent periodicity at 25 to 35 cm in the Cretaceous intervals, and about 15 cm in the early Tertiary sediments. The absolute period of the cycles that is determined from paleomagnetic calibration at two sites is 20,000 to 25,000 yr, and almost certainly corresponds to the period of the earth's precessional cycle. These sequences therefore contain an internal chronometer to measure events across the K/T extinction boundary at this scale of resolution. The orbital metronome was used to address several related questions: the position of the K/T boundary within magnetic chron 29R, the fluxes of biogenic and detrital material to the deep sea immediately before and after the K/T event, the duration of the Sr anomaly, and the level of background climatic variability in the latest Cretaceous time. The carbonate/color cycles that were analyzed contain primary records of ocean carbonate productivity and chemistry, as evidenced by bioturbational mixing of adjacent beds and the weak lithification of the rhythmic sequences. It was concluded that sedimentary sequences that contain orbital cyclicity are capable of providing resolution of dramatic events in earth history with much greater precision than obtainable through radiometric methods. The data show no evidence for a gradual climatic deterioration prior to the K/T extinction event, and argue for a geologically rapid revolution at this horizon.

  20. Next-generation sequencing identification of pathogenic bacterial genes and their relationship with fecal indicator bacteria in different water sources in the Kathmandu Valley, Nepal.

    PubMed

    Ghaju Shrestha, Rajani; Tanaka, Yasuhiro; Malla, Bikash; Bhandari, Dinesh; Tandukar, Sarmila; Inoue, Daisuke; Sei, Kazunari; Sherchand, Jeevan B; Haramoto, Eiji

    2017-12-01

    Bacteriological analysis of drinking water leads to detection of only conventional fecal indicator bacteria. This study aimed to explore and characterize bacterial diversity, to understand the extent of pathogenic bacterial contamination, and to examine the relationship between pathogenic bacteria and fecal indicator bacteria in different water sources in the Kathmandu Valley, Nepal. Sixteen water samples were collected from shallow dug wells (n=12), a deep tube well (n=1), a spring (n=1), and rivers (n=2) in September 2014 for 16S rRNA gene next-generation sequencing. A total of 525 genera were identified, of which 81 genera were classified as possible pathogenic bacteria. Acinetobacter, Arcobacter, and Clostridium were detected with a relatively higher abundance (>0.1% of total bacterial genes) in 16, 13, and 5 of the 16 samples, respectively, and the highest abundance ratio of Acinetobacter (85.14%) was obtained in the deep tube well sample. Furthermore, the bla OXA23-like genes of Acinetobacter were detected using SYBR Green-based quantitative PCR in 13 (35%) of 37 water samples, including the 16 samples that were analyzed for next-generation sequencing, with concentrations ranging 5.3-7.5logcopies/100mL. There was no sufficient correlation found between fecal indicator bacteria, such as Escherichia coli and total coliforms, and potential pathogenic bacteria, as well as the bla OXA23-like gene of Acinetobacter. These results suggest the limitation of using conventional fecal indicator bacteria in evaluating the pathogenic bacteria contamination of different water sources in the Kathmandu Valley. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Deep sequencing-based analysis of the anaerobic stimulon in Neisseria gonorrhoeae

    PubMed Central

    2011-01-01

    Background Maintenance of an anaerobic denitrification system in the obligate human pathogen, Neisseria gonorrhoeae, suggests that an anaerobic lifestyle may be important during the course of infection. Furthermore, mounting evidence suggests that reduction of host-produced nitric oxide has several immunomodulary effects on the host. However, at this point there have been no studies analyzing the complete gonococcal transcriptome response to anaerobiosis. Here we performed deep sequencing to compare the gonococcal transcriptomes of aerobically and anaerobically grown cells. Using the information derived from this sequencing, we discuss the implications of the robust transcriptional response to anaerobic growth. Results We determined that 198 chromosomal genes were differentially expressed (~10% of the genome) in response to anaerobic conditions. We also observed a large induction of genes encoded within the cryptic plasmid, pJD1. Validation of RNA-seq data using translational-lacZ fusions or RT-PCR demonstrated the RNA-seq results to be very reproducible. Surprisingly, many genes of prophage origin were induced anaerobically, as well as several transcriptional regulators previously unknown to be involved in anaerobic growth. We also confirmed expression and regulation of a small RNA, likely a functional equivalent of fnrS in the Enterobacteriaceae family. We also determined that many genes found to be responsive to anaerobiosis have also been shown to be responsive to iron and/or oxidative stress. Conclusions Gonococci will be subject to many forms of environmental stress, including oxygen-limitation, during the course of infection. Here we determined that the anaerobic stimulon in gonococci was larger than previous studies would suggest. Many new targets for future research have been uncovered, and the results derived from this study may have helped to elucidate factors or mechanisms of virulence that may have otherwise been overlooked. PMID:21251255

  2. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.

    PubMed

    Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry

    2018-03-01

    We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.

  3. Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

    PubMed Central

    Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong

    2014-01-01

    Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372

  4. Unified Deep Learning Architecture for Modeling Biology Sequence.

    PubMed

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  5. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

  6. Deep-sequencing to resolve complex diversity of apicomplexan parasites in platypuses and echidnas: Proof of principle for wildlife disease investigation.

    PubMed

    Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard

    2017-11-01

    The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products.

    PubMed

    Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K

    2016-01-01

    Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.

  8. Persistence and evolution of allergen-specific IgE repertoires during subcutaneous specific immunotherapy

    PubMed Central

    Levin, Mattias; King, Jasmine J.; Glanville, Jacob; Jackson, Katherine J. L.; Looney, Timothy J.; Hoh, Ramona A.; Mari, Adriano; Andersson, Morgan; Greiff, Lennart; Fire, Andrew Z.; Boyd, Scott D.; Ohlin, Mats

    2016-01-01

    Background Specific immunotherapy (SIT) is the only treatment with proven long-term curative potential in allergic disease. Allergen-specific IgE is the causative agent of allergic disease, and antibodies contribute to SIT, but the effects of SIT on aeroallergen-specific B cell repertoires are not well understood. Objective To characterize the IgE sequences expressed by allergen-specific B cells, and track the fate of these B cell clones during SIT. Methods We have used high-throughput antibody gene sequencing and identification of allergen-specific IgE using combinatorial antibody fragment library technology to analyze immunoglobulin repertoires of blood and nasal mucosa of aeroallergen-sensitized individuals before and during the first year of subcutaneous SIT. Results Of 52 distinct allergen-specific IgE heavy chains from eight allergic donors, 37 were also detected by high-throughput antibody gene sequencing of blood, nasal mucosa, or both sample types. The allergen-specific clones had increased persistence, higher likelihood of belonging to clones expressing other switched isotypes, and possibly larger clone size than the rest of the IgE repertoire. Clone members in nasal tissue showed close mutational relationships. Conclusion Combining functional binding studies, deep antibody repertoire sequencing, and information on clinical outcomes in larger studies may in the future aid assessment of SIT mechanisms and efficacy. PMID:26559321

  9. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    PubMed Central

    Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  10. Comparison of magnetic resonance imaging sequences for depicting the subthalamic nucleus for deep brain stimulation.

    PubMed

    Nagahama, Hiroshi; Suzuki, Kengo; Shonai, Takaharu; Aratani, Kazuki; Sakurai, Yuuki; Nakamura, Manami; Sakata, Motomichi

    2015-01-01

    Electrodes are surgically implanted into the subthalamic nucleus (STN) of Parkinson's disease patients to provide deep brain stimulation. For ensuring correct positioning, the anatomic location of the STN must be determined preoperatively. Magnetic resonance imaging has been used for pinpointing the location of the STN. To identify the optimal imaging sequence for identifying the STN, we compared images produced with T2 star-weighted angiography (SWAN), gradient echo T2*-weighted imaging, and fast spin echo T2-weighted imaging in 6 healthy volunteers. Our comparison involved measurement of the contrast-to-noise ratio (CNR) for the STN and substantia nigra and a radiologist's interpretations of the images. Of the sequences examined, the CNR and qualitative scores were significantly higher on SWAN images than on other images (p < 0.01) for STN visualization. Kappa value (0.74) on SWAN images was the highest in three sequences for visualizing the STN. SWAN is the sequence best suited for identifying the STN at the present time.

  11. Deep sequencing methods for protein engineering and design.

    PubMed

    Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

    2017-08-01

    The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh].

    PubMed

    Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K

    2011-01-20

    Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.

  13. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

    PubMed Central

    2011-01-01

    Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263

  14. Culturable prokaryotic diversity of deep, gas hydrate sediments: first use of a continuous high-pressure, anaerobic, enrichment and isolation system for subseafloor sediments (DeepIsoBUG)

    PubMed Central

    Parkes, R John; Sellek, Gerard; Webster, Gordon; Martin, Derek; Anders, Erik; Weightman, Andrew J; Sass, Henrik

    2009-01-01

    Deep subseafloor sediments may contain depressurization-sensitive, anaerobic, piezophilic prokaryotes. To test this we developed the DeepIsoBUG system, which when coupled with the HYACINTH pressure-retaining drilling and core storage system and the PRESS core cutting and processing system, enables deep sediments to be handled without depressurization (up to 25 MPa) and anaerobic prokaryotic enrichments and isolation to be conducted up to 100 MPa. Here, we describe the system and its first use with subsurface gas hydrate sediments from the Indian Continental Shelf, Cascadia Margin and Gulf of Mexico. Generally, highest cell concentrations in enrichments occurred close to in situ pressures (14 MPa) in a variety of media, although growth continued up to at least 80 MPa. Predominant sequences in enrichments were Carnobacterium, Clostridium, Marinilactibacillus and Pseudomonas, plus Acetobacterium and Bacteroidetes in Indian samples, largely independent of media and pressures. Related 16S rRNA gene sequences for all of these Bacteria have been detected in deep, subsurface environments, although isolated strains were piezotolerant, being able to grow at atmospheric pressure. Only the Clostridium and Acetobacterium were obligate anaerobes. No Archaea were enriched. It may be that these sediment samples were not deep enough (total depth 1126–1527 m) to obtain obligate piezophiles. PMID:19694787

  15. Targeted parallel sequencing of the Musa species: searching for an alternative model system for polyploidy studies

    USDA-ARS?s Scientific Manuscript database

    Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...

  16. 3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing

    PubMed Central

    2013-01-01

    Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768

  17. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  18. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  19. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE

    PubMed Central

    2011-01-01

    Background The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. Results We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress. Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. Conclusions This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE. PMID:21320317

  20. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE.

    PubMed

    Molina, Carlos; Zaman-Allah, Mainassara; Khan, Faheema; Fatnassi, Nadia; Horres, Ralf; Rotter, Björn; Steinhauer, Diana; Amenc, Laurie; Drevon, Jean-Jacques; Winter, Peter; Kahl, Günter

    2011-02-14

    The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress.Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE.

  1. Optical Communications Channel Combiner

    NASA Technical Reports Server (NTRS)

    Quirk, Kevin J.; Quirk, Kevin J.; Nguyen, Danh H.; Nguyen, Huy

    2012-01-01

    NASA has identified deep-space optical communications links as an integral part of a unified space communication network in order to provide data rates in excess of 100 Mb/s. The distances and limited power inherent in a deep-space optical downlink necessitate the use of photon-counting detectors and a power-efficient modulation such as pulse position modulation (PPM). For the output of each photodetector, whether from a separate telescope or a portion of the detection area, a communication receiver estimates a log-likelihood ratio for each PPM slot. To realize the full effective aperture of these receivers, their outputs must be combined prior to information decoding. A channel combiner was developed to synchronize the log-likelihood ratio (LLR) sequences of multiple receivers, and then combines these into a single LLR sequence for information decoding. The channel combiner synchronizes the LLR sequences of up to three receivers and then combines these into a single LLR sequence for output. The channel combiner has three channel inputs, each of which takes as input a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The cross-correlation between the channels LLR time series are calculated and used to synchronize the sequences prior to combining. The output of the channel combiner is a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The unit is controlled through a 1 Gb/s Ethernet UDP/IP interface. A deep-space optical communication link has not yet been demonstrated. This ground-station channel combiner was developed to demonstrate this capability and is unique in its ability to process such a signal.

  2. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases. PMID:24651478

  3. The 3-D aftershock distribution of three recent M5~5.5 earthquakes in the Anza region,California

    NASA Astrophysics Data System (ADS)

    Zhang, Q.; Wdowinski, S.; Lin, G.

    2011-12-01

    The San Jacinto fault zone (SJFZ) exhibits the highest level of seismicity compared to other regions in southern California. On average, it produces four earthquakes per day, most of them at depth of 10-17 km. Over the past decade, an increasing seismic activity occurred in the Anza region, which included three M5~5.5 events and their aftershock sequences. These events occurred in 2001, 2005, and 2010. In this research we map the 3-D distribution of these three events to evaluate their rupture geometry and better understand the unusual deep seismic pattern along the SJFZ, which was termed "deep creep" (Wdowinski, 2009). We relocated 97,562 events from 1981 to 2011 in Anza region by applying the Source-Specific Station Term (SSST) method (Lin et al., 2006) and used an accurate 1-D velocity model derived from 3-D model of Lin et al (2007) and used In order to separate the aftershock sequence from background seismicity, we characterized each of the three aftershock sequences using Omori's law. Preliminary results show that all three sequences had a similar geometry of deep elongated aftershock distribution. Most aftershocks occurred at depth of 10-17 km and extended over a 70 km long segments of the SJFZ, centered at the mainshock hypocenters. A comparative study of other M5~5.5 mainshocks and their aftershock sequences in southern California reveals very different geometrical pattern, suggesting that the three Anza M5~5.5 events are unique and can be indicative of "deep creep" deformation processes. Reference 1.Lin, G.and Shearer,P.M.,2006, The COMPLOC earthquake location package,Seism. Res. Lett.77, pp.440-444. 2.Lin, G. and Shearer, P.M., Hauksson, E., and Thurber C.H.,2007, A three-dimensional crustal seismic velocity model for southern California from a composite event method,J. Geophys.Res.112, B12306, doi: 10.1029/ 2007JB004977. 3.Wdowinski, S. ,2009, Deep creep as a cause for the excess seismicity along the San Jacinto fault, Nat. Geosci.,doi:10.1038/NGEO684.

  4. Effects of hydrostatic pressure on yeasts isolated from deep-sea hydrothermal vents.

    PubMed

    Burgaud, Gaëtan; Hué, Nguyen Thi Minh; Arzur, Danielle; Coton, Monika; Perrier-Cornet, Jean-Marie; Jebbar, Mohamed; Barbier, Georges

    2015-11-01

    Hydrostatic pressure plays a significant role in the distribution of life in the biosphere. Knowledge of deep-sea piezotolerant and (hyper)piezophilic bacteria and archaea diversity has been well documented, along with their specific adaptations to cope with high hydrostatic pressure (HHP). Recent investigations of deep-sea microbial community compositions have shown unexpected micro-eukaryotic communities, mainly dominated by fungi. Molecular methods such as next-generation sequencing have been used for SSU rRNA gene sequencing to reveal fungal taxa. Currently, a difficult but fascinating challenge for marine mycologists is to create deep-sea marine fungus culture collections and assess their ability to cope with pressure. Indeed, although there is no universal genetic marker for piezoresistance, physiological analyses provide concrete relevant data for estimating their adaptations and understanding the role of fungal communities in the abyss. The present study investigated morphological and physiological responses of fungi to HHP using a collection of deep-sea yeasts as a model. The aim was to determine whether deep-sea yeasts were able to tolerate different HHP and if they were metabolically active. Here we report an unexpected taxonomic-based dichotomic response to pressure with piezosensitve ascomycetes and piezotolerant basidiomycetes, and distinct morphological switches triggered by pressure for certain strains. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  5. When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

    PubMed

    Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J

    2015-09-15

    As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Unique microbial community in drilling fluids from Chinese continental scientific drilling

    USGS Publications Warehouse

    Zhang, Gengxin; Dong, Hailiang; Jiang, Hongchen; Xu, Zhiqin; Eberl, Dennis D.

    2006-01-01

    Circulating drilling fluid is often regarded as a contamination source in investigations of subsurface microbiology. However, it also provides an opportunity to sample geological fluids at depth and to study contained microbial communities. During our study of deep subsurface microbiology of the Chinese Continental Scientific Deep drilling project, we collected 6 drilling fluid samples from a borehole from 2290 to 3350 m below the land surface. Microbial communities in these samples were characterized with cultivation-dependent and -independent techniques. Characterization of 16S rRNA genes indicated that the bacterial clone sequences related to Firmicutes became progressively dominant with increasing depth. Most sequences were related to anaerobic, thermophilic, halophilic or alkaliphilic bacteria. These habitats were consistent with the measured geochemical characteristics of the drilling fluids that have incorporated geological fluids and partly reflected the in-situ conditions. Several clone types were closely related to Thermoanaerobacter ethanolicus, Caldicellulosiruptor lactoaceticus, and Anaerobranca gottschalkii, an anaerobic metal-reducer, an extreme thermophile, and an anaerobic chemoorganotroph, respectively, with an optimal growth temperature of 50–68°C. Seven anaerobic, thermophilic Fe(III)-reducing bacterial isolates were obtained and they were capable of reducing iron oxide and clay minerals to produce siderite, vivianite, and illite. The archaeal diversity was low. Most archaeal sequences were not related to any known cultivated species, but rather to environmental clone sequences recovered from subsurface environments. We infer that the detected microbes were derived from geological fluids at depth and their growth habitats reflected the deep subsurface conditions. These findings have important implications for microbial survival and their ecological functions in the deep subsurface.

  7. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    PubMed

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  8. Paleocene Wilcox cross-shelf channel-belt history and shelf-margin growth: Key to Gulf of Mexico sediment delivery

    NASA Astrophysics Data System (ADS)

    Zhang, Jinyu; Steel, Ronald; Ambrose, William

    2017-12-01

    Shelf margins prograde and aggrade by the incremental addition of deltaic sediments supplied from river channel belts and by stored shoreline sediment. This paper documents the shelf-edge trajectory and coeval channel belts for a segment of Paleocene Lower Wilcox Group in the northern Gulf of Mexico based on 400 wireline logs and 300 m of whole cores. By quantitatively analyzing these data and comparing them with global databases, we demonstrate how varying sediment supply impacted the Wilcox shelf-margin growth and deep-water sediment dispersal under greenhouse eustatic conditions. The coastal plain to marine topset and uppermost continental slope succession of the Lower Wilcox shelf-margin sediment prism is divided into eighteen high-frequency ( 300 ky duration) stratigraphic sequences, and further grouped into 5 sequence sets (labeled as A-E from bottom to top). Sequence Set A is dominantly muddy slope deposits. The shelf edge of Sequence Sets B and C prograded rapidly (> 10 km/Ma) and aggraded modestly (< 80 m/Ma). The coeval channel belts are relatively large (individually averaging 11-13 m thick) and amalgamated. The water discharge of Sequence Sets B and C rivers, estimated by channel-belt thickness, bedform type, and grain size, is 7000-29,000 m3/s, considered as large rivers when compared with modern river databases. In contrast, slow progradation (< 10 km/Ma) and rapid aggradation (> 80 m/Ma) characterizes Sequence Sets D and E, which is associated with smaller (9-10 m thick on average) and isolated channel belts. This stratigraphic trend is likely due to an upward decreasing sediment supply indicated by the shelf-edge progradation rate and channel size, as well as an upward increasing shelf accommodation indicated by the shelf-edge aggradation rate. The rapid shelf-edge progradation and large rivers in Sequence Sets B and C confirm earlier suggestions that it was the early phase of Lower Wilcox dispersal that brought the largest deep-water sediment volumes into the Gulf of Mexico. Key factors in this Lower Wilcox stratigraphic trend are likely to have been a very high initial sediment flux to the Gulf because of the high initial release of sediment from Laramide catchments to the north and northwest, possibly aided by modest eustatic sea-level fall on the Texas shelf, which is suggested by the early, flat shelf-edge trajectory, high amalgamation of channel belts, and the low overall aggradation rate of the Sequence Sets B and C.

  9. A comprehensive survey of 3' animal miRNA modification events and a possible role for 3' adenylation in modulating miRNA targeting effectiveness.

    PubMed

    Burroughs, A Maxwell; Ando, Yoshinari; de Hoon, Michiel J L; Tomaru, Yasuhiro; Nishibu, Takahiro; Ukekawa, Ryo; Funakoshi, Taku; Kurokawa, Tsutomu; Suzuki, Harukazu; Hayashizaki, Yoshihide; Daub, Carsten O

    2010-10-01

    Animal microRNA sequences are subject to 3' nucleotide addition. Through detailed analysis of deep-sequenced short RNA data sets, we show adenylation and uridylation of miRNA is globally present and conserved across Drosophila and vertebrates. To better understand 3' adenylation function, we deep-sequenced RNA after knockdown of nucleotidyltransferase enzymes. The PAPD4 nucleotidyltransferase adenylates a wide range of miRNA loci, but adenylation does not appear to affect miRNA stability on a genome-wide scale. Adenine addition appears to reduce effectiveness of miRNA targeting of mRNA transcripts while deep-sequencing of RNA bound to immunoprecipitated Argonaute (AGO) subfamily proteins EIF2C1-EIF2C3 revealed substantial reduction of adenine addition in miRNA associated with EIF2C2 and EIF2C3. Our findings show 3' addition events are widespread and conserved across animals, PAPD4 is a primary miRNA adenylating enzyme, and suggest a role for 3' adenine addition in modulating miRNA effectiveness, possibly through interfering with incorporation into the RNA-induced silencing complex (RISC), a regulatory role that would complement the role of miRNA uridylation in blocking DICER1 uptake.

  10. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.

  11. Maximum entropy methods for extracting the learned features of deep neural networks.

    PubMed

    Finnegan, Alex; Song, Jun S

    2017-10-01

    New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.

  12. Sedimentary modeling and analysis of petroleum system of the upper Tertiary sequences in southern Ulleung sedimentary Basin, East Sea (Sea of Japan)

    NASA Astrophysics Data System (ADS)

    Cheong, D.; Kim, D.; Kim, Y.

    2010-12-01

    The block 6-1 located in the southwestern margin of the Ulleung basin, East Sea (Sea of Japan) is an area where recently produces commercial natural gas and condensate. A total of 17 exploratory wells have been drilled, and also many seismic explorations have been carried out since early 1970s. Among the wells and seismic sections, the Gorae 1 well and a seismic section through the Gorae 1-2 well were chosen for this simulation work. Then, a 2-D graphic simulation using SEDPAK elucidates the evolution, burial history and diagenesis of the sedimentary sequence. The study area is a suitable place for modeling a petroleum system and evaluating hydrocarbon potential of reservoir. Shale as a source rock is about 3500m deep from sea floor, and sandstones interbedded with thin mud layers are distributed as potential reservoir rocks from 3,500m to 2,000m deep. On top of that, shales cover as seal rocks and overburden rocks upto 900m deep. Input data(sea level, sediment supply, subsidence rate, etc) for the simulation was taken from several previous published papers including the well and seismic data, and the thermal maturity of the sediment was calculated from known thermal gradient data. In this study area, gas and condensate have been found and commercially produced, and the result of the simulation also shows that there is a gas window between 4000m and 6000m deep, so that three possible interpretations can be inferred from the simulation result. First, oil has already moved and gone to the southeastern area along uplifting zones. Or second, oil has never been generated because organic matter is kerogen type 3, and or finally, generated oil has been converted into gas by thermally overcooking. SEDPAK has an advantage that it provides the timing and depth information of generated oil and gas with TTI values even though it has a limit which itself can not perform geochemical modeling to analyze thermal maturity level of source rocks. Based on the result of our simulation, added exploratory wells are required to discover deeper gas located in the study area.

  13. Toward a real-time system for temporal enhanced ultrasound-guided prostate biopsy.

    PubMed

    Azizi, Shekoofeh; Van Woudenberg, Nathan; Sojoudi, Samira; Li, Ming; Xu, Sheng; Abu Anas, Emran M; Yan, Pingkun; Tahmasebi, Amir; Kwak, Jin Tae; Turkbey, Baris; Choyke, Peter; Pinto, Peter; Wood, Bradford; Mousavi, Parvin; Abolmaesumi, Purang

    2018-03-27

    We have previously proposed temporal enhanced ultrasound (TeUS) as a new paradigm for tissue characterization. TeUS is based on analyzing a sequence of ultrasound data with deep learning and has been demonstrated to be successful for detection of cancer in ultrasound-guided prostate biopsy. Our aim is to enable the dissemination of this technology to the community for large-scale clinical validation. In this paper, we present a unified software framework demonstrating near-real-time analysis of ultrasound data stream using a deep learning solution. The system integrates ultrasound imaging hardware, visualization and a deep learning back-end to build an accessible, flexible and robust platform. A client-server approach is used in order to run computationally expensive algorithms in parallel. We demonstrate the efficacy of the framework using two applications as case studies. First, we show that prostate cancer detection using near-real-time analysis of RF and B-mode TeUS data and deep learning is feasible. Second, we present real-time segmentation of ultrasound prostate data using an integrated deep learning solution. The system is evaluated for cancer detection accuracy on ultrasound data obtained from a large clinical study with 255 biopsy cores from 157 subjects. It is further assessed with an independent dataset with 21 biopsy targets from six subjects. In the first study, we achieve area under the curve, sensitivity, specificity and accuracy of 0.94, 0.77, 0.94 and 0.92, respectively, for the detection of prostate cancer. In the second study, we achieve an AUC of 0.85. Our results suggest that TeUS-guided biopsy can be potentially effective for the detection of prostate cancer.

  14. The mitochondrial genome of Ifremeria nautilei and the phylogenetic position of the enigmatic deep-sea Abyssochrysoidea (Mollusca: Gastropoda).

    PubMed

    Osca, David; Templado, José; Zardoya, Rafael

    2014-09-01

    The complete nucleotide sequence of the mitochondrial (mt) genome of the deep-sea vent snail Ifremeria nautilei (Gastropoda: Abyssochrysoidea) was determined. The double stranded circular molecule is 15,664 pb in length and encodes for the typical 37 metazoan mitochondrial genes. The gene arrangement of the Ifremeria mt genome is most similar to genome organization of caenogastropods and differs only on the relative position of the trnW gene. The deduced amino acid sequences of the mt protein coding genes of Ifremeria mt genome were aligned with orthologous sequences from representatives of the main lineages of gastropods and phylogenetic relationships were inferred. The reconstructed phylogeny supports that Ifremeria belongs to Caenogastropoda and that it is closely related to hypsogastropod superfamilies. Results were compared with a reconstructed nuclear-based phylogeny. Moreover, a relaxed molecular-clock timetree calibrated with fossils dated the divergence of Abyssochrysoidea in the Late Jurassic-Early Cretaceous indicating a relatively modern colonization of deep-sea environments by these snails. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. A comprehensive framework for functional diversity patterns of marine chromophytic phytoplankton using rbcL phylogeny

    PubMed Central

    Samanta, Brajogopal; Bhadury, Punyasloke

    2016-01-01

    Marine chromophytes are taxonomically diverse group of algae and contribute approximately half of the total oceanic primary production. To understand the global patterns of functional diversity of chromophytic phytoplankton, robust bioinformatics and statistical analyses including deep phylogeny based on 2476 form ID rbcL gene sequences representing seven ecologically significant oceanographic ecoregions were undertaken. In addition, 12 form ID rbcL clone libraries were generated and analyzed (148 sequences) from Sundarbans Biosphere Reserve representing the world’s largest mangrove ecosystem as part of this study. Global phylogenetic analyses recovered 11 major clades of chromophytic phytoplankton in varying proportions with several novel rbcL sequences in each of the seven targeted ecoregions. Majority of OTUs was found to be exclusive to each ecoregion, whereas some were shared by two or more ecoregions based on beta-diversity analysis. Present phylogenetic and bioinformatics analyses provide a strong statistical support for the hypothesis that different oceanographic regimes harbor distinct and coherent groups of chromophytic phytoplankton. It has been also shown as part of this study that varying natural selection pressure on form ID rbcL gene under different environmental conditions could lead to functional differences and overall fitness of chromophytic phytoplankton populations. PMID:26861415

  16. Seasonal and regional diversity of maple sap microbiota revealed using community PCR fingerprinting and 16S rRNA gene clone libraries.

    PubMed

    Filteau, Marie; Lagacé, Luc; LaPointe, Gisèle; Roy, Denis

    2010-04-01

    An arbitrary primed community PCR fingerprinting technique based on capillary electrophoresis was developed to study maple sap microbial community characteristics among 19 production sites in Québec over the tapping season. Presumptive fragment identification was made with corresponding fingerprint profiles of bacterial isolate cultures. Maple sap microbial communities were subsequently compared using a representative subset of 13 16S rRNA gene clone libraries followed by gene sequence analysis. Results from both methods indicated that all maple sap production sites and flow periods shared common microbiota members, but distinctive features also existed. Changes over the season in relative abundance of predominant populations showed evidence of a common pattern. Pseudomonas (64%) and Rahnella (8%) were the most abundantly and frequently represented genera of the 2239 sequences analyzed. Janthinobacterium, Leuconostoc, Lactococcus, Weissella, Epilithonimonas and Sphingomonas were revealed as occasional contaminants in maple sap. Maple sap microbiota showed a low level of deep diversity along with a high variation of similar 16S rRNA gene sequences within the Pseudomonas genus. Predominance of Pseudomonas is suggested as a typical feature of maple sap microbiota across geographical regions, production sites, and sap flow periods.

  17. w4CSeq: software and web application to analyze 4C-seq data.

    PubMed

    Cai, Mingyang; Gao, Fan; Lu, Wange; Wang, Kai

    2016-11-01

    Circularized Chromosome Conformation Capture followed by deep sequencing (4C-Seq) is a powerful technique to identify genome-wide partners interacting with a pre-specified genomic locus. Here, we present a computational and statistical approach to analyze 4C-Seq data generated from both enzyme digestion and sonication fragmentation-based methods. We implemented a command line software tool and a web interface called w4CSeq, which takes in the raw 4C sequencing data (FASTQ files) as input, performs automated statistical analysis and presents results in a user-friendly manner. Besides providing users with the list of candidate interacting sites/regions, w4CSeq generates figures showing genome-wide distribution of interacting regions, and sketches the enrichment of key features such as TSSs, TTSs, CpG sites and DNA replication timing around 4C sites. Users can establish their own web server by downloading source codes at https://github.com/WGLab/w4CSeq Additionally, a demo web server is available at http://w4cseq.wglab.org CONTACT: kaiwang@usc.edu or wangelu@usc.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Sensitive cell-based assay for determination of human immunodeficiency virus type 1 coreceptor tropism.

    PubMed

    Weber, Jan; Vazquez, Ana C; Winner, Dane; Gibson, Richard M; Rhea, Ariel M; Rose, Justine D; Wylie, Doug; Henry, Kenneth; Wright, Alison; King, Kevin; Archer, John; Poveda, Eva; Soriano, Vicente; Robertson, David L; Olivo, Paul D; Arts, Eric J; Quiñones-Mateu, Miguel E

    2013-05-01

    CCR5 antagonists are a powerful new class of antiretroviral drugs that require a companion assay to evaluate the presence of CXCR4-tropic (non-R5) viruses prior to use in human immunodeficiency virus (HIV)-infected individuals. In this study, we have developed, characterized, verified, and prevalidated a novel phenotypic test to determine HIV-1 coreceptor tropism (VERITROP) based on a sensitive cell-to-cell fusion assay. A proprietary vector was constructed containing a near-full-length HIV-1 genome with the yeast uracil biosynthesis (URA3) gene replacing the HIV-1 env coding sequence. Patient-derived HIV-1 PCR products were introduced by homologous recombination using an innovative yeast-based cloning strategy. The env-expressing vectors were then used in a cell-to-cell fusion assay to determine the presence of R5 and/or non-R5 HIV-1 variants within the viral population. Results were compared with (i) the original version of Trofile (Monogram Biosciences, San Francisco, CA), (ii) population sequencing, and (iii) 454 pyrosequencing, with the genotypic data analyzed using several bioinformatics tools, i.e., the 11/24/25 rule, Geno2Pheno (2% to 5.75%, 3.5%, or 10% false-positive rate [FPR]), and webPSSM. VERITROP consistently detected minority non-R5 variants from clinical specimens, with an analytical sensitivity of 0.3%, with viral loads of ≥1,000 copies/ml, and from B and non-B subtypes. In a pilot study, a 73.7% (56/76) concordance was observed with the original Trofile assay, with 19 of the 20 discordant results corresponding to non-R5 variants detected using VERITROP and not by the original Trofile assay. The degree of concordance of VERITROP and Trofile with population and deep sequencing results depended on the algorithm used to determine HIV-1 coreceptor tropism. Overall, VERITROP showed better concordance with deep sequencing/Geno2Pheno at a 0.3% detection threshold (67%), whereas Trofile matched better with population sequencing (79%). However, 454 sequencing using Geno2Pheno at a 10% FPR and 0.3% threshold and VERITROP more accurately predicted the success of a maraviroc-based regimen. In conclusion, VERITROP may promote the development of new HIV coreceptor antagonists and aid in the treatment and management of HIV-infected individuals prior to and/or during treatment with this class of drugs.

  19. Data file of a deep proteome analysis of the prefrontal cortex in aged mice with progranulin deficiency or neuronal overexpression of progranulin.

    PubMed

    Heidler, Juliana; Hardt, Stefanie; Wittig, Ilka; Tegeder, Irmgard

    2016-12-01

    Progranulin deficiency is associated with neurodegeneration in humans and in mice. The mechanisms likely involve progranulin-promoted removal of protein waste via autophagy. We performed a deep proteomic screen of the pre-frontal cortex in aged (13-15 months) female progranulin-deficient mice (GRN -/- ) and mice with inducible neuron-specific overexpression of progranulin (SLICK-GRN-OE) versus the respective control mice. Proteins were extracted and analyzed per liquid chromatography/mass spectrometry (LC/MS) on a Thermo Scientific™ Q Exactive Plus equipped with an ultra-high performance liquid chromatography unit and a Nanospray Flex Ion-Source. Full Scan MS-data were acquired using Xcalibur and raw files were analyzed using the proteomics software Max Quant. The mouse reference proteome set from uniprot (June 2015) was used to identify peptides and proteins. The DiB data file is a reduced MaxQuant output and includes peptide and protein identification, accession numbers, protein and gene names, sequence coverage and label free quantification (LFQ) values of each sample. Differences in protein expression in genotypes are presented in "Progranulin overexpression in sensory neurons attenuates neuropathic pain in mice: Role of autophagy" (C. Altmann, S. Hardt, C. Fischer, J. Heidler, H.Y. Lim, A. Haussler, B. Albuquerque, B. Zimmer, C. Moser, C. Behrends, F. Koentgen, I. Wittig, M.H. Schmidt, A.M. Clement, T. Deller, I. Tegeder, 2016) [1].

  20. A Statistical Guide to the Design of Deep Mutational Scanning Experiments

    PubMed Central

    Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia

    2016-01-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710

  1. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations.

    PubMed

    Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri

    2013-02-19

    Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.

  2. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    PubMed

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  3. Is Multitask Deep Learning Practical for Pharma?

    PubMed

    Ramsundar, Bharath; Liu, Bowen; Wu, Zhenqin; Verras, Andreas; Tudor, Matthew; Sheridan, Robert P; Pande, Vijay

    2017-08-28

    Multitask deep learning has emerged as a powerful tool for computational drug discovery. However, despite a number of preliminary studies, multitask deep networks have yet to be widely deployed in the pharmaceutical and biotech industries. This lack of acceptance stems from both software difficulties and lack of understanding of the robustness of multitask deep networks. Our work aims to resolve both of these barriers to adoption. We introduce a high-quality open-source implementation of multitask deep networks as part of the DeepChem open-source platform. Our implementation enables simple python scripts to construct, fit, and evaluate sophisticated deep models. We use our implementation to analyze the performance of multitask deep networks and related deep models on four collections of pharmaceutical data (three of which have not previously been analyzed in the literature). We split these data sets into train/valid/test using time and neighbor splits to test multitask deep learning performance under challenging conditions. Our results demonstrate that multitask deep networks are surprisingly robust and can offer strong improvement over random forests. Our analysis and open-source implementation in DeepChem provide an argument that multitask deep networks are ready for widespread use in commercial drug discovery.

  4. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less

  5. A ruby-colored Pseudobaeospora species is described as new from material collected on the island of Hawaii.

    PubMed

    Desjardin, Dennis E; Hemmes, Don E; Perry, Brian A

    2014-01-01

    Pseudobaeospora wipapatiae is described as new based on material collected in alien wet habitats on the island of Hawaii. Unique features of this beautiful species include deep ruby-colored basidiomes with two-spored basidia, amyloid cheilocystidia and a hymeniderm pileipellis with abundant pileocystidia that is initially deep ruby in KOH then changes to lilac gray. Phylogenetic analysis of nuclear large ribosomal subunit sequence data suggest a close relationship between Pseudobaeospora and Tricholoma. BLAST comparisons of internal transcribed spacer and 5.8S nuclear ribosomal subunit regions sequence data reveal greatest similarity with existing sequences of Pseudobaeospora species. A comprehensive description, color photograph, illustrations of salient micromorphological features and comparisons with phenetically similar taxa are provided. © 2014 by The Mycological Society of America.

  6. Analysis of fractures from borehole televiewer logs in a 500m deep hole at Xiaguan, Yunnan province, Southwest China

    USGS Publications Warehouse

    Zhai, Qingshan; Springer, J.E.; Zoback, M.D.

    1990-01-01

    Fractures from a 500 m deep hole in the Red River fault zone were analyzed using an ultrasonic borehole televiewer. Four hundred and eighty individual fractures were identified between 19 m and 465 m depth. Fracture frequency had no apparent relation to the major stratigraphic units and did not change systematically with depth. Fracture orientation, however, did change with stratigraphic position. The borehole intersected 14 m of Cenozoic deposits, 363 m of lower Ordovician clastic sediments, and 106 m of older ultramafic intrusions. The clastic sequence was encountered again at a depth of 484 m, suggesting a large fault displacement. Fractures in the top 162 m of the sedimentary section appear randomly distributed. Below that depth, they are steeply dipping with northerly and north-westerly strikes, parallel to the major active faults in the region. Fractures in the ultramafic section strike roughly eastwest and are steeply dipping. These orientations are confined to the ultramafic section and are parallel to an older, inactive regional fault set. ?? 1990.

  7. Analysis of Ribosome Stalling and Translation Elongation Dynamics by Deep Learning.

    PubMed

    Zhang, Sai; Hu, Hailin; Zhou, Jingtian; He, Xuan; Jiang, Tao; Zeng, Jianyang

    2017-09-27

    Ribosome stalling is manifested by the local accumulation of ribosomes at specific codon positions of mRNAs. Here, we present ROSE, a deep learning framework to analyze high-throughput ribosome profiling data and estimate the probability of a ribosome stalling event occurring at each genomic location. Extensive validation tests on independent data demonstrated that ROSE possessed higher prediction accuracy than conventional prediction models, with an increase in the area under the receiver operating characteristic curve by up to 18.4%. In addition, genome-wide statistical analyses showed that ROSE predictions can be well correlated with diverse putative regulatory factors of ribosome stalling. Moreover, the genome-wide ribosome stalling landscapes of both human and yeast computed by ROSE recovered the functional interplays between ribosome stalling and cotranslational events in protein biogenesis, including protein targeting by the signal recognition particles and protein secondary structure formation. Overall, our study provides a novel method to complement the ribosome profiling techniques and further decipher the complex regulatory mechanisms underlying translation elongation dynamics encoded in the mRNA sequence. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Deep Sequencing of Influenza A Virus from a Human Challenge Study Reveals a Selective Bottleneck and Only Limited Intrahost Genetic Diversification.

    PubMed

    Sobel Leonard, Ashley; McClain, Micah T; Smith, Gavin J D; Wentworth, David E; Halpin, Rebecca A; Lin, Xudong; Ransier, Amy; Stockwell, Timothy B; Das, Suman R; Gilbert, Anthony S; Lambkin-Williams, Robert; Ginsburg, Geoffrey S; Woods, Christopher W; Koelle, Katia

    2016-12-15

    Knowledge of influenza virus evolution at the point of transmission and at the intrahost level remains limited, particularly for human hosts. Here, we analyze a unique viral data set of next-generation sequencing (NGS) samples generated from a human influenza challenge study wherein 17 healthy subjects were inoculated with cell- and egg-passaged virus. Nasal wash samples collected from 7 of these subjects were successfully deep sequenced. From these, we characterized changes in the subjects' viral populations during infection and identified differences between the virus in these samples and the viral stock used to inoculate the subjects. We first calculated pairwise genetic distances between the subjects' nasal wash samples, the viral stock, and the influenza virus A/Wisconsin/67/2005 (H3N2) reference strain used to generate the stock virus. These distances revealed that considerable viral evolution occurred at various points in the human challenge study. Further quantitative analyses indicated that (i) the viral stock contained genetic variants that originated and likely were selected for during the passaging process, (ii) direct intranasal inoculation with the viral stock resulted in a selective bottleneck that reduced nonsynonymous genetic diversity in the viral hemagglutinin and nucleoprotein, and (iii) intrahost viral evolution continued over the course of infection. These intrahost evolutionary dynamics were dominated by purifying selection. Our findings indicate that rapid viral evolution can occur during acute influenza infection in otherwise healthy human hosts when the founding population size of the virus is large, as is the case with direct intranasal inoculation. Influenza viruses circulating among humans are known to rapidly evolve over time. However, little is known about how influenza virus evolves across single transmission events and over the course of a single infection. To address these issues, we analyze influenza virus sequences from a human challenge experiment that initiated infection with a cell- and egg-passaged viral stock, which appeared to have adapted during its preparation. We find that the subjects' viral populations differ genetically from the viral stock, with subjects' viral populations having lower representation of the amino-acid-changing variants that arose during viral preparation. We also find that most of the viral evolution occurring over single infections is characterized by further decreases in the frequencies of these amino-acid-changing variants and that only limited intrahost genetic diversification through new mutations is apparent. Our findings indicate that influenza virus populations can undergo rapid genetic changes during acute human infections. Copyright © 2016 Sobel Leonard et al.

  9. DeepSig: deep learning improves signal peptide detection in proteins.

    PubMed

    Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2018-05-15

    The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.

  10. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  11. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers.

    PubMed

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  12. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers

    PubMed Central

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  13. Molecular diversity and distribution pattern of ciliates in sediments from deep-sea hydrothermal vents in the Okinawa Trough and adjacent sea areas

    NASA Astrophysics Data System (ADS)

    Zhao, Feng; Xu, Kuidong

    2016-10-01

    In comparison with the macrobenthos and prokaryotes, patterns of diversity and distribution of microbial eukaryotes in deep-sea hydrothermal vents are poorly known. The widely used high-throughput sequencing of 18S rDNA has revealed a high diversity of microeukaryotes yielded from both living organisms and buried DNA in marine sediments. More recently, cDNA surveys have been utilized to uncover the diversity of active organisms. However, both methods have never been used to evaluate the diversity of ciliates in hydrothermal vents. By using high-throughput DNA and cDNA sequencing of 18S rDNA, we evaluated the molecular diversity of ciliates, a representative group of microbial eukaryotes, from the sediments of deep-sea hydrothermal vents in the Okinawa Trough and compared it with that of an adjacent deep-sea area about 15 km away and that of an offshore area of the Yellow Sea about 500 km away. The results of DNA sequencing showed that Spirotrichea and Oligohymenophorea were the most diverse and abundant groups in all the three habitats. The proportion of sequences of Oligohymenophorea was the highest in the hydrothermal vents whereas Spirotrichea was the most diverse group at all three habitats. Plagiopyleans were found only in the hydrothermal vents but with low diversity and abundance. By contrast, the cDNA sequencing showed that Plagiopylea was the most diverse and most abundant group in the hydrothermal vents, followed by Spirotrichea in terms of diversity and Oligohymenophorea in terms of relative abundance. A novel group of ciliates, distinctly separate from the 12 known classes, was detected in the hydrothermal vents, indicating undescribed, possibly highly divergent ciliates may inhabit this environment. Statistical analyses showed that: (i) the three habitats differed significantly from one another in terms of diversity of both the rare and the total ciliate taxa, and; (ii) the adjacent deep sea was more similar to the offshore area than to the hydrothermal vents. In terms of the diversity of abundant taxa, however, there was no significant difference between the hydrothermal vents and the adjacent deep sea, both of which differed significantly from the offshore area. As abundant ciliate taxa can be found in several sampling sites, they are likely adapted to large environmental variations, while rare taxa are found in specific habitat and thus are potentially more sensitive to varying environmental conditions.

  14. Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum

    DTIC Science & Technology

    2016-11-01

    DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial

  15. Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi

    PubMed Central

    Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain

    2017-01-01

    ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908

  16. High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

    USDA-ARS?s Scientific Manuscript database

    The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...

  17. Genealogy-based methods for inference of historical recombination and gene flow and their application in Saccharomyces cerevisiae.

    PubMed

    Jenkins, Paul A; Song, Yun S; Brem, Rachel B

    2012-01-01

    Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.

  18. Genealogy-Based Methods for Inference of Historical Recombination and Gene Flow and Their Application in Saccharomyces cerevisiae

    PubMed Central

    Jenkins, Paul A.; Song, Yun S.; Brem, Rachel B.

    2012-01-01

    Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance. PMID:23226196

  19. Insilico profiling of microRNAs in Korean ginseng (Panax ginseng Meyer)

    PubMed Central

    Mathiyalagan, Ramya; Subramaniyam, Sathiyamoorthy; Natarajan, Sathishkumar; Kim, Yeon Ju; Sun, Myung Suk; Kim, Se Young; Kim, Yu-Jin; Yang, Deok Chun

    2013-01-01

    MicroRNAs (miRNAs) are a class of recently discovered non-coding small RNA molecules, on average approximately 21 nucleotides in length, which underlie numerous important biological roles in gene regulation in various organisms. The miRNA database (release 18) has 18,226 miRNAs, which have been deposited from different species. Although miRNAs have been identified and validated in many plant species, no studies have been reported on discovering miRNAs in Panax ginseng Meyer, which is a traditionally known medicinal plant in oriental medicine, also known as Korean ginseng. It has triterpene ginseng saponins called ginsenosides, which are responsible for its various pharmacological activities. Predicting conserved miRNAs by homology-based analysis with available expressed sequence tag (EST) sequences can be powerful, if the species lacks whole genome sequence information. In this study by using the EST based computational approach, 69 conserved miRNAs belonging to 44 miRNA families were identified in Korean ginseng. The digital gene expression patterns of predicted conserved miRNAs were analyzed by deep sequencing using small RNA sequences of flower buds, leaves, and lateral roots. We have found that many of the identified miRNAs showed tissue specific expressions. Using the insilico method, 346 potential targets were identified for the predicted 69 conserved miRNAs by searching the ginseng EST database, and the predicted targets were mainly involved in secondary metabolic processes, responses to biotic and abiotic stress, and transcription regulator activities, as well as a variety of other metabolic processes. PMID:23717176

  20. Carbonate sedimentation in an extensional active margin: Cretaceous history of the Haymana region, Pontides

    NASA Astrophysics Data System (ADS)

    Okay, Aral I.; Altiner, Demir

    2016-10-01

    The Haymana region in Central Anatolia is located in the southern part of the Pontides close to the İzmir-Ankara suture. During the Cretaceous, the region formed part of the south-facing active margin of the Eurasia. The area preserves a nearly complete record of the Cretaceous system. Shallow marine carbonates of earliest Cretaceous age are overlain by a 700-m-thick Cretaceous sequence, dominated by deep marine limestones. Three unconformity-bounded pelagic carbonate sequences of Berriasian, Albian-Cenomanian and Turonian-Santonian ages are recognized: Each depositional sequence is preceded by a period of tilting and submarine erosion during the Berriasian, early Albian and late Cenomanian, which corresponds to phases of local extension in the active continental margin. Carbonate breccias mark the base of the sequences and each carbonate sequence steps down on older units. The deep marine carbonate deposition ended in the late Santonian followed by tilting, erosion and folding during the Campanian. Deposition of thick siliciclastic turbidites started in the late Campanian and continued into the Tertiary. Unlike most forearc basins, the Haymana region was a site of deep marine carbonate deposition until the Campanian. This was because the Pontide arc was extensional and the volcanic detritus was trapped in the intra-arc basins and did not reach the forearc or the trench. The extensional nature of the arc is also shown by the opening of the Black Sea as a backarc basin in the Turonian-Santonian. The carbonate sedimentation in an active margin is characterized by synsedimentary vertical displacements, which results in submarine erosion, carbonate breccias and in the lateral discontinuity of the sequences, and differs from blanket like carbonate deposition in the passive margins.

  1. Magnetostratigraphy of the impact breccias and post-impact carbonates from borehole Yaxcopoil-1, Chicxulub impact crater, Yucatán, Mexico

    NASA Astrophysics Data System (ADS)

    Rebolledo-Vieyra, Mario; Urrutia-Fucugauchi, Jaime

    2004-06-01

    We report the magnetostratigraphy of the sedimentary sequence between the impact breccias and the post-impact carbonate sequence conducted on samples recovered by Yaxcopoil-1 (Yax-1). Samples of impact breccias show reverse polarities that span up to ~56 cm into the postimpact carbonate lithologies. We correlate these breccias to those of PEMEX boreholes Yucatán-6 and Chicxulub-1, from which we tied our magnetostratigraphy to the radiometric age from a melt sample from the Yucatán-6 borehole. Thin section analyses of the carbonate samples showed a significant amount of dark minerals and glass shards that we identified as the magnetic carriers; therefore, we propose that the mechanism of magnetic acquisition within the carbonate rocks for the interval studied is detrital remanent magnetism (DRM). With these samples, we constructed the scale of geomagnetic polarities where we find two polarities within the sequence, a reverse polarity event within the impact breccias and the base of the post-impact carbonate sequence (up to 794.07 m), and a normal polarity event in the last ~20 cm of the interval studied. The polarities recorded in the sequence analyzed are interpreted to span from chron 29r to 29n, and we propose that the reverse polarity event lies within the 29r chron. The magnetostratigraphy of the sequence studied shows that the horizon at 794.11 m deep, interpreted as the K/T boundary, lies within the geomagnetic chron 29r, which contains the K/T boundary.

  2. Electro-thermo-mechanical coupling analysis of deep drawing with resistance heating for aluminum matrix composites sheet

    NASA Astrophysics Data System (ADS)

    Zhang, Kaifeng; Zhang, Tuoda; Wang, Bo

    2013-05-01

    Recently, electro-plastic forming to be a focus of attention in materials hot processing research area, because it is a sort of energy-saving, high efficient and green manufacturing technology. An electro-thermo-mechanical model can be adopted to carry out the sequence simulation of aluminum matrix composites sheet deep drawing via electro-thermal coupling and thermal-mechanical coupling method. The first step of process is resistance heating of sheet, then turn off the power, and the second step is deep drawing. Temperature distribution of SiCp/2024Al composite sheet by resistance heating and sheet deep drawing deformation were analyzed. During the simulation, effect of contact resistances, temperature coefficient of resistance for electrode material and SiCp/2024Al composite on temperature distribution were integrally considered. The simulation results demonstrate that Sicp/2024Al composite sheet can be rapidly heated to 400° in 30s using resistances heating and the sheet temperature can be controlled by adjusting the current density. Physical properties of the electrode materials can significantly affect the composite sheet temperature distribution. The temperature difference between the center and the side of the sheet is proportional to the thermal conductivity of the electrode, the principal cause of which is that the heat transfers from the sheet to the electrode. SiCp/2024Al thin-wall part can be intactly manufactured at strain rate of 0.08s-1 and the sheet thickness thinning rate is limited within 20%, which corresponds well to the experimental result.

  3. Mitochondrial DNA Analyses Indicate High Diversity, Expansive Population Growth and High Genetic Connectivity of Vent Copepods (Dirivultidae) across Different Oceans

    PubMed Central

    Kihara, Terue C.; Laurent, Stefan; Kodami, Sahar; Martinez Arbizu, Pedro

    2016-01-01

    Communities in spatially fragmented deep-sea hydrothermal vents rich in polymetallic sulfides could soon face major disturbance events due to deep-sea mineral mining, such that unraveling patterns of gene flow between hydrothermal vent populations will be an important step in the development of conservation policies. Indeed, the time required by deep-sea populations to recover following habitat perturbations depends both on the direction of gene flow and the number of migrants available for re-colonization after disturbance. In this study we compare nine dirivultid copepod species across various geological settings. We analyze partial nucleotide sequences of the mtCOI gene and use divergence estimates (FST) and haplotype networks to infer intraspecific population connectivity between vent sites. Furthermore, we evaluate contrasting scenarios of demographic population expansion/decline versus constant population size (using, for example, Tajima’s D). Our results indicate high diversity, population expansion and high connectivity of all copepod populations in all oceans. For example, haplotype diversity values range from 0.89 to 1 and FST values range from 0.001 to 0.11 for Stygiopontius species from the Central Indian Ridge, Mid Atlantic Ridge, East Pacific Rise, and Eastern Lau Spreading Center. We suggest that great abundance and high site occupancy by these species favor high genetic diversity. Two scenarios both showed similarly high connectivity: fast spreading centers with little distance between vent fields and slow spreading centers with greater distance between fields. This unexpected result may be due to some distinct frequency of natural disturbance events, or to aspects of individual life histories that affect realized rates of dispersal. However, our statistical performance analyses showed that at least 100 genomic regions should be sequenced to ensure accurate estimates of migration rate. Our demography parameters demonstrate that dirivultid populations are generally large and continuously undergoing population growth. Benthic and pelagic species abundance data support these findings. PMID:27732624

  4. Contrasting genomic properties of free-living and particle-attached microbial assemblages within a coastal ecosystem

    PubMed Central

    Smith, Maria W.; Zeigler Allen, Lisa; Allen, Andrew E.; Herfort, Lydie; Simon, Holly M.

    2013-01-01

    The Columbia River (CR) is a powerful economic and environmental driver in the US Pacific Northwest. Microbial communities in the water column were analyzed from four diverse habitats: (1) an estuarine turbidity maximum (ETM), (2) a chlorophyll maximum of the river plume, (3) an upwelling-associated hypoxic zone, and (4) the deep ocean bottom. Three size fractions, 0.1–0.8, 0.8–3, and 3–200 μm were collected for each habitat in August 2007, and used for DNA isolation and 454 sequencing, resulting in 12 metagenomes of >5 million reads (>1.6 Gbp). To characterize the dominant microorganisms and metabolisms contributing to coastal biogeochemistry, we used predicted peptide and rRNA data. The 3- and 0.8-μm metagenomes, representing particulate fractions, were taxonomically diverse across habitats. The 3-μm size fractions contained a high abundance of eukaryota with diatoms dominating the hypoxic water and plume, while cryptophytes were more abundant in the ETM. The 0.1-μm metagenomes represented mainly free-living bacteria and archaea. The most abundant archaeal hits were observed in the deep ocean and hypoxic water (19% of prokaryotic peptides in the 0.1-μm metagenomes), and were homologous to Nitrosopumilus maritimus (ammonia-oxidizing Thaumarchaeota). Bacteria dominated metagenomes of all samples. In the euphotic zone (estuary, plume and hypoxic ocean), the most abundant bacterial taxa (≥40% of prokaryotic peptides) represented aerobic photoheterotrophs. In contrast, the low-oxygen, deep water metagenome was enriched with sequences for strict and facultative anaerobes. Interestingly, many of the same anaerobic bacterial families were enriched in the 3-μm size fraction of the ETM (2–10X more abundant relative to the 0.1-μm metagenome), indicating possible formation of anoxic microniches within particles. Results from this study provide a metagenome perspective on ecosystem-scale metabolism in an upwelling-influenced river-dominated coastal margin. PMID:23750156

  5. Mitochondrial DNA Analyses Indicate High Diversity, Expansive Population Growth and High Genetic Connectivity of Vent Copepods (Dirivultidae) across Different Oceans.

    PubMed

    Gollner, Sabine; Stuckas, Heiko; Kihara, Terue C; Laurent, Stefan; Kodami, Sahar; Martinez Arbizu, Pedro

    2016-01-01

    Communities in spatially fragmented deep-sea hydrothermal vents rich in polymetallic sulfides could soon face major disturbance events due to deep-sea mineral mining, such that unraveling patterns of gene flow between hydrothermal vent populations will be an important step in the development of conservation policies. Indeed, the time required by deep-sea populations to recover following habitat perturbations depends both on the direction of gene flow and the number of migrants available for re-colonization after disturbance. In this study we compare nine dirivultid copepod species across various geological settings. We analyze partial nucleotide sequences of the mtCOI gene and use divergence estimates (FST) and haplotype networks to infer intraspecific population connectivity between vent sites. Furthermore, we evaluate contrasting scenarios of demographic population expansion/decline versus constant population size (using, for example, Tajima's D). Our results indicate high diversity, population expansion and high connectivity of all copepod populations in all oceans. For example, haplotype diversity values range from 0.89 to 1 and FST values range from 0.001 to 0.11 for Stygiopontius species from the Central Indian Ridge, Mid Atlantic Ridge, East Pacific Rise, and Eastern Lau Spreading Center. We suggest that great abundance and high site occupancy by these species favor high genetic diversity. Two scenarios both showed similarly high connectivity: fast spreading centers with little distance between vent fields and slow spreading centers with greater distance between fields. This unexpected result may be due to some distinct frequency of natural disturbance events, or to aspects of individual life histories that affect realized rates of dispersal. However, our statistical performance analyses showed that at least 100 genomic regions should be sequenced to ensure accurate estimates of migration rate. Our demography parameters demonstrate that dirivultid populations are generally large and continuously undergoing population growth. Benthic and pelagic species abundance data support these findings.

  6. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.

    PubMed

    Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie

    2017-07-01

    Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Brain tumor classification of microscopy images using deep residual learning

    NASA Astrophysics Data System (ADS)

    Ishikawa, Yota; Washiya, Kiyotada; Aoki, Kota; Nagahashi, Hiroshi

    2016-12-01

    The crisis rate of brain tumor is about one point four in ten thousands. In general, cytotechnologists take charge of cytologic diagnosis. However, the number of cytotechnologists who can diagnose brain tumors is not sufficient, because of the necessity of highly specialized skill. Computer-Aided Diagnosis by computational image analysis may dissolve the shortage of experts and support objective pathological examinations. Our purpose is to support a diagnosis from a microscopy image of brain cortex and to identify brain tumor by medical image processing. In this study, we analyze Astrocytes that is a type of glia cell of central nerve system. It is not easy for an expert to discriminate brain tumor correctly since the difference between astrocytes and low grade astrocytoma (tumors formed from Astrocyte) is very slight. In this study, we present a novel method to segment cell regions robustly using BING objectness estimation and to classify brain tumors using deep convolutional neural networks (CNNs) constructed by deep residual learning. BING is a fast object detection method and we use pretrained BING model to detect brain cells. After that, we apply a sequence of post-processing like Voronoi diagram, binarization, watershed transform to obtain fine segmentation. For classification using CNNs, a usual way of data argumentation is applied to brain cells database. Experimental results showed 98.5% accuracy of classification and 98.2% accuracy of segmentation.

  8. Distribution and Diversity of Microbial Eukaryotes in Bathypelagic Waters of the South China Sea.

    PubMed

    Xu, Dapeng; Jiao, Nianzhi; Ren, Rui; Warren, Alan

    2017-05-01

    Little is known about the biodiversity of microbial eukaryotes in the South China Sea, especially in waters at bathyal depths. Here, we employed SSU rDNA gene sequencing to reveal the diversity and community structure across depth and distance gradients in the South China Sea. Vertically, the highest alpha diversity was found at 75-m depth. The communities of microbial eukaryotes were clustered into shallow-, middle-, and deep-water groups according to the depth from which they were collected, indicating a depth-related diversity and distribution pattern. Rhizaria sequences dominated the microeukaryote community and occurred in all samples except those from less than 50-m deep, being most abundant near the sea floor where they contributed ca. 64-97% and 40-74% of the total sequences and OTUs recovered, respectively. A large portion of rhizarian OTUs has neither a nearest named neighbor nor a nearest neighbor in the GenBank database which indicated the presence of new phylotypes in the South China Sea. Given their overwhelming abundance and richness, further phylogenetic analysis of rhizarians were performed and three new genetic clusters were revealed containing sequences retrieved from the deep waters of the South China Sea. Our results shed light on the diversity and community structure of microbial eukaryotes in this not yet fully explored area. © 2016 The Author(s) Journal of Eukaryotic Microbiology © 2016 International Society of Protistologists.

  9. Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data

    PubMed Central

    Vernick, Kenneth D.

    2017-01-01

    Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932

  10. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    PubMed Central

    Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1T was the first isolate within the phylum “Thermus-Deinococcus” to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22675595

  11. Brain Tumor Segmentation Using Deep Belief Networks and Pathological Knowledge.

    PubMed

    Zhan, Tianming; Chen, Yi; Hong, Xunning; Lu, Zhenyu; Chen, Yunjie

    2017-01-01

    In this paper, we propose an automatic brain tumor segmentation method based on Deep Belief Networks (DBNs) and pathological knowledge. The proposed method is targeted against gliomas (both low and high grade) obtained in multi-sequence magnetic resonance images (MRIs). Firstly, a novel deep architecture is proposed to combine the multi-sequences intensities feature extraction with classification to get the classification probabilities of each voxel. Then, graph cut based optimization is executed on the classification probabilities to strengthen the spatial relationships of voxels. At last, pathological knowledge of gliomas is applied to remove some false positives. Our method was validated in the Brain Tumor Segmentation Challenge 2012 and 2013 databases (BRATS 2012, 2013). The performance of segmentation results demonstrates our proposal providing a competitive solution with stateof- the-art methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. The seismic stratigraphy of Okanagan Lake, British Columbia; a record of rapid deglaciation in a deep 'fiord-lake' basin

    NASA Astrophysics Data System (ADS)

    Eyles, Nicholas; Mullins, Henry T.; Hine, Albert C.

    1991-09-01

    This paper presents the first detailed data regarding the newly discovered deep infill of Okanagan Lake. Okanagan Lake (50°00'N, 119°30'W) is 120 km long, ˜ 3-5 km wide and occupies a glacially overdeepened bedrock basin in the southern interior of British Columbia. This basin, and other elongate lakes of the region (e.g. Shuswap, Kootenay, Kalamalka, Canim and Mahood lakes), mark the site of westward flowing ice streams within successive Cordilleran ice sheets. An air gun seismic survey of Okanagan Lake shows that the bedrock floor is nearly 650 m below sea-level, more than 2000 m below the rim of the surrounding plateau. The maximum thickness of Pleistocene sediment in Okanagan Lake basin approaches 800 m. Forty-six seismic reflection traverses and an axial profile show a relatively simple stratigraphy composed of three seismic sequences argued to be no older than the last glacial cycle (< 30 ka). A discontinuous basal unit (sequence I) characterized by large-scale diffractions, and up to 460 m thick, infills the narrow, V-shaped bedrock floor of the basin and is interpreted as a boulder gravel deposited by subglacial meltwaters. Overlying seismic sequence II is composed of two sub-sequences. Sub-sequence IIa is a chaotic to massive facies up to 736 m thick. Lakeshore exposures close to where this unit reaches lake level show deformed and chaotically-bedded glaciolacustrine silts containing gravel lens and large ice-rafted boulders. The surface topography of this sub-sequence is irregular and in general mimics the form of the underlying bedrock as a result of compaction. This sequence passes laterally into stratified facies (sub-sequence IIb) at the northern end of the basin. Seismic sequence II appears to record rapid ice-proximal dumping of glaciolacustrine silt as the Okanagan glacier backwasted upvalley in a deep lake. A thin (60 m max.) laminated seismic sequence (III) drapes the hummocky surface of sequence II and represents postglacial sedimentation from fan-deltas. The extreme thickness of sequences I and II in Okanagan Lake reflects the focussing of large volumes of meltwater and sediment into the basin during deglaciation; pre-existing sediments that pre-date the last glacial cycle appear to have been completely eroded. Glaciological conditions during sedimentation may have been similar to marine-based outlet glaciers calving in deep water in fiord basins. In contrast to marine settings where ice bergs are free to disperse, large volumes of dead ice were trapped within the basin; structural evidence for sedimentation around dead ice blocks has been previously used to argue that the Cordilleran Ice Sheet downwasted in situ. We emphasize in contrast, the trapping of dead ice left behind by rapidly calving lake-based outlet glaciers.

  13. An application of computer aided requirements analysis to a real time deep space system

    NASA Technical Reports Server (NTRS)

    Farny, A. M.; Morris, R. V.; Hartsough, C.; Callender, E. D.; Teichroew, D.; Chikofsky, E.

    1981-01-01

    The entire procedure of incorporating the requirements and goals of a space flight project into integrated, time ordered sequences of spacecraft commands, is called the uplink process. The Uplink Process Control Task (UPCT) was created to examine the uplink process and determine ways to improve it. The Problem Statement Language/Problem Statement Analyzer (PSL/PSA) designed to assist the designer/analyst/engineer in the preparation of specifications of an information system is used as a supporting tool to aid in the analysis. Attention is given to a definition of the uplink process, the definition of PSL/PSA, the construction of a PSA database, the value of analysis to the study of the uplink process, and the PSL/PSA lessons learned.

  14. dictyExpress: a web-based platform for sequence data management and analytics in Dictyostelium and beyond.

    PubMed

    Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz

    2017-06-02

    Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open-source, available for extension and further development by bioinformaticians and data scientists.

  15. Characterization of the Prokaryotic Diversity in Cold Saline Perennial Springs of the Canadian High Arctic▿

    PubMed Central

    Perreault, Nancy N.; Andersen, Dale T.; Pollard, Wayne H.; Greer, Charles W.; Whyte, Lyle G.

    2007-01-01

    The springs at Gypsum Hill and Colour Peak on Axel Heiberg Island in the Canadian Arctic originate from deep salt aquifers and are among the few known examples of cold springs in thick permafrost on Earth. The springs discharge cold anoxic brines (7.5 to 15.8% salts), with a mean oxidoreduction potential of −325 mV, and contain high concentrations of sulfate and sulfide. We surveyed the microbial diversity in the sediments of seven springs by denaturing gradient gel electrophoresis (DGGE) and analyzing clone libraries of 16S rRNA genes amplified with Bacteria and Archaea-specific primers. Dendrogram analysis of the DGGE banding patterns divided the springs into two clusters based on their geographic origin. Bacterial 16S rRNA clone sequences from the Gypsum Hill library (spring GH-4) were classified into seven phyla (Actinobacteria, Bacteroidetes, Firmicutes, Gemmatimonadetes, Proteobacteria, Spirochaetes, and Verrucomicrobia); Deltaproteobacteria and Gammaproteobacteria sequences represented half of the clone library. Sequences related to Proteobacteria (82%), Firmicutes (9%), and Bacteroidetes (6%) constituted 97% of the bacterial clone library from Colour Peak (spring CP-1). Most GH-4 archaeal clone sequences (79%) were related to the Crenarchaeota while half of the CP-1 sequences were related to orders Halobacteriales and Methanosarcinales of the Euryarchaeota. Sequences related to the sulfur-oxidizing bacterium Thiomicrospira psychrophila dominated both the GH-4 (19%) and CP-1 (45%) bacterial libraries, and 56 to 76% of the bacterial sequences were from potential sulfur-metabolizing bacteria. These results suggest that the utilization and cycling of sulfur compounds may play a major role in the energy production and maintenance of microbial communities in these unique, cold environments. PMID:17220254

  16. Joint deep shape and appearance learning: application to optic pathway glioma segmentation

    NASA Astrophysics Data System (ADS)

    Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George

    2017-03-01

    Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.

  17. Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.

    PubMed

    Williams, Philip H; Eyles, Rod; Weiller, Georg

    2012-01-01

    MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

  18. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    PubMed

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  19. Detection of microRNAs in color space.

    PubMed

    Marco, Antonio; Griffiths-Jones, Sam

    2012-02-01

    Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.

  20. A Statistical Guide to the Design of Deep Mutational Scanning Experiments.

    PubMed

    Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia

    2016-09-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. Copyright © 2016 by the Genetics Society of America.

  1. An introduction to deep learning on biological sequence data: examples and solutions.

    PubMed

    Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae

    2017-11-15

    Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. skaaesonderby@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. Viral Linkage in HIV-1 Seroconverters and Their Partners in an HIV-1 Prevention Clinical Trial

    PubMed Central

    Campbell, Mary S.; Mullins, James I.; Hughes, James P.; Celum, Connie; Wong, Kim G.; Raugi, Dana N.; Sorensen, Stefanie; Stoddard, Julia N.; Zhao, Hong; Deng, Wenjie; Kahle, Erin; Panteleeff, Dana; Baeten, Jared M.; McCutchan, Francine E.; Albert, Jan; Leitner, Thomas; Wald, Anna; Corey, Lawrence; Lingappa, Jairam R.

    2011-01-01

    Background Characterization of viruses in HIV-1 transmission pairs will help identify biological determinants of infectiousness and evaluate candidate interventions to reduce transmission. Although HIV-1 sequencing is frequently used to substantiate linkage between newly HIV-1 infected individuals and their sexual partners in epidemiologic and forensic studies, viral sequencing is seldom applied in HIV-1 prevention trials. The Partners in Prevention HSV/HIV Transmission Study (ClinicalTrials.gov #NCT00194519) was a prospective randomized placebo-controlled trial that enrolled serodiscordant heterosexual couples to determine the efficacy of genital herpes suppression in reducing HIV-1 transmission; as part of the study analysis, HIV-1 sequences were examined for genetic linkage between seroconverters and their enrolled partners. Methodology/Principal Findings We obtained partial consensus HIV-1 env and gag sequences from blood plasma for 151 transmission pairs and performed deep sequencing of env in some cases. We analyzed sequences with phylogenetic techniques and developed a Bayesian algorithm to evaluate the probability of linkage. For linkage, we required monophyletic clustering between enrolled partners' sequences and a Bayesian posterior probability of ≥50%. Adjudicators classified each seroconversion, finding 108 (71.5%) linked, 40 (26.5%) unlinked, and 3 (2.0%) indeterminate transmissions, with linkage determined by consensus env sequencing in 91 (84%). Male seroconverters had a higher frequency of unlinked transmissions than female seroconverters. The likelihood of transmission from the enrolled partner was related to time on study, with increasing numbers of unlinked transmissions occurring after longer observation periods. Finally, baseline viral load was found to be significantly higher among linked transmitters. Conclusions/Significance In this first use of HIV-1 sequencing to establish endpoints in a large clinical trial, more than one-fourth of transmissions were unlinked to the enrolled partner, illustrating the relevance of these methods in the design of future HIV-1 prevention trials in serodiscordant couples. A hierarchy of sequencing techniques, analysis methods, and expert adjudication contributed to the linkage determination process. PMID:21399681

  3. Draft Genome Sequence of Thermus scotoductus Strain K1, Isolated from a Geothermal Spring in Karvachar, Nagorno Karabakh

    PubMed Central

    Saghatelyan, Ani; Poghosyan, Lianna

    2015-01-01

    The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. PMID:26564055

  4. Crackle pitch and rate do not vary significantly during a single automated-auscultation session in patients with pneumonia, congestive heart failure, or interstitial pulmonary fibrosis.

    PubMed

    Vyshedskiy, Andrey; Ishikawa, Sadamu; Murphy, Raymond L H

    2011-06-01

    To determine the variability of crackle pitch and crackle rate during a single automated-auscultation session with a computerized 16-channel lung-sound analyzer. Forty-nine patients with pneumonia, 52 with congestive heart failure (CHF), and 18 with interstitial pulmonary fibrosis (IPF) performed breathing maneuvers in the following sequence: normal breathing, deep breathing, cough several times; deep breathing, vital-capacity maneuver, and deep breathing. From the auscultation recordings we measured the crackle pitch and crackle rate. Crackle pitch variability, expressed as a percentage of the average crackle pitch, was small in all patients and in all maneuvers: pneumonia 11%, CHF 11%, pulmonary fibrosis 7%. Crackle rate variability was also small: pneumonia 31%, CHF 32%, IPF 24%. Compared to the first deep-breathing maneuver (100%), the average crackle pitch did not significantly change following coughing (pneumonia 100%, CHF 103%, IPF 100%), the vital-capacity maneuver (pneumonia 100%, CHF 92%, IPF 104%), or during quiet breathing (pneumonia 97%, CHF 100%, IPF 104%). Similarly, the average crackle rate did not change significantly following coughing (pneumonia 105%, CHF 110%, IPF 90%) or the vital-capacity maneuver (pneumonia 102%, CHF 101%, IPF 99%). However, during normal breathing the crackle rate was significantly lower in the patients with pneumonia (74%, P < .001) and significantly higher in the patients with IPF (147%, P < .05) than it was during deep breathing. In patients with CHF the average crackle rate during normal breathing was not significantly different from that during the first deep-breathing maneuver (108%). Crackle pitch and rate were surprisingly stable in all 3 conditions. Neither crackle pitch nor crackle rate changed significantly from breath to breath or from one deep-breathing maneuver to another, even when the maneuvers were separated by cough or the vital-capacity maneuver. The observation that crackle rate is a reproducible measurement during one automated-auscultation session suggests that crackle rate can be used to follow the course of cardiopulmonary illnesses such as pneumonia, IPF, and CHF.

  5. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    PubMed

    Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

    2013-01-01

    Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.

  6. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE PAGES

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...

    2016-09-10

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  7. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  8. Plastid Phylogenomics Resolve Deep Relationships among Eupolypod II Ferns with Rapid Radiation and Rate Heterogeneity

    PubMed Central

    Wei, Ran; Yan, Yue-Hong; Harris, AJ; Kang, Jong-Soo; Shen, Hui; Zhang, Xian-Chun

    2017-01-01

    Abstract The eupolypods II ferns represent a classic case of evolutionary radiation and, simultaneously, exhibit high substitution rate heterogeneity. These factors have been proposed to contribute to the contentious resolutions among clades within this fern group in multilocus phylogenetic studies. We investigated the deep phylogenetic relationships of eupolypod II ferns by sampling all major families and using 40 plastid genomes, or plastomes, of which 33 were newly sequenced with next-generation sequencing technology. We performed model-based analyses to evaluate the diversity of molecular evolutionary rates for these ferns. Our plastome data, with more than 26,000 informative characters, yielded good resolution for deep relationships within eupolypods II and unambiguously clarified the position of Rhachidosoraceae and the monophyly of Athyriaceae. Results of rate heterogeneity analysis revealed approximately 33 significant rate shifts in eupolypod II ferns, with the most heterogeneous rates (both accelerations and decelerations) occurring in two phylogenetically difficult lineages, that is, the Rhachidosoraceae–Aspleniaceae and Athyriaceae clades. These observations support the hypothesis that rate heterogeneity has previously constrained the deep phylogenetic resolution in eupolypods II. According to the plastome data, we propose that 14 chloroplast markers are particularly phylogenetically informative for eupolypods II both at the familial and generic levels. Our study demonstrates the power of a character-rich plastome data set and high-throughput sequencing for resolving the recalcitrant lineages, which have undergone rapid evolutionary radiation and dramatic changes in substitution rates. PMID:28854625

  9. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

    PubMed

    Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo

    2017-01-01

    Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.

  10. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090

  11. Mitogenomics does not resolve deep molluscan relationships (yet?).

    PubMed

    Stöger, I; Schrödl, M

    2013-11-01

    The origin of molluscs among lophotrochozoan metazoans is unresolved and interclass relationships are contradictory between morphology-based, multi-locus, and recent phylogenomic analyses. Within the "Deep Metazoan Phylogeny" framework, all available molluscan mitochondrial genomes were compiled, covering 6 of 8 classes. Genomes were reannotated, and 13 protein coding genes (PCGs) were analyzed in various taxon settings, under multiple masking and coding regimes. Maximum Likelihood based methods were used for phylogenetic reconstructions. In all cases, molluscs result mixed up with lophotrochozoan outgroups, and most molluscan classes with more than single representatives available are non-monophyletic. We discuss systematic errors such as long branch attraction to cause aberrant, basal positions of fast evolving ingroups such as scaphopods, patellogastropods and, in particular, the gastropod subgroup Heterobranchia. Mitochondrial sequences analyzed either as amino acids or nucleotides may perform well in some (Cephalopoda) but not in other palaeozoic molluscan groups; they are not suitable to reconstruct deep (Cambrian) molluscan evolution. Supposedly "rare" mitochondrial genome level features have long been promoted as phylogenetically informative. In our newly annotated data set, features such as genome size, transcription on one or both strands, and certain coupled pairs of PCGs show a homoplastic, but obviously non-random distribution. Apparently congruent (but not unambiguous) signal for non-trivial subclades, e.g. for a clade composed of pteriomorph and heterodont bivalves, needs confirmation from a more comprehensive bivalve sampling. We found that larger clusters not only of PCGs but also of rRNAs and even tRNAs can bear local phylogenetic signal; adding trnG-trnE to the end of the ancestral cluster trnM-trnC-trnY-trnW-trnQ might be synapomorphic for Mollusca. Mitochondrial gene arrangement and other genome level features explored and reviewed herein thus failed as golden bullets, but are promising as additional characters or evidence supporting deep molluscan clades revealed by other data sets. A representative and dense sampling of molluscan subgroups may contribute to resolve contentious interclass relationships in the future, and is vital for exploring the evolution of especially diverse mitochondrial genomes in molluscs. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Deep-fried oil consumption in rats impairs glycerolipid metabolism, gut histology and microbiota structure.

    PubMed

    Zhou, Zhongkai; Wang, Yuyang; Jiang, Yumei; Diao, Yongjia; Strappe, Padraig; Prenzler, Paul; Ayton, Jamie; Blanchard, Chris

    2016-04-28

    Deep frying in oil is a popular cooking method around the world. However, the safety of deep-fried edible oil, which is ingested with fried food, is a concern, because the oil is exposed continuously to be re-used at a high temperature, leading to a number of well-known chemical reactions. Thus, this study investigates the changes in energy metabolism, colon histology and gut microbiota in rats following deep-fried oil consumption and explores the mechanisms involved in above alterations. Deep-fried oil was prepared following a published method. Adult male Wistar rats were randomly divided into three groups (n = 8/group). Group 1: basal diet without extra oil consumption (control group); Group 2: basal diet supplemented with non-heated canola oil (NEO group); Group 3: basal diet supplemented with deep-fried canola oil (DFEO group). One point five milliliters (1.5 mL) of non-heated or heated oil were fed by oral gavage using a feeding needle once daily for 6 consecutive weeks. Effect of DFEO on rats body weight, KEGG pathway regarding lipids metabolism, gut histology and gut microbiota were analyzed using techniques of RNA sequencing, HiSeq Illumina sequencing platform, etc. Among the three groups, DFEO diet resulted in a lowest rat body weight. Metabolic pathway analysis showed 13 significantly enriched KEGG pathways in Control versus NEO group, and the majority of these were linked to carbohydrate, lipid and amino acid metabolisms. Comparison of NEO group versus DFEO group, highlighted significantly enriched functional pathways were mainly associated with chronic diseases. Among them, only one metabolism pathway (i.e. glycerolipid metabolism pathway) was found to be significantly enriched, indicating that inhibition of this metabolism pathway (glycerolipid metabolism) may be a response to the reduction in energy metabolism in the rats of DFEO group. Related gene analysis indicated that the down-regulation of Lpin1 seems to be highly associated with the inhibition of glycerolipid metabolism pathway. Histological analysis of gastrointestinal tract demonstrated several changes induced by DFEO on intestinal mucosa with associated destruction of endocrine tissue and the evidence of inflammation. Microbiota data showed that rats in DFEO group had the lowest proportion of Prevotella and the highest proportion of Bacteroides among the three groups. In particular, rats in DFEO group were characterized with higher presence of Allobaculum (Firmicutes), but not in control and NEO groups. This study investigated the negative effect of DFEO on health, in which DFEO could impair glycerolipid metabolism, destroy gut histological structure and unbalance microbiota profile. More importantly, this is the first attempt to reveal the mechanism involved in these changes, which may provide the guideline for designing health diet.

  13. Seismogenic faulting in the Meruoca granite, NE Brazil, consistent with a local weak fracture zone.

    PubMed

    Moura, Ana Catarina A; De Oliveira, Paulo H S; Ferreira, Joaquim M; Bezerra, Francisco H R; Fuck, Reinhardt A; Do Nascimento, Aderson F

    2014-12-01

    A sequence of earthquakes occurred in 2008 in the Meruoca granitic pluton, located in the northwestern part of the Borborema Province, NE Brazil. A seismological study defined the seismic activity occurring along the seismically-defined Riacho Fundo fault, a 081° striking, 8 km deep structure. The objective of this study was to analyze the correlation between this seismic activity and geological structures in the Meruoca granite. We carried out geological mapping in the epicentral area, analyzed the mineralogy of fault rocks, and compared the seismically-defined Riacho Fundo fault with geological data. We concluded that the seismically-defined fault coincides with ∼E-W-striking faults observed at outcrop scale and a swarm of Mesozoic basalt dikes. We propose that seismicity reactivated brittle structures in the Meruoca granite. Our study highlights the importance of geological mapping and mineralogical analysis in order to establish the relationships between geological structures and seismicity at a given area.

  14. Seismogenic faulting in the Meruoca granite, NE Brazil, consistent with a local weak fracture zone.

    PubMed

    Moura, Ana Catarina A; Oliveira, Paulo H S DE; Ferreira, Joaquim M; Bezerra, Francisco H R; Fuck, Reinhardt A; Nascimento, Aderson F DO

    2014-10-24

    A sequence of earthquakes occurred in 2008 in the Meruoca granitic pluton, located in the northwestern part of the Borborema Province, NE Brazil. A seismological study defined the seismic activity occurring along the seismically-defined Riacho Fundo fault, a 081° striking, 8 km deep structure. The objective of this study was to analyze the correlation between this seismic activity and geological structures in the Meruoca granite. We carried out geological mapping in the epicentral area, analyzed the mineralogy of fault rocks, and compared the seismically-defined Riacho Fundo fault with geological data. We concluded that the seismically-defined fault coincides with ∼E-W-striking faults observed at outcrop scale and a swarm of Mesozoic basalt dikes. We propose that seismicity reactivated brittle structures in the Meruoca granite. Our study highlights the importance of geological mapping and mineralogical analysis in order to establish the relationships between geological structures and seismicity at a given area.

  15. A multi-protease, multi-dissociation, bottom-up-to-top-down proteomic view of the Loxosceles intermedia venom

    PubMed Central

    Trevisan-Silva, Dilza; Bednaski, Aline V.; Fischer, Juliana S.G.; Veiga, Silvio S.; Bandeira, Nuno; Guthals, Adrian; Marchini, Fabricio K.; Leprevost, Felipe V.; Barbosa, Valmir C.; Senff-Ribeiro, Andrea; Carvalho, Paulo C.

    2017-01-01

    Venoms are a rich source for the discovery of molecules with biotechnological applications, but their analysis is challenging even for state-of-the-art proteomics. Here we report on a large-scale proteomic assessment of the venom of Loxosceles intermedia, the so-called brown spider. Venom was extracted from 200 spiders and fractioned into two aliquots relative to a 10 kDa cutoff mass. Each of these was further fractioned and digested with trypsin (4 h), trypsin (18 h), pepsin (18 h), and chymotrypsin (18 h), then analyzed by MudPIT on an LTQ-Orbitrap XL ETD mass spectrometer fragmenting precursors by CID, HCD, and ETD. Aliquots of undigested samples were also analyzed. Our experimental design allowed us to apply spectral networks, thus enabling us to obtain meta-contig assemblies, and consequently de novo sequencing of practically complete proteins, culminating in a deep proteome assessment of the venom. Data are available via ProteomeXchange, with identifier PXD005523. PMID:28696408

  16. Photometric Calibrations of Gemini Images of NGC 6253

    NASA Astrophysics Data System (ADS)

    Pearce, Sean; Jeffery, Elizabeth

    2017-01-01

    We present preliminary results of our analysis of the metal-rich open cluster NGC 6253 using imaging data from GMOS on the Gemini-South Observatory. These data are part of a larger project to observe the effects of high metallicity on white dwarf cooling processes, especially the white dwarf cooling age, which have important implications on the processes of stellar evolution. To standardize the Gemini photometry, we have also secured imaging data of both the cluster and standard star fields using the 0.6-m SARA Observatory at CTIO. By analyzing and comparing the standard star fields of both the SARA data and the published Gemini zero-points of the standard star fields, we will calibrate the data obtained for the cluster. These calibrations are an important part of the project to obtain a standardized deep color-magnitude diagram to analyze the cluster. We present the process of verifying our standardization process. With a standardized CMD, we also present an analysis of the cluster's main sequence turn off age.

  17. Draft Genome Sequence of Aldehyde-Degrading Strain Halomonas axialensis ACH-L-8

    PubMed Central

    Ye, Jun; Ren, Chong; Shan, Xiexie

    2016-01-01

    Halomonas axialensis ACH-L-8, a deep-sea strain isolated from the South China Sea, has the ability to degrade aldehydes. Here, we present an annotated draft genome sequence of this species, which could provide fundamental molecular information on the aldehydes-degrading mechanism. PMID:27081145

  18. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    USDA-ARS?s Scientific Manuscript database

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  19. Impact of the HIV-1 genetic background and HIV-1 population size on the evolution of raltegravir resistance.

    PubMed

    Fun, Axel; Leitner, Thomas; Vandekerckhove, Linos; Däumer, Martin; Thielen, Alexander; Buchholz, Bernd; Hoepelman, Andy I M; Gisolf, Elizabeth H; Schipper, Pauline J; Wensing, Annemarie M J; Nijhuis, Monique

    2018-01-05

    Emergence of resistance against integrase inhibitor raltegravir in human immunodeficiency virus type 1 (HIV-1) patients is generally associated with selection of one of three signature mutations: Y143C/R, Q148K/H/R or N155H, representing three distinct resistance pathways. The mechanisms that drive selection of a specific pathway are still poorly understood. We investigated the impact of the HIV-1 genetic background and population dynamics on the emergence of raltegravir resistance. Using deep sequencing we analyzed the integrase coding sequence (CDS) in longitudinal samples from five patients who initiated raltegravir plus optimized background therapy at viral loads > 5000 copies/ml. To investigate the role of the HIV-1 genetic background we created recombinant viruses containing the viral integrase coding region from pre-raltegravir samples from two patients in whom raltegravir resistance developed through different pathways. The in vitro selections performed with these recombinant viruses were designed to mimic natural population bottlenecks. Deep sequencing analysis of the viral integrase CDS revealed that the virological response to raltegravir containing therapy inversely correlated with the relative amount of unique sequence variants that emerged suggesting diversifying selection during drug pressure. In 4/5 patients multiple signature mutations representing different resistance pathways were observed. Interestingly, the resistant population can consist of a single resistant variant that completely dominates the population but also of multiple variants from different resistance pathways that coexist in the viral population. We also found evidence for increased diversification after stronger bottlenecks. In vitro selections with low viral titers, mimicking population bottlenecks, revealed that both recombinant viruses and HXB2 reference virus were able to select mutations from different resistance pathways, although typically only one resistance pathway emerged in each individual culture. The generation of a specific raltegravir resistant variant is not predisposed in the genetic background of the viral integrase CDS. Typically, in the early phases of therapy failure the sequence space is explored and multiple resistance pathways emerge and then compete for dominance which frequently results in a switch of the dominant population over time towards the fittest variant or even multiple variants of similar fitness that can coexist in the viral population.

  20. Suboxic deep seawater in the late Paleoproterozoic: Evidence from hematitic chert and iron formation related to seafloor-hydrothermal sulfide deposits, central Arizona, USA

    USGS Publications Warehouse

    Slack, J.F.; Grenne, Tor; Bekker, A.; Rouxel, O.J.; Lindberg, P.A.

    2007-01-01

    A current model for the evolution of Proterozoic deep seawater composition involves a change from anoxic sulfide-free to sulfidic conditions 1.8??Ga. In an earlier model the deep ocean became oxic at that time. Both models are based on the secular distribution of banded iron formation (BIF) in shallow marine sequences. We here present a new model based on rare earth elements, especially redox-sensitive Ce, in hydrothermal silica-iron oxide sediments from deeper-water, open-marine settings related to volcanogenic massive sulfide (VMS) deposits. In contrast to Archean, Paleozoic, and modern hydrothermal iron oxide sediments, 1.74 to 1.71??Ga hematitic chert (jasper) and iron formation in central Arizona, USA, show moderate positive to small negative Ce anomalies, suggesting that the redox state of the deep ocean then was at a transitional, suboxic state with low concentrations of dissolved O2 but no H2S. The presence of jasper and/or iron formation related to VMS deposits in other volcanosedimentary sequences ca. 1.79-1.69??Ga, 1.40??Ga, and 1.24??Ga also reflects oxygenated and not sulfidic deep ocean waters during these time periods. Suboxic conditions in the deep ocean are consistent with the lack of shallow-marine BIF ??? 1.8 to 0.8??Ga, and likely limited nutrient concentrations in seawater and, consequently, may have constrained biological evolution. ?? 2006 Elsevier B.V. All rights reserved.

  1. Mutations Related to Antiretroviral Resistance Identified by Ultra-Deep Sequencing in HIV-1 Infected Children under Structured Interruptions of HAART

    PubMed Central

    Vazquez-Guillen, Jose Manuel; Palacios-Saucedo, Gerardo C.; Rivera-Morales, Lydia G.; Garcia-Campos, Jorge; Ortiz-Lopez, Rocio; Noguera-Julian, Marc; Paredes, Roger; Vielma-Ramirez, Herlinda J.; Ramirez, Teresa J.; Chavez-Garcia, Marcelino; Lopez-Guillen, Paulo; Briones-Lara, Evangelina; Sanchez-Sanchez, Luz M.; Vazquez-Martinez, Carlos A.; Rodriguez-Padilla, Cristina

    2016-01-01

    Although Structured Treatment Interruptions (STI) are currently not considered an alternative strategy for antiretroviral treatment, their true benefits and limitations have not been fully established. Some studies suggest the possibility of improving the quality of life of patients with this strategy; however, the information that has been obtained corresponds mostly to studies conducted in adults, with a lack of knowledge about its impact on children. Furthermore, mutations associated with antiretroviral resistance could be selected due to sub-therapeutic levels of HAART at each interruption period. Genotyping methods to determine the resistance profiles of the infecting viruses have become increasingly important for the management of patients under STI, thus low-abundance antiretroviral drug-resistant mutations (DRM’s) at levels under limit of detection of conventional genotyping (<20% of quasispecies) could increase the risk of virologic failure. In this work, we analyzed the protease and reverse transcriptase regions of the pol gene by ultra-deep sequencing in pediatric patients under STI with the aim of determining the presence of high- and low-abundance DRM’s in the viral rebounds generated by the STI. High-abundance mutations in protease and high- and low-abundance mutations in reverse transcriptase were detected but no one of these are directly associated with resistance to antiretroviral drugs. The results could suggest that the evaluated STI program is virologically safe, but strict and carefully planned studies, with greater numbers of patients and interruption/restart cycles, are still needed to evaluate the selection of DRM’s during STI. PMID:26807922

  2. Transcriptomic investigation of meat tenderness in two Italian cattle breeds.

    PubMed

    Bongiorni, S; Gruber, C E M; Bueno, S; Chillemi, G; Ferrè, F; Failla, S; Moioli, B; Valentini, A

    2016-06-01

    Our objectives for this study were to understand the biological basis of meat tenderness and to provide an overview of the gene expression profiles related to meat quality as a tool for selection. Through deep mRNA sequencing, we analyzed gene expression in muscle tissues of two Italian cattle breeds: Maremmana and Chianina. We uncovered several differentially expressed genes that encode for proteins belonging to a family of tripartite motif proteins, which are involved in growth, cell differentiation and apoptosis, such as TRIM45, or play an essential role in regulating skeletal muscle differentiation and the regeneration of adult skeletal muscle, such as TRIM32. Other differentially expressed genes (SCN2B, SLC9A7 and KCNK3) emphasize the involvement of potassium-sodium pumps in tender meat. By mapping splice junctions in RNA-Seq reads, we found significant differences in gene isoform expression levels. The PRKAG3 gene, which is involved in the regulation of energy metabolism, showed four isoforms that were differentially expressed. This distinct pattern of PRKAG3 gene expression could indicate impaired glycogen storage in skeletal muscle, and consequently, this gene very likely has a role in the tenderization process. Furthermore, with this deep RNA-sequencing, we captured a high number of expressed SNPs, for example, we found 1462 homozygous SNPs showing the alternative allele with a 100% frequency when comparing tender and tough meat. SNPs were then classified into categories by their position and also by their effect on gene coding (174 non-synonymous polymorphisms) based on the available UMD_3.1 annotations. © 2016 Stichting International Foundation for Animal Genetics.

  3. Transcriptomic analysis and mutational status of IDH1 in paired primary-recurrent intrahepatic cholangiocarcinoma.

    PubMed

    Peraldo-Neia, C; Ostano, P; Cavalloni, G; Pignochino, Y; Sangiolo, D; De Cecco, L; Marchesi, E; Ribero, D; Scarpa, A; De Rose, A M; Giuliani, A; Calise, F; Raggi, C; Invernizzi, P; Aglietta, M; Chiorino, G; Leone, F

    2018-06-05

    Effective target therapies for intrahepatic cholangiocarcinoma (ICC) have not been identified so far. One of the reasons may be the genetic evolution from primary (PR) to recurrent (REC) tumors. We aim to identify peculiar characteristics and to select potential targets specific for recurrent tumors. Eighteen ICC paired PR and REC tumors were collected from 5 Italian Centers. Eleven pairs were analyzed for gene expression profiling and 16 for mutational status of IDH1. For one pair, deep mutational analysis by Next Generation Sequencing was also carried out. An independent cohort of patients was used for validation. Two class-paired comparison yielded 315 differentially expressed genes between REC and PR tumors. Up-regulated genes in RECs are involved in RNA/DNA processing, cell cycle, epithelial to mesenchymal transition (EMT), resistance to apoptosis, and cytoskeleton remodeling. Down-regulated genes participate to epithelial cell differentiation, proteolysis, apoptotic, immune response, and inflammatory processes. A 24 gene signature is able to discriminate RECs from PRs in an independent cohort; FANCG is statistically associated with survival in the chol-TCGA dataset. IDH1 was mutated in the RECs of five patients; 4 of them displayed the mutation only in RECs. Deep sequencing performed in one patient confirmed the IDH1 mutation in REC. RECs are enriched for genes involved in EMT, resistance to apoptosis, and cytoskeleton remodeling. Key players of these pathways might be considered druggable targets in RECs. IDH1 is mutated in 30% of RECs, becoming both a marker of progression and a target for therapy.

  4. Diversity and phylogenetic analyses of bacteria from a shallow-water hydrothermal vent in Milos island (Greece).

    PubMed

    Giovannelli, Donato; d'Errico, Giuseppe; Manini, Elena; Yakimov, Michail; Vetriani, Costantino

    2013-01-01

    Studies of shallow-water hydrothermal vents have been lagging behind their deep-sea counterparts. Hence, the importance of these systems and their contribution to the local and regional diversity and biogeochemistry is unclear. This study analyzes the bacterial community along a transect at the shallow-water hydrothermal vent system of Milos island, Greece. The abundance and biomass of the prokaryotic community is comparable to areas not affected by hydrothermal activity and was, on average, 1.34 × 10(8) cells g(-1). The abundance, biomass and diversity of the prokaryotic community increased with the distance from the center of the vent and appeared to be controlled by the temperature gradient rather than the trophic conditions. The retrieved 16S rRNA gene fragments matched sequences from a variety of geothermal environments, although the average similarity was low (94%), revealing previously undiscovered taxa. Epsilonproteobacteria constituted the majority of the population along the transect, with an average contribution to the total diversity of 60%. The larger cluster of 16S rRNA gene sequences was related to chemolithoautotrophic Sulfurovum spp., an Epsilonproteobacterium so far detected only at deep-sea hydrothermal vents. The presence of previously unknown lineages of Epsilonproteobacteria could be related to the abundance of organic matter in these systems, which may support alternative metabolic strategies to chemolithoautotrophy. The relative contribution of Gammaproteobacteria to the Milos microbial community increased along the transect as the distance from the center of the vent increased. Further attempts to isolate key species from these ecosystems will be critical to shed light on their evolution and ecology.

  5. Diversity and phylogenetic analyses of bacteria from a shallow-water hydrothermal vent in Milos island (Greece)

    PubMed Central

    Giovannelli, Donato; d'Errico, Giuseppe; Manini, Elena; Yakimov, Michail; Vetriani, Costantino

    2013-01-01

    Studies of shallow-water hydrothermal vents have been lagging behind their deep-sea counterparts. Hence, the importance of these systems and their contribution to the local and regional diversity and biogeochemistry is unclear. This study analyzes the bacterial community along a transect at the shallow-water hydrothermal vent system of Milos island, Greece. The abundance and biomass of the prokaryotic community is comparable to areas not affected by hydrothermal activity and was, on average, 1.34 × 108 cells g−1. The abundance, biomass and diversity of the prokaryotic community increased with the distance from the center of the vent and appeared to be controlled by the temperature gradient rather than the trophic conditions. The retrieved 16S rRNA gene fragments matched sequences from a variety of geothermal environments, although the average similarity was low (94%), revealing previously undiscovered taxa. Epsilonproteobacteria constituted the majority of the population along the transect, with an average contribution to the total diversity of 60%. The larger cluster of 16S rRNA gene sequences was related to chemolithoautotrophic Sulfurovum spp., an Epsilonproteobacterium so far detected only at deep-sea hydrothermal vents. The presence of previously unknown lineages of Epsilonproteobacteria could be related to the abundance of organic matter in these systems, which may support alternative metabolic strategies to chemolithoautotrophy. The relative contribution of Gammaproteobacteria to the Milos microbial community increased along the transect as the distance from the center of the vent increased. Further attempts to isolate key species from these ecosystems will be critical to shed light on their evolution and ecology. PMID:23847607

  6. The dynamics of genome replication using deep sequencing

    PubMed Central

    Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.

    2014-01-01

    Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142

  7. Deep sequencing reveals persistence of cell-associated mumps vaccine virus in chronic encephalitis.

    PubMed

    Morfopoulou, Sofia; Mee, Edward T; Connaughton, Sarah M; Brown, Julianne R; Gilmour, Kimberly; Chong, W K 'Kling'; Duprex, W Paul; Ferguson, Deborah; Hubank, Mike; Hutchinson, Ciaran; Kaliakatsos, Marios; McQuaid, Stephen; Paine, Simon; Plagnol, Vincent; Ruis, Christopher; Virasami, Alex; Zhan, Hong; Jacques, Thomas S; Schepelmann, Silke; Qasim, Waseem; Breuer, Judith

    2017-01-01

    Routine childhood vaccination against measles, mumps and rubella has virtually abolished virus-related morbidity and mortality. Notwithstanding this, we describe here devastating neurological complications associated with the detection of live-attenuated mumps virus Jeryl Lynn (MuV JL5 ) in the brain of a child who had undergone successful allogeneic transplantation for severe combined immunodeficiency (SCID). This is the first confirmed report of MuV JL5 associated with chronic encephalitis and highlights the need to exclude immunodeficient individuals from immunisation with live-attenuated vaccines. The diagnosis was only possible by deep sequencing of the brain biopsy. Sequence comparison of the vaccine batch to the MuV JL5 isolated from brain identified biased hypermutation, particularly in the matrix gene, similar to those found in measles from cases of SSPE. The findings provide unique insights into the pathogenesis of paramyxovirus brain infections.

  8. Deep sequencing of small RNA libraries from human prostate epithelial and stromal cells reveal distinct pattern of microRNAs primarily predicted to target growth factors.

    PubMed

    Singh, Savita; Zheng, Yun; Jagadeeswaran, Guru; Ebron, Jey Sabith; Sikand, Kavleen; Gupta, Sanjay; Sunker, Ramanjulu; Shukla, Girish C

    2016-02-28

    Complex epithelial and stromal cell interactions are required during the development and progression of prostate cancer. Regulatory small non-coding microRNAs (miRNAs) participate in the spatiotemporal regulation of messenger RNA (mRNA) and regulation of translation affecting a large number of genes involved in prostate carcinogenesis. In this study, through deep-sequencing of size fractionated small RNA libraries we profiled the miRNAs of prostate epithelial (PrEC) and stromal (PrSC) cells. Over 50 million reads were obtained for PrEC in which 860,468 were unique sequences. Similarly, nearly 76 million reads for PrSC were obtained in which over 1 million were unique reads. Expression of many miRNAs of broadly conserved and poorly conserved miRNA families were identified. Sixteen highly expressed miRNAs with significant change in expression in PrSC than PrEC were further analyzed in silico. ConsensusPathDB showed the target genes of these miRNAs were significantly involved in adherence junction, cell adhesion, EGRF, TGF-β and androgen signaling. Let-7 family of tumor-suppressor miRNAs expression was highly pervasive in both, PrEC and PrSC cells. In addition, we have also identified several miRNAs that are unique to PrEC or PrSC cells and their predicted putative targets are a group of transcription factors. This study provides perspective on the miRNA expression in PrEC and PrSC, and reveals a global trend in miRNA interactome. We conclude that the most abundant miRNAs are potential regulators of development and differentiation of the prostate gland by targeting a set of growth factors. Additionally, high level expression of the most members of let-7 family miRNAs suggests their role in the fine tuning of the growth and proliferation of prostate epithelial and stromal cells. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  9. Draft Genome Sequence of Thermus scotoductus Strain K1, Isolated from a Geothermal Spring in Karvachar, Nagorno Karabakh.

    PubMed

    Saghatelyan, Ani; Poghosyan, Lianna; Panosyan, Hovik; Birkeland, Nils-Kåre

    2015-11-12

    The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. Copyright © 2015 Saghatelyan et al.

  10. Clinical characteristics of patients with central nervous system relapse in BCR-ABL1-positive acute lymphoblastic leukemia: the importance of characterizing ABL1 mutations in cerebrospinal fluid.

    PubMed

    Sanchez, Ricardo; Ayala, Rosa; Alonso, Rafael Alberto; Martínez, María Pilar; Ribera, Jordi; García, Olga; Sanchez-Pina, José; Mercadal, Santiago; Montesinos, Pau; Martino, Rodrigo; Barba, Pere; González-Campos, José; Barrios, Manuel; Lavilla, Esperanza; Gil, Cristina; Bernal, Teresa; Escoda, Lourdes; Abella, Eugenia; Amigo, Ma Luz; Moreno, Ma José; Bravo, Pilar; Guàrdia, Ramón; Hernández-Rivas, Jesús-María; García-Guiñón, Antoni; Piernas, Sonia; Ribera, José-María; Martínez-López, Joaquín

    2017-07-01

    We investigated the frequency, predictors, and evolution of acute lymphoblastic leukemia (ALL) in patients with CNS relapse and introduced a novel method for studying BCR-ABL1 protein variants in cDNA from bone marrow (BM) and cerebrospinal fluid (CSF) blast cells. A total of 128 patients were analyzed in two PETHEMA clinical trials. All achieved complete remission after imatinib treatment. Of these, 30 (23%) experienced a relapse after achieving complete remission, and 13 (10%) had an isolated CNS relapse or combined CNS and BM relapses. We compared the characteristics of patients with and without CNS relapse and further analyzed CSF and BM samples from two of the 13 patients with CNS relapse. In both patients, classical sequencing analysis of the kinase domain of BCR-ABL1 from the cDNA of CSF blasts revealed the pathogenic variant p.L387M. We also performed ultra-deep next-generation sequencing (NGS) in three samples from one of the relapsed patients. We did not find the mutation in the BM sample, but we did find it in CSF blasts with 45% of reads at the time of relapse. These data demonstrate the feasibility of detecting BCR-ABL1 mutations in CSF blasts by NGS and highlight the importance of monitoring clonal evolution over time.

  11. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  12. A DNA barcode library for ground beetles (Insecta, Coleoptera, Carabidae) of Germany: The genus Bembidion Latreille, 1802 and allied taxa

    PubMed Central

    Raupach, Michael J.; Hannig, Karsten; Morinière, Jérome; Hendrich, Lars

    2016-01-01

    Abstract As molecular identification method, DNA barcoding based on partial cytochrome c oxidase subunit 1 (COI) sequences has been proven to be a useful tool for species determination in many insect taxa including ground beetles. In this study we tested the effectiveness of DNA barcodes to discriminate species of the ground beetle genus Bembidion and some closely related taxa of Germany. DNA barcodes were obtained from 819 individuals and 78 species, including sequences from previous studies as well as more than 300 new generated DNA barcodes. We found a 1:1 correspondence between BIN and traditionally recognized species for 69 species (89%). Low interspecific distances with maximum pairwise K2P values below 2.2% were found for three species pairs, including two species pairs with haplotype sharing (Bembidion atrocaeruleum/Bembidion varicolor and Bembidion guttula/Bembidion mannerheimii). In contrast to this, deep intraspecific sequence divergences with distinct lineages were revealed for two species (Bembidion geniculatum/Ocys harpaloides). Our study emphasizes the use of DNA barcodes for the identification of the analyzed ground beetles species and represents an important step in building-up a comprehensive barcode library for the Carabidae in Germany and Central Europe as well. PMID:27408547

  13. Special astronomical configurations, solar activity and deep degassing as a trigger of natural hazards

    NASA Astrophysics Data System (ADS)

    Natyaganov, Vladimir; Syvorotkin, Vladimir; Fedorov, Valeriy; Shopin, Sergey

    2016-04-01

    Extraordinary cases of tectonic events (strong earthquakes, volcano eruptions), mine explosions, typhoons, hurricanes, tornado outbreak sequences, ball lightnings, transient luminous events are analyzed in relation with special astronomical configurations, which are specific relative positions of the Sun, Earth, Moon and the closest planets of the Solar System (Venus, Mars and Jupiter) [1]. Usage of special astronomical coordinate systems give evidence not only of correlations but also of hidden causes-and-effect relations between the analyzed phenomena. The geocentric ecliptic latitude system is an example of such astronomical coordinate systems. It gives clear evidence of coherence between strong earthquakes and the maximal Moon declination from the plane of the ecliptic. Extraordinary cases of planet activity from the beginning of XX century till the present time are shown in the years of special astronomical configurations and abrupt increasing of solar activity. According to the empirical scheme of short-term earthquake prediction [3], geomagnetic disturbances are the triggers of earthquakes. Geomagnetic disturbances perform electromagnetic pumping (electromagnetic excitation) of the Earth's interior in the regions of intersections of seismomagnetic meridians with the plate boundaries as a result of electrothermal breakdowns in the heterogeneous medium of tectonic faults. This results in the local intensification of deep degassing [4], decreasing of shear strength of the medium that triggers earthquakes usually after 2 or 3 weeks (±2 days) after the geomagnetic disturbance. Examples of officially registered predictions of Kamchatka earthquakes with M7+ without missing events, including deep-focus earthquakes in the Okhotsk Sea since the year of 2002, are shown. It is discussed correlations and possible cause-and-effect relations between a different phenomena such as - dangerous natural hazardous events such as the record tornado outbreak sequences in the USA (May 2003, 400 tornadoes in 20 states and the 2011 Superb Outbreak in April 2011 (580 tornadoes), which corresponds to a third and about a half of the average annual number of tornadoes) - naturally-anthropogenic accidents with gas explosions in diggings and coal mines [4]; - special Moon phases (new moons and full moons); - local intensification of deep hydrogen-methane degassing; - extensive spatial anomalies of total ozone content in the stratosphere; - strong geomagnetic disturbances. The work was financially supported by the Ministry of Education and Science of the Russian Federation (in accordance with the requirements of the contract No. 14.577.21.0109, project UID RFMEFI57714X0109) References 1. V. M. Fedorov, Gravitational factors and astronomical chronology of geosphere processes [Gravitacionnye faktory i astronomicheskaja hronologija geosfernyh processov]. Moscow State University, Moscow, 2000. 368p. (In Russian) 2. V. L. Natyaganov, A. M. Nechaev, I. V. Stepanov, "Spatio-tempral relations of planet tectonic activity [Prostranstvenno-vremennye zakonomernosti tektonicheskoj aktivnosti planety]", Eurasian Union of Scientists, 2015, No. 3(12), Vol. 8. pp. 120-123. (In Russian) 3. L. N. Doda, V. L. Natyaganov, I. V. Stepanov, "An empirical scheme of short-term earthquake prediction," Doklady Earth Sciences, vol. 453, no.5, pp.551-557, Dec., 2013 4. V. L. Syvorotkin. Deep degassing and global catastrophes. Geoinformcentr. Moscow, 2002, 250 p. (In Russian)

  14. Poly(A)-tag deep sequencing data processing to extract poly(A) sites.

    PubMed

    Wu, Xiaohui; Ji, Guoli; Li, Qingshun Quinn

    2015-01-01

    Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.

  15. Probing the Rare Biosphere of the North-West Mediterranean Sea: An Experiment with High Sequencing Effort.

    PubMed

    Crespo, Bibiana G; Wallhead, Philip J; Logares, Ramiro; Pedrós-Alió, Carlos

    2016-01-01

    High-throughput sequencing (HTS) techniques have suggested the existence of a wealth of species with very low relative abundance: the rare biosphere. We attempted to exhaustively map this rare biosphere in two water samples by performing an exceptionally deep pyrosequencing analysis (~500,000 final reads per sample). Species data were derived by a 97% identity criterion and various parametric distributions were fitted to the observed counts. Using the best-fitting Sichel distribution we estimate a total species richness of 1,568-1,669 (95% Credible Interval) and 5,027-5,196 for surface and deep water samples respectively, implying that 84-89% of the total richness in those two samples was sequenced, and we predict that a quadrupling of the present sequencing effort would suffice to observe 90% of the total richness in both samples. Comparing the HTS results with a culturing approach we found that most of the cultured taxa were not obtained by HTS, despite the high sequencing effort. Culturing therefore remains a useful tool for uncovering marine bacterial diversity, in addition to its other uses for studying the ecology of marine bacteria.

  16. A metagenomic survey of viral abundance and diversity in mosquitoes from Hubei province.

    PubMed

    Shi, Chenyan; Liu, Yi; Hu, Xiaomin; Xiong, Jinfeng; Zhang, Bo; Yuan, Zhiming

    2015-01-01

    Mosquitoes as one of the most common but important vectors have the potential to transmit or acquire a lot of viruses through biting, however viral flora in mosquitoes and its impact on mosquito-borne disease transmission has not been well investigated and evaluated. In this study, the metagenomic techniquehas been successfully employed in analyzing the abundance and diversity of viral community in three mosquito samples from Hubei, China. Among 92,304 reads produced through a run with 454 GS FLX system, 39% have high similarities with viral sequences belonging to identified bacterial, fungal, animal, plant and insect viruses, and 0.02% were classed into unidentified viral sequences, demonstrating high abundance and diversity of viruses in mosquitoes. Furthermore, two novel viruses in subfamily Densovirinae and family Dicistroviridae were identified, and six torque tenosus virus1 in family Anelloviridae, three porcine parvoviruses in subfamily Parvovirinae and a Culex tritaeniorhynchus rhabdovirus in Family Rhabdoviridae were preliminarily characterized. The viral metagenomic analysis offered us a deep insight into the viral population of mosquito which played an important role in viral initiative or passive transmission and evolution during the process.

  17. Deep sequencing analysis of viral short RNAs from an infected Pinot Noir grapevine.

    PubMed

    Pantaleo, Vitantonio; Saldarelli, Pasquale; Miozzi, Laura; Giampetruzzi, Annalisa; Gisel, Andreas; Moxon, Simon; Dalmay, Tamas; Bisztray, György; Burgyan, Jozsef

    2010-12-05

    Virus-derived short interfering RNAs (vsiRNAs) isolated from grapevine V. vinifera Pinot Noir clone ENTAV 115 were analyzed by high-throughput sequencing using the Illumina Solexa platform. We identified and characterized vsiRNAs derived from grapevine field plants naturally infected with different viruses belonging to the genera Foveavirus, Maculavirus, Marafivirus and Nepovirus. These vsiRNAs were mainly of 21 and 22 nucleotides (nt) in size and were discontinuously distributed throughout Grapevine rupestris stem-pitting associated virus (GRSPaV) and Grapevine fleck virus (GFkV) genomic RNAs. Among the studied viruses, GRSPaV and GFkV vsiRNAs had a 5' terminal nucleotide bias, which differed from that described for experimental viral infections in Arabidopsis thaliana. VsiRNAs were found to originate from both genomic and antigenomic GRSPaV RNA strands, whereas with the grapevine tymoviruses GFkV and Grapevine Red Globe associated virus (GRGV), the large majority derived from the antigenomic viral strand, a feature never observed in other plant-virus interactions. Copyright © 2010 Elsevier Inc. All rights reserved.

  18. Molecular and morphological differentiation of Secret Toad-headed agama, Phrynocephalus mystaceus, with the description of a new subspecies from Iran (Reptilia, Agamidae).

    PubMed

    Solovyeva, Evgeniya N; Dunayev, Evgeniy N; Nazarov, Roman A; Mehdi Radjabizadeh; Poyarkov, Nikolay A

    2018-01-01

    The morphological and genetic variation of a wide-ranging Secret Toad-headed agama, Phrynocephalus mystaceus that inhabits sand deserts of south-eastern Europe, Middle East, Middle Asia, and western China is reviewed. Based on the morphological differences and high divergence in COI (mtDNA) gene sequences a new subspecies of Ph. mystaceus is described from Khorasan Razavi Province in Iran. Partial sequences of COI mtDNA gene of 31 specimens of Ph. mystaceus from 17 localities from all major parts of species range were analyzed. Genetic distances show a deep divergence between Ph. mystaceus khorasanus ssp. n. from Khorasan Razavi Province and all other populations of Ph. mystaceus . The new subspecies can be distinguished from other populations of Ph. mystaceus by a combination of several morphological features. Molecular and morphological analyses do not support the validity of other Ph. mystaceus subspecies described from Middle Asia and Caspian basin. Geographic variations in the Ph. mystaceus species complex and the status of previously described subspecies were discussed.

  19. Identification and Validation of Expressed Sequence Tags from Pigeonpea (Cajanus cajan L.) Root

    PubMed Central

    Kumar, Ravi Ranjan; Yadav, Shailesh; Joshi, Shourabh; Bhandare, Prithviraj P.; Patil, Vinod Kumar; Kulkarni, Pramod B.; Sonkawade, Swati; Naik, G. R.

    2014-01-01

    Pigeonpea (Cajanus cajan (L) Millsp.) is an important food legume crop of rain fed agriculture in the arid and semiarid tropics of the world. It has deep and extensive root system which serves a number of important physiological and metabolic functions in plant development and growth. In order to identify genes associated with pigeonpea root, ESTs were generated from the root tissues of pigeonpea (GRG-295 genotype) by normalized cDNA library. A total of 105 high quality ESTs were generated by sequencing of 250 random clones which resulted in 72 unigenes comprising 25 contigs and 47 singlets. The ESTs were assigned to 9 functional categories on the basis of their putative function. In order to validate the possible expression of transcripts, four genes, namely, S-adenosylmethionine synthetase, phosphoglycerate kinase, serine carboxypeptidase, and methionine aminopeptidase, were further analyzed by reverse transcriptase PCR. The possible role of the identified transcripts and their functions associated with root will also be a valuable resource for the functional genomics study in legume crop. PMID:24895494

  20. Agonal sequences in four filmed hangings: analysis of respiratory and movement responses to asphyxia by hanging.

    PubMed

    Sauvageau, Anny

    2009-01-01

    The human pathophysiology of asphyxia by hanging is still poorly understood, despite great advances in forensic science. In that context, filmed hangings may hold the key to answer questions regarding the sequence of events leading to death in human asphyxia. Four filmed hangings were analyzed. Rapid loss of consciousness was observed between 13 sec and 18 sec after onset of hanging, closely followed by convulsions (at 14-19 sec). A complex pattern of decerebration rigidity (19-21 sec in most cases), followed by a quick phase of decortication rigidity (1 min 00 sec-1 min 08 sec in most cases), an extended phase of decortication rigidity (1 min 04 sec-1 min 32 sec) and loss of muscle tone (1 min 38 sec-2 min 47 sec) was revealed. Very deep respiratory attempts started between 20 and 22 sec, the last respiratory attempt being detected between 2 min 00 sec and 2 min 04 sec. Despite differences in the types of hanging, this unique study reveals similarities that are further discussed.

  1. Culture-Independent Identification of Periodontitis-Associated Porphyromonas and Tannerella Populations by Targeted Molecular Analysis

    PubMed Central

    de Lillo, A.; Booth, V.; Kyriacou, L.; Weightman, A. J.; Wade, W. G.

    2004-01-01

    Periodontitis is the commonest bacterial disease of humans and is the major cause of adult tooth loss. About half of the oral microflora is unculturable; and 16S rRNA PCR, cloning, and sequencing techniques have demonstrated the high level of species richness of the oral microflora. In the present study, a PCR primer set specific for the genera Porphyromonas and Tannerella was designed and used to analyze the bacterial populations in subgingival plaque samples from inflamed shallow and deep sites in subjects with periodontitis and shallow sites in age- and sex-matched controls. A total of 308 clones were sequenced and found to belong to one of six Porphyromonas or Tannerella species or phylotypes, one of which, Porphyromonas P3, was novel. Tannerella forsythensis was found in significantly higher proportions in patients than in controls. Porphyromonas catoniae and Tannerella phylotype BU063 appeared to be associated with shallow sites. Targeted culture-independent molecular ecology studies have a valuable role to play in the identification of bacterial targets for further investigations of the pathogenesis of bacterial infections. PMID:15583276

  2. Ultra-Deep Sequencing Analysis of the Hepatitis A Virus 5'-Untranslated Region among Cases of the Same Outbreak from a Single Source

    PubMed Central

    Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu

    2014-01-01

    Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287

  3. Low endemism, continued deep-shallow interchanges, and evidence for cosmopolitan distributions in free-living marine nematodes (order Enoplida)

    PubMed Central

    2010-01-01

    Background Nematodes represent the most abundant benthic metazoa in one of the largest habitats on earth, the deep sea. Characterizing major patterns of biodiversity within this dominant group is a critical step towards understanding evolutionary patterns across this vast ecosystem. The present study has aimed to place deep-sea nematode species into a phylogenetic framework, investigate relationships between shallow water and deep-sea taxa, and elucidate phylogeographic patterns amongst the deep-sea fauna. Results Molecular data (18 S and 28 S rRNA) confirms a high diversity amongst deep-sea Enoplids. There is no evidence for endemic deep-sea lineages in Maximum Likelihood or Bayesian phylogenies, and Enoplids do not cluster according to depth or geographic location. Tree topologies suggest frequent interchanges between deep-sea and shallow water habitats, as well as a mixture of early radiations and more recently derived lineages amongst deep-sea taxa. This study also provides convincing evidence of cosmopolitan marine species, recovering a subset of Oncholaimid nematodes with identical gene sequences (18 S, 28 S and cox1) at trans-Atlantic sample sites. Conclusions The complex clade structures recovered within the Enoplida support a high global species richness for marine nematodes, with phylogeographic patterns suggesting the existence of closely related, globally distributed species complexes in the deep sea. True cosmopolitan species may additionally exist within this group, potentially driven by specific life history traits of Enoplids. Although this investigation aimed to intensively sample nematodes from the order Enoplida, specimens were only identified down to genus (at best) and our sampling regime focused on an infinitesimal small fraction of the deep-sea floor. Future nematode studies should incorporate an extended sample set covering a wide depth range (shelf, bathyal, and abyssal sites), utilize additional genetic loci (e.g. mtDNA) that are informative at the species level, and apply high-throughput sequencing methods to fully assay community diversity. Finally, further molecular studies are needed to determine whether phylogeographic patterns observed in Enoplids are common across other ubiquitous marine groups (e.g. Chromadorida, Monhysterida). PMID:21167065

  4. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir

    PubMed Central

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; McMurry, Kim; Gleasner, Cheryl D.; Vuyisich, Momchilo; Chain, Patrick S.

    2015-01-01

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637

  5. Subglacial Lake Vostok (Antarctica) Accretion Ice Contains a Diverse Set of Sequences from Aquatic, Marine and Sediment-Inhabiting Bacteria and Eukarya

    PubMed Central

    Edgar, Robyn; Veerapaneni, Ram S.; D’Elia, Tom; Morris, Paul F.; Rogers, Scott O.

    2013-01-01

    Lake Vostok, the 7th largest (by volume) and 4th deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes. PMID:23843994

  6. Subglacial Lake Vostok (Antarctica) accretion ice contains a diverse set of sequences from aquatic, marine and sediment-inhabiting bacteria and eukarya.

    PubMed

    Shtarkman, Yury M; Koçer, Zeynep A; Edgar, Robyn; Veerapaneni, Ram S; D'Elia, Tom; Morris, Paul F; Rogers, Scott O

    2013-01-01

    Lake Vostok, the 7(th) largest (by volume) and 4(th) deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes.

  7. De novo characterization of the Chinese fir (Cunninghamia lanceolata) transcriptome and analysis of candidate genes involved in cellulose and lignin biosynthesis

    PubMed Central

    2012-01-01

    Background Chinese fir (Cunninghamia lanceolata) is an important timber species that accounts for 20–30% of the total commercial timber production in China. However, the available genomic information of Chinese fir is limited, and this severely encumbers functional genomic analysis and molecular breeding in Chinese fir. Recently, major advances in transcriptome sequencing have provided fast and cost-effective approaches to generate large expression datasets that have proven to be powerful tools to profile the transcriptomes of non-model organisms with undetermined genomes. Results In this study, the transcriptomes of nine tissues from Chinese fir were analyzed using the Illumina HiSeq™ 2000 sequencing platform. Approximately 40 million paired-end reads were obtained, generating 3.62 gigabase pairs of sequencing data. These reads were assembled into 83,248 unique sequences (i.e. Unigenes) with an average length of 449 bp, amounting to 37.40 Mb. A total of 73,779 Unigenes were supported by more than 5 reads, 42,663 (57.83%) had homologs in the NCBI non-redundant and Swiss-Prot protein databases, corresponding to 27,224 unique protein entries. Of these Unigenes, 16,750 were assigned to Gene Ontology classes, and 14,877 were clustered into orthologous groups. A total of 21,689 (29.40%) were mapped to 119 pathways by BLAST comparison against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The majority of the genes encoding the enzymes in the biosynthetic pathways of cellulose and lignin were identified in the Unigene dataset by targeted searches of their annotations. And a number of candidate Chinese fir genes in the two metabolic pathways were discovered firstly. Eighteen genes related to cellulose and lignin biosynthesis were cloned for experimental validating of transcriptome data. Overall 49 Unigenes, covering different regions of these selected genes, were found by alignment. Their expression patterns in different tissues were analyzed by qRT-PCR to explore their putative functions. Conclusions A substantial fraction of transcript sequences was obtained from the deep sequencing of Chinese fir. The assembled Unigene dataset was used to discover candidate genes of cellulose and lignin biosynthesis. This transcriptome dataset will provide a comprehensive sequence resource for molecular genetics research of C. lanceolata. PMID:23171398

  8. miRDis: a Web tool for endogenous and exogenous microRNA discovery based on deep-sequencing data analysis.

    PubMed

    Zhang, Hanyuan; Vieira Resende E Silva, Bruno; Cui, Juan

    2018-05-01

    Small RNA sequencing is the most widely used tool for microRNA (miRNA) discovery, and shows great potential for the efficient study of miRNA cross-species transport, i.e., by detecting the presence of exogenous miRNA sequences in the host species. Because of the increased appreciation of dietary miRNAs and their far-reaching implication in human health, research interests are currently growing with regard to exogenous miRNAs bioavailability, mechanisms of cross-species transport and miRNA function in cellular biological processes. In this article, we present microRNA Discovery (miRDis), a new small RNA sequencing data analysis pipeline for both endogenous and exogenous miRNA detection. Specifically, we developed and deployed a Web service that supports the annotation and expression profiling data of known host miRNAs and the detection of novel miRNAs, other noncoding RNAs, and the exogenous miRNAs from dietary species. As a proof-of-concept, we analyzed a set of human plasma sequencing data from a milk-feeding study where 225 human miRNAs were detected in the plasma samples and 44 show elevated expression after milk intake. By examining the bovine-specific sequences, data indicate that three bovine miRNAs (bta-miR-378, -181* and -150) are present in human plasma possibly because of the dietary uptake. Further evaluation based on different sets of public data demonstrates that miRDis outperforms other state-of-the-art tools in both detection and quantification of miRNA from either animal or plant sources. The miRDis Web server is available at: http://sbbi.unl.edu/miRDis/index.php.

  9. Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population.

    PubMed

    Zhang, Wenqian; Meehan, Joe; Su, Zhenqiang; Ng, Hui Wen; Shu, Mao; Luo, Heng; Ge, Weigong; Perkins, Roger; Tong, Weida; Hong, Huixiao

    2014-01-01

    Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.

  10. Identification of microRNAs from Amur grape (Vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics.

    PubMed

    Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui

    2012-03-29

    MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress.

  11. Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing

    PubMed Central

    Shen, Yingjia; Venu, R.C.; Nobuta, Kan; Wu, Xiaohui; Notibala, Varun; Demirci, Caghan; Meyers, Blake C.; Wang, Guo-Liang; Ji, Guoli; Li, Qingshun Q.

    2011-01-01

    Polyadenylation sites mark the ends of mRNA transcripts. Alternative polyadenylation (APA) may alter sequence elements and/or the coding capacity of transcripts, a mechanism that has been demonstrated to regulate gene expression and transcriptome diversity. To study the role of APA in transcriptome dynamics, we analyzed a large-scale data set of RNA “tags” that signify poly(A) sites and expression levels of mRNA. These tags were derived from a wide range of tissues and developmental stages that were mutated or exposed to environmental treatments, and generated using digital gene expression (DGE)–based protocols of the massively parallel signature sequencing (MPSS-DGE) and the Illumina sequencing-by-synthesis (SBS-DGE) sequencing platforms. The data offer a global view of APA and how it contributes to transcriptome dynamics. Upon analysis of these data, we found that ∼60% of Arabidopsis genes have multiple poly(A) sites. Likewise, ∼47% and 82% of rice genes use APA, supported by MPSS-DGE and SBS-DGE tags, respectively. In both species, ∼49%–66% of APA events were mapped upstream of annotated stop codons. Interestingly, 10% of the transcriptomes are made up of APA transcripts that are differentially distributed among developmental stages and in tissues responding to environmental stresses, providing an additional level of transcriptome dynamics. Examples of pollen-specific APA switching and salicylic acid treatment-specific APA clearly demonstrated such dynamics. The significance of these APAs is more evident in the 3034 genes that have conserved APA events between rice and Arabidopsis. PMID:21813626

  12. High level of intergenera gene exchange shapes the evolution of haloarchaea in an isolated Antarctic lake.

    PubMed

    DeMaere, Matthew Z; Williams, Timothy J; Allen, Michelle A; Brown, Mark V; Gibson, John A E; Rich, John; Lauro, Federico M; Dyall-Smith, Michael; Davenport, Karen W; Woyke, Tanja; Kyrpides, Nikos C; Tringe, Susannah G; Cavicchioli, Ricardo

    2013-10-15

    Deep Lake in Antarctica is a globally isolated, hypersaline system that remains liquid at temperatures down to -20 °C. By analyzing metagenome data and genomes of four isolates we assessed genome variation and patterns of gene exchange to learn how the lake community evolved. The lake is completely and uniformly dominated by haloarchaea, comprising a hierarchically structured, low-complexity community that differs greatly to temperate and tropical hypersaline environments. The four Deep Lake isolates represent distinct genera (∼85% 16S rRNA gene similarity and ∼73% genome average nucleotide identity) with genomic characteristics indicative of niche adaptation, and collectively account for ∼72% of the cellular community. Network analysis revealed a remarkable level of intergenera gene exchange, including the sharing of long contiguous regions (up to 35 kb) of high identity (∼100%). Although the genomes of closely related Halobacterium, Haloquadratum, and Haloarcula (>90% average nucleotide identity) shared regions of high identity between species or strains, the four Deep Lake isolates were the only distantly related haloarchaea to share long high-identity regions. Moreover, the Deep Lake high-identity regions did not match to any other hypersaline environment metagenome data. The most abundant species, tADL, appears to play a central role in the exchange of insertion sequences, but not the exchange of high-identity regions. The genomic characteristics of the four haloarchaea are consistent with a lake ecosystem that sustains a high level of intergenera gene exchange while selecting for ecotypes that maintain sympatric speciation. The peculiarities of this polar system restrict which species can grow and provide a tempo and mode for accentuating gene exchange.

  13. Medial tibial pain: a dynamic contrast-enhanced MRI study.

    PubMed

    Mattila, K T; Komu, M E; Dahlström, S; Koskinen, S K; Heikkilä, J

    1999-09-01

    The purpose of this study was to compare the sensitivity of different magnetic resonance imaging (MRI) sequences to depict periosteal edema in patients with medial tibial pain. Additionally, we evaluated the ability of dynamic contrast-enhanced imaging (DCES) to depict possible temporal alterations in muscular perfusion within compartments of the leg. Fifteen patients with medial tibial pain were examined with MRI. T1-, T2-weighted, proton density axial images and dynamic and static phase post-contrast images were compared in ability to depict periosteal edema. STIR was used in seven cases to depict bone marrow edema. Images were analyzed to detect signs of compartment edema. Region-of-interest measurements in compartments were performed during DCES and compared with controls. In detecting periosteal edema, post-contrast T1-weighted images were better than spin echo T2-weighted and proton density images or STIR images, but STIR depicted the bone marrow edema best. DCES best demonstrated the gradually enhancing periostitis. Four subjects with severe periosteal edema had visually detectable pathologic enhancement during DCES in the deep posterior compartment of the leg. Percentage enhancement in the deep posterior compartment of the leg was greater in patients than in controls. The fast enhancement phase in the deep posterior compartment began slightly slower in patients than in controls, but it continued longer. We believe that periosteal edema in bone stress reaction can cause impairment of venous flow in the deep posterior compartment. MRI can depict both these conditions. In patients with medial tibial pain, MR imaging protocol should include axial STIR images (to depict bone pathology) with T1-weighted axial pre and post-contrast images, and dynamic contrast enhanced imaging to show periosteal edema and abnormal contrast enhancement within a compartment.

  14. Using small RNA (sRNA) deep sequencing to understand global virus distribution in plants

    USDA-ARS?s Scientific Manuscript database

    Small RNAs (sRNAs), a class of regulatory RNAs, have been used to serve as the specificity determinants of suppressing gene expression in plants and animals. Next generation sequencing (NGS) uncovered the sRNA landscape in most organisms including their associated microbes. In the current study, w...

  15. Fatal Metacestode Infection in Bornean Orangutan Caused by Unknown Versteria Species

    PubMed Central

    Gendron-Fitzpatrick, Annette; Deering, Kathleen M.; Wallace, Roberta S.; Clyde, Victoria L.; Lauck, Michael; Rosen, Gail E.; Bennett, Andrew J.; Greiner, Ellis C.; O’Connor, David H.

    2014-01-01

    A captive juvenile Bornean orangutan (Pongo pygmaeus) died from an unknown disseminated parasitic infection. Deep sequencing of DNA from infected tissues, followed by gene-specific PCR and sequencing, revealed a divergent species within the newly proposed genus Versteria (Cestoda: Taeniidae). Versteria may represent a previously unrecognized risk to primate health. PMID:24377497

  16. Testing deep reticulate evolution in Amaryllidaceae Tribe Hippeastreae (Asparagales) with ITS and chloroplast sequence data

    USDA-ARS?s Scientific Manuscript database

    The phylogeny of Amaryllidaceae tribe Hippeastreae was inferred using chloroplast (3’ycf1, ndhF, trnL-F) and nuclear (ITS rDNA) sequence data under maximum parsimony and maximum likelihood frameworks. Network analyses were applied to resolve conflicting signals among data sets and putative scenarios...

  17. BIOCHEMICAL AND PHYLOGENETIC CHARACTERIZATION OF TWO NOVEL DEEP-SEA THERMOCOCCUS ISOLATES WITH POTENTIALLY BIOTECHNOLOGICAL APPLICATIONS

    EPA Science Inventory

    The partial 16S rDNA gene sequences of two thermophilic archaeal strains, TY and TYS, previously isolated from the Guaymas Basin hydrothermal vent site were determined. Lipid analyses and a comparative analysis performed with 16S rDNA sequences of similar thermophilic species sho...

  18. Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing

    PubMed Central

    Hoque, Mainul; Ji, Zhe; Zheng, Dinghai; Luo, Wenting; Li, Wencheng; You, Bei; Park, Ji Yeon; Yehia, Ghassan; Tian, Bin

    2012-01-01

    Alternative cleavage and polyadenylation (APA) leads to mRNA isoforms with different coding sequences (CDS) and/or 3′ untranslated regions (3′UTRs). Using 3′ Region Extraction And Deep Sequencing (3′READS), a method which addresses the internal priming and oligo(A) tail issues that commonly plague polyA site (pA) identification, we comprehensively mapped pAs in the mouse genome, thoroughly annotating 3′ ends of genes and revealing over five thousand pAs (~8% of total) flanked by A-rich sequences, which have hitherto been overlooked. About 79% of mRNA genes and 66% of long non-coding RNA (lncRNA) genes have APA; but these two gene types have distinct usage patterns for pAs in introns and upstream exons. Promoter-distal pAs become relatively more abundant during embryonic development and cell differentiation, a trend affecting pAs in both 3′-most exons and upstream regions. Upregulated isoforms generally have stronger pAs, suggesting global modulation of the 3′ end processing activity in development and differentiation. PMID:23241633

  19. Microbes in deep marine sediments viewed through amplicon sequencing and metagenomics

    NASA Astrophysics Data System (ADS)

    Biddle, J.; Leon, Z. R.; Russell, J. A., III; Martino, A. J.

    2016-12-01

    Nearly twenty percent of microbial biomass on Earth can be found in the marine subsurface. The majority of this is concentrated on continental margins, which have been investigated by scientific drilling. On the Costa Rica Margin, Iberian Margin and Peru Margins, sediment samples have been investigated through DNA extraction followed by amplicon and metagenomic sequencing. Overall samples show a high degree of microbial diversity, including many lineages of newly defined groups. In this talk, metagenome assembled genomes of unusual lineages will be presented, including their relationships to shallower relatives. From Costa Rica, in particular, we have retrieved deep relatives of Lokiarchaeota and Thorarchaeota, as well as other deeply branching archaeal relatives. We discuss their genome similarities to both other archaea and eukaryotes. From the Iberian Margin, relatives of Atribacteria and Aerophobetes will be discussed. Finally, we will detail the knowledge lost or gained depending on whether samples are studied via amplicon sequencing or total metagenomics, as studies in other environments have shown that up to 15% of microbial diversity is ignored when samples are studied via amplicon sequencing alone.

  20. Characterization of skin ulceration syndrome associated microRNAs in sea cucumber Apostichopus japonicus by deep sequencing.

    PubMed

    Li, Chenghua; Feng, Weida; Qiu, Lihua; Xia, Changge; Su, Xiurong; Jin, Chunhua; Zhou, Tingting; Zeng, Yuan; Li, Taiwu

    2012-08-01

    MicroRNAs (miRNAs) constitute a family of small RNA species which have been demonstrated to be one of key effectors in mediating host-pathogen interaction. In this study, two haemocytes miRNA libraries were constructed with deep sequenced by illumina Hiseq2000 from healthy (L1) and skin ulceration syndrome Apostichopus japonicus (L2). The high throughput solexa sequencing resulted in 9,579,038 and 7,742,558 clean data from L1 and L2, respectively. Sequences analysis revealed that 40 conserved miRNAs were found in both libraries, in which let-7 and mir-125 were speculated to be clustered together and expressed accordingly. Eighty-six miRNA candidates were also identified by reference genome search and stem-loop structure prediction. Importantly, mir-31 and mir-2008 displayed significant differential expression between the two libraries according to FPKM model, which might be considered as promising targets for elucidating the intrinsic mechanism of skin ulceration syndrome outbreak in the species. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Analysis of deep learning methods for blind protein contact prediction in CASP12.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2018-03-01

    Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.

  2. Diversity and Biogeography of Bathyal and Abyssal Seafloor Bacteria

    PubMed Central

    Bienhold, Christina; Zinger, Lucie; Boetius, Antje; Ramette, Alban

    2016-01-01

    The deep ocean floor covers more than 60% of the Earth’s surface, and hosts diverse bacterial communities with important functions in carbon and nutrient cycles. The identification of key bacterial members remains a challenge and their patterns of distribution in seafloor sediment yet remain poorly described. Previous studies were either regionally restricted or included few deep-sea sediments, and did not specifically test biogeographic patterns across the vast oligotrophic bathyal and abyssal seafloor. Here we define the composition of this deep seafloor microbiome by describing those bacterial operational taxonomic units (OTU) that are specifically associated with deep-sea surface sediments at water depths ranging from 1000–5300 m. We show that the microbiome of the surface seafloor is distinct from the subsurface seafloor. The cosmopolitan bacterial OTU were affiliated with the clades JTB255 (class Gammaproteobacteria, order Xanthomonadales) and OM1 (Actinobacteria, order Acidimicrobiales), comprising 21% and 7% of their respective clades, and about 1% of all sequences in the study. Overall, few sequence-abundant bacterial types were globally dispersed and displayed positive range-abundance relationships. Most bacterial populations were rare and exhibited a high degree of endemism, explaining the substantial differences in community composition observed over large spatial scales. Despite the relative physicochemical uniformity of deep-sea sediments, we identified indicators of productivity regimes, especially sediment organic matter content, as factors significantly associated with changes in bacterial community structure across the globe. PMID:26814838

  3. Diverse deep-sea fungi from the South China Sea and their antimicrobial activity.

    PubMed

    Zhang, Xiao-Yong; Zhang, Yun; Xu, Xin-Ya; Qi, Shu-Hua

    2013-11-01

    We investigated the diversity of fungal communities in nine different deep-sea sediment samples of the South China Sea by culture-dependent methods followed by analysis of fungal internal transcribed spacer (ITS) sequences. Although 14 out of 27 identified species were reported in a previous study, 13 species were isolated from sediments of deep-sea environments for the first report. Moreover, these ITS sequences of six isolates shared 84-92 % similarity with their closest matches in GenBank, which suggested that they might be novel phylotypes of genera Ajellomyces, Podosordaria, Torula, and Xylaria. The antimicrobial activities of these fungal isolates were explored using a double-layer technique. A relatively high proportion (56 %) of fungal isolates exhibited antimicrobial activity against at least one pathogenic bacterium or fungus among four marine pathogenic microbes (Micrococcus luteus, Pseudoaltermonas piscida, Aspergerillus versicolor, and A. sydowii). Out of these antimicrobial fungi, the genera Arthrinium, Aspergillus, and Penicillium exhibited antibacterial and antifungal activities, while genus Aureobasidium displayed only antibacterial activity, and genera Acremonium, Cladosporium, Geomyces, and Phaeosphaeriopsis displayed only antifungal activity. To our knowledge, this is the first report to investigate the diversity and antimicrobial activity of culturable deep-sea-derived fungi in the South China Sea. These results suggest that diverse deep-sea fungi from the South China Sea are a potential source for antibiotics' discovery and further increase the pool of fungi available for natural bioactive product screening.

  4. High fungal diversity and abundance recovered in the deep-sea sediments of the Pacific Ocean.

    PubMed

    Xu, Wei; Pang, Ka-Lai; Luo, Zhu-Hua

    2014-11-01

    Knowledge about the presence and ecological significance of bacteria and archaea in the deep-sea environments has been well recognized, but the eukaryotic microorganisms, such as fungi, have rarely been reported. The present study investigated the composition and abundance of fungal community in the deep-sea sediments of the Pacific Ocean. In this study, a total of 1,947 internal transcribed spacer (ITS) regions of fungal rRNA gene clones were recovered from five sediment samples at the Pacific Ocean (water depths ranging from 5,017 to 6,986 m) using three different PCR primer sets. There were 16, 17, and 15 different operational taxonomic units (OTUs) identified from fungal-universal, Ascomycota-, and Basidiomycota-specific clone libraries, respectively. Majority of the recovered sequences belonged to diverse phylotypes of Ascomycota (25 phylotypes) and Basidiomycota (18 phylotypes). The multiple primer approach totally recovered 27 phylotypes which showed low similarities (≤97 %) with available fungal sequences in the GenBank, suggesting possible new fungal taxa occurring in the deep-sea environments or belonging to taxa not represented in the GenBank. Our results also recovered high fungal LSU rRNA gene copy numbers (3.52 × 10(6) to 5.23 × 10(7)copies/g wet sediment) from the Pacific Ocean sediment samples, suggesting that the fungi might be involved in important ecological functions in the deep-sea environments.

  5. Analyses of RNA Polymerase II genes from free-living protists: phylogeny, long branch attraction, and the eukaryotic big bang.

    PubMed

    Dacks, Joel B; Marinets, Alexandra; Ford Doolittle, W; Cavalier-Smith, Thomas; Logsdon, John M

    2002-06-01

    The phylogenetic relationships among major eukaryotic protist lineages are largely uncertain. Two significant obstacles in reconstructing eukaryotic phylogeny are long-branch attraction (LBA) effects and poor taxon sampling of free-living protists. We have obtained and analyzed gene sequences encoding the largest subunit of RNA Polymerase II (RPB1) from Naegleria gruberi (a heterolobosean), Cercomonas ATCC 50319 (a cercozoan), and Ochromonas danica (a heterokont); we have also analyzed the RPB1 gene from the nucleomorph (nm) genome of Guillardia theta (a cryptomonad). Using a variety of phylogenetic methods our analysis shows that RPB1s from Giardia intestinalis and Trichomonas vaginalis are probably subject to intense LBA effects. Thus, the deep branching of these taxa on RPB1 trees is questionable and should not be interpreted as evidence favoring their early divergence. Similar effects are discernable, to a lesser extent, with the Mastigamoeba invertens RPB1 sequence. Upon removal of the outgroup and these problematic sequences, analyses of the remaining RPB1s indicate some resolution among major eukaryotic groups. The most robustly supported higher-level clades are the opisthokonts (animals plus fungi) and the red algae plus the cryptomonad nm-the latter result gives added support to the red algal origin of cryptomonad chloroplasts. Clades comprising Dictyostelium discoideum plus Acanthamoeba castellanii (Amoebozoa) and Ochromonas plus Plasmodium falciparum (chromalveolates) are consistently observed and moderately supported. The clades supported by our RPB1 analyses are congruent with other data, suggesting that bona fide phylogenetic relationships are being resolved. Thus, the RPB1 gene has apparently retained some phylogenetically meaningful signal, making it worthwhile to obtain sequences from more diverse protist taxa. Additional RPB1 data, especially in combination with other genes, should provide further resolution of branching orders among protist groups within the apparently rapid early divergence of eukaryotes.

  6. Deep Sequencing-Identified Kanamycin-Resistant Paenibacillus sp. Strain KS1 Isolated from Epiphyte Tillandsia usneoides (Spanish Moss) in Central Florida, USA

    PubMed Central

    Govindarajan, Subramaniam S.; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K.

    2017-01-01

    ABSTRACT Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. PMID:28153888

  7. Predictive value of the composition of the vaginal microbiota in bacterial vaginosis, a dynamic study to identify recurrence-related flora.

    PubMed

    Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping

    2016-06-02

    Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV.

  8. A simple and novel method for RNA-seq library preparation of single cell cDNA analysis by hyperactive Tn5 transposase.

    PubMed

    Brouilette, Scott; Kuersten, Scott; Mein, Charles; Bozek, Monika; Terry, Anna; Dias, Kerith-Rae; Bhaw-Rosun, Leena; Shintani, Yasunori; Coppen, Steven; Ikebe, Chiho; Sawhney, Vinit; Campbell, Niall; Kaneko, Masahiro; Tano, Nobuko; Ishida, Hidekazu; Suzuki, Ken; Yashiro, Kenta

    2012-10-01

    Deep sequencing of single cell-derived cDNAs offers novel insights into oncogenesis and embryogenesis. However, traditional library preparation for RNA-seq analysis requires multiple steps with consequent sample loss and stochastic variation at each step significantly affecting output. Thus, a simpler and better protocol is desirable. The recently developed hyperactive Tn5-mediated library preparation, which brings high quality libraries, is likely one of the solutions. Here, we tested the applicability of hyperactive Tn5-mediated library preparation to deep sequencing of single cell cDNA, optimized the protocol, and compared it with the conventional method based on sonication. This new technique does not require any expensive or special equipment, which secures wider availability. A library was constructed from only 100 ng of cDNA, which enables the saving of precious specimens. Only a few steps of robust enzymatic reaction resulted in saved time, enabling more specimens to be prepared at once, and with a more reproducible size distribution among the different specimens. The obtained RNA-seq results were comparable to the conventional method. Thus, this Tn5-mediated preparation is applicable for anyone who aims to carry out deep sequencing for single cell cDNAs. Copyright © 2012 Wiley Periodicals, Inc.

  9. Detailed investigation of the microbial community in foaming activated sludge reveals novel foam formers

    PubMed Central

    Guo, Feng; Wang, Zhi-Ping; Yu, Ke; Zhang, T.

    2015-01-01

    Foaming of activated sludge (AS) causes adverse impacts on wastewater treatment operation and hygiene. In this study, we investigated the microbial communities of foam, foaming AS and non-foaming AS in a sewage treatment plant via deep-sequencing of the taxonomic marker genes 16S rRNA and mycobacterial rpoB and a metagenomic approach. In addition to Actinobacteria, many genera (e.g., Clostridium XI, Arcobacter, Flavobacterium) were more abundant in the foam than in the AS. On the other hand, deep-sequencing of rpoB did not detect any obligate pathogenic mycobacteria in the foam. We found that unknown factors other than the abundance of Gordonia sp. could determine the foaming process, because abundance of the same species was stable before and after a foaming event over six months. More interestingly, although the dominant Gordonia foam former was the closest with G. amarae, it was identified as an undescribed Gordonia species by referring to the 16S rRNA gene, gyrB and, most convincingly, the reconstructed draft genome from metagenomic reads. Our results, based on metagenomics and deep sequencing, reveal that foams are derived from diverse taxa, which expands previous understanding and provides new insight into the underlying complications of the foaming phenomenon in AS. PMID:25560234

  10. Acute West Nile Virus Meningoencephalitis Diagnosed Via Metagenomic Deep Sequencing of Cerebrospinal Fluid in a Renal Transplant Patient.

    PubMed

    Wilson, M R; Zimmermann, L L; Crawford, E D; Sample, H A; Soni, P R; Baker, A N; Khan, L M; DeRisi, J L

    2017-03-01

    Solid organ transplant patients are vulnerable to suffering neurologic complications from a wide array of viral infections and can be sentinels in the population who are first to get serious complications from emerging infections like the recent waves of arboviruses, including West Nile virus, Chikungunya virus, Zika virus, and Dengue virus. The diverse and rapidly changing landscape of possible causes of viral encephalitis poses great challenges for traditional candidate-based infectious disease diagnostics that already fail to identify a causative pathogen in approximately 50% of encephalitis cases. We present the case of a 14-year-old girl on immunosuppression for a renal transplant who presented with acute meningoencephalitis. Traditional diagnostics failed to identify an etiology. RNA extracted from her cerebrospinal fluid was subjected to unbiased metagenomic deep sequencing, enhanced with the use of a Cas9-based technique for host depletion. This analysis identified West Nile virus (WNV). Convalescent serum serologies subsequently confirmed WNV seroconversion. These results support a clear clinical role for metagenomic deep sequencing in the setting of suspected viral encephalitis, especially in the context of the high-risk transplant patient population. © 2016 The Authors. American Journal of Transplantation published by Wiley Periodicals, Inc. on behalf of American Society of Transplant Surgeons.

  11. A deep learning framework for causal shape transformation.

    PubMed

    Lore, Kin Gwn; Stoecklein, Daniel; Davies, Michael; Ganapathysubramanian, Baskar; Sarkar, Soumik

    2018-02-01

    Recurrent neural network (RNN) and Long Short-term Memory (LSTM) networks are the common go-to architecture for exploiting sequential information where the output is dependent on a sequence of inputs. However, in most considered problems, the dependencies typically lie in the latent domain which may not be suitable for applications involving the prediction of a step-wise transformation sequence that is dependent on the previous states only in the visible domain with a known terminal state. We propose a hybrid architecture of convolution neural networks (CNN) and stacked autoencoders (SAE) to learn a sequence of causal actions that nonlinearly transform an input visual pattern or distribution into a target visual pattern or distribution with the same support and demonstrated its practicality in a real-world engineering problem involving the physics of fluids. We solved a high-dimensional one-to-many inverse mapping problem concerning microfluidic flow sculpting, where the use of deep learning methods as an inverse map is very seldom explored. This work serves as a fruitful use-case to applied scientists and engineers in how deep learning can be beneficial as a solution for high-dimensional physical problems, and potentially opening doors to impactful advance in fields such as material sciences and medical biology where multistep topological transformations is a key element. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Evidence for a persistent microbial seed bank throughout the global ocean

    PubMed Central

    Gibbons, Sean M.; Caporaso, J. Gregory; Pirrung, Meg; Field, Dawn; Knight, Rob; Gilbert, Jack A.

    2013-01-01

    Do bacterial taxa demonstrate clear endemism, like macroorganisms, or can one site’s bacterial community recapture the total phylogenetic diversity of the world’s oceans? Here we compare a deep bacterial community characterization from one site in the English Channel (L4-DeepSeq) with 356 datasets from the International Census of Marine Microbes (ICoMM) taken from around the globe (ranging from marine pelagic and sediment samples to sponge-associated environments). At the L4-DeepSeq site, increasing sequencing depth uncovers greater phylogenetic overlap with the global ICoMM data. This site contained 31.7–66.2% of operational taxonomic units identified in a given ICoMM biome. Extrapolation of this overlap suggests that 1.93 × 1011 sequences from the L4 site would capture all ICoMM bacterial phylogenetic diversity. Current technology trends suggest this limit may be attainable within 3 y. These results strongly suggest the marine biosphere maintains a previously undetected, persistent microbial seed bank. PMID:23487761

  13. Characterization of bacterial diversity associated with deep sea ferromanganese nodules from the South China Sea.

    PubMed

    Zhang, De-Chao; Liu, Yan-Xia; Li, Xin-Zheng

    2015-09-01

    Deep sea ferromanganese (FeMn) nodules contain metallic mineral resources and have great economic potential. In this study, a combination of culture-dependent and culture-independent (16S rRNA genes clone library and pyrosequencing) methods was used to investigate the bacterial diversity in FeMn nodules from Jiaolong Seamount, the South China Sea. Eleven bacterial strains including some moderate thermophiles were isolated. The majority of strains belonged to the phylum Proteobacteria; one isolate belonged to the phylum Firmicutes. A total of 259 near full-length bacterial 16S rRNA gene sequences in a clone library and 67,079 valid reads obtained using pyrosequencing indicated that members of the Gammaproteobacteria dominated, with the most abundant bacterial genera being Pseudomonas and Alteromonas. Sequence analysis indicated the presence of many organisms whose closest relatives are known manganese oxidizers, iron reducers, hydrogen-oxidizing bacteria and methylotrophs. This is the first reported investigation of bacterial diversity associated with deep sea FeMn nodules from the South China Sea.

  14. Single-virion sequencing of lamivudine-treated HBV populations reveal population evolution dynamics and demographic history.

    PubMed

    Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin

    2017-10-27

    Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.

  15. Deep Packet/Flow Analysis using GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gong, Qian; Wu, Wenji; DeMar, Phil

    Deep packet inspection (DPI) faces severe performance challenges in high-speed networks (40/100 GE) as it requires a large amount of raw computing power and high I/O throughputs. Recently, researchers have tentatively used GPUs to address the above issues and boost the performance of DPI. Typically, DPI applications involve highly complex operations in both per-packet and per-flow data level, often in real-time. The parallel architecture of GPUs fits exceptionally well for per-packet network traffic processing. However, for stateful network protocols such as TCP, their data stream need to be reconstructed in a per-flow level to deliver a consistent content analysis. Sincemore » the flow-centric operations are naturally antiparallel and often require large memory space for buffering out-of-sequence packets, they can be problematic for GPUs, whose memory is normally limited to several gigabytes. In this work, we present a highly efficient GPU-based deep packet/flow analysis framework. The proposed design includes a purely GPU-implemented flow tracking and TCP stream reassembly. Instead of buffering and waiting for TCP packets to become in sequence, our framework process the packets in batch and uses a deterministic finite automaton (DFA) with prefix-/suffix- tree method to detect patterns across out-of-sequence packets that happen to be located in different batches. In conclusion, evaluation shows that our code can reassemble and forward tens of millions of packets per second and conduct a stateful signature-based deep packet inspection at 55 Gbit/s using an NVIDIA K40 GPU.« less

  16. Deep-water oilfield development cost analysis and forecasting —— Take gulf of mexico for example

    NASA Astrophysics Data System (ADS)

    Shi, Mingyu; Wang, Jianjun; Yi, Chenggao; Bai, Jianhui; Wang, Jing

    2017-11-01

    Gulf of Mexico (GoM) is the earliest offshore oilfield which has ever been developed. It tends to breed increasingly value of efficient, secure and cheap key technology of deep-water development. Thus, the analyze of development expenditure in this area is significantly important the evaluation concept of deep-water oilfield all over the world. This article emphasizes on deep-water development concept and EPC contract value in GoM in recent 10 years in case of comparison and selection to the economic efficiency. Besides, the QUETOR has been put into use in this research processes the largest upstream cost database to simulate and calculate the calculating examples’ expenditure. By analyzing and forecasting the deep-water oilfield development expenditure, this article explores the relevance between expenditure index and oil price.

  17. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    PubMed Central

    Van Nostrand, Joy D.; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong

    2017-01-01

    Illumina’s MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1–3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility. PMID:28453559

  18. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wen, Chongqing; Wu, Liyou; Qin, Yujia

    Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less

  19. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform

    DOE PAGES

    Wen, Chongqing; Wu, Liyou; Qin, Yujia; ...

    2017-04-28

    Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered,more » the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.« less

  20. Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform.

    PubMed

    Wen, Chongqing; Wu, Liyou; Qin, Yujia; Van Nostrand, Joy D; Ning, Daliang; Sun, Bo; Xue, Kai; Liu, Feifei; Deng, Ye; Liang, Yuting; Zhou, Jizhong

    2017-01-01

    Illumina's MiSeq has become the dominant platform for gene amplicon sequencing in microbial ecology studies; however, various technical concerns, such as reproducibility, still exist. To assess reproducibility, 16S rRNA gene amplicons from 18 soil samples of a reciprocal transplantation experiment were sequenced on an Illumina MiSeq. The V4 region of 16S rRNA gene from each sample was sequenced in triplicate with each replicate having a unique barcode. The average OTU overlap, without considering sequence abundance, at a rarefaction level of 10,323 sequences was 33.4±2.1% and 20.2±1.7% between two and among three technical replicates, respectively. When OTU sequence abundance was considered, the average sequence abundance weighted OTU overlap was 85.6±1.6% and 81.2±2.1% for two and three replicates, respectively. Removing singletons significantly increased the overlap for both (~1-3%, p<0.001). Increasing the sequencing depth to 160,000 reads by deep sequencing increased OTU overlap both when sequence abundance was considered (95%) and when not (44%). However, if singletons were not removed the overlap between two technical replicates (not considering sequence abundance) plateaus at 39% with 30,000 sequences. Diversity measures were not affected by the low overlap as α-diversities were similar among technical replicates while β-diversities (Bray-Curtis) were much smaller among technical replicates than among treatment replicates (e.g., 0.269 vs. 0.374). Higher diversity coverage, but lower OTU overlap, was observed when replicates were sequenced in separate runs. Detrended correspondence analysis indicated that while there was considerable variation among technical replicates, the reproducibility was sufficient for detecting treatment effects for the samples examined. These results suggest that although there is variation among technical replicates, amplicon sequencing on MiSeq is useful for analyzing microbial community structure if used appropriately and with caution. For example, including technical replicates, removing spurious sequences and unrepresentative OTUs, using a clustering method with a high stringency for OTU generation, estimating treatment effects at higher taxonomic levels, and adapting the unique molecular identifier (UMI) and other newly developed methods to lower PCR and sequencing error and to identify true low abundance rare species all can increase reproducibility.

  1. Interpreting the strongest deep earthquake ever observed

    NASA Astrophysics Data System (ADS)

    Schultz, Colin

    2013-12-01

    Massive earthquakes that strike deep within the Earth may be more efficient at dissipating pent-up energy than similar quakes near the surface, according to new research by Wei et al. The authors analyzed the rupture of the most powerful deep earthquake ever recorded.

  2. De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses

    PubMed Central

    Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

    2015-01-01

    Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement. PMID:26208029

  3. De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses.

    PubMed

    Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

    2015-01-01

    Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.

  4. Microbial diversity in methane hydrate-bearing deep marine sediments core preserved in the original pressure.

    NASA Astrophysics Data System (ADS)

    Takahashi, Y.; Hata, T.; Nishida, H.

    2017-12-01

    In normal coring of deep marine sediments, the sampled cores are exposed to the pressure of the atmosphere, which results in dissociation of gas-hydrates and might change microbial diversity. In this study, we analyzed microbial composition in methane hydrate-bearing sediment core sampled and preserved by Hybrid-PCS (Pressure Coring System). We sliced core into three layers; (i) outside layer, which were most affected by drilling fluids, (ii) middle layer, and (iii) inner layer, which were expected to be most preserved as the original state. From each layer, we directly extracted DNA, and amplified V3-V4 region of 16S rRNA gene. We determined at least 5000 of nucleotide sequences of the partial 16S rDNA from each layer by Miseq (Illumina). In the all layers, facultative anaerobes, which can grow with or without oxygen because they can metabolize energy aerobically or anaerobically, were detected as majority. However, the genera which are often detected anaerobic environment is abundant in the inner layer compared to the outside layer, indicating that condition of drilling and preservation affect the microbial composition in the deep marine sediment core. This study was conducted as a part of the activity of the Research Consortium for Methane Hydrate Resources in Japan [MH21 consortium], and supported by JOGMEC (Japan Oil, Gas and Metals National Corporation). The sample was provided by AIST (National Institute of Advanced Industrial Science and Technology).

  5. Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants.

    PubMed

    Eaton, Deren A R; Spriggs, Elizabeth L; Park, Brian; Donoghue, Michael J

    2017-05-01

    Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.

  6. Genomic and metagenomic technologies to explore the antibiotic resistance mobilome.

    PubMed

    Martínez, José L; Coque, Teresa M; Lanza, Val F; de la Cruz, Fernando; Baquero, Fernando

    2017-01-01

    Antibiotic resistance is a relevant problem for human health that requires global approaches to establish a deep understanding of the processes of acquisition, stabilization, and spread of resistance among human bacterial pathogens. Since natural (nonclinical) ecosystems are reservoirs of resistance genes, a health-integrated study of the epidemiology of antibiotic resistance requires the exploration of such ecosystems with the aim of determining the role they may play in the selection, evolution, and spread of antibiotic resistance genes, involving the so-called resistance mobilome. High-throughput sequencing techniques allow an unprecedented opportunity to describe the genetic composition of a given microbiome without the need to subculture the organisms present inside. However, bioinformatic methods for analyzing this bulk of data, mainly with respect to binning each resistance gene with the organism hosting it, are still in their infancy. Here, we discuss how current genomic methodologies can serve to analyze the resistance mobilome and its linkage with different bacterial genomes and metagenomes. In addition, we describe the drawbacks of current methodologies for analyzing the resistance mobilome, mainly in cases of complex microbiotas, and discuss the possibility of implementing novel tools to improve our current metagenomic toolbox. © 2016 New York Academy of Sciences.

  7. Oral Microbiome of Deep and Shallow Dental Pockets In Chronic Periodontitis

    PubMed Central

    Ge, Xiuchun; Rodriguez, Rafael; Trinh, My; Gunsolley, John; Xu, Ping

    2013-01-01

    We examined the subgingival bacterial biodiversity in untreated chronic periodontitis patients by sequencing 16S rRNA genes. The primary purpose of the study was to compare the oral microbiome in deep (diseased) and shallow (healthy) sites. A secondary purpose was to evaluate the influences of smoking, race and dental caries on this relationship. A total of 88 subjects from two clinics were recruited. Paired subgingival plaque samples were taken from each subject, one from a probing site depth >5 mm (deep site) and the other from a probing site depth ≤3mm (shallow site). A universal primer set was designed to amplify the V4–V6 region for oral microbial 16S rRNA sequences. Differences in genera and species attributable to deep and shallow sites were determined by statistical analysis using a two-part model and false discovery rate. Fifty-one of 170 genera and 200 of 746 species were found significantly different in abundances between shallow and deep sites. Besides previously identified periodontal disease-associated bacterial species, additional species were found markedly changed in diseased sites. Cluster analysis revealed that the microbiome difference between deep and shallow sites was influenced by patient-level effects such as clinic location, race and smoking. The differences between clinic locations may be influenced by racial distribution, in that all of the African Americans subjects were seen at the same clinic. Our results suggested that there were influences from the microbiome for caries and periodontal disease and these influences are independent. PMID:23762384

  8. Theonellapeptolide IIIe, a new cyclic peptolide from the New Zealand deep water sponge, Lamellomorpha strongylata.

    PubMed

    Li, S; Dumdei, E J; Blunt, J W; Munro, M H; Robinson, W T; Pannell, L K

    1998-06-26

    The structure, stereochemistry, and conformation of theonellapeptolide IIIe (1), a new 36-membered ring cyclic peptolide from the New Zealand deep-water sponge Lamellomorpha strongylata, is described. The sequence of the cytotoxic peptolide was determined through a combination of NMR and MS-MS techniques and confirmed by X-ray crystal structure analysis, which, with chiral HPLC, established the absolute stereochemistry.

  9. Fusarium musae as cause of superficial and deep-seated human infections.

    PubMed

    Esposto, M C; Prigitano, A; Tortorano, A M

    2016-12-01

    BLAST analysis in GenBank of 60 Fusarium verticillioides clinical isolates using the sequence of translation elongation factor 1-alpha allowed the identification of four F. musae confirming that this species is not a rare etiology of superficial and deep infections and that its habitat is not restricted to banana fruits. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  10. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir.

    PubMed

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; Johnson, Shannon; McMurry, Kim; Gleasner, Cheryl D; Vuyisich, Momchilo; Chain, Patrick S; Junier, Pilar

    2015-08-27

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. Copyright © 2015 Filippidou et al.

  11. Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

    PubMed

    Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

    2016-09-01

    Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of historical processes across biotas. We argue that sequence capture should be given greater attention as a method of obtaining data for studies in shallow systematics and comparative phylogeography. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. The first complete mitogenome of the South China deep-sea giant isopod Bathynomus sp. (Crustacea: Isopoda: Cirolanidae) allows insights into the early mitogenomic evolution of isopods.

    PubMed

    Shen, Yanjun; Kou, Qi; Zhong, Zaixuan; Li, Xinzheng; He, Lisheng; He, Shunping; Gan, Xiaoni

    2017-03-01

    In this study, the complete mitochondrial (mt) genome sequence of the South China deep-sea giant isopod Bathynomus sp. was determined, and this study is the first to explore in detail the mt genome of a deep-sea member of the order Isopoda. This species belongs to the genus Bathynomus , the members of which are saprophagous residents of the deep-sea benthic environment; based on their large size, Bathynomus is included in the "supergiant group" of isopods. The mt genome of Bathynomus sp. is 14,965 bp in length and consists of 13 protein-coding genes, two ribosomal RNA genes, only 18 transfer RNA genes, and a noncoding control region 362 bp in length, which is the smallest control region discovered in Isopoda to date. Although the overall genome organization is typical for metazoans, the mt genome of Bathynomus sp. shows a number of derived characters, such as an inversion of 10 genes when compared to the pancrustacean ground pattern. Rearrangements in some genes (e.g., cob , trnT , nad5, and trnF ) are shared by nearly all isopod mt genomes analyzed thus far, and when compared to the putative isopod ground pattern, five rearrangements were found in Bathynomus sp. Two tRNAs exhibit modified secondary structures: The TΨC arm is absent from trnQ , and trnC lacks the DHU. Within the class Malacostraca, trnC arm loss is only found in other isopods. Phylogenetic analysis revealed that Bathynomus sp. (Cymothoida) and Sphaeroma serratum (Sphaeromatidea) form a single clade, although it is unclear whether Cymothoida is monophyletic or paraphyletic. Moreover, the evolutionary rate of Bathynomus sp. (dN/dS [nonsynonymous mutational rate/synonymous mutational rate] = 0.0705) is the slowest measured to date among Cymothoida, which may be associated with its relatively constant deep-sea environment. Overall, our results may provide useful information for understanding the evolution of deep-sea Isopoda species.

  13. Deep Sequencing-Identified Kanamycin-Resistant Paenibacillus sp. Strain KS1 Isolated from Epiphyte Tillandsia usneoides (Spanish Moss) in Central Florida, USA.

    PubMed

    Lata, Pushpa; Govindarajan, Subramaniam S; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K

    2017-02-02

    Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. Copyright © 2017 Lata et al.

  14. The complete genome sequence of a new polerovirus in strawberry plants from eastern Canada showing strawberry decline symptoms.

    PubMed

    Xiang, Yu; Bernardy, Mike; Bhagwat, Basdeo; Wiersma, Paul A; DeYoung, Robyn; Bouthillier, Michel

    2015-02-01

    Strawberry decline disease, probably caused by synergistic reactions of mixed virus infections, threatens the North American strawberry industry. Deep sequencing of strawberry plant samples from eastern Canada resulted in the identification of a new virus genome resembling poleroviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Polerovirus, family Luteoviridae. The virus is tentatively named "strawberry polerovirus 1" (SPV1).

  15. Identification of Free-Living and Particle-Associated Microbial Communities Present in Hadal Regions of the Mariana Trench.

    PubMed

    Tarn, Jonathan; Peoples, Logan M; Hardy, Kevin; Cameron, James; Bartlett, Douglas H

    2016-01-01

    Relatively few studies have described the microbial populations present in ultra-deep hadal environments, largely as a result of difficulties associated with sampling. Here we report Illumina-tag V6 16S rRNA sequence-based analyses of the free-living and particle-associated microbial communities recovered from locations within two of the deepest hadal sites on Earth, the Challenger Deep (10,918 meters below surface-mbs) and the Sirena Deep (10,667 mbs) within the Mariana Trench, as well as one control site (Ulithi Atoll, 761 mbs). Seawater samples were collected using an autonomous lander positioned ~1 m above the seafloor. The bacterial populations within the Mariana Trench bottom water samples were dissimilar to other deep-sea microbial communities, though with overlap with those of diffuse flow hydrothermal vents and deep-subsurface locations. Distinct particle-associated and free-living bacterial communities were found to exist. The hadal bacterial populations were also markedly different from one another, indicating the likelihood of different chemical conditions at the two sites. In contrast to the bacteria, the hadal archaeal communities were more similar to other less deep datasets and to each other due to an abundance of cosmopolitan deep-sea taxa. The hadal communities were enriched in 34 bacterial and 4 archaeal operational taxonomic units (OTUs) including members of the Gammaproteobacteria, Epsilonproteobacteria, Marinimicrobia, Cyanobacteria, Deltaproteobacteria, Gemmatimonadetes, Atribacteria, Spirochaetes, and Euryarchaeota. Sequences matching cultivated piezophiles were notably enriched in the Challenger Deep, especially within the particle-associated fraction, and were found in higher abundances than in other hadal studies, where they were either far less prevalent or missing. Our results indicate the importance of heterotrophy, sulfur-cycling, and methane and hydrogen utilization within the bottom waters of the deeper regions of the Mariana Trench, and highlight novel community features of these extreme habitats.

  16. Identification of Free-Living and Particle-Associated Microbial Communities Present in Hadal Regions of the Mariana Trench

    PubMed Central

    Tarn, Jonathan; Peoples, Logan M.; Hardy, Kevin; Cameron, James; Bartlett, Douglas H.

    2016-01-01

    Relatively few studies have described the microbial populations present in ultra-deep hadal environments, largely as a result of difficulties associated with sampling. Here we report Illumina-tag V6 16S rRNA sequence-based analyses of the free-living and particle-associated microbial communities recovered from locations within two of the deepest hadal sites on Earth, the Challenger Deep (10,918 meters below surface-mbs) and the Sirena Deep (10,667 mbs) within the Mariana Trench, as well as one control site (Ulithi Atoll, 761 mbs). Seawater samples were collected using an autonomous lander positioned ~1 m above the seafloor. The bacterial populations within the Mariana Trench bottom water samples were dissimilar to other deep-sea microbial communities, though with overlap with those of diffuse flow hydrothermal vents and deep-subsurface locations. Distinct particle-associated and free-living bacterial communities were found to exist. The hadal bacterial populations were also markedly different from one another, indicating the likelihood of different chemical conditions at the two sites. In contrast to the bacteria, the hadal archaeal communities were more similar to other less deep datasets and to each other due to an abundance of cosmopolitan deep-sea taxa. The hadal communities were enriched in 34 bacterial and 4 archaeal operational taxonomic units (OTUs) including members of the Gammaproteobacteria, Epsilonproteobacteria, Marinimicrobia, Cyanobacteria, Deltaproteobacteria, Gemmatimonadetes, Atribacteria, Spirochaetes, and Euryarchaeota. Sequences matching cultivated piezophiles were notably enriched in the Challenger Deep, especially within the particle-associated fraction, and were found in higher abundances than in other hadal studies, where they were either far less prevalent or missing. Our results indicate the importance of heterotrophy, sulfur-cycling, and methane and hydrogen utilization within the bottom waters of the deeper regions of the Mariana Trench, and highlight novel community features of these extreme habitats. PMID:27242695

  17. The green ash transcriptome and identification of genes responding to abiotic and biotic stresses

    Treesearch

    Thomas Lane; Teodora Best; Nicole Zembower; Jack Davitt; Nathan Henry; Yi Xu; Jennifer Koch; Haiying Liang; John McGraw; Stephan Schuster; Donghwan Shim; Mark V. Coggeshall; John E. Carlson; Margaret E. Staton

    2016-01-01

    Background: To develop a set of transcriptome sequences to support research on environmental stress responses in green ash (Fraxinus pennsylvanica), we undertook deep RNA sequencing of green ash tissues under various stress treatments. The treatments, including emerald ash borer (EAB) feeding, heat, drought, cold and ozone, were selected to mimic...

  18. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm

    PubMed Central

    Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; McBride, Kathryn R.; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T. B. K.; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D.

    2016-01-01

    Thalassospira sp. strain KO164 was isolated from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. The near-complete genome sequence presented here will facilitate analyses into this deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin. PMID:27881538

  19. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm

    DOE PAGES

    Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; ...

    2016-11-23

    We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.

  20. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar

    We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.

  1. Draft Genome Sequence of the Spore-Forming Probiotic Strain Bacillus coagulans Unique IS-2

    PubMed Central

    Upadrasta, Aditya; Pitta, Swetha

    2016-01-01

    Bacillus coagulans Unique IS-2 is a potential spore-forming probiotic that is commercially available on the market. The draft genome sequence presented here provides deep insight into the beneficial features of this strain for its safe use as a probiotic for various human and animal health applications. PMID:27103709

  2. Deep Sequencing Reveals the Complete Genome Sequence of Sweet potato virus G from East Timor

    PubMed Central

    Maina, Solomon; Edwards, Owain R.; Barbetti, Martin J.; de Almeida, Luis; Ximenes, Abel

    2016-01-01

    We present the first complete Sweet potato virus G (SPVG) genome from sweet potato in East Timor and compare it with seven complete SPVG genomes from South Korea (three), Taiwan (two), Argentina (one), and the United States (one). It most resembles the genomes from the United States and South Korea. PMID:27609925

  3. Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy

    PubMed Central

    Matkovich, Scot J.; Dorn, Gerald W.

    2018-01-01

    Summary MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicates purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses. PMID:25836573

  4. Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy.

    PubMed

    Matkovich, Scot J; Dorn, Gerald W

    2015-01-01

    MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicate purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses.

  5. Toward an Understanding of Changes in Diversity Associated with Fecal Microbiome Transplantation Based on 16S rRNA Gene Deep Sequencing

    PubMed Central

    Shahinas, Dea; Silverman, Michael; Sittler, Taylor; Chiu, Charles; Kim, Peter; Allen-Vercoe, Emma; Weese, Scott; Wong, Andrew; Low, Donald E.; Pillai, Dylan R.

    2012-01-01

    ABSTRACT Fecal microbiome transplantation by low-volume enema is an effective, safe, and inexpensive alternative to antibiotic therapy for patients with chronic relapsing Clostridium difficile infection (CDI). We explored the microbial diversity of pre- and posttransplant stool specimens from CDI patients (n = 6) using deep sequencing of the 16S rRNA gene. While interindividual variability in microbiota change occurs with fecal transplantation and vancomycin exposure, in this pilot study we note that clinical cure of CDI is associated with an increase in diversity and richness. Genus- and species-level analysis may reveal a cocktail of microorganisms or products thereof that will ultimately be used as a probiotic to treat CDI. PMID:23093385

  6. Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill.

    PubMed

    Mason, Olivia U; Hazen, Terry C; Borglin, Sharon; Chain, Patrick S G; Dubinsky, Eric A; Fortney, Julian L; Han, James; Holman, Hoi-Ying N; Hultman, Jenni; Lamendella, Regina; Mackelprang, Rachel; Malfatti, Stephanie; Tom, Lauren M; Tringe, Susannah G; Woyke, Tanja; Zhou, Jizhong; Rubin, Edward M; Jansson, Janet K

    2012-09-01

    The Deepwater Horizon oil spill in the Gulf of Mexico resulted in a deep-sea hydrocarbon plume that caused a shift in the indigenous microbial community composition with unknown ecological consequences. Early in the spill history, a bloom of uncultured, thus uncharacterized, members of the Oceanospirillales was previously detected, but their role in oil disposition was unknown. Here our aim was to determine the functional role of the Oceanospirillales and other active members of the indigenous microbial community using deep sequencing of community DNA and RNA, as well as single-cell genomics. Shotgun metagenomic and metatranscriptomic sequencing revealed that genes for motility, chemotaxis and aliphatic hydrocarbon degradation were significantly enriched and expressed in the hydrocarbon plume samples compared with uncontaminated seawater collected from plume depth. In contrast, although genes coding for degradation of more recalcitrant compounds, such as benzene, toluene, ethylbenzene, total xylenes and polycyclic aromatic hydrocarbons, were identified in the metagenomes, they were expressed at low levels, or not at all based on analysis of the metatranscriptomes. Isolation and sequencing of two Oceanospirillales single cells revealed that both cells possessed genes coding for n-alkane and cycloalkane degradation. Specifically, the near-complete pathway for cyclohexane oxidation in the Oceanospirillales single cells was elucidated and supported by both metagenome and metatranscriptome data. The draft genome also included genes for chemotaxis, motility and nutrient acquisition strategies that were also identified in the metagenomes and metatranscriptomes. These data point towards a rapid response of members of the Oceanospirillales to aliphatic hydrocarbons in the deep sea.

  7. Combining Next Generation Sequencing with Bulked Segregant Analysis to Fine Map a Stem Moisture Locus in Sorghum (Sorghum bicolor L. Moench).

    PubMed

    Han, Yucui; Lv, Peng; Hou, Shenglin; Li, Suying; Ji, Guisu; Ma, Xue; Du, Ruiheng; Liu, Guoqing

    2015-01-01

    Sorghum is one of the most promising bioenergy crops. Stem juice yield, together with stem sugar concentration, determines sugar yield in sweet sorghum. Bulked segregant analysis (BSA) is a gene mapping technique for identifying genomic regions containing genetic loci affecting a trait of interest that when combined with deep sequencing could effectively accelerate the gene mapping process. In this study, a dry stem sorghum landrace was characterized and the stem water controlling locus, qSW6, was fine mapped using QTL analysis and the combined BSA and deep sequencing technologies. Results showed that: (i) In sorghum variety Jiliang 2, stem water content was around 80% before flowering stage. It dropped to 75% during grain filling with little difference between different internodes. In landrace G21, stem water content keeps dropping after the flag leaf stage. The drop from 71% at flowering time progressed to 60% at grain filling time. Large differences exist between different internodes with the lowest (51%) at the 7th and 8th internodes at dough stage. (ii) A quantitative trait locus (QTL) controlling stem water content mapped on chromosome 6 between SSR markers Ch6-2 and gpsb069 explained about 34.7-56.9% of the phenotypic variation for the 5th to 10th internodes, respectively. (iii) BSA and deep sequencing analysis narrowed the associated region to 339 kb containing 38 putative genes. The results could help reveal molecular mechanisms underlying juice yield of sorghum and thus to improve total sugar yield.

  8. Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression

    PubMed Central

    Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D.; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario

    2016-01-01

    The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma. Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines. We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK. Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%. Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression. Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants. In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842

  9. Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression.

    PubMed

    Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario

    2016-04-19

    The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma.Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines.We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK.Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%.Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression.Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants.In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression.

  10. Human Splice-Site Prediction with Deep Neural Networks.

    PubMed

    Naito, Tatsuhiko

    2018-04-18

    Accurate splice-site prediction is essential to delineate gene structures from sequence data. Several computational techniques have been applied to create a system to predict canonical splice sites. For classification tasks, deep neural networks (DNNs) have achieved record-breaking results and often outperformed other supervised learning techniques. In this study, a new method of splice-site prediction using DNNs was proposed. The proposed system receives an input sequence data and returns an answer as to whether it is splice site. The length of input is 140 nucleotides, with the consensus sequence (i.e., "GT" and "AG" for the donor and acceptor sites, respectively) in the middle. Each input sequence model is applied to the pretrained DNN model that determines the probability that an input is a splice site. The model consists of convolutional layers and bidirectional long short-term memory network layers. The pretraining and validation were conducted using the data set tested in previously reported methods. The performance evaluation results showed that the proposed method can outperform the previous methods. In addition, the pattern learned by the DNNs was visualized as position frequency matrices (PFMs). Some of PFMs were very similar to the consensus sequence. The trained DNN model and the brief source code for the prediction system are uploaded. Further improvement will be achieved following the further development of DNNs.

  11. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    PubMed

    Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris

    2010-07-15

    The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net

  12. Deep learning on temporal-spectral data for anomaly detection

    NASA Astrophysics Data System (ADS)

    Ma, King; Leung, Henry; Jalilian, Ehsan; Huang, Daniel

    2017-05-01

    Detecting anomalies is important for continuous monitoring of sensor systems. One significant challenge is to use sensor data and autonomously detect changes that cause different conditions to occur. Using deep learning methods, we are able to monitor and detect changes as a result of some disturbance in the system. We utilize deep neural networks for sequence analysis of time series. We use a multi-step method for anomaly detection. We train the network to learn spectral and temporal features from the acoustic time series. We test our method using fiber-optic acoustic data from a pipeline.

  13. Extended Star Formation or a Range of Stellar Rotation Velocities? The Nature of Extended Main Sequence Turnoffs in Intermediate-Age Star Clusters

    NASA Astrophysics Data System (ADS)

    Goudfrooij, Paul

    2016-10-01

    Recently, deep color-magnitude diagrams (CMDs) from HST data revealed that several massive intermediate-age star clusters in the Magellanic Clouds exhibit extended main-sequence turn-offs (eMSTOs), and in some cases also dual red clumps. This poses serious questions regarding the mechanisms responsible for the formation of massive star clusters and their well-known light-element abundance variations. The nature of eMSTOs is currently a hotly debated topic of study. Several recent studies indicate that the eMSTOs are caused by an age spread of about 100-500 Myr among cluster stars, while other studies indicate that eMSTOs can be caused by a coeval population in which the relevant stars span a range of rotation velocities. Formal evidence to (dis-)prove either scenario still remains at large, mainly because the available stellar tracks that incorporate the effects of rotation are only available for masses > 1.7 Msun whereas the stars in the known eMSTOs of intermediate-age clusters are less massive. To circumvent this issue, we identified a massive star cluster in the Large Magellanic Cloud (LMC) that has the right dynamical properties to host an eMSTO along with an age at which the effects of age spreads to CMD morphology are substantially different from those of spreads of rotation rates: the 600 Myr old cluster NGC 1831. We propose to obtain deep WFC3/UVIS imaging with filters F336W and F814W to analyze the morphologies of the MSTO and upper MS regions of NGC 1831 at high precision and compare with model predictions. This will have a lasting impact on our understanding of the eMSTO phenomenon and of star cluster formation in general.

  14. Seeding and Establishment of Legionella pneumophila in Hospitals: Implications for Genomic Investigations of Nosocomial Legionnaires' Disease.

    PubMed

    David, Sophia; Afshar, Baharak; Mentasti, Massimo; Ginevra, Christophe; Podglajen, Isabelle; Harris, Simon R; Chalker, Victoria J; Jarraud, Sophie; Harrison, Timothy G; Parkhill, Julian

    2017-05-01

    Legionnaires' disease is an important cause of hospital-acquired pneumonia and is caused by infection with the bacterium Legionella. Because current typing methods often fail to resolve the infection source in possible nosocomial cases, we aimed to determine whether whole-genome sequencing (WGS) could be used to support or refute suspected links between cases and hospitals. We focused on cases involving a major nosocomial-associated strain, L. pneumophila sequence type (ST) 1. WGS data from 229 L. pneumophila ST1 isolates were analyzed, including 99 isolates from the water systems of 17 hospitals and 42 clinical isolates from patients with confirmed or suspected hospital-acquired infections, as well as isolates obtained from or associated with community-acquired sources of Legionnaires' disease. Phylogenetic analysis demonstrated that all hospitals from which multiple isolates were obtained have been colonized by 1 or more distinct ST1 populations. However, deep sampling of 1 hospital also revealed the existence of substantial diversity and ward-specific microevolution within the population. Across all hospitals, suspected links with cases were supported with WGS, although the degree of support was dependent on the depth of environmental sampling and available contextual information. Finally, phylogeographic analysis revealed that hospitals have been seeded with L. pneumophila via both local and international spread of ST1. WGS can be used to support or refute suspected links between hospitals and Legionnaires' disease cases. However, deep hospital sampling is frequently required due to the potential coexistence of multiple populations, existence of substantial diversity, and similarity of hospital isolates to local populations. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America.

  15. Genetic epidemiology of pharmacogenetic variants in South East Asian Malays using whole-genome sequences.

    PubMed

    Sivadas, A; Salleh, M Z; Teh, L K; Scaria, V

    2017-10-01

    Expanding the scope of pharmacogenomic research by including multiple global populations is integral to building robust evidence for its clinical translation. Deep whole-genome sequencing of diverse ethnic populations provides a unique opportunity to study rare and common pharmacogenomic markers that often vary in frequency across populations. In this study, we aim to build a diverse map of pharmacogenetic variants in South East Asian (SEA) Malay population using deep whole-genome sequences of 100 healthy SEA Malay individuals. We investigated the allelic diversity of potentially deleterious pharmacogenomic variants in SEA Malay population. Our analysis revealed 227 common and 466 rare potentially functional single nucleotide variants (SNVs) in 437 pharmacogenomic genes involved in drug metabolism, transport and target genes, including 74 novel variants. This study has created one of the most comprehensive maps of pharmacogenetic markers in any population from whole genomes and will hugely benefit pharmacogenomic investigations and drug dosage recommendations in SEA Malays.

  16. Rapid Fine Conformational Epitope Mapping Using Comprehensive Mutagenesis and Deep Sequencing*

    PubMed Central

    Kowalsky, Caitlin A.; Faber, Matthew S.; Nath, Aritro; Dann, Hailey E.; Kelly, Vince W.; Liu, Li; Shanker, Purva; Wagner, Ellen K.; Maynard, Jennifer A.; Chan, Christina; Whitehead, Timothy A.

    2015-01-01

    Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891

  17. Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer

    DOE PAGES

    Hong, Matthew K. H.; Macintyre, Geoff; Wedge, David C.; ...

    2015-04-01

    Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones,more » even years after removal of the prostate. As a result, analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.« less

  18. Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts

    NASA Technical Reports Server (NTRS)

    Woese, C. R.; Achenbach, L.; Rouviere, P.; Mandelco, L.

    1991-01-01

    A major and too little recognized source of artifact in phylogenetic analysis of molecular sequence data is compositional difference among sequences. The problem becomes particularly acute when alignments contain ribosomal RNAs from both mesophilic and thermophilic species. Among prokaryotes the latter are considerably higher in G + C content than the former, which often results in artificial clustering of thermophilic lineages and their being placed artificially deep in phylogenetic trees. In this communication we review archaeal phylogeny in the light of this consideration, focusing in particular on the phylogenetic position of the sulfate reducing species Archaeoglobus fulgidus, using both 16S rRNA and 23S rRNA sequences. The analysis shows clearly that the previously reported deep branching of the A. fulgidus lineage (very near the base of the euryarchaeal side of the archaeal tree) is incorrect, and that the lineage actually groups with a previously recognized unit that comprises the Methanomicrobiales and extreme halophiles.

  19. Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer.

    PubMed

    Hong, Matthew K H; Macintyre, Geoff; Wedge, David C; Van Loo, Peter; Patel, Keval; Lunke, Sebastian; Alexandrov, Ludmil B; Sloggett, Clare; Cmero, Marek; Marass, Francesco; Tsui, Dana; Mangiola, Stefano; Lonie, Andrew; Naeem, Haroon; Sapre, Nikhil; Phal, Pramit M; Kurganovs, Natalie; Chin, Xiaowen; Kerger, Michael; Warren, Anne Y; Neal, David; Gnanapragasam, Vincent; Rosenfeld, Nitzan; Pedersen, John S; Ryan, Andrew; Haviv, Izhak; Costello, Anthony J; Corcoran, Niall M; Hovens, Christopher M

    2015-04-01

    Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones, even years after removal of the prostate. Analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.

  20. Deep intronic GPR143 mutation in a Japanese family with ocular albinism

    PubMed Central

    Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei

    2015-01-01

    Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease. PMID:26061757

  1. Deep intronic GPR143 mutation in a Japanese family with ocular albinism.

    PubMed

    Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei

    2015-06-10

    Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease.

  2. Protein model discrimination using mutational sensitivity derived from deep sequencing.

    PubMed

    Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan

    2012-02-08

    A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. UCSC genome browser: deep support for molecular biomedical research.

    PubMed

    Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C

    2008-01-01

    The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).

  4. Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.

    PubMed

    Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel

    2013-09-01

    RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  5. A Constructivist View of Music Education: Perspectives for Deep Learning

    ERIC Educational Resources Information Center

    Scott, Sheila

    2006-01-01

    The article analyzes a constructivist view of music education. A constructivist music classroom exemplifies deep learning when students formulate questions, acquire new knowledge by developing and implementing plans for investigating these questions, and reflect on the results. A context for deep learning requires that teachers and students work…

  6. Deep-Earth reactor: nuclear fission, helium, and the geomagnetic field.

    PubMed

    Hollenbach, D F; Herndon, J M

    2001-09-25

    Geomagnetic field reversals and changes in intensity are understandable from an energy standpoint as natural consequences of intermittent and/or variable nuclear fission chain reactions deep within the Earth. Moreover, deep-Earth production of helium, having (3)He/(4)He ratios within the range observed from deep-mantle sources, is demonstrated to be a consequence of nuclear fission. Numerical simulations of a planetary-scale geo-reactor were made by using the SCALE sequence of codes. The results clearly demonstrate that such a geo-reactor (i) would function as a fast-neutron fuel breeder reactor; (ii) could, under appropriate conditions, operate over the entire period of geologic time; and (iii) would function in such a manner as to yield variable and/or intermittent output power.

  7. New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

    PubMed

    Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard

    2009-05-01

    The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.

  8. An Anatomy of a Seismic Sequence in a Deep Gold Mine

    NASA Astrophysics Data System (ADS)

    Gibowicz, S. J.

    1997-12-01

    An unusual swarm-like seismic sequence occurred in April 1993 at the Western Deep Levels gold mine, South Africa. Altogether 199 events with moment magnitude from -0.5 to 3.1 were recorded and located by the mine seismic network. The sequence lasted 12 days and was composed in fact of four main shock-aftershocks sequences, closely following each other in space and time. The events were confined to a volume of rock extending to 670 m in the N-S, 630 m in the E-W, and 390 m in the vertical directions. The first sequence lasted 179 hours and the second only 13 hours, being interrupted by the third sequence which lasted 31 hours, being in turn interrupted by the fourth sequence. The parameter p, describing the rate of occurrence of aftershocks, ranged from 0.7 to 1. The first sequence is characterized by the lowest value of the fractal correlation dimension D = 1.75 and the second by the highest value of D = 2.4, whereas the third and fourth sequences are characterized by the middle value of D = 1.9.¶The corner frequencies of P and S waves are in close proximity and range from 14 to 220 Hz. A display of source parameters as a function of time shows that the four main shocks are most distinctly marked by their source radius. For 46 events a moment tensor inversion was performed. In most cases the double-couple component is dominant, ranging from 60 to 90 percent of the solution. The double-couple solutions correspond to the same number of normal and reverse faults and oblique-slip focal mechanisms. An analysis of space distribution of P, T and B axes reveals that the distribution of B axes is the most regular.

  9. Compilation of Reprints Number 63.

    DTIC Science & Technology

    1986-03-01

    Michel Be6, Stephen H1. Johnson, and E.F. Chiburis PRELIMINARY SEISMIC REFRACTION RESULTS USING A BOREHOLE SEISMOMETER IN DEEP SEA DRILLING PROJECT HOLE...refraction data with wells drilled on land and offshore reflection profiles permits tentative identification of geologic sequences on the basis of...PERIOD CO’VEAEO PRELIMINARY SEISMIC REFRACTION RESULTS USING A Rern BOREHOLE SEISMOMETER IN DEEP SEA DRILLING ~ rn PROJECT HOLE 395A 6.PERFORMING ORG

  10. Microbial community composition of deep-sea corals from the Red Sea provides insight into functional adaption to a unique environment

    PubMed Central

    Röthig, Till; Yum, Lauren K.; Kremb, Stephan G.; Roik, Anna; Voolstra, Christian R.

    2017-01-01

    Microbes associated with deep-sea corals remain poorly studied. The lack of symbiotic algae suggests that associated microbes may play a fundamental role in maintaining a viable coral host via acquisition and recycling of nutrients. Here we employed 16 S rRNA gene sequencing to study bacterial communities of three deep-sea scleractinian corals from the Red Sea, Dendrophyllia sp., Eguchipsammia fistula, and Rhizotrochus typus. We found diverse, species-specific microbiomes, distinct from the surrounding seawater. Microbiomes were comprised of few abundant bacteria, which constituted the majority of sequences (up to 58% depending on the coral species). In addition, we found a high diversity of rare bacteria (taxa at <1% abundance comprised >90% of all bacteria). Interestingly, we identified anaerobic bacteria, potentially providing metabolic functions at low oxygen conditions, as well as bacteria harboring the potential to degrade crude oil components. Considering the presence of oil and gas fields in the Red Sea, these bacteria may unlock this carbon source for the coral host. In conclusion, the prevailing environmental conditions of the deep Red Sea (>20 °C, <2 mg oxygen L−1) may require distinct functional adaptations, and our data suggest that bacterial communities may contribute to coral functioning in this challenging environment. PMID:28303925

  11. Microbial community composition of deep-sea corals from the Red Sea provides insight into functional adaption to a unique environment.

    PubMed

    Röthig, Till; Yum, Lauren K; Kremb, Stephan G; Roik, Anna; Voolstra, Christian R

    2017-03-17

    Microbes associated with deep-sea corals remain poorly studied. The lack of symbiotic algae suggests that associated microbes may play a fundamental role in maintaining a viable coral host via acquisition and recycling of nutrients. Here we employed 16 S rRNA gene sequencing to study bacterial communities of three deep-sea scleractinian corals from the Red Sea, Dendrophyllia sp., Eguchipsammia fistula, and Rhizotrochus typus. We found diverse, species-specific microbiomes, distinct from the surrounding seawater. Microbiomes were comprised of few abundant bacteria, which constituted the majority of sequences (up to 58% depending on the coral species). In addition, we found a high diversity of rare bacteria (taxa at <1% abundance comprised >90% of all bacteria). Interestingly, we identified anaerobic bacteria, potentially providing metabolic functions at low oxygen conditions, as well as bacteria harboring the potential to degrade crude oil components. Considering the presence of oil and gas fields in the Red Sea, these bacteria may unlock this carbon source for the coral host. In conclusion, the prevailing environmental conditions of the deep Red Sea (>20 °C, <2 mg oxygen L -1 ) may require distinct functional adaptations, and our data suggest that bacterial communities may contribute to coral functioning in this challenging environment.

  12. Efficiency of the neighbor-joining method in reconstructing deep and shallow evolutionary relationships in large phylogenies.

    PubMed

    Kumar, S; Gadagkar, S R

    2000-12-01

    The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages.

  13. Bacterial and archaeal communities in the deep-sea sediments of inactive hydrothermal vents in the Southwest India Ridge

    NASA Astrophysics Data System (ADS)

    Zhang, Likui; Kang, Manyu; Xu, Jiajun; Xu, Jian; Shuai, Yinjie; Zhou, Xiaojian; Yang, Zhihui; Ma, Kesen

    2016-05-01

    Active deep-sea hydrothermal vents harbor abundant thermophilic and hyperthermophilic microorganisms. However, microbial communities in inactive hydrothermal vents have not been well documented. Here, we investigated bacterial and archaeal communities in the two deep-sea sediments (named as TVG4 and TVG11) collected from inactive hydrothermal vents in the Southwest India Ridge using the high-throughput sequencing technology of Illumina MiSeq2500 platform. Based on the V4 region of 16S rRNA gene, sequence analysis showed that bacterial communities in the two samples were dominated by Proteobacteria, followed by Bacteroidetes, Actinobacteria and Firmicutes. Furthermore, archaeal communities in the two samples were dominated by Thaumarchaeota and Euryarchaeota. Comparative analysis showed that (i) TVG4 displayed the higher bacterial richness and lower archaeal richness than TVG11; (ii) the two samples had more divergence in archaeal communities than bacterial communities. Bacteria and archaea that are potentially associated with nitrogen, sulfur metal and methane cycling were detected in the two samples. Overall, we first provided a comparative picture of bacterial and archaeal communities and revealed their potentially ecological roles in the deep-sea environments of inactive hydrothermal vents in the Southwest Indian Ridge, augmenting microbial communities in inactive hydrothermal vents.

  14. Archaeal Diversity in Waters from Deep South African Gold Mines

    PubMed Central

    Takai, Ken; Moser, Duane P.; DeFlaun, Mary; Onstott, Tullis C.; Fredrickson, James K.

    2001-01-01

    A culture-independent molecular analysis of archaeal communities in waters collected from deep South African gold mines was performed by performing a PCR-mediated terminal restriction fragment length polymorphism (T-RFLP) analysis of rRNA genes (rDNA) in conjunction with a sequencing analysis of archaeal rDNA clone libraries. The water samples used represented various environments, including deep fissure water, mine service water, and water from an overlying dolomite aquifer. T-RFLP analysis revealed that the ribotype distribution of archaea varied with the source of water. The archaeal communities in the deep gold mine environments exhibited great phylogenetic diversity; the majority of the members were most closely related to uncultivated species. Some archaeal rDNA clones obtained from mine service water and dolomite aquifer water samples were most closely related to environmental rDNA clones from surface soil (soil clones) and marine environments (marine group I [MGI]). Other clones exhibited intermediate phylogenetic affiliation between soil clones and MGI in the Crenarchaeota. Fissure water samples, derived from active or dormant geothermal environments, yielded archaeal sequences that exhibited novel phylogeny, including a novel lineage of Euryarchaeota. These results suggest that deep South African gold mines harbor novel archaeal communities distinct from those observed in other environments. Based on the phylogenetic analysis of archaeal strains and rDNA clones, including the newly discovered archaeal rDNA clones, the evolutionary relationship and the phylogenetic organization of the domain Archaea are reevaluated. PMID:11722932

  15. Genomic and Phylogenetic Characterization of Luminous Bacteria Symbiotic with the Deep-Sea Fish Chlorophthalmus albatrossis (Aulopiformes: Chlorophthalmidae)

    PubMed Central

    Dunlap, Paul V.; Ast, Jennifer C.

    2005-01-01

    Bacteria forming light-organ symbiosis with deep-sea chlorophthalmid fishes (Aulopiformes: Chlorophthalmidae) are considered to belong to the species Photobacterium phosphoreum. The identification of these bacteria as P. phosphoreum, however, was based exclusively on phenotypic traits, which may not discriminate between phenetically similar but evolutionarily distinct luminous bacteria. Therefore, to test the species identification of chlorophthalmid symbionts, we carried out a genomotypic (repetitive element palindromic PCR genomic profiling) and phylogenetic analysis on strains isolated from the perirectal light organ of Chlorophthalmus albatrossis. Sequence analysis of the 16S rRNA gene of 10 strains from 5 fish specimens placed these bacteria in a cluster related to but phylogenetically distinct from the type strain of P. phosphoreum, ATCC 11040T, and the type strain of Photobacterium iliopiscarium, ATCC 51760T. Analysis of gyrB resolved the C. albatrossis strains as a strongly supported clade distinct from P. phosphoreum and P. iliopiscarium. Genomic profiling of 109 strains from the 5 C. albatrossis specimens revealed a high level of similarity among strains but allowed identification of genomotypically different types from each fish. Representatives of each type were then analyzed phylogenetically, using sequence of the luxABFE genes. As with gyrB, analysis of luxABFE resolved the C. albatrossis strains as a robustly supported clade distinct from P. phosphoreum. Furthermore, other strains of luminous bacteria reported as P. phosphoreum, i.e., NCIMB 844, from the skin of Merluccius capensis (Merlucciidae), NZ-11D, from the light organ of Nezumia aequalis (Macrouridae), and pjapo.1.1, from the light organ of Physiculus japonicus (Moridae), grouped phylogenetically by gyrB and luxABFE with the C. albatrossis strains, not with ATCC 11040T. These results demonstrate that luminous bacteria symbiotic with C. albatrossis, together with certain other strains of luminous bacteria, form a clade, designated the kishitanii clade, that is related to but evolutionarily distinct from P. phosphoreum. Members of the kishitanii clade may constitute the major or sole bioluminescent symbiont of several families of deep-sea luminous fishes. PMID:15691950

  16. Genomic and phylogenetic characterization of luminous bacteria symbiotic with the deep-sea fish Chlorophthalmus albatrossis (Aulopiformes: Chlorophthalmidae).

    PubMed

    Dunlap, Paul V; Ast, Jennifer C

    2005-02-01

    Bacteria forming light-organ symbiosis with deep-sea chlorophthalmid fishes (Aulopiformes: Chlorophthalmidae) are considered to belong to the species Photobacterium phosphoreum. The identification of these bacteria as P. phosphoreum, however, was based exclusively on phenotypic traits, which may not discriminate between phenetically similar but evolutionarily distinct luminous bacteria. Therefore, to test the species identification of chlorophthalmid symbionts, we carried out a genomotypic (repetitive element palindromic PCR genomic profiling) and phylogenetic analysis on strains isolated from the perirectal light organ of Chlorophthalmus albatrossis. Sequence analysis of the 16S rRNA gene of 10 strains from 5 fish specimens placed these bacteria in a cluster related to but phylogenetically distinct from the type strain of P. phosphoreum, ATCC 11040(T), and the type strain of Photobacterium iliopiscarium, ATCC 51760(T). Analysis of gyrB resolved the C. albatrossis strains as a strongly supported clade distinct from P. phosphoreum and P. iliopiscarium. Genomic profiling of 109 strains from the 5 C. albatrossis specimens revealed a high level of similarity among strains but allowed identification of genomotypically different types from each fish. Representatives of each type were then analyzed phylogenetically, using sequence of the luxABFE genes. As with gyrB, analysis of luxABFE resolved the C. albatrossis strains as a robustly supported clade distinct from P. phosphoreum. Furthermore, other strains of luminous bacteria reported as P. phosphoreum, i.e., NCIMB 844, from the skin of Merluccius capensis (Merlucciidae), NZ-11D, from the light organ of Nezumia aequalis (Macrouridae), and pjapo.1.1, from the light organ of Physiculus japonicus (Moridae), grouped phylogenetically by gyrB and luxABFE with the C. albatrossis strains, not with ATCC 11040(T). These results demonstrate that luminous bacteria symbiotic with C. albatrossis, together with certain other strains of luminous bacteria, form a clade, designated the kishitanii clade, that is related to but evolutionarily distinct from P. phosphoreum. Members of the kishitanii clade may constitute the major or sole bioluminescent symbiont of several families of deep-sea luminous fishes.

  17. Stability of deep features across CT scanners and field of view using a physical phantom

    NASA Astrophysics Data System (ADS)

    Paul, Rahul; Shafiq-ul-Hassan, Muhammad; Moros, Eduardo G.; Gillies, Robert J.; Hall, Lawrence O.; Goldgof, Dmitry B.

    2018-02-01

    Radiomics is the process of analyzing radiological images by extracting quantitative features for monitoring and diagnosis of various cancers. Analyzing images acquired from different medical centers is confounded by many choices in acquisition, reconstruction parameters and differences among device manufacturers. Consequently, scanning the same patient or phantom using various acquisition/reconstruction parameters as well as different scanners may result in different feature values. To further evaluate this issue, in this study, CT images from a physical radiomic phantom were used. Recent studies showed that some quantitative features were dependent on voxel size and that this dependency could be reduced or removed by the appropriate normalization factor. Deep features extracted from a convolutional neural network, may also provide additional features for image analysis. Using a transfer learning approach, we obtained deep features from three convolutional neural networks pre-trained on color camera images. An we examination of the dependency of deep features on image pixel size was done. We found that some deep features were pixel size dependent, and to remove this dependency we proposed two effective normalization approaches. For analyzing the effects of normalization, a threshold has been used based on the calculated standard deviation and average distance from a best fit horizontal line among the features' underlying pixel size before and after normalization. The inter and intra scanner dependency of deep features has also been evaluated.

  18. Identification and characterization of a chitin deacetylase from a metagenomic library of deep-sea sediments of the Arctic Ocean.

    PubMed

    Liu, Jinlin; Jia, Zhijuan; Li, Sha; Li, Yan; You, Qiang; Zhang, Chunyan; Zheng, Xiaotong; Xiong, Guomei; Zhao, Jin; Qi, Chao; Yang, Jihong

    2016-09-15

    The chemical and biological compositions of deep-sea sediments are interesting because of the underexplored diversity when it comes to bioprospecting. The special geographical location and climates make Arctic Ocean a unique ocean area containing an abundance of microbial resources. A metagenomic library was constructed based on the deep-sea sediments of Arctic Ocean. Part of insertion fragments of this library were sequenced. A chitin deacetylase gene, cdaYJ, was identified and characterized. A metagenomic library with 2750 clones was obtained and ten clones were sequenced. Results revealed several interesting genes, including a chitin deacetylase coding sequence, cdaYJ. The CdaYJ is homologous to some known chitin deacetylases and contains conserved chitin deacetylase active sites. CdaYJ protein exhibits a long N-terminal and a relative short C-terminal. Phylogenetic analysis revealed that CdaYJ showed highest homology to CDAs from Alphaproteobacteria. The cdaYJ gene was subcloned into the pET-28a vector and the recombinant CdaYJ (rCdaYJ) was expressed in Escherichia coli BL21 (DE3). rCdaYJ showed a molecular weight of 43kDa, and exhibited deacetylation activity by using p-nitroacetanilide as substrate. The optimal pH and temperature of rCdaYJ were tested as pH7.4 and 28°C, respectively. The construction of metagenomic library of the Arctic deep-sea sediments provides us an opportunity to look into the microbial communities and exploiting valuable gene resources. A chitin deacetylase CdaYJ was identified from the library. It showed highest deacetylation activity under slight alkaline and low temperature conditions. CdaYJ might be a candidate chitin deacetylase that possesses industrial and pharmaceutical potentials. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. A deep learning pipeline for Indian dance style classification

    NASA Astrophysics Data System (ADS)

    Dewan, Swati; Agarwal, Shubham; Singh, Navjyoti

    2018-04-01

    In this paper, we address the problem of dance style classification to classify Indian dance or any dance in general. We propose a 3-step deep learning pipeline. First, we extract 14 essential joint locations of the dancer from each video frame, this helps us to derive any body region location within the frame, we use this in the second step which forms the main part of our pipeline. Here, we divide the dancer into regions of important motion in each video frame. We then extract patches centered at these regions. Main discriminative motion is captured in these patches. We stack the features from all such patches of a frame into a single vector and form our hierarchical dance pose descriptor. Finally, in the third step, we build a high level representation of the dance video using the hierarchical descriptors and train it using a Recurrent Neural Network (RNN) for classification. Our novelty also lies in the way we use multiple representations for a single video. This helps us to: (1) Overcome the RNN limitation of learning small sequences over big sequences such as dance; (2) Extract more data from the available dataset for effective deep learning by training multiple representations. Our contributions in this paper are three-folds: (1) We provide a deep learning pipeline for classification of any form of dance; (2) We prove that a segmented representation of a dance video works well with sequence learning techniques for recognition purposes; (3) We extend and refine the ICD dataset and provide a new dataset for evaluation of dance. Our model performs comparable or better in some cases than the state-of-the-art on action recognition benchmarks.

  20. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    PubMed

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  1. Methane related changes in prokaryotic activity along geochemical profiles in sediments of Lake Kinneret (Israel)

    NASA Astrophysics Data System (ADS)

    Bar Or, I.; Ben-Dov, E.; Kushmaro, A.; Eckert, W.; Sivan, O.

    2014-06-01

    Microbial methane oxidation process (methanotrophy) is the primary control on the emission of the greenhouse gas methane (CH4) to the atmosphere. In terrestrial environments, aerobic methanotrophic bacteria are mainly responsible for oxidizing the methane. In marine sediments the coupling of the anaerobic oxidation of methane (AOM) with sulfate reduction, often by a consortium of anaerobic methanotrophic archaea (ANME) and sulfate reducing bacteria, was found to consume almost all the upward diffusing methane. Recently, we showed geochemical evidence for AOM driven by iron reduction in Lake Kinneret (LK) (Israel) deep sediments and suggested that this process can be an important global methane sink. The goal of the present study was to link the geochemical gradients found in the porewater (chemical and isotope profiles) with possible changes in microbial community structure. Specifically, we examined the possible shift in the microbial community in the deep iron-driven AOM zone and its similarity to known sulfate driven AOM populations. Screening of archaeal 16S rRNA gene sequences revealed Thaumarchaeota and Euryarchaeota as the dominant phyla in the sediment. Thaumarchaeota, which belongs to the family of copper containing membrane-bound monooxgenases, increased with depth while Euryarchaeota decreased. This may indicate the involvement of Thaumarchaeota, which were discovered to be ammonia oxidizers but whose activity could also be linked to methane, in AOM in the deep sediment. ANMEs sequences were not found in the clone libraries, suggesting that iron-driven AOM is not through sulfate. Bacterial 16S rRNA sequences displayed shifts in community diversity with depth. Proteobacteria and Chloroflexi increased with depth, which could be connected with their different dissimilatory anaerobic processes. The observed changes in microbial community structure suggest possible direct and indirect mechanisms for iron-driven AOM in deep sediments.

  2. smRNAome profiling to identify conserved and novel microRNAs in Stevia rebaudiana Bertoni

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) constitute a family of small RNA (sRNA) population that regulates the gene expression and plays an important role in plant development, metabolism, signal transduction and stress response. Extensive studies on miRNAs have been performed in different plants such as Arabidopsis thaliana, Oryza sativa etc. and volume of the miRNA database, mirBASE, has been increasing on day to day basis. Stevia rebaudiana Bertoni is an important perennial herb which accumulates high concentrations of diterpene steviol glycosides which contributes to its high indexed sweetening property with no calorific value. Several studies have been carried out for understanding molecular mechanism involved in biosynthesis of these glycosides, however, information about miRNAs has been lacking in S. rebaudiana. Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs irrespective of availability of genome sequence data. Results To identify miRNAs in S. rebaudiana, sRNA library was constructed and sequenced using Illumina genome analyzer II. A total of 30,472,534 reads representing 2,509,190 distinct sequences were obtained from sRNA library. Based on sequence similarity, we identified 100 miRNAs belonging to 34 highly conserved families. Also, we identified 12 novel miRNAs whose precursors were potentially generated from stevia EST and nucleotide sequences. All novel sequences have not been earlier described in other plant species. Putative target genes were predicted for most conserved and novel miRNAs. The predicted targets are mainly mRNA encoding enzymes regulating essential plant metabolic and signaling pathways. Conclusions This study led to the identification of 34 highly conserved miRNA families and 12 novel potential miRNAs indicating that specific miRNAs exist in stevia species. Our results provided information on stevia miRNAs and their targets building a foundation for future studies to understand their roles in key stevia traits. PMID:23116282

  3. Next-generation sequencing sheds light on the natural history of hepatitis C infection in patients who fail treatment.

    PubMed

    Abdelrahman, Tamer; Hughes, Joseph; Main, Janice; McLauchlan, John; Thursz, Mark; Thomson, Emma

    2015-01-01

    High rates of sexually transmitted infection and reinfection with hepatitis C virus (HCV) have recently been reported in human immunodeficiency virus (HIV)-infected men who have sex with men and reinfection has also been described in monoinfected injecting drug users. The diagnosis of reinfection has traditionally been based on direct Sanger sequencing of samples pre- and posttreatment, but not on more sensitive deep sequencing techniques. We studied viral quasispecies dynamics in patients who failed standard of care therapy in a high-risk HIV-infected cohort of patients with early HCV infection to determine whether treatment failure was associated with reinfection or recrudescence of preexisting infection. Paired sequences (pre- and posttreatment) were analyzed. The HCV E2 hypervariable region-1 was amplified using nested reverse-transcription polymerase chain reaction (RT-PCR) with indexed genotype-specific primers and the same products were sequenced using both Sanger and 454 pyrosequencing approaches. Of 99 HIV-infected patients with acute HCV treated with 24-48 weeks of pegylated interferon alpha and ribavirin, 15 failed to achieve a sustained virological response (six relapsed, six had a null response, and three had a partial response). Using direct sequencing, 10/15 patients (66%) had evidence of a previously undetected strain posttreatment; in many studies, this is interpreted as reinfection. However, pyrosequencing revealed that 15/15 (100%) of patients had evidence of persisting infection; 6/15 (40%) patients had evidence of a previously undetected variant present in the posttreatment sample in addition to a variant that was detected at baseline. This could represent superinfection or a limitation of the sensitivity of pyrosequencing. In this high-risk group, the emergence of new viral strains following treatment failure is most commonly associated with emerging dominance of preexisting minority variants rather than reinfection. Superinfection may occur in this cohort but reinfection is overestimated by Sanger sequencing. © 2014 The Authors. Hepatology published by Wiley on behalf of the American Association for the Study of Liver Diseases.

  4. smRNAome profiling to identify conserved and novel microRNAs in Stevia rebaudiana Bertoni.

    PubMed

    Mandhan, Vibha; Kaur, Jagdeep; Singh, Kashmir

    2012-11-01

    MicroRNAs (miRNAs) constitute a family of small RNA (sRNA) population that regulates the gene expression and plays an important role in plant development, metabolism, signal transduction and stress response. Extensive studies on miRNAs have been performed in different plants such as Arabidopsis thaliana, Oryza sativa etc. and volume of the miRNA database, mirBASE, has been increasing on day to day basis. Stevia rebaudiana Bertoni is an important perennial herb which accumulates high concentrations of diterpene steviol glycosides which contributes to its high indexed sweetening property with no calorific value. Several studies have been carried out for understanding molecular mechanism involved in biosynthesis of these glycosides, however, information about miRNAs has been lacking in S. rebaudiana. Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs irrespective of availability of genome sequence data. To identify miRNAs in S. rebaudiana, sRNA library was constructed and sequenced using Illumina genome analyzer II. A total of 30,472,534 reads representing 2,509,190 distinct sequences were obtained from sRNA library. Based on sequence similarity, we identified 100 miRNAs belonging to 34 highly conserved families. Also, we identified 12 novel miRNAs whose precursors were potentially generated from stevia EST and nucleotide sequences. All novel sequences have not been earlier described in other plant species. Putative target genes were predicted for most conserved and novel miRNAs. The predicted targets are mainly mRNA encoding enzymes regulating essential plant metabolic and signaling pathways. This study led to the identification of 34 highly conserved miRNA families and 12 novel potential miRNAs indicating that specific miRNAs exist in stevia species. Our results provided information on stevia miRNAs and their targets building a foundation for future studies to understand their roles in key stevia traits.

  5. Identification and profiling of growth-related microRNAs of the swimming crab Portunus trituberculatus by using Solexa deep sequencing.

    PubMed

    Ren, Xianyun; Cui, Yanting; Gao, Baoquan; Liu, Ping; Li, Jian

    2016-08-01

    MicroRNAs (miRNAs) are a class of endogenous small non-coding RNAs that regulate gene expression by post-transcriptional repression of mRNAs. The swimming crab Portunus trituberculatus is one of the most important crustacean species for aquaculture in China. However, to date no miRNAs have been reported to for modulating growth in P. trituberculatus. To investigate miRNAs involved in the growth of this species, we constructed six small RNA libraries for big individuals (BIs) and small individuals (SIs) from a highly inbred family. Six mixed RNA pools of five tissues (eyestalk, gill, heart, hepatopancreas, and muscle) were obtained. By aligning sequencing data with those for known miRNAs, a total of 404 miRNAs, including 339 known and 65 novel miRNAs, were identified from the six libraries. MiR-100 and miR-276a-3p were among the most prominent miRNA species. We identified seven differentially expressed miRNAs between the BIs and SIs, which were validated using real-time PCR. Preliminary analyzes of their putative target genes and GO and KEGG pathway analyzes showed that these differentially expressed miRNAs could play important roles in global transcriptional depression and cell differentiation of P. trituberculatus. This study reveals the first miRNA profile related to the body growth of P. trituberculatus, which would be particularly useful for crab breeding programs. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Identification and Expression Analysis of microRNAs at the Grain Filling Stage in Rice(Oryza sativa L.)via Deep Sequencing

    PubMed Central

    Yi, Rong; Zhu, Zhixuan; Hu, Jihong; Qian, Qian; Dai, Jincheng; Ding, Yi

    2013-01-01

    MicroRNAs (miRNAs) have been shown to play crucial roles in the regulation of plant development. In this study, high-throughput RNA-sequencing technology was used to identify novel miRNAs, and to reveal miRNAs expression patterns at different developmental stages during rice (Oryza sativa L.) grain filling. A total of 434 known miRNAs (380, 402, 390 and 392 at 5, 7, 12 and 17 days after fertilization, respectively.) were obtained from rice grain. The expression profiles of these identified miRNAs were analyzed and the results showed that 161 known miRNAs were differentially expressed during grain development, a high proportion of which were up-regulated from 5 to 7 days after fertilization. In addition, sixty novel miRNAs were identified, and five of these were further validated experimentally. Additional analysis showed that the predicted targets of the differentially expressed miRNAs may participate in signal transduction, carbohydrate and nitrogen metabolism, the response to stimuli and epigenetic regulation. In this study, differences were revealed in the composition and expression profiles of miRNAs among individual developmental stages during the rice grain filling process, and miRNA editing events were also observed, analyzed and validated during this process. The results provide novel insight into the dynamic profiles of miRNAs in developing rice grain and contribute to the understanding of the regulatory roles of miRNAs in grain filling. PMID:23469249

  7. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis

    PubMed Central

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  8. Deep sequencing identifies circulating mouse miRNAs that are functionally implicated in manifestations of aging and responsive to calorie restriction.

    PubMed

    Dhahbi, Joseph M; Spindler, Stephen R; Atamna, Hani; Yamakawa, Amy; Guerrero, Noel; Boffelli, Dario; Mote, Patricia; Martin, David I K

    2013-02-01

    MicroRNAs (miRNAs) function to modulate gene expression, and through this property they regulate a broad spectrum of cellular processes. They can circulate in blood and thereby mediate cell-to-cell communication. Aging involves changes in many cellular processes that are potentially regulated by miRNAs, and some evidence has implicated circulating miRNAs in the aging process. In order to initiate a comprehensive assessment of the role of circulating miRNAs in aging, we have used deep sequencing to characterize circulating miRNAs in the serum of young mice, old mice, and old mice maintained on calorie restriction (CR). Deep sequencing identifies a set of novel miRNAs, and also accurately measures all known miRNAs present in serum. This analysis demonstrates that the levels of many miRNAs circulating in the mouse are increased with age, and that the increases can be antagonized by CR. The genes targeted by this set of age-modulated miRNAs are predicted to regulate biological processes directly relevant to the manifestations of aging including metabolic changes, and the miRNAs themselves have been linked to diseases associated with old age. This finding implicates circulating miRNAs in the aging process, raising questions about their tissues of origin, their cellular targets, and their functional role in metabolic changes that occur with aging.

  9. Predictive value of the composition of the vaginal microbiota in bacterial vaginosis, a dynamic study to identify recurrence-related flora

    PubMed Central

    Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping

    2016-01-01

    Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV. PMID:27253522

  10. Taxonomy, distribution and ecology of the order Phyllodocida (Annelida, Polychaeta) in deep-sea habitats around the Iberian margin

    NASA Astrophysics Data System (ADS)

    Ravara, Ascensão; Ramos, Diana; Teixeira, Marcos A. L.; Costa, Filipe O.; Cunha, Marina R.

    2017-03-01

    The polychaetes of the order Phyllodocida (excluding Nereidiformia and Phyllodociformia incertae sedis) collected from deep-sea habitats of the Iberian margin (Bay of Biscay, Horseshoe continental rise, Gulf of Cadiz and Alboran Sea), and Atlantic seamounts (Gorringe Bank, Atlantis and Nameless) are reported herein. Thirty-six species belonging to seven families - Acoetidae, Pholoidae, Polynoidae, Sigalionidae, Glyceridae, Goniadidae and Phyllodocidae, were identified. Amended descriptions and/or new illustrations are given for the species Allmaniella setubalensis, Anotochaetonoe michelbhaudi, Lepidasthenia brunnea and Polynoe sp. Relevant taxonomical notes are provided for other seventeen species. Allmaniella setubalensis, Anotochaetonoe michelbhaudi, Harmothoe evei, Eumida longicirrata and Glycera noelae, previously known only from their type localities were found in different deep-water places of the studied areas and constitute new records for the Iberian margin. The geographic distributions and the bathymetric range of thirteen and fifteen species, respectively, are extended. The morphology-based biodiversity inventory was complemented with DNA sequences of the mitochondrial barcode region (COI barcodes) providing a molecular tag for future reference. Twenty new sequences were obtained for nine species in the families Acoetidae, Glyceridae and Polynoidae and for three lineages within the Phylodoce madeirensis complex (Phyllodocidae). A brief analysis of the newly obtained sequences and publicly available COI barcode data for the genera herein reported, highlighted several cases of unclear taxonomic assignments, which need further study.

  11. SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory.

    PubMed

    Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N

    2014-01-01

    A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  12. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm.

    PubMed

    Woo, Hannah L; O'Dell, Kaela B; Utturkar, Sagar; McBride, Kathryn R; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T B K; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D; Hazen, Terry C

    2016-11-23

    Thalassospira sp. strain KO164 was isolated from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. The near-complete genome sequence presented here will facilitate analyses into this deep-ocean bacterium's ability to degrade recalcitrant organics such as lignin. Copyright © 2016 Woo et al.

  13. Whole-Genome Characterization of Prunus necrotic ringspot virus Infecting Sweet Cherry in China

    PubMed Central

    2018-01-01

    ABSTRACT Prunus necrotic ringspot virus (PNRSV) causes yield loss in most cultivated stone fruits, including sweet cherry. Using a small RNA deep-sequencing approach combined with end-genome sequence cloning, we identified the complete genomes of all three PNRSV strands from PNRSV-infected sweet cherry trees and compared them with those of two previously reported isolates. PMID:29496825

  14. DNA-Free Genetically Edited Grapevine and Apple Protoplast Using CRISPR/Cas9 Ribonucleoproteins.

    PubMed

    Malnoy, Mickael; Viola, Roberto; Jung, Min-Hee; Koo, Ok-Jae; Kim, Seokjoong; Kim, Jin-Soo; Velasco, Riccardo; Nagamangala Kanchiswamy, Chidananda

    2016-01-01

    The combined availability of whole genome sequences and genome editing tools is set to revolutionize the field of fruit biotechnology by enabling the introduction of targeted genetic changes with unprecedented control and accuracy, both to explore emergent phenotypes and to introduce new functionalities. Although plasmid-mediated delivery of genome editing components to plant cells is very efficient, it also presents some drawbacks, such as possible random integration of plasmid sequences in the host genome. Additionally, it may well be intercepted by current process-based GMO regulations, complicating the path to commercialization of improved varieties. Here, we explore direct delivery of purified CRISPR/Cas9 ribonucleoproteins (RNPs) to the protoplast of grape cultivar Chardonnay and apple cultivar such as Golden delicious fruit crop plants for efficient targeted mutagenesis. We targeted MLO-7 , a susceptible gene in order to increase resistance to powdery mildew in grape cultivar and DIPM-1, DIPM-2 , and DIPM-4 in the apple to increase resistance to fire blight disease. Furthermore, efficient protoplast transformation, the molar ratio of Cas9 and sgRNAs were optimized for each grape and apple cultivar. The targeted mutagenesis insertion and deletion rate was analyzed using targeted deep sequencing. Our results demonstrate that direct delivery of CRISPR/Cas9 RNPs to the protoplast system enables targeted gene editing and paves the way to the generation of DNA-free genome edited grapevine and apple plants.

  15. Discovery of Culex pipiens associated tunisia virus: a new ssRNA(+) virus representing a new insect associated virus family

    PubMed Central

    Bigot, Diane; Atyame, Célestine M; Weill, Mylène; Justy, Fabienne

    2018-01-01

    Abstract In the global context of arboviral emergence, deep sequencing unlocks the discovery of new mosquito-borne viruses. Mosquitoes of the species Culex pipiens, C. torrentium, and C. hortensis were sampled from 22 locations worldwide for transcriptomic analyses. A virus discovery pipeline was used to analyze the dataset of 0.7 billion reads comprising 22 individual transcriptomes. Two closely related 6.8 kb viral genomes were identified in C. pipiens and named as Culex pipiens associated tunisia virus (CpATV) strains Ayed and Jedaida. The CpATV genome contained four ORFs. ORF1 possessed helicase and RNA-dependent RNA polymerase (RdRp) domains related to new viral sequences recently found mainly in dipterans. ORF2 and 4 contained a capsid protein domain showing strong homology with Virgaviridae plant viruses. ORF3 displayed similarities with eukaryotic Rhoptry domain and a merozoite surface protein (MSP7) domain only found in mosquito-transmitted Plasmodium, suggesting possible interactions between CpATV and vertebrate cells. Estimation of a strong purifying selection exerted on each ORFs and the presence of a polymorphism maintained in the coding region of ORF3 suggested that both CpATV sequences are genuine functional viruses. CpATV is part of an entirely new and highly diversified group of viruses recently found in insects, and that bears the genomic hallmarks of a new viral family. PMID:29340209

  16. Mutation Spectrum of the ABCA4 Gene in a Greek Cohort with Stargardt Disease: Identification of Novel Mutations and Evidence of Three Prevalent Mutated Alleles

    PubMed Central

    Vassiliki, Kokkinou; George, Koutsodontis; Polixeni, Stamatiou; Christoforos, Giatzakis; Minas, Aslanides Ioannis; Stavrenia, Koukoula; Ioannis, Datseris

    2018-01-01

    Aim To evaluate the frequency and pattern of disease-associated mutations of ABCA4 gene among Greek patients with presumed Stargardt disease (STGD1). Materials and Methods A total of 59 patients were analyzed for ABCA4 mutations using the ABCR400 microarray and PCR-based sequencing of all coding exons and flanking intronic regions. MLPA analysis as well as sequencing of two regions in introns 30 and 36 reported earlier to harbor deep intronic disease-associated variants was used in 4 selected cases. Results An overall detection rate of at least one mutant allele was achieved in 52 of the 59 patients (88.1%). Direct sequencing improved significantly the complete characterization rate, that is, identification of two mutations compared to the microarray analysis (93.1% versus 50%). In total, 40 distinct potentially disease-causing variants of the ABCA4 gene were detected, including six previously unreported potentially pathogenic variants. Among the disease-causing variants, in this cohort, the most frequent was c.5714+5G>A representing 16.1%, while p.Gly1961Glu and p.Leu541Pro represented 15.2% and 8.5%, respectively. Conclusions By using a combination of methods, we completely molecularly diagnosed 48 of the 59 patients studied. In addition, we identified six previously unreported, potentially pathogenic ABCA4 mutations. PMID:29854428

  17. RTS,S/AS01 malaria vaccine mismatch observed among Plasmodium falciparum isolates from southern and central Africa and globally.

    PubMed

    Pringle, Julia C; Carpi, Giovanna; Almagro-Garcia, Jacob; Zhu, Sha Joe; Kobayashi, Tamaki; Mulenga, Modest; Bobanga, Thierry; Chaponda, Mike; Moss, William J; Norris, Douglas E

    2018-04-26

    The RTS,S/AS01 malaria vaccine encompasses the central repeats and C-terminal of Plasmodium falciparum circumsporozoite protein (PfCSP). Although no Phase II clinical trial studies observed evidence of strain-specific immunity, recent studies show a decrease in vaccine efficacy against non-vaccine strain parasites. In light of goals to reduce malaria morbidity, anticipating the effectiveness of RTS,S/AS01 is critical to planning widespread vaccine introduction. We deep sequenced C-terminal Pfcsp from 77 individuals living along the international border in Luapula Province, Zambia and Haut-Katanga Province, the Democratic Republic of the Congo (DRC) and compared translated amino acid haplotypes to the 3D7 vaccine strain. Only 5.2% of the 193 PfCSP sequences from the Zambia-DRC border region matched 3D7 at all 84 amino acids. To further contextualize the genetic diversity sampled in this study with global PfCSP diversity, we analyzed an additional 3,809 Pfcsp sequences from the Pf3k database and constructed a haplotype network representing 15 countries from Africa and Asia. The diversity observed in our samples was similar to the diversity observed in the global haplotype network. These observations underscore the need for additional research assessing genetic diversity in P. falciparum and the impact of PfCSP diversity on RTS,S/AS01 efficacy.

  18. Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

    PubMed

    Ibrahim, Wisam; Abadeh, Mohammad Saniee

    2017-05-21

    Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Identification of microRNAs from Amur grape (vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics

    PubMed Central

    2012-01-01

    Background MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. Results A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Conclusions Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress. PMID:22455456

  20. Ancient human genomes suggest three ancestral populations for present-day Europeans

    PubMed Central

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H.; Schraiber, Joshua G.; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I.; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A.; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M.; Brisighelli, Francesca; Busby, George B. J.; Cali, Francesco; Churnosov, Mikhail; Cole, David E. C.; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A.; Romero, Irene Gallego; Gubina, Marina; Hammer, Michael; Henn, Brenna M.; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R.; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W.; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B.; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G.; Ruiz-Linares, Andres; Tishkoff, Sarah A.; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E.; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-01-01

    We sequenced the genomes of a ~7,000 year old farmer from Germany and eight ~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other ancient genomes1–4 with 2,345 contemporary humans to show that most present Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations’ deep relationships and show that EEF had ~44% ancestry from a “Basal Eurasian” population that split prior to the diversification of other non-African lineages. PMID:25230663

Top