Sample records for discovery sample set

  1. Multiple reaction monitoring (MRM)-profiling for biomarker discovery applied to human polycystic ovarian syndrome.

    PubMed

    Cordeiro, Fernanda B; Ferreira, Christina R; Sobreira, Tiago Jose P; Yannell, Karen E; Jarmusch, Alan K; Cedenho, Agnaldo P; Lo Turco, Edson G; Cooks, R Graham

    2017-09-15

    We describe multiple reaction monitoring (MRM)-profiling, which provides accelerated discovery of discriminating molecular features, and its application to human polycystic ovary syndrome (PCOS) diagnosis. The discovery phase of the MRM-profiling seeks molecular features based on some prior knowledge of the chemical functional groups likely to be present in the sample. It does this through use of a limited number of pre-chosen and chemically specific neutral loss and/or precursor ion MS/MS scans. The output of the discovery phase is a set of precursor/product transitions. In the screening phase these MRM transitions are used to interrogate multiple samples (hence the name MRM-profiling). MRM-profiling was applied to follicular fluid samples of 22 controls and 29 clinically diagnosed PCOS patients. Representative samples were delivered by flow injection to a triple quadrupole mass spectrometer set to perform a number of pre-chosen and chemically specific neutral loss and/or precursor ion MS/MS scans. The output of this discovery phase was a set of 1012 precursor/product transitions. In the screening phase each individual sample was interrogated for these MRM transitions. Principal component analysis (PCA) and receiver operating characteristic (ROC) curves were used for statistical analysis. To evaluate the method's performance, half the samples were used to build a classification model (testing set) and half were blinded (validation set). Twenty transitions were used for the classification of the blind samples, most of them (N = 19) showed lower abundances in the PCOS group and corresponded to phosphatidylethanolamine (PE) and phosphatidylserine (PS) lipids. Agreement of 73% with clinical diagnosis was found when classifying the 26 blind samples. MRM-profiling is a supervised method characterized by its simplicity, speed and the absence of chromatographic separation. It can be used to rapidly isolate discriminating molecules in healthy/disease conditions by tailored screening of signals associated with hundreds of molecules in complex samples. Copyright © 2017 John Wiley & Sons, Ltd.

  2. Development of a universal metabolome-standard method for long-term LC-MS metabolome profiling and its application for bladder cancer urine-metabolite-biomarker discovery.

    PubMed

    Peng, Jun; Chen, Yi-Ting; Chen, Chien-Lun; Li, Liang

    2014-07-01

    Large-scale metabolomics study requires a quantitative method to generate metabolome data over an extended period with high technical reproducibility. We report a universal metabolome-standard (UMS) method, in conjunction with chemical isotope labeling liquid chromatography-mass spectrometry (LC-MS), to provide long-term analytical reproducibility and facilitate metabolome comparison among different data sets. In this method, UMS of a specific type of sample labeled by an isotope reagent is prepared a priori. The UMS is spiked into any individual samples labeled by another form of the isotope reagent in a metabolomics study. The resultant mixture is analyzed by LC-MS to provide relative quantification of the individual sample metabolome to UMS. UMS is independent of a study undertaking as well as the time of analysis and useful for profiling the same type of samples in multiple studies. In this work, the UMS method was developed and applied for a urine metabolomics study of bladder cancer. UMS of human urine was prepared by (13)C2-dansyl labeling of a pooled sample from 20 healthy individuals. This method was first used to profile the discovery samples to generate a list of putative biomarkers potentially useful for bladder cancer detection and then used to analyze the verification samples about one year later. Within the discovery sample set, three-month technical reproducibility was examined using a quality control sample and found a mean CV of 13.9% and median CV of 9.4% for all the quantified metabolites. Statistical analysis of the urine metabolome data showed a clear separation between the bladder cancer group and the control group from the discovery samples, which was confirmed by the verification samples. Receiver operating characteristic (ROC) test showed that the area under the curve (AUC) was 0.956 in the discovery data set and 0.935 in the verification data set. These results demonstrated the utility of the UMS method for long-term metabolomics and discovering potential metabolite biomarkers for diagnosis of bladder cancer.

  3. PERSONAL AND CIRCUMSTANTIAL FACTORS INFLUENCING THE ACT OF DISCOVERY.

    ERIC Educational Resources Information Center

    OSTRANDER, EDWARD R.

    HOW STUDENTS SAY THEY LEARN WAS INVESTIGATED. INTERVIEWS WITH A RANDOM SAMPLE OF 74 WOMEN STUDENTS POSED QUESTIONS ABOUT THE NATURE, FREQUENCY, PATTERNS, AND CIRCUMSTANCES UNDER WHICH ACTS OF DISCOVERY TAKE PLACE IN THE ACADEMIC SETTING. STUDENTS WERE ASSIGNED DISCOVERY RATINGS BASED ON READINGS OF TYPESCRIPTS. EACH STUDENT WAS CLASSIFIED AND…

  4. Improving the quality of biomarker discovery research: the right samples and enough of them.

    PubMed

    Pepe, Margaret S; Li, Christopher I; Feng, Ziding

    2015-06-01

    Biomarker discovery research has yielded few biomarkers that validate for clinical use. A contributing factor may be poor study designs. The goal in discovery research is to identify a subset of potentially useful markers from a large set of candidates assayed on case and control samples. We recommend the PRoBE design for selecting samples. We propose sample size calculations that require specifying: (i) a definition for biomarker performance; (ii) the proportion of useful markers the study should identify (Discovery Power); and (iii) the tolerable number of useless markers amongst those identified (False Leads Expected, FLE). We apply the methodology to a study of 9,000 candidate biomarkers for risk of colon cancer recurrence where a useful biomarker has positive predictive value ≥ 30%. We find that 40 patients with recurrence and 160 without recurrence suffice to filter out 98% of useless markers (2% FLE) while identifying 95% of useful biomarkers (95% Discovery Power). Alternative methods for sample size calculation required more assumptions. Biomarker discovery research should utilize quality biospecimen repositories and include sample sizes that enable markers meeting prespecified performance characteristics for well-defined clinical applications to be identified. The scientific rigor of discovery research should be improved. ©2015 American Association for Cancer Research.

  5. Signature-Discovery Approach for Sample Matching of a Nerve-Agent Precursor using Liquid Chromatography–Mass Spectrometry, XCMS, and Chemometrics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fraga, Carlos G.; Clowers, Brian H.; Moore, Ronald J.

    2010-05-15

    This report demonstrates the use of bioinformatic and chemometric tools on liquid chromatography mass spectrometry (LC-MS) data for the discovery of ultra-trace forensic signatures for sample matching of various stocks of the nerve-agent precursor known as methylphosphonic dichloride (dichlor). The use of the bioinformatic tool known as XCMS was used to comprehensively search and find candidate LC-MS peaks in a known set of dichlor samples. These candidate peaks were down selected to a group of 34 impurity peaks. Hierarchal cluster analysis and factor analysis demonstrated the potential of these 34 impurities peaks for matching samples based on their stock source.more » Only one pair of dichlor stocks was not differentiated from one another. An acceptable chemometric approach for sample matching was determined to be variance scaling and signal averaging of normalized duplicate impurity profiles prior to classification by k-nearest neighbors. Using this approach, a test set of dichlor samples were all correctly matched to their source stock. The sample preparation and LC-MS method permitted the detection of dichlor impurities presumably in the parts-per-trillion (w/w). The detection of a common impurity in all dichlor stocks that were synthesized over a 14-year period and by different manufacturers was an unexpected discovery. Our described signature-discovery approach should be useful in the development of a forensic capability to help in criminal investigations following chemical attacks.« less

  6. Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease.

    PubMed

    Taguchi, Y-h; Iwadate, Mitsuo; Umeyama, Hideaki

    2015-04-30

    Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery.

  7. Tackling the conformational sampling of larger flexible compounds and macrocycles in pharmacology and drug discovery.

    PubMed

    Chen, I-Jen; Foloppe, Nicolas

    2013-12-15

    Computational conformational sampling underpins much of molecular modeling and design in pharmaceutical work. The sampling of smaller drug-like compounds has been an active area of research. However, few studies have tested in details the sampling of larger more flexible compounds, which are also relevant to drug discovery, including therapeutic peptides, macrocycles, and inhibitors of protein-protein interactions. Here, we investigate extensively mainstream conformational sampling methods on three carefully curated compound sets, namely the 'Drug-like', larger 'Flexible', and 'Macrocycle' compounds. These test molecules are chemically diverse with reliable X-ray protein-bound bioactive structures. The compared sampling methods include Stochastic Search and the recent LowModeMD from MOE, all the low-mode based approaches from MacroModel, and MD/LLMOD recently developed for macrocycles. In addition to default settings, key parameters of the sampling protocols were explored. The performance of the computational protocols was assessed via (i) the reproduction of the X-ray bioactive structures, (ii) the size, coverage and diversity of the output conformational ensembles, (iii) the compactness/extendedness of the conformers, and (iv) the ability to locate the global energy minimum. The influence of the stochastic nature of the searches on the results was also examined. Much better results were obtained by adopting search parameters enhanced over the default settings, while maintaining computational tractability. In MOE, the recent LowModeMD emerged as the method of choice. Mixed torsional/low-mode from MacroModel performed as well as LowModeMD, and MD/LLMOD performed well for macrocycles. The low-mode based approaches yielded very encouraging results with the flexible and macrocycle sets. Thus, one can productively tackle the computational conformational search of larger flexible compounds for drug discovery, including macrocycles. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. Flow Cytometry: Impact on Early Drug Discovery.

    PubMed

    Edwards, Bruce S; Sklar, Larry A

    2015-07-01

    Modern flow cytometers can make optical measurements of 10 or more parameters per cell at tens of thousands of cells per second and more than five orders of magnitude dynamic range. Although flow cytometry is used in most drug discovery stages, "sip-and-spit" sampling technology has restricted it to low-sample-throughput applications. The advent of HyperCyt sampling technology has recently made possible primary screening applications in which tens of thousands of compounds are analyzed per day. Target-multiplexing methodologies in combination with extended multiparameter analyses enable profiling of lead candidates early in the discovery process, when the greatest numbers of candidates are available for evaluation. The ability to sample small volumes with negligible waste reduces reagent costs, compound usage, and consumption of cells. Improved compound library formatting strategies can further extend primary screening opportunities when samples are scarce. Dozens of targets have been screened in 384- and 1536-well assay formats, predominantly in academic screening lab settings. In concert with commercial platform evolution and trending drug discovery strategies, HyperCyt-based systems are now finding their way into mainstream screening labs. Recent advances in flow-based imaging, mass spectrometry, and parallel sample processing promise dramatically expanded single-cell profiling capabilities to bolster systems-level approaches to drug discovery. © 2015 Society for Laboratory Automation and Screening.

  9. Flow Cytometry: Impact On Early Drug Discovery

    PubMed Central

    Edwards, Bruce S.; Sklar, Larry A.

    2015-01-01

    Summary Modern flow cytometers can make optical measurements of 10 or more parameters per cell at tens-of-thousands of cells per second and over five orders of magnitude dynamic range. Although flow cytometry is used in most drug discovery stages, “sip-and-spit” sampling technology has restricted it to low sample throughput applications. The advent of HyperCyt sampling technology has recently made possible primary screening applications in which tens-of-thousands of compounds are analyzed per day. Target-multiplexing methodologies in combination with extended multi-parameter analyses enable profiling of lead candidates early in the discovery process, when the greatest numbers of candidates are available for evaluation. The ability to sample small volumes with negligible waste reduces reagent costs, compound usage and consumption of cells. Improved compound library formatting strategies can further extend primary screening opportunities when samples are scarce. Dozens of targets have been screened in 384- and 1536-well assay formats, predominantly in academic screening lab settings. In concert with commercial platform evolution and trending drug discovery strategies, HyperCyt-based systems are now finding their way into mainstream screening labs. Recent advances in flow-based imaging, mass spectrometry and parallel sample processing promise dramatically expanded single cell profiling capabilities to bolster systems level approaches to drug discovery. PMID:25805180

  10. Open science resources for the discovery and analysis of Tara Oceans data

    PubMed Central

    Pesant, Stéphane; Not, Fabrice; Picheral, Marc; Kandels-Lewis, Stefanie; Le Bescot, Noan; Gorsky, Gabriel; Iudicone, Daniele; Karsenti, Eric; Speich, Sabrina; Troublé, Romain; Dimier, Céline; Searson, Sarah; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; Vargas, Colomban De; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Krzic, Uros; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Reynaud, Emmanuel G.; Sardet, Christian; Sieracki, Mike; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Velayoudon, Didier; Weissenbach, Jean; Wincker, Patrick

    2015-01-01

    The Tara Oceans expedition (2009–2013) sampled contrasting ecosystems of the world oceans, collecting environmental data and plankton, from viruses to metazoans, for later analysis using modern sequencing and state-of-the-art imaging technologies. It surveyed 210 ecosystems in 20 biogeographic provinces, collecting over 35,000 samples of seawater and plankton. The interpretation of such an extensive collection of samples in their ecological context requires means to explore, assess and access raw and validated data sets. To address this challenge, the Tara Oceans Consortium offers open science resources, including the use of open access archives for nucleotides (ENA) and for environmental, biogeochemical, taxonomic and morphological data (PANGAEA), and the development of on line discovery tools and collaborative annotation tools for sequences and images. Here, we present an overview of Tara Oceans Data, and we provide detailed registries (data sets) of all campaigns (from port-to-port), stations and sampling events. PMID:26029378

  11. Open science resources for the discovery and analysis of Tara Oceans data

    NASA Astrophysics Data System (ADS)

    2015-05-01

    The Tara Oceans expedition (2009-2013) sampled contrasting ecosystems of the world oceans, collecting environmental data and plankton, from viruses to metazoans, for later analysis using modern sequencing and state-of-the-art imaging technologies. It surveyed 210 ecosystems in 20 biogeographic provinces, collecting over 35,000 samples of seawater and plankton. The interpretation of such an extensive collection of samples in their ecological context requires means to explore, assess and access raw and validated data sets. To address this challenge, the Tara Oceans Consortium offers open science resources, including the use of open access archives for nucleotides (ENA) and for environmental, biogeochemical, taxonomic and morphological data (PANGAEA), and the development of on line discovery tools and collaborative annotation tools for sequences and images. Here, we present an overview of Tara Oceans Data, and we provide detailed registries (data sets) of all campaigns (from port-to-port), stations and sampling events.

  12. Open science resources for the discovery and analysis of Tara Oceans data.

    PubMed

    Pesant, Stéphane; Not, Fabrice; Picheral, Marc; Kandels-Lewis, Stefanie; Le Bescot, Noan; Gorsky, Gabriel; Iudicone, Daniele; Karsenti, Eric; Speich, Sabrina; Troublé, Romain; Dimier, Céline; Searson, Sarah

    2015-01-01

    The Tara Oceans expedition (2009-2013) sampled contrasting ecosystems of the world oceans, collecting environmental data and plankton, from viruses to metazoans, for later analysis using modern sequencing and state-of-the-art imaging technologies. It surveyed 210 ecosystems in 20 biogeographic provinces, collecting over 35,000 samples of seawater and plankton. The interpretation of such an extensive collection of samples in their ecological context requires means to explore, assess and access raw and validated data sets. To address this challenge, the Tara Oceans Consortium offers open science resources, including the use of open access archives for nucleotides (ENA) and for environmental, biogeochemical, taxonomic and morphological data (PANGAEA), and the development of on line discovery tools and collaborative annotation tools for sequences and images. Here, we present an overview of Tara Oceans Data, and we provide detailed registries (data sets) of all campaigns (from port-to-port), stations and sampling events.

  13. Study design and data analysis considerations for the discovery of prognostic molecular biomarkers: a case study of progression free survival in advanced serous ovarian cancer.

    PubMed

    Qin, Li-Xuan; Levine, Douglas A

    2016-06-10

    Accurate discovery of molecular biomarkers that are prognostic of a clinical outcome is an important yet challenging task, partly due to the combination of the typically weak genomic signal for a clinical outcome and the frequently strong noise due to microarray handling effects. Effective strategies to resolve this challenge are in dire need. We set out to assess the use of careful study design and data normalization for the discovery of prognostic molecular biomarkers. Taking progression free survival in advanced serous ovarian cancer as an example, we conducted empirical analysis on two sets of microRNA arrays for the same set of tumor samples: arrays in one set were collected using careful study design (that is, uniform handling and randomized array-to-sample assignment) and arrays in the other set were not. We found that (1) handling effects can confound the clinical outcome under study as a result of chance even with randomization, (2) the level of confounding handling effects can be reduced by data normalization, and (3) good study design cannot be replaced by post-hoc normalization. In addition, we provided a practical approach to define positive and negative control markers for detecting handling effects and assessing the performance of a normalization method. Our work showcased the difficulty of finding prognostic biomarkers for a clinical outcome of weak genomic signals, illustrated the benefits of careful study design and data normalization, and provided a practical approach to identify handling effects and select a beneficial normalization method. Our work calls for careful study design and data analysis for the discovery of robust and translatable molecular biomarkers.

  14. Simultaneous isoform discovery and quantification from RNA-seq.

    PubMed

    Hiller, David; Wong, Wing Hung

    2013-05-01

    RNA sequencing is a recent technology which has seen an explosion of methods addressing all levels of analysis, from read mapping to transcript assembly to differential expression modeling. In particular the discovery of isoforms at the transcript assembly stage is a complex problem and current approaches suffer from various limitations. For instance, many approaches use graphs to construct a minimal set of isoforms which covers the observed reads, then perform a separate algorithm to quantify the isoforms, which can result in a loss of power. Current methods also use ad-hoc solutions to deal with the vast number of possible isoforms which can be constructed from a given set of reads. Finally, while the need of taking into account features such as read pairing and sampling rate of reads has been acknowledged, most existing methods do not seamlessly integrate these features as part of the model. We present Montebello, an integrated statistical approach which performs simultaneous isoform discovery and quantification by using a Monte Carlo simulation to find the most likely isoform composition leading to a set of observed reads. We compare Montebello to Cufflinks, a popular isoform discovery approach, on a simulated data set and on 46.3 million brain reads from an Illumina tissue panel. On this data set Montebello appears to offer a modest improvement over Cufflinks when considering discovery and parsimony metrics. In addition Montebello mitigates specific difficulties inherent in the Cufflinks approach. Finally, Montebello can be fine-tuned depending on the type of solution desired.

  15. Use of eQTL Analysis for the Discovery of Target Genes Identified by GWAS

    DTIC Science & Technology

    2014-04-01

    technology. Cases having a RIN number of 7.0 or greater were considered good quality. Once completed, the optimum set of 500 samples were then selected for...AD_________________ Award Number: W81XWH-11-1-0261 TITLE: Use of eQTL Analysis for the Discovery...Distribution Unlimited The views, opinions and/or findings contained in this report are those of the author(s) and

  16. An analysis of gene expression in PTSD implicates genes involved in the glucocorticoid receptor pathway and neural responses to stress

    PubMed Central

    Logue, Mark W.; Smith, Alicia K.; Baldwin, Clinton; Wolf, Erika J.; Guffanti, Guia; Ratanatharathorn, Andrew; Stone, Annjanette; Schichman, Steven A.; Humphries, Donald; Binder, Elisabeth B.; Arloth, Janine; Menke, Andreas; Uddin, Monica; Wildman, Derek; Galea, Sandro; Aiello, Allison E.; Koenen, Karestan C.; Miller, Mark W.

    2015-01-01

    We examined the association between posttraumatic stress disorder (PTSD) and gene expression using whole blood samples from a cohort of trauma-exposed white non-Hispanic male veterans (115 cases and 28 controls). 10,264 probes of genes and gene transcripts were analyzed. We found 41 that were differentially expressed in PTSD cases versus controls (multiple-testing corrected p<0.05). The most significant was DSCAM, a neurological gene expressed widely in the developing brain and in the amygdala and hippocampus of the adult brain. We then examined the 41 differentially expressed genes in a meta-analysis using two replication cohorts and found significant associations with PTSD for 7 of the 41 (p<0.05), one of which (ATP6AP1L) survived multiple-testing correction. There was also broad evidence of overlap across the discovery and replication samples for the entire set of genes implicated in the discovery data based on the direction of effect and an enrichment of p<0.05 significant probes beyond what would be expected under the null. Finally, we found that the set of differentially expressed genes from the discovery sample was enriched for genes responsive to glucocorticoid signaling with most showing reduced expression in PTSD cases compared to controls. PMID:25867994

  17. Better cancer biomarker discovery through better study design.

    PubMed

    Rundle, Andrew; Ahsan, Habibul; Vineis, Paolo

    2012-12-01

    High-throughput laboratory technologies coupled with sophisticated bioinformatics algorithms have tremendous potential for discovering novel biomarkers, or profiles of biomarkers, that could serve as predictors of disease risk, response to treatment or prognosis. We discuss methodological issues in wedding high-throughput approaches for biomarker discovery with the case-control study designs typically used in biomarker discovery studies, especially focusing on nested case-control designs. We review principles for nested case-control study design in relation to biomarker discovery studies and describe how the efficiency of biomarker discovery can be effected by study design choices. We develop a simulated prostate cancer cohort data set and a series of biomarker discovery case-control studies nested within the cohort to illustrate how study design choices can influence biomarker discovery process. Common elements of nested case-control design, incidence density sampling and matching of controls to cases are not typically factored correctly into biomarker discovery analyses, inducing bias in the discovery process. We illustrate how incidence density sampling and matching of controls to cases reduce the apparent specificity of truly valid biomarkers 'discovered' in a nested case-control study. We also propose and demonstrate a new case-control matching protocol, we call 'antimatching', that improves the efficiency of biomarker discovery studies. For a valid, but as yet undiscovered, biomarker(s) disjunctions between correctly designed epidemiologic studies and the practice of biomarker discovery reduce the likelihood that true biomarker(s) will be discovered and increases the false-positive discovery rate. © 2012 The Authors. European Journal of Clinical Investigation © 2012 Stichting European Society for Clinical Investigation Journal Foundation.

  18. Oncology biomarkers: discovery, validation, and clinical use.

    PubMed

    Heckman-Stoddard, Brandy M

    2012-05-01

    To discuss the discovery, validation, and clinical use of multiple types of biomarkers. Medical literature and published guidelines. Formal validation of biomarkers should include both retrospective analyses of well-characterized samples as well as a prospective clinical trial in which the biomarker is tested for its ability to predict the presence of disease or the efficacy of a cancer therapy. Biomarker development is complicated, with very few biomarker discoveries leading to clinically useful tests. Nurses should understand how a biomarker was developed, including the sensitivity and specificity before applying new biomarkers in the clinical setting. Copyright © 2012. Published by Elsevier Inc.

  19. MicroRNAs for Detection of Pancreatic Neoplasia

    PubMed Central

    Vila-Navarro, Elena; Vila-Casadesús, Maria; Moreira, Leticia; Duran-Sanchon, Saray; Sinha, Rupal; Ginés, Àngels; Fernández-Esparrach, Glòria; Miquel, Rosa; Cuatrecasas, Miriam; Castells, Antoni; Lozano, Juan José; Gironella, Meritxell

    2017-01-01

    Objective: The aim of our study was to analyze the miRNome of pancreatic ductal adenocarcinoma (PDAC) and its preneoplastic lesion intraductal papillary mucinous neoplasm (IPMN), to find new microRNA (miRNA)-based biomarkers for early detection of pancreatic neoplasia. Objective: Effective early detection methods for PDAC are needed. miRNAs are good biomarker candidates. Methods: Pancreatic tissues (n = 165) were obtained from patients with PDAC, IPMN, or from control individuals (C), from Hospital Clínic of Barcelona. Biomarker discovery was done using next-generation sequencing in a discovery set of 18 surgical samples (11 PDAC, 4 IPMN, 3 C). MiRNA validation was carried out by quantitative reverse transcriptase PCR in 2 different set of samples. Set 1—52 surgical samples (24 PDAC, 7 IPMN, 6 chronic pancreatitis, 15 C), and set 2—95 endoscopic ultrasound-guided fine-needle aspirations (60 PDAC, 9 IPMN, 26 C). Results: In all, 607 and 396 miRNAs were significantly deregulated in PDAC and IPMN versus C. Of them, 40 miRNAs commonly overexpressed in both PDAC and IPMN were selected for further validation. Among them, significant up-regulation of 31 and 30 miRNAs was confirmed by quantitative reverse transcriptase PCR in samples from set 1 and set 2, respectively. Conclusions: miRNome analysis shows that PDAC and IPMN have differential miRNA profiles with respect to C, with a large number of deregulated miRNAs shared by both neoplastic lesions. Indeed, we have identified and validated 30 miRNAs whose expression is significantly increased in PDAC and IPMN lesions. The feasibility of detecting these miRNAs in endoscopic ultrasound-guided fine-needle aspiration samples makes them good biomarker candidates for early detection of pancreatic cancer. PMID:27232245

  20. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics.

    PubMed

    Liley, James; Wallace, Chris

    2015-02-01

    Genome-wide association studies (GWAS) have been successful in identifying single nucleotide polymorphisms (SNPs) associated with many traits and diseases. However, at existing sample sizes, these variants explain only part of the estimated heritability. Leverage of GWAS results from related phenotypes may improve detection without the need for larger datasets. The Bayesian conditional false discovery rate (cFDR) constitutes an upper bound on the expected false discovery rate (FDR) across a set of SNPs whose p values for two diseases are both less than two disease-specific thresholds. Calculation of the cFDR requires only summary statistics and have several advantages over traditional GWAS analysis. However, existing methods require distinct control samples between studies. Here, we extend the technique to allow for some or all controls to be shared, increasing applicability. Several different SNP sets can be defined with the same cFDR value, and we show that the expected FDR across the union of these sets may exceed expected FDR in any single set. We describe a procedure to establish an upper bound for the expected FDR among the union of such sets of SNPs. We apply our technique to pairwise analysis of p values from ten autoimmune diseases with variable sharing of controls, enabling discovery of 59 SNP-disease associations which do not reach GWAS significance after genomic control in individual datasets. Most of the SNPs we highlight have previously been confirmed using replication studies or larger GWAS, a useful validation of our technique; we report eight SNP-disease associations across five diseases not previously declared. Our technique extends and strengthens the previous algorithm, and establishes robust limits on the expected FDR. This approach can improve SNP detection in GWAS, and give insight into shared aetiology between phenotypically related conditions.

  1. Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

    PubMed

    Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

    2014-01-01

    Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

  2. SELDI-TOF MS of quadruplicate urine and serum samples to evaluate changes related to storage conditions.

    PubMed

    Traum, Avram Z; Wells, Meghan P; Aivado, Manuel; Libermann, Towia A; Ramoni, Marco F; Schachter, Asher D

    2006-03-01

    Proteomic profiling with SELDI-TOF MS has facilitated the discovery of disease-specific protein profiles. However, multicenter studies are often hindered by the logistics required for prompt deep-freezing of samples in liquid nitrogen or dry ice within the clinic setting prior to shipping. We report high concordance between MS profiles within sets of quadruplicate split urine and serum samples deep-frozen at 0, 2, 6, and 24 h after sample collection. Gage R&R results confirm that deep-freezing times are not a statistically significant source of SELDI-TOF MS variability for either blood or urine.

  3. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    PubMed

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  4. Mining the human urine proteome for monitoring renal transplant injury

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sigdel, Tara K.; Gao, Yuqian; He, Jintang

    The human urinary proteome reflects systemic and inherent renal injury perturbations and can be analyzed to harness specific biomarkers for different kidney transplant injury states. 396 unique urine samples were collected contemporaneously with an allograft biopsy from 396 unique kidney transplant recipients. Centralized, blinded histology on the graft was used to classify matched urine samples into categories of acute rejection (AR), chronic allograft nephropathy (CAN), BK virus nephritis (BKVN), and stable graft (STA). Liquid chromatography–mass spectrometry (LC-MS) based proteomics using iTRAQ based discovery (n=108) and global label-free LC-MS analyses of individual samples (n=137) for quantitative proteome assessment were used inmore » the discovery step. Selected reaction monitoring (SRM) was applied to identify and validate minimal urine protein/peptide biomarkers to accurately segregate organ injury causation and pathology on unique urine samples (n=151). A total of 958 proteins were initially quantified by iTRAQ, 87% of which were also identified among 1574 urine proteins detected in LC-MS validation. 103 urine proteins were significantly (p<0.05) perturbed in injury and enriched for humoral immunity, complement activation, and lymphocyte trafficking. A set of 131 peptides corresponding to 78 proteins were assessed by SRM for their significance in an independent sample cohort. A minimal set of 35 peptides mapping to 33 proteins, were modeled to segregate different injury groups (AUC =93% for AR, 99% for CAN, 83% for BKVN). Urinary proteome discovery and targeted validation identified urine protein fingerprints for non-invasive differentiation of kidney transplant injuries, thus opening the door for personalized immune risk assessment and therapy.« less

  5. Enhancing the chemical selectivity in discovery-based analysis with tandem ionization time-of-flight mass spectrometry detection for comprehensive two-dimensional gas chromatography.

    PubMed

    Freye, Chris E; Moore, Nicholas R; Synovec, Robert E

    2018-02-16

    The complementary information provided by tandem ionization time-of-flight mass spectrometry (TI-TOFMS) is investigated for comparative discovery-based analysis, when coupled with comprehensive two-dimensional gas chromatography (GC × GC). The TI conditions implemented were a hard ionization energy (70 eV) concurrently collected with a soft ionization energy (14 eV). Tile-based Fisher ratio (F-ratio) analysis is used to analyze diesel fuel spiked with twelve analytes at a nominal concentration of 50 ppm. F-ratio analysis is a supervised discovery-based technique that compares two different sample classes, in this case spiked and unspiked diesel, to reduce the complex GC × GC-TI-TOFMS data into a hit list of class distinguishing analyte features. Hit lists of the 70 eV and 14 eV data sets, and the single hit list produced when the two data sets are fused together, are all investigated. For the 70 eV hit list, eleven of the twelve analytes were found in the top thirteen hits. For the 14 eV hit list, nine of the twelve analytes were found in the top nine hits, with the other three analytes either not found or well down the hit list. As expected, the F-ratios per m/z used to calculate each average F-ratio per hit were generally smaller fragment ions for the 70 eV data set, while the larger fragment ions were emphasized in the 14 eV data set, supporting the notion that complementary information was provided. The discovery rate was improved when F-ratio analysis was performed on the fused data sets resulted in eleven of the twelve analytes being at the top of the single hit list. Using PARAFAC, analytes that were "discovered" were deconvoluted in order to obtain their identification via match values (MV). Location of the analytes and the "F-ratio spectra" obtained from F-ratio analysis were used to guide the deconvolution. Eight of the twelve analytes where successfully deconvoluted and identified using the in-house library for the 70 eV data set. PARAFAC deconvolution of the two separate data sets provided increased confidence in identification of "discovered" analytes. Herein, we explore the limit of analyte discovery and limit of analyte identification, and demonstrate a general workflow for the investigation of key chemical features in complex samples. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Blood Sampling and Preparation Procedures for Proteomic Biomarker Studies of Psychiatric Disorders.

    PubMed

    Guest, Paul C; Rahmoune, Hassan

    2017-01-01

    A major challenge in proteomic biomarker discovery and validation for psychiatric diseases is the inherent biological complexity underlying these conditions. There are also many technical issues which hinder this process such as the lack of standardization in sampling, processing and storage of bio-samples in preclinical and clinical settings. This chapter describes a reproducible procedure for sampling blood serum and plasma that is specifically designed for maximizing data quality output in two-dimensional gel electrophoresis, multiplex immunoassay and mass spectrometry profiling studies.

  7. The Influence of Metabolic Syndrome and Sex on the DNA Methylome in Schizophrenia

    PubMed Central

    Lines, Brittany N.

    2018-01-01

    Introduction The mechanism by which metabolic syndrome occurs in schizophrenia is not completely known; however, previous work suggests that changes in DNA methylation may be involved which is further influenced by sex. Within this study, the DNA methylome was profiled to identify altered methylation associated with metabolic syndrome in a schizophrenia population on atypical antipsychotics. Methods Peripheral blood from schizophrenia subjects was utilized for DNA methylation analyses. Discovery analyses (n = 96) were performed using an epigenome-wide analysis on the Illumina HumanMethylation450K BeadChip based on metabolic syndrome diagnosis. A secondary discovery analysis was conducted based on sex. The top hits from the discovery analyses were assessed in an additional validation set (n = 166) using site-specific methylation pyrosequencing. Results A significant increase in CDH22 gene methylation in subjects with metabolic syndrome was identified in the overall sample. Additionally, differential methylation was found within the MAP3K13 gene in females and the CCDC8 gene within males. Significant differences in methylation were again observed for the CDH22 and MAP3K13 genes, but not CCDC8, in the validation sample set. Conclusions This study provides preliminary evidence that DNA methylation may be associated with metabolic syndrome and sex in schizophrenia. PMID:29850476

  8. RNA-Seq-based toxicogenomic assessment of fresh frozen and formalin-fixed tissues yields similar mechanistic insights.

    PubMed

    Auerbach, Scott S; Phadke, Dhiral P; Mav, Deepak; Holmgren, Stephanie; Gao, Yuan; Xie, Bin; Shin, Joo Heon; Shah, Ruchir R; Merrick, B Alex; Tice, Raymond R

    2015-07-01

    Formalin-fixed, paraffin-embedded (FFPE) pathology specimens represent a potentially vast resource for transcriptomic-based biomarker discovery. We present here a comparison of results from a whole transcriptome RNA-Seq analysis of RNA extracted from fresh frozen and FFPE livers. The samples were derived from rats exposed to aflatoxin B1 (AFB1 ) and a corresponding set of control animals. Principal components analysis indicated that samples were separated in the two groups representing presence or absence of chemical exposure, both in fresh frozen and FFPE sample types. Sixty-five percent of the differentially expressed transcripts (AFB1 vs. controls) in fresh frozen samples were also differentially expressed in FFPE samples (overlap significance: P < 0.0001). Genomic signature and gene set analysis of AFB1 differentially expressed transcript lists indicated highly similar results between fresh frozen and FFPE at the level of chemogenomic signatures (i.e., single chemical/dose/duration elicited transcriptomic signatures), mechanistic and pathology signatures, biological processes, canonical pathways and transcription factor networks. Overall, our results suggest that similar hypotheses about the biological mechanism of toxicity would be formulated from fresh frozen and FFPE samples. These results indicate that phenotypically anchored archival specimens represent a potentially informative resource for signature-based biomarker discovery and mechanistic characterization of toxicity. Copyright © 2014 John Wiley & Sons, Ltd.

  9. Micro-CAI in Education: Some Considerations.

    ERIC Educational Resources Information Center

    Majsterek, David

    This paper focuses on the applications which best suit the microcomputer in an educational setting with emphasis on adapting effective pedagogical practice to the computer's programability and delivery capabilities. Discovery learning and "being told" are identified as two types of computer assisted instruction (CAI) and sample uses of…

  10. A genome-wide association study of anorexia nervosa.

    PubMed

    Boraska, V; Franklin, C S; Floyd, J A B; Thornton, L M; Huckins, L M; Southam, L; Rayner, N W; Tachmazidou, I; Klump, K L; Treasure, J; Lewis, C M; Schmidt, U; Tozzi, F; Kiezebrink, K; Hebebrand, J; Gorwood, P; Adan, R A H; Kas, M J H; Favaro, A; Santonastaso, P; Fernández-Aranda, F; Gratacos, M; Rybakowski, F; Dmitrzak-Weglarz, M; Kaprio, J; Keski-Rahkonen, A; Raevuori, A; Van Furth, E F; Slof-Op 't Landt, M C T; Hudson, J I; Reichborn-Kjennerud, T; Knudsen, G P S; Monteleone, P; Kaplan, A S; Karwautz, A; Hakonarson, H; Berrettini, W H; Guo, Y; Li, D; Schork, N J; Komaki, G; Ando, T; Inoko, H; Esko, T; Fischer, K; Männik, K; Metspalu, A; Baker, J H; Cone, R D; Dackor, J; DeSocio, J E; Hilliard, C E; O'Toole, J K; Pantel, J; Szatkiewicz, J P; Taico, C; Zerwas, S; Trace, S E; Davis, O S P; Helder, S; Bühren, K; Burghardt, R; de Zwaan, M; Egberts, K; Ehrlich, S; Herpertz-Dahlmann, B; Herzog, W; Imgart, H; Scherag, A; Scherag, S; Zipfel, S; Boni, C; Ramoz, N; Versini, A; Brandys, M K; Danner, U N; de Kovel, C; Hendriks, J; Koeleman, B P C; Ophoff, R A; Strengman, E; van Elburg, A A; Bruson, A; Clementi, M; Degortes, D; Forzan, M; Tenconi, E; Docampo, E; Escaramís, G; Jiménez-Murcia, S; Lissowska, J; Rajewski, A; Szeszenia-Dabrowska, N; Slopien, A; Hauser, J; Karhunen, L; Meulenbelt, I; Slagboom, P E; Tortorella, A; Maj, M; Dedoussis, G; Dikeos, D; Gonidakis, F; Tziouvas, K; Tsitsika, A; Papezova, H; Slachtova, L; Martaskova, D; Kennedy, J L; Levitan, R D; Yilmaz, Z; Huemer, J; Koubek, D; Merl, E; Wagner, G; Lichtenstein, P; Breen, G; Cohen-Woods, S; Farmer, A; McGuffin, P; Cichon, S; Giegling, I; Herms, S; Rujescu, D; Schreiber, S; Wichmann, H-E; Dina, C; Sladek, R; Gambaro, G; Soranzo, N; Julia, A; Marsal, S; Rabionet, R; Gaborieau, V; Dick, D M; Palotie, A; Ripatti, S; Widén, E; Andreassen, O A; Espeseth, T; Lundervold, A; Reinvang, I; Steen, V M; Le Hellard, S; Mattingsdal, M; Ntalla, I; Bencko, V; Foretova, L; Janout, V; Navratilova, M; Gallinger, S; Pinto, D; Scherer, S W; Aschauer, H; Carlberg, L; Schosser, A; Alfredsson, L; Ding, B; Klareskog, L; Padyukov, L; Courtet, P; Guillaume, S; Jaussent, I; Finan, C; Kalsi, G; Roberts, M; Logan, D W; Peltonen, L; Ritchie, G R S; Barrett, J C; Estivill, X; Hinney, A; Sullivan, P F; Collier, D A; Zeggini, E; Bulik, C M

    2014-10-01

    Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome-wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2907 cases with AN from 14 countries (15 sites) and 14 860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery data sets. Seventy-six (72 independent) single nucleotide polymorphisms were taken forward for in silico (two data sets) or de novo (13 data sets) replication genotyping in 2677 independent AN cases and 8629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication data sets comprised 5551 AN cases and 21 080 controls. AN subtype analyses (1606 AN restricting; 1445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01 × 10(-7)) in SOX2OT and rs17030795 (P=5.84 × 10(-6)) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76 × 10(-)(6)) between CUL3 and FAM124B and rs1886797 (P=8.05 × 10(-)(6)) near SPATA13. Comparing discovery with replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4 × 10(-6)), strongly suggesting that true findings exist but our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.

  11. Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data

    PubMed Central

    Yi, Ming; Zhao, Yongmei; Jia, Li; He, Mei; Kebebew, Electron; Stephens, Robert M.

    2014-01-01

    To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios—family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest. PMID:24831545

  12. Global Characterization of Protein Altering Mutations in Prostate Cancer

    DTIC Science & Technology

    2011-08-01

    integrative analyses of somatic mutation with gene expression and copy number change data collected on the same samples. To date, we have performed...implications for resistance to cancer therapeutics. We have also identified a subset of genes that appear to be recurrently mutated in our discovery set, and...integrative analyses of somatic mutation with gene expression and copy number change data collected on the same samples. Body This is a “synergy” project

  13. An Integrated Clinico-transcriptomic Approach Identifies a Central Role of the Heme Degradation Pathway for Septic Complications after Trauma.

    PubMed

    Rittirsch, Daniel; Schoenborn, Veit; Lindig, Sandro; Wanner, Elisabeth; Sprengel, Kai; Günkel, Sebastian; Blaess, Markus; Schaarschmidt, Barbara; Sailer, Patricia; Märsmann, Sonja; Simmen, Hans-Peter; Cinelli, Paolo; Bauer, Michael; Claus, Ralf A; Wanner, Guido A

    2016-12-01

    The present study was aimed to identify mechanisms linked to complicated courses and adverse events after severe trauma by a systems biology approach. In severe trauma, overwhelming systemic inflammation can result in additional damage and the development of complications, including sepsis. In a prospective, longitudinal single-center study, RNA samples from circulating leukocytes from patients with multiple injury (injury severity score ≥17 points; n = 81) were analyzed for dynamic changes in gene expression over a period of 21 days by whole-genome screening (discovery set; n = 10 patients; 90 samples) and quantitative RT-PCR (validation set; n = 71 patients, 517 samples). Multivariate correlational analysis of transcripts and clinical parameters was used to identify mechanisms related to sepsis. Transcriptome profiling of the discovery set revealed the strongest changes between patients with either systemic inflammation or sepsis in gene expression of the heme degradation pathway. Using quantitative RT-PCR analyses (validation set), the key components haptoglobin (HP), cluster of differentiation (CD) 163, heme oxygenase-1 (HMOX1), and biliverdin reductase A (BLVRA) showed robust changes following trauma. Upregulation of HP was associated with the severity of systemic inflammation and the development of sepsis. Patients who received allogeneic blood transfusions had a higher incidence of nosocomial infections and sepsis, and the amount of blood transfusion as source of free heme correlated with the expression pattern of HP. These findings indicate that the heme degradation pathway is associated with increased susceptibility to septic complications after trauma, which is indicated by HP expression in particular.

  14. Turning publicly available gene expression data into discoveries using gene set context analysis.

    PubMed

    Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

    2016-01-08

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Association of High Myopia with Crystallin Beta A4 (CRYBA4) Gene Polymorphisms in the Linkage-Identified MYP6 Locus

    PubMed Central

    Ho, Daniel W. H.; Yap, Maurice K. H.; Ng, Po Wah; Fung, Wai Yan; Yip, Shea Ping

    2012-01-01

    Background Myopia is the most common ocular disorder worldwide and imposes tremendous burden on the society. It is a complex disease. The MYP6 locus at 22 q12 is of particular interest because many studies have detected linkage signals at this interval. The MYP6 locus is likely to contain susceptibility gene(s) for myopia, but none has yet been identified. Methodology/Principal Findings Two independent subject groups of southern Chinese in Hong Kong participated in the study an initial study using a discovery sample set of 342 cases and 342 controls, and a follow-up study using a replication sample set of 316 cases and 313 controls. Cases with high myopia were defined by spherical equivalent ≤ -8 dioptres and emmetropic controls by spherical equivalent within ±1.00 dioptre for both eyes. Manual candidate gene selection from the MYP6 locus was supported by objective in silico prioritization. DNA samples of discovery sample set were genotyped for 178 tagging single nucleotide polymorphisms (SNPs) from 26 genes. For replication, 25 SNPs (tagging or located at predicted transcription factor or microRNA binding sites) from 4 genes were subsequently examined using the replication sample set. Fisher P value was calculated for all SNPs and overall association results were summarized by meta-analysis. Based on initial and replication studies, rs2009066 located in the crystallin beta A4 (CRYBA4) gene was identified to be the most significantly associated with high myopia (initial study: P = 0.02; replication study: P = 1.88e-4; meta-analysis: P = 1.54e-5) among all the SNPs tested. The association result survived correction for multiple comparisons. Under the allelic genetic model for the combined sample set, the odds ratio of the minor allele G was 1.41 (95% confidence intervals, 1.21-1.64). Conclusions/Significance A novel susceptibility gene (CRYBA4) was discovered for high myopia. Our study also signified the potential importance of appropriate gene prioritization in candidate selection. PMID:22792142

  16. New molecular settings to support in vivo anti-malarial assays.

    PubMed

    Bahamontes-Rosa, Noemí; Alejandre, Ane Rodriguez; Gomez, Vanesa; Viera, Sara; Gomez-Lorenzo, María G; Sanz-Alonso, Laura María; Mendoza-Losana, Alfonso

    2016-03-08

    Quantitative real-time PCR (qPCR) is now commonly used as a method to confirm diagnosis of malaria and to differentiate recrudescence from re-infection, especially in clinical trials and in reference laboratories where precise quantification is critical. Although anti-malarial drug discovery is based on in vivo murine efficacy models, use of molecular analysis has been limited. The aim of this study was to develop qPCR as a valid methodology to support pre-clinical anti-malarial models by using filter papers to maintain material for qPCR and to compare this with traditional methods. FTA technology (Whatman) is a rapid and safe method for extracting nucleic acids from blood. Peripheral blood samples from mice infected with Plasmodium berghei, P. yoelii, or P. falciparum were kept as frozen samples or as spots on FTA cards. The extracted genetic material from both types of samples was assessed for quantification by qPCR using sets of specific primers specifically designed for Plasmodium 18S rRNA, LDH, and CytB genes. The optimal conditions for nucleic acid extraction from FTA cards and qPCR amplification were set up, and were confirmed to be suitable for parasite quantification using DNA as template after storage at room temperature for as long as 26 months in the case of P. berghei samples and 52 months for P. falciparum and P. yoelii. The quality of DNA extracted from the FTA cards for gene sequencing and microsatellite amplification was also assessed. This is the first study to report the suitability of FTA cards and qPCR assay to quantify parasite load in samples from in vivo efficacy models to support the drug discovery process.

  17. Virtual Observatories, Data Mining, and Astroinformatics

    NASA Astrophysics Data System (ADS)

    Borne, Kirk

    The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development ofnew information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys' databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management.Each of these areas is now an active research discipline, with significantscience-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential for astronomical discovery is equally large, and so the data-oriented research methods, algorithms, and techniques that are presented here will enable the greatest discovery potential from the ever-growing data and information resources in astronomy.

  18. Empirical Bayes method for reducing false discovery rates of correlation matrices with block diagonal structure.

    PubMed

    Pacini, Clare; Ajioka, James W; Micklem, Gos

    2017-04-12

    Correlation matrices are important in inferring relationships and networks between regulatory or signalling elements in biological systems. With currently available technology sample sizes for experiments are typically small, meaning that these correlations can be difficult to estimate. At a genome-wide scale estimation of correlation matrices can also be computationally demanding. We develop an empirical Bayes approach to improve covariance estimates for gene expression, where we assume the covariance matrix takes a block diagonal form. Our method shows lower false discovery rates than existing methods on simulated data. Applied to a real data set from Bacillus subtilis we demonstrate it's ability to detecting known regulatory units and interactions between them. We demonstrate that, compared to existing methods, our method is able to find significant covariances and also to control false discovery rates, even when the sample size is small (n=10). The method can be used to find potential regulatory networks, and it may also be used as a pre-processing step for methods that calculate, for example, partial correlations, so enabling the inference of the causal and hierarchical structure of the networks.

  19. Material discovery by combining stochastic surface walking global optimization with a neural network.

    PubMed

    Huang, Si-Da; Shang, Cheng; Zhang, Xiao-Jie; Liu, Zhi-Pan

    2017-09-01

    While the underlying potential energy surface (PES) determines the structure and other properties of a material, it has been frustrating to predict new materials from theory even with the advent of supercomputing facilities. The accuracy of the PES and the efficiency of PES sampling are two major bottlenecks, not least because of the great complexity of the material PES. This work introduces a "Global-to-Global" approach for material discovery by combining for the first time a global optimization method with neural network (NN) techniques. The novel global optimization method, named the stochastic surface walking (SSW) method, is carried out massively in parallel for generating a global training data set, the fitting of which by the atom-centered NN produces a multi-dimensional global PES; the subsequent SSW exploration of large systems with the analytical NN PES can provide key information on the thermodynamics and kinetics stability of unknown phases identified from global PESs. We describe in detail the current implementation of the SSW-NN method with particular focuses on the size of the global data set and the simultaneous energy/force/stress NN training procedure. An important functional material, TiO 2 , is utilized as an example to demonstrate the automated global data set generation, the improved NN training procedure and the application in material discovery. Two new TiO 2 porous crystal structures are identified, which have similar thermodynamics stability to the common TiO 2 rutile phase and the kinetics stability for one of them is further proved from SSW pathway sampling. As a general tool for material simulation, the SSW-NN method provides an efficient and predictive platform for large-scale computational material screening.

  20. BRSCW Reference Set Application: Joe Buechler - Biosite Inc (2009) — EDRN Public Portal

    Cancer.gov

    Over 40 marker assays are available to run on the samples. These include markers such as Osteopontin, Mesothelin, Periostin, Endoglin, intestinal Fatty Acid Binding Protein, and FAS-Ligand, some of which have been previously described in the literature. Other proprietary markers are derived from internal discovery efforts and from collaborator programs.

  1. Association between copy number variation losses and alcohol dependence across African American and European American ethnic groups.

    PubMed

    Ulloa, Alvaro E; Chen, Jiayu; Vergara, Victor M; Calhoun, Vince; Liu, Jingyu

    2014-05-01

    Copy number variations (CNVs) are structural genetic mutations consisting of segmental gains or losses in DNA sequence. Although CNVs contribute substantially to genomic variation, few genetic and imaging studies report association of CNVs with alcohol dependence (AD). Our purpose is to find evidence of this association across ethnic populations and genders. This work is the first AD-CNV study across ethnic groups and the first to include the African American (AA) population. This study considers 2 CNV data sets, one for discovery (2,345 samples) and the other for validation (239 samples), both including subjects with AD and healthy controls of European and African ancestry. Our analysis assesses the association between AD and CNV losses across ethnic groups and gender by examining the effect of overall losses across the whole genome, collective losses within individual cytogenetic bands, and specific losses in CNV regions. Results from the discovery data set showed an association between CNV losses within 16q12.2 and AD diagnosis (p = 4.53 × 10(-3) ). An overlapping CNV region from the validation data set exhibited the same direction of effect with respect to AD (p = 0.051). This CNV region affects the genes CES1p1 and CES1, which are members of the carboxylesterase (CES) family. The enzyme encoded by CES1 is a major liver enzyme that typically catalyzes the decomposition of ester into alcohol and carboxylic acid and is involved in drug or xenobiotics, fatty acid, and cholesterol metabolisms. In addition, the most significantly associated CNV region was located at 9p21.2 (p = 1.9 × 10(-3) ) in our discovery data set. Although not observed in the validation data set, probably due to small sample size, this result might hold potential connection to AD given its connection with neuronal death. In contrast, we did not find any association between AD and the overall total losses or the collective losses within individual cytogenetic bands. Overall, our study provides evidence that the specific CNVs at 16q12.2 contribute to the development of alcoholism in AA and European American populations. Copyright © 2014 by the Research Society on Alcoholism.

  2. Poisson Statistics of Combinatorial Library Sampling Predict False Discovery Rates of Screening

    PubMed Central

    2017-01-01

    Microfluidic droplet-based screening of DNA-encoded one-bead-one-compound combinatorial libraries is a miniaturized, potentially widely distributable approach to small molecule discovery. In these screens, a microfluidic circuit distributes library beads into droplets of activity assay reagent, photochemically cleaves the compound from the bead, then incubates and sorts the droplets based on assay result for subsequent DNA sequencing-based hit compound structure elucidation. Pilot experimental studies revealed that Poisson statistics describe nearly all aspects of such screens, prompting the development of simulations to understand system behavior. Monte Carlo screening simulation data showed that increasing mean library sampling (ε), mean droplet occupancy, or library hit rate all increase the false discovery rate (FDR). Compounds identified as hits on k > 1 beads (the replicate k class) were much more likely to be authentic hits than singletons (k = 1), in agreement with previous findings. Here, we explain this observation by deriving an equation for authenticity, which reduces to the product of a library sampling bias term (exponential in k) and a sampling saturation term (exponential in ε) setting a threshold that the k-dependent bias must overcome. The equation thus quantitatively describes why each hit structure’s FDR is based on its k class, and further predicts the feasibility of intentionally populating droplets with multiple library beads, assaying the micromixtures for function, and identifying the active members by statistical deconvolution. PMID:28682059

  3. NASA SMD E/PO Community Addresses the needs of the Higher Ed Community: Introducing Slide sets for the Introductory Earth and Space Science Instructor

    NASA Astrophysics Data System (ADS)

    Buxner, S.; Meinke, B. K.; Brain, D.; Schneider, N. M.; Schultz, G. R.; Smith, D. A.; Grier, J.; Shipp, S. S.

    2014-12-01

    The NASA Science Mission Directorate (SMD) Science Education and Public Outreach (E/PO) community and Forums work together to bring the cutting-edge discoveries of NASA Astrophysics and Planetary Science missions to the introductory astronomy college classroom. These mission- and grant-based E/PO programs are uniquely poised to foster collaboration between scientists with content expertise and educators with pedagogy expertise. We present two new opportunities for college instructors to bring the latest NASA discoveries in Space Science into their classrooms. The NASA Science Mission Directorate (SMD) Astrophysics Education and Public Outreach Forum is coordinating the development of a pilot series of slide sets to help Astronomy 101 instructors incorporate new discoveries in their classrooms. The "Astro 101 slide sets" are presentations 5-7 slides in length on a new development or discovery from a NASA Astrophysics mission relevant to topics in introductory astronomy courses. We intend for these slide sets to help Astronomy 101 instructors include new developments (discoveries not yet in their textbooks) into the broader context of the course. In a similar effort to keep the astronomy classroom apprised of the fast moving field of planetary science, the Division of Planetary Sciences (DPS) has developed the Discovery slide sets, which are 3-slide presentations that can be incorporated into college lectures. The slide sets are targeted at the Introductory Astronomy undergraduate level. Each slide set consists of three slides which cover a description of the discovery, a discussion of the underlying science, and a presentation of the big picture implications of the discovery, with a fourth slide includes links to associated press releases, images, and primary sources. Topics span all subdisciplines of planetary science, and sets are available in Farsi and Spanish. The NASA SMD Planetary Science Forum has recently partnered with the DPS to continue producing the Discovery slides and connect them to NASA mission science.

  4. European genome-wide association study identifies SLC14A1 as a new urinary bladder cancer susceptibility gene

    PubMed Central

    Rafnar, Thorunn; Vermeulen, Sita H.; Sulem, Patrick; Thorleifsson, Gudmar; Aben, Katja K.; Witjes, J. Alfred; Grotenhuis, Anne J.; Verhaegh, Gerald W.; Hulsbergen-van de Kaa, Christina A.; Besenbacher, Soren; Gudbjartsson, Daniel; Stacey, Simon N.; Gudmundsson, Julius; Johannsdottir, Hrefna; Bjarnason, Hjordis; Zanon, Carlo; Helgadottir, Hafdis; Jonasson, Jon Gunnlaugur; Tryggvadottir, Laufey; Jonsson, Eirikur; Geirsson, Gudmundur; Nikulasson, Sigfus; Petursdottir, Vigdis; Bishop, D. Timothy; Chung-Sak, Sei; Choudhury, Ananya; Elliott, Faye; Barrett, Jennifer H.; Knowles, Margaret A.; de Verdier, Petra J.; Ryk, Charlotta; Lindblom, Annika; Rudnai, Peter; Gurzau, Eugene; Koppova, Kvetoslava; Vineis, Paolo; Polidoro, Silvia; Guarrera, Simonetta; Sacerdote, Carlotta; Panadero, Angeles; Sanz-Velez, José I.; Sanchez, Manuel; Valdivia, Gabriel; Garcia-Prats, Maria D.; Hengstler, Jan G.; Selinski, Silvia; Gerullis, Holger; Ovsiannikov, Daniel; Khezri, Abdolaziz; Aminsharifi, Alireza; Malekzadeh, Mahyar; van den Berg, Leonard H.; Ophoff, Roel A.; Veldink, Jan H.; Zeegers, Maurice P.; Kellen, Eliane; Fostinelli, Jacopo; Andreoli, Daniele; Arici, Cecilia; Porru, Stefano; Buntinx, Frank; Ghaderi, Abbas; Golka, Klaus; Mayordomo, José I.; Matullo, Giuseppe; Kumar, Rajiv; Steineck, Gunnar; Kiltie, Anne E.; Kong, Augustine; Thorsteinsdottir, Unnur; Stefansson, Kari; Kiemeney, Lambertus A.

    2011-01-01

    Three genome-wide association studies in Europe and the USA have reported eight urinary bladder cancer (UBC) susceptibility loci. Using extended case and control series and 1000 Genomes imputations of 5 340 737 single-nucleotide polymorphisms (SNPs), we searched for additional loci in the European GWAS. The discovery sample set consisted of 1631 cases and 3822 controls from the Netherlands and 603 cases and 37 781 controls from Iceland. For follow-up, we used 3790 cases and 7507 controls from 13 sample sets of European and Iranian ancestry. Based on the discovery analysis, we followed up signals in the urea transporter (UT) gene SLC14A. The strongest signal at this locus was represented by a SNP in intron 3, rs17674580, that reached genome-wide significance in the overall analysis of the discovery and follow-up groups: odds ratio = 1.17, P = 7.6 × 10−11. SLC14A1 codes for UTs that define the Kidd blood group and are crucial for the maintenance of a constant urea concentration gradient in the renal medulla and, through this, the kidney's ability to concentrate urine. It is speculated that rs17674580, or other sequence variants in LD with it, indirectly modifies UBC risk by affecting urine production. If confirmed, this would support the ‘urogenous contact hypothesis’ that urine production and voiding frequency modify the risk of UBC. PMID:21750109

  5. Potential metabolomic biomarkers for reliable diagnosis of Behcet's disease using gas chromatography/ time-of-flight-mass spectrometry.

    PubMed

    Ahn, Joong Kyong; Kim, Jungyeon; Hwang, Jiwon; Song, Juhwan; Kim, Kyoung Heon; Cha, Hoon-Suk

    2018-05-01

    Although many diagnostic criteria of Behcet's disease (BD) have been developed and revised by experts, diagnosing BD is still complicated and challenging. No metabolomic studies on serum have been attempted to improve the diagnosis and to identify potential biomarkers of BD. The purposes of this study were to investigate distinctive metabolic changes in serum samples of BD patients and to identify metabolic candidate biomarkers for reliable diagnosis of BD using the metabolomics platform. Metabolomic profiling of 90 serum samples from 45 BD patients and 45 healthy controls (HCs) were performed via gas chromatography with time-of-flight mass spectrometry (GC/TOF-MS) with multivariate statistical analyses. A total of 104 metabolites were identified from samples. The serum metabolite profiles obtained from GC/TOF-MS analysis can distinguish BD patients from HC group in discovery set. The variation values of the partial least squared-discrimination analysis (PLS-DA) model are R 2 X of 0.246, R 2 Y of 0.913 and Q 2 of 0.852, respectively, indicating strong explanation and prediction capabilities of the model. A panel of five metabolic biomarkers, namely, decanoic acid, fructose, tagatose, linoleic acid and oleic acid were selected and adequately validated as putative biomarkers of BD (sensitivity 100%, specificity 97.1%, area under the curve 0.998) in the discovery set and independent set. The PLS_DA model showed clear discrimination of BD and HC groups by the five metabolic biomarkers in independent set. This is the first report on characteristic metabolic profiles and potential metabolite biomarkers in serum for reliable diagnosis of BD using GC/TOF-MS. Copyright © 2017. Published by Elsevier SAS.

  6. Concise Review: Progress and Challenges in Using Human Stem Cells for Biological and Therapeutics Discovery: Neuropsychiatric Disorders.

    PubMed

    Panchision, David M

    2016-03-01

    In facing the daunting challenge of using human embryonic and induced pluripotent stem cells to study complex neural circuit disorders such as schizophrenia, mood and anxiety disorders, and autism spectrum disorders, a 2012 National Institute of Mental Health workshop produced a set of recommendations to advance basic research and engage industry in cell-based studies of neuropsychiatric disorders. This review describes progress in meeting these recommendations, including the development of novel tools, strides in recapitulating relevant cell and tissue types, insights into the genetic basis of these disorders that permit integration of risk-associated gene regulatory networks with cell/circuit phenotypes, and promising findings of patient-control differences using cell-based assays. However, numerous challenges are still being addressed, requiring further technological development, approaches to resolve disease heterogeneity, and collaborative structures for investigators of different disciplines. Additionally, since data obtained so far is on small sample sizes, replication in larger sample sets is needed. A number of individual success stories point to a path forward in developing assays to translate discovery science to therapeutics development. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  7. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    PubMed Central

    2012-01-01

    Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. PMID:23181585

  8. Three plasma metabolite signatures for diagnosing high altitude pulmonary edema

    NASA Astrophysics Data System (ADS)

    Guo, Li; Tan, Guangguo; Liu, Ping; Li, Huijie; Tang, Lulu; Huang, Lan; Ren, Qian

    2015-10-01

    High-altitude pulmonary edema (HAPE) is a potentially fatal condition, occurring at altitudes greater than 3,000 m and affecting rapidly ascending, non-acclimatized healthy individuals. However, the lack of biomarkers for this disease still constitutes a bottleneck in the clinical diagnosis. Here, ultra-high performance liquid chromatography coupled with Q-TOF mass spectrometry was applied to study plasma metabolite profiling from 57 HAPE and 57 control subjects. 14 differential plasma metabolites responsible for the discrimination between the two groups from discovery set (35 HAPE subjects and 35 healthy controls) were identified. Furthermore, 3 of the 14 metabolites (C8-ceramide, sphingosine and glutamine) were selected as candidate diagnostic biomarkers for HAPE using metabolic pathway impact analysis. The feasibility of using the combination of these three biomarkers for HAPE was evaluated, where the area under the receiver operating characteristic curve (AUC) was 0.981 and 0.942 in the discovery set and the validation set (22 HAPE subjects and 22 healthy controls), respectively. Taken together, these results suggested that this composite plasma metabolite signature may be used in HAPE diagnosis, especially after further investigation and verification with larger samples.

  9. Statistical Design for Biospecimen Cohort Size in Proteomics-based Biomarker Discovery and Verification Studies

    PubMed Central

    Skates, Steven J.; Gillette, Michael A.; LaBaer, Joshua; Carr, Steven A.; Anderson, N. Leigh; Liebler, Daniel C.; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L.; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Boja, Emily S.

    2014-01-01

    Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC), with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance, and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step towards building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research. PMID:24063748

  10. Statistical design for biospecimen cohort size in proteomics-based biomarker discovery and verification studies.

    PubMed

    Skates, Steven J; Gillette, Michael A; LaBaer, Joshua; Carr, Steven A; Anderson, Leigh; Liebler, Daniel C; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R; Rodriguez, Henry; Boja, Emily S

    2013-12-06

    Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor, and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC) with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step toward building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research.

  11. Electrodynamics of the middle atmosphere: Superpressure balloon program

    NASA Technical Reports Server (NTRS)

    Holzworth, Robert H.

    1987-01-01

    In this experiment a comprehensive set of electrical parameters were measured during eight long duration flights in the southern hemisphere stratosphere. These flight resulted in the largest data set ever collected from the stratosphere. The stratosphere has never been electrodynamically sampled in the systematic manner before. New discoveries include short term variability in the planetary scale electric current system, the unexpected observation of stratospheric conductivity variations over thunderstorms and the observation of direct stratospheric conductivity variations following a relatively small solar flare. Major statistical studies were conducted of the large scale current systems, the stratospheric conductivity and the neutral gravity waves (from pressure and temperature data) using the entire data set.

  12. Metabolomic patterns and alcohol consumption in African Americans in the Atherosclerosis Risk in Communities Study123

    PubMed Central

    Zheng, Yan; Yu, Bing; Alexander, Danny; Steffen, Lyn M; Nettleton, Jennifer A

    2014-01-01

    Background: Effects of alcohol consumption on health and disease are complex and involve a number of cellular and metabolic processes. Objective: We examined the association between alcohol consumption habits and metabolomic profiles. Design: We conducted a cross-sectional study to explore the association of alcohol consumption habits measured by using a questionnaire with serum metabolites measured by using untargeted mass spectrometry in 1977 African Americans from the Jackson field center in the Atherosclerosis Risk in Communities Study. The whole sample was split into a discovery set (n = 1500) and a replication set (n = 477). Alcohol consumption habits were treated as an ordinal variable, with nondrinkers as the reference group and quartiles of current drinkers as ordinal groups with higher values. For each metabolite, a linear regression was conducted to estimate its relation with alcohol consumption habits separately in both sets. A modified Bonferroni procedure was used in the discovery set to adjust the significance threshold (P < 1.9 × 10−4). Results: In 356 named metabolites, 39 metabolites were significantly associated with alcohol consumption habits in both discovery and replication sets. In general, alcohol consumption was associated with higher levels of most metabolites such as those in amino acid and lipid pathways and with lower levels of γ-glutamyl dipeptides. Three pathways, 2-hydroxybutyrate-related metabolites, γ-glutamyl dipeptides, and lysophosphatidylcholines, which are considered to be involved in inflammation and oxidation, were associated with incident cardiovascular diseases. Conclusions: To our knowledge, this is the largest metabolomic study thus far conducted in nonwhites. Metabolomic biomarkers of alcohol consumption were identified and replicated. The results lend new insight into potential mediating effects between alcohol consumption and future health and disease. PMID:24760976

  13. MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark.

    PubMed

    Qin, Li-Xuan; Zhou, Qin

    2014-01-01

    MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays.

  14. MicroRNA Array Normalization: An Evaluation Using a Randomized Dataset as the Benchmark

    PubMed Central

    Qin, Li-Xuan; Zhou, Qin

    2014-01-01

    MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays. PMID:24905456

  15. Selected reaction monitoring mass spectrometry: a methodology overview.

    PubMed

    Ebhardt, H Alexander

    2014-01-01

    Moving past the discovery phase of proteomics, the term targeted proteomics combines multiple approaches investigating a certain set of proteins in more detail. One such targeted proteomics approach is the combination of liquid chromatography and selected or multiple reaction monitoring mass spectrometry (SRM, MRM). SRM-MS requires prior knowledge of the fragmentation pattern of peptides, as the presence of the analyte in a sample is determined by measuring the m/z values of predefined precursor and fragment ions. Using scheduled SRM-MS, many analytes can robustly be monitored allowing for high-throughput sample analysis of the same set of proteins over many conditions. In this chapter, fundaments of SRM-MS are explained as well as an optimized SRM pipeline from assay generation to data analyzed.

  16. Methods of Data Collection, Sample Processing, and Data Analysis for Edge-of-Field, Streamgaging, Subsurface-Tile, and Meteorological Stations at Discovery Farms and Pioneer Farm in Wisconsin, 2001-7

    USGS Publications Warehouse

    Stuntebeck, Todd D.; Komiskey, Matthew J.; Owens, David W.; Hall, David W.

    2008-01-01

    The University of Wisconsin (UW)-Madison Discovery Farms (Discovery Farms) and UW-Platteville Pioneer Farm (Pioneer Farm) programs were created in 2000 to help Wisconsin farmers meet environmental and economic challenges. As a partner with each program, and in cooperation with the Wisconsin Department of Natural Resources and the Sand County Foundation, the U.S. Geological Survey (USGS) Wisconsin Water Science Center (WWSC) installed, maintained, and operated equipment to collect water-quantity and water-quality data from 25 edge-offield, 6 streamgaging, and 5 subsurface-tile stations at 7 Discovery Farms and Pioneer Farm. The farms are located in the southern half of Wisconsin and represent a variety of landscape settings and crop- and animal-production enterprises common to Wisconsin agriculture. Meteorological stations were established at most farms to measure precipitation, wind speed and direction, air and soil temperature (in profile), relative humidity, solar radiation, and soil moisture (in profile). Data collection began in September 2001 and is continuing through the present (2008). This report describes methods used by USGS WWSC personnel to collect, process, and analyze water-quantity, water-quality, and meteorological data for edge-of-field, streamgaging, subsurface-tile, and meteorological stations at Discovery Farms and Pioneer Farm from September 2001 through October 2007. Information presented includes equipment used; event-monitoring and samplecollection procedures; station maintenance; sample handling and processing procedures; water-quantity, waterquality, and precipitation data analyses; and procedures for determining estimated constituent concentrations for unsampled runoff events.

  17. 4 CFR 28.43 - Compelling discovery.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 4 Accounts 1 2011-01-01 2011-01-01 false Compelling discovery. 28.43 Section 28.43 Accounts... Procedures Discovery § 28.43 Compelling discovery. (a) Motion for an order compelling discovery. Motions for orders compelling discovery shall be submitted to the administrative judge as set forth at § 28.42(c)(2...

  18. Suggestive association between variants in IL1RAPL and asthma symptoms in Latin American children.

    PubMed

    Marques, Cintia Rodrigues; Costa, Gustavo No; da Silva, Thiago Magalhães; Oliveira, Pablo; Cruz, Alvaro A; Alcantara-Neves, Neuza Maria; Fiaccone, Rosemeire L; Horta, Bernardo L; Hartwig, Fernando Pires; Burchard, Esteban G; Pino-Yanes, Maria; Rodrigues, Laura C; Lima-Costa, Maria Fernanda; Pereira, Alexandre C; Gouveia, Mateus H; Sant Anna, Hanaisa P; Tarazona-Santos, Eduardo; Lima Barreto, Maurício; Figueiredo, Camila Alexandrina

    2017-04-01

    Several genome-wide association studies have been conducted to investigate the influence of genetic polymorphisms in the development of allergic diseases, but few of them have included the X chromosome. The aim of present study was to perform an X chromosome-wide association study (X-WAS) for asthma symptoms. The study included 1307 children of which 294 were asthma cases. DNA was genotyped using 2.5 HumanOmni Beadchip from Illumina. Statistical analyses were performed in PLINK 1.9, MACH 1.0 and Minimac2. The variant rs12007907 (g.29483892C>A) in IL1RAPL gene was suggestively associated with asthma symptoms in discovery set (odds ratio (OR)=0.49, 95% confidence interval (CI): 0.37-0.67; P=3.33 × 10 - 6 ). This result was replicated in the ProAr cohort in men only (OR=0.45, 95% CI: 0.21-0.95; P=0.038). Furthermore, investigating the functional role of the rs12007907 on the production a Th2-type cytokine, IL-13, we found a negative association between the minor allele A with IL-13 production in the discovery set (P=0.044). Gene-based analysis revealed that NUDT10 was the most consistently associated with asthma symptoms in discovery sample. In conclusion, the rs12007907 variant in IL1RAPL gene was negatively associated with asthma and IL-13 production in our study and a sex-specific association was observed in one of the validation samples. It suggests an effect on asthma susceptibility and may explain differences in severe asthma frequency between women and men.

  19. Decoy receptor 1 (DCR1) promoter hypermethylation and response to irinotecan in metastatic colorectal cancer

    PubMed Central

    Bosch, Linda J.W.; Coupé, Veerle M.H.; Mongera, Sandra; Haan, Josien C.; Richman, Susan D.; Koopman, Miriam; Tol, Jolien; de Meyer, Tim; Louwagie, Joost; Dehaspe, Luc; van Grieken, Nicole C.T.; Ylstra, Bauke; Verheul, Henk M.W.; van Engeland, Manon; Nagtegaal, Iris D.; Herman, James G.; Quirke, Philip; Seymour, Matthew T.; Punt, Cornelis J.A.; van Criekinge, Wim; Carvalho, Beatriz; Meijer, Gerrit A.

    2017-01-01

    Diversity in colorectal cancer biology is associated with variable responses to standard chemotherapy. We aimed to identify and validate DNA hypermethylated genes as predictive biomarkers for irinotecan treatment of metastatic CRC patients. Candidate genes were selected from 389 genes involved in DNA Damage Repair by correlation analyses between gene methylation status and drug response in 32 cell lines. A large series of samples (n=818) from two phase III clinical trials was used to evaluate these candidate genes by correlating methylation status to progression-free survival after treatment with first-line single-agent fluorouracil (Capecitabine or 5-fluorouracil) or combination chemotherapy (Capecitabine or 5-fluorouracil plus irinotecan (CAPIRI/FOLFIRI)). In the discovery (n=185) and initial validation set (n=166), patients with methylated Decoy Receptor 1 (DCR1) did not benefit from CAPIRI over Capecitabine treatment (discovery set: HR=1.2 (95%CI 0.7-1.9, p=0.6), validation set: HR=0.9 (95%CI 0.6-1.4, p=0.5)), whereas patients with unmethylated DCR1 did (discovery set: HR=0.4 (95%CI 0.3-0.6, p=0.00001), validation set: HR=0.5 (95%CI 0.3-0.7, p=0.0008)). These results could not be replicated in the external data set (n=467), where a similar effect size was found in patients with methylated and unmethylated DCR1 for FOLFIRI over 5FU treatment (methylated DCR1: HR=0.7 (95%CI 0.5-0.9, p=0.01), unmethylated DCR1: HR=0.8 (95%CI 0.6-1.2, p=0.4)). In conclusion, DCR1 promoter hypermethylation status is a potential predictive biomarker for response to treatment with irinotecan, when combined with capecitabine. This finding could not be replicated in an external validation set, in which irinotecan was combined with 5FU. These results underline the challenge and importance of extensive clinical evaluation of candidate biomarkers in multiple trials. PMID:28968978

  20. Decoy receptor 1 (DCR1) promoter hypermethylation and response to irinotecan in metastatic colorectal cancer.

    PubMed

    Bosch, Linda J W; Trooskens, Geert; Snaebjornsson, Petur; Coupé, Veerle M H; Mongera, Sandra; Haan, Josien C; Richman, Susan D; Koopman, Miriam; Tol, Jolien; de Meyer, Tim; Louwagie, Joost; Dehaspe, Luc; van Grieken, Nicole C T; Ylstra, Bauke; Verheul, Henk M W; van Engeland, Manon; Nagtegaal, Iris D; Herman, James G; Quirke, Philip; Seymour, Matthew T; Punt, Cornelis J A; van Criekinge, Wim; Carvalho, Beatriz; Meijer, Gerrit A

    2017-09-08

    Diversity in colorectal cancer biology is associated with variable responses to standard chemotherapy. We aimed to identify and validate DNA hypermethylated genes as predictive biomarkers for irinotecan treatment of metastatic CRC patients. Candidate genes were selected from 389 genes involved in DNA Damage Repair by correlation analyses between gene methylation status and drug response in 32 cell lines. A large series of samples (n=818) from two phase III clinical trials was used to evaluate these candidate genes by correlating methylation status to progression-free survival after treatment with first-line single-agent fluorouracil (Capecitabine or 5-fluorouracil) or combination chemotherapy (Capecitabine or 5-fluorouracil plus irinotecan (CAPIRI/FOLFIRI)). In the discovery (n=185) and initial validation set (n=166), patients with methylated Decoy Receptor 1 ( DCR1) did not benefit from CAPIRI over Capecitabine treatment (discovery set: HR=1.2 (95%CI 0.7-1.9, p =0.6), validation set: HR=0.9 (95%CI 0.6-1.4, p =0.5)), whereas patients with unmethylated DCR1 did (discovery set: HR=0.4 (95%CI 0.3-0.6, p =0.00001), validation set: HR=0.5 (95%CI 0.3-0.7, p =0.0008)). These results could not be replicated in the external data set (n=467), where a similar effect size was found in patients with methylated and unmethylated DCR1 for FOLFIRI over 5FU treatment (methylated DCR1 : HR=0.7 (95%CI 0.5-0.9, p =0.01), unmethylated DCR1 : HR=0.8 (95%CI 0.6-1.2, p =0.4)). In conclusion, DCR1 promoter hypermethylation status is a potential predictive biomarker for response to treatment with irinotecan, when combined with capecitabine. This finding could not be replicated in an external validation set, in which irinotecan was combined with 5FU. These results underline the challenge and importance of extensive clinical evaluation of candidate biomarkers in multiple trials.

  1. A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples

    PubMed Central

    Sebastiani, Paola; Zhao, Zhenming; Abad-Grau, Maria M; Riva, Alberto; Hartley, Stephen W; Sedgewick, Amanda E; Doria, Alessandro; Montano, Monty; Melista, Efthymia; Terry, Dellara; Perls, Thomas T; Steinberg, Martin H; Baldwin, Clinton T

    2008-01-01

    Background One of the challenges of the analysis of pooling-based genome wide association studies is to identify authentic associations among potentially thousands of false positive associations. Results We present a hierarchical and modular approach to the analysis of genome wide genotype data that incorporates quality control, linkage disequilibrium, physical distance and gene ontology to identify authentic associations among those found by statistical association tests. The method is developed for the allelic association analysis of pooled DNA samples, but it can be easily generalized to the analysis of individually genotyped samples. We evaluate the approach using data sets from diverse genome wide association studies including fetal hemoglobin levels in sickle cell anemia and a sample of centenarians and show that the approach is highly reproducible and allows for discovery at different levels of synthesis. Conclusion Results from the integration of Bayesian tests and other machine learning techniques with linkage disequilibrium data suggest that we do not need to use too stringent thresholds to reduce the number of false positive associations. This method yields increased power even with relatively small samples. In fact, our evaluation shows that the method can reach almost 70% sensitivity with samples of only 100 subjects. PMID:18194558

  2. Candidate-based proteomics in the search for biomarkers of cardiovascular disease

    PubMed Central

    Anderson, Leigh

    2005-01-01

    The key concept of proteomics (looking at many proteins at once) opens new avenues in the search for clinically useful biomarkers of disease, treatment response and ageing. As the number of proteins that can be detected in plasma or serum (the primary clinical diagnostic samples) increases towards 1000, a paradoxical decline has occurred in the number of new protein markers approved for diagnostic use in clinical laboratories. This review explores the limitations of current proteomics protein discovery platforms, and proposes an alternative approach, applicable to a range of biological/physiological problems, in which quantitative mass spectrometric methods developed for analytical chemistry are employed to measure limited sets of candidate markers in large sets of clinical samples. A set of 177 candidate biomarker proteins with reported associations to cardiovascular disease and stroke are presented as a starting point for such a ‘directed proteomics’ approach. PMID:15611012

  3. An epigenome-wide study of body mass index and DNA methylation in blood using participants from the Sister Study cohort.

    PubMed

    Wilson, L E; Harlid, S; Xu, Z; Sandler, D P; Taylor, J A

    2017-01-01

    The relationship between obesity and chronic disease risk is well-established; the underlying biological mechanisms driving this risk increase may include obesity-related epigenetic modifications. To explore this hypothesis, we conducted a genome-wide analysis of DNA methylation and body mass index (BMI) using data from a subset of women in the Sister Study. The Sister Study is a cohort of 50 884 US women who had a sister with breast cancer but were free of breast cancer themselves at enrollment. Study participants completed examinations which included measurements of height and weight, and provided blood samples. Blood DNA methylation data generated with the Illumina Infinium HumanMethylation27 BeadChip array covering 27,589 CpG sites was available for 871 women from a prior study of breast cancer and DNA methylation. To identify differentially methylated CpG sites associated with BMI, we analyzed this methylation data using robust linear regression with adjustment for age and case status. For those CpGs passing the false discovery rate significance level, we examined the association in a replication set comprised of a non-overlapping group of 187 women from the Sister Study who had DNA methylation data generated using the Infinium HumanMethylation450 BeadChip array. Analysis of this expanded 450 K array identified additional BMI-associated sites which were investigated with targeted pyrosequencing. Four CpG sites reached genome-wide significance (false discovery rate (FDR) q<0.05) in the discovery set and associations for all four were significant at strict Bonferroni correction in the replication set. An additional 23 sites passed FDR in the replication set and five were replicated by pyrosequencing in the discovery set. Several of the genes identified including ANGPT4, RORC, SOCS3, FSD2, XYLT1, ABCG1, STK39, ASB2 and CRHR2 have been linked to obesity and obesity-related chronic diseases. Our findings support the hypothesis that obesity-related epigenetic differences are detectable in blood and may be related to risk of chronic disease.

  4. Pseudotargeted MS Method for the Sensitive Analysis of Protein Phosphorylation in Protein Complexes.

    PubMed

    Lyu, Jiawen; Wang, Yan; Mao, Jiawei; Yao, Yating; Wang, Shujuan; Zheng, Yong; Ye, Mingliang

    2018-05-15

    In this study, we presented an enrichment-free approach for the sensitive analysis of protein phosphorylation in minute amounts of samples, such as purified protein complexes. This method takes advantage of the high sensitivity of parallel reaction monitoring (PRM). Specifically, low confident phosphopeptides identified from the data-dependent acquisition (DDA) data set were used to build a pseudotargeted list for PRM analysis to allow the identification of additional phosphopeptides with high confidence. The development of this targeted approach is very easy as the same sample and the same LC-system were used for the discovery and the targeted analysis phases. No sample fractionation or enrichment was required for the discovery phase which allowed this method to analyze minute amount of sample. We applied this pseudotargeted MS method to quantitatively examine phosphopeptides in affinity purified endogenous Shc1 protein complexes at four temporal stages of EGF signaling and identified 82 phospho-sites. To our knowledge, this is the highest number of phospho-sites identified from the protein complexes. This pseudotargeted MS method is highly sensitive in the identification of low abundance phosphopeptides and could be a powerful tool to study phosphorylation-regulated assembly of protein complex.

  5. Experimental Null Method to Guide the Development of Technical Procedures and to Control False-Positive Discovery in Quantitative Proteomics.

    PubMed

    Shen, Xiaomeng; Hu, Qiang; Li, Jun; Wang, Jianmin; Qu, Jun

    2015-10-02

    Comprehensive and accurate evaluation of data quality and false-positive biomarker discovery is critical to direct the method development/optimization for quantitative proteomics, which nonetheless remains challenging largely due to the high complexity and unique features of proteomic data. Here we describe an experimental null (EN) method to address this need. Because the method experimentally measures the null distribution (either technical or biological replicates) using the same proteomic samples, the same procedures and the same batch as the case-vs-contol experiment, it correctly reflects the collective effects of technical variability (e.g., variation/bias in sample preparation, LC-MS analysis, and data processing) and project-specific features (e.g., characteristics of the proteome and biological variation) on the performances of quantitative analysis. To show a proof of concept, we employed the EN method to assess the quantitative accuracy and precision and the ability to quantify subtle ratio changes between groups using different experimental and data-processing approaches and in various cellular and tissue proteomes. It was found that choices of quantitative features, sample size, experimental design, data-processing strategies, and quality of chromatographic separation can profoundly affect quantitative precision and accuracy of label-free quantification. The EN method was also demonstrated as a practical tool to determine the optimal experimental parameters and rational ratio cutoff for reliable protein quantification in specific proteomic experiments, for example, to identify the necessary number of technical/biological replicates per group that affords sufficient power for discovery. Furthermore, we assessed the ability of EN method to estimate levels of false-positives in the discovery of altered proteins, using two concocted sample sets mimicking proteomic profiling using technical and biological replicates, respectively, where the true-positives/negatives are known and span a wide concentration range. It was observed that the EN method correctly reflects the null distribution in a proteomic system and accurately measures false altered proteins discovery rate (FADR). In summary, the EN method provides a straightforward, practical, and accurate alternative to statistics-based approaches for the development and evaluation of proteomic experiments and can be universally adapted to various types of quantitative techniques.

  6. SABRE: a method for assessing the stability of gene modules in complex tissues and subject populations.

    PubMed

    Shannon, Casey P; Chen, Virginia; Takhar, Mandeep; Hollander, Zsuzsanna; Balshaw, Robert; McManus, Bruce M; Tebbutt, Scott J; Sin, Don D; Ng, Raymond T

    2016-11-14

    Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues.

  7. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    PubMed

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  8. Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.

    PubMed

    Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le

    2013-01-01

    Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.

  9. SPACE WARPS - I. Crowdsourcing the discovery of gravitational lenses

    NASA Astrophysics Data System (ADS)

    Marshall, Philip J.; Verma, Aprajita; More, Anupreeta; Davis, Christopher P.; More, Surhud; Kapadia, Amit; Parrish, Michael; Snyder, Chris; Wilcox, Julianne; Baeten, Elisabeth; Macmillan, Christine; Cornen, Claude; Baumer, Michael; Simpson, Edwin; Lintott, Chris J.; Miller, David; Paget, Edward; Simpson, Robert; Smith, Arfon M.; Küng, Rafael; Saha, Prasenjit; Collett, Thomas E.

    2016-01-01

    We describe SPACE WARPS, a novel gravitational lens discovery service that yields samples of high purity and completeness through crowdsourced visual inspection. Carefully produced colour composite images are displayed to volunteers via a web-based classification interface, which records their estimates of the positions of candidate lensed features. Images of simulated lenses, as well as real images which lack lenses, are inserted into the image stream at random intervals; this training set is used to give the volunteers instantaneous feedback on their performance, as well as to calibrate a model of the system that provides dynamical updates to the probability that a classified image contains a lens. Low-probability systems are retired from the site periodically, concentrating the sample towards a set of lens candidates. Having divided 160 deg2 of Canada-France-Hawaii Telescope Legacy Survey imaging into some 430 000 overlapping 82 by 82 arcsec tiles and displaying them on the site, we were joined by around 37 000 volunteers who contributed 11 million image classifications over the course of eight months. This stage 1 search reduced the sample to 3381 images containing candidates; these were then refined in stage 2 to yield a sample that we expect to be over 90 per cent complete and 30 per cent pure, based on our analysis of the volunteers performance on training images. We comment on the scalability of the SPACE WARPS system to the wide field survey era, based on our projection that searches of 105 images could be performed by a crowd of 105 volunteers in 6 d.

  10. Meta-analysis of genome-wide association studies for personality

    PubMed Central

    de Moor, Marleen H.M.; Costa, Paul T.; Terracciano, Antonio; Krueger, Robert F.; de Geus, Eco J.C.; Toshiko, Tanaka; Penninx, Brenda W.J.H.; Esko, Tõnu; Madden, Pamela A F; Derringer, Jaime; Amin, Najaf; Willemsen, Gonneke; Hottenga, Jouke-Jan; Distel, Marijn A.; Uda, Manuela; Sanna, Serena; Spinhoven, Philip; Hartman, Catharina A.; Sullivan, Patrick; Realo, Anu; Allik, Jüri; Heath, Andrew C; Pergadia, Michele L; Agrawal, Arpana; Lin, Peng; Grucza, Richard; Nutile, Teresa; Ciullo, Marina; Rujescu, Dan; Giegling, Ina; Konte, Bettina; Widen, Elisabeth; Cousminer, Diana L; Eriksson, Johan G.; Palotie, Aarno; Luciano, Michelle; Tenesa, Albert; Davies, Gail; Lopez, Lorna M.; Hansell, Narelle K.; Medland, Sarah E.; Ferrucci, Luigi; Schlessinger, David; Montgomery, Grant W.; Wright, Margaret J.; Aulchenko, Yurii S.; Janssens, A.Cecile J.W.; Oostra, Ben A.; Metspalu, Andres; Abecasis, Gonçalo R.; Deary, Ian J.; Räikkönen, Katri; Bierut, Laura J.; Martin, Nicholas G.; van Duijn, Cornelia M.; Boomsma, Dorret I.

    2013-01-01

    Personality can be thought of as a set of characteristics that influence people’s thoughts, feelings, and behaviour across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide association (GWA) data for personality in ten discovery samples (17 375 adults) and five in-silico replication samples (3 294 adults). All participants were of European ancestry. Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness were based on the NEO Five-Factor Inventory. Genotype data were available of ~2.4M Single Nucleotide Polymorphisms (SNPs; directly typed and imputed using HAPMAP data). In the discovery samples, classical association analyses were performed under an additive model followed by meta-analysis using the weighted inverse variance method. Results showed genome-wide significance for Openness to Experience near the RASA1 gene on 5q14.3 (rs1477268 and rs2032794, P = 2.8 × 10−8 and 3.1 × 10−8) and for Conscientiousness in the brain-expressed KATNAL2 gene on 18q21.1 (rs2576037, P = 4.9 × 10−8). We further conducted a gene-based test that confirmed the association of KATNAL2 to Conscientiousness. In-silico replication did not, however, show significant associations of the top SNPs with Openness and Conscientiousness, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples. Larger scale GWA studies and alternative approaches are required for confirmation of KATNAL2 as a novel gene affecting Conscientiousness. PMID:21173776

  11. 49 CFR 1121.2 - Discovery.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 49 Transportation 8 2014-10-01 2014-10-01 false Discovery. 1121.2 Section 1121.2 Transportation... TRANSPORTATION RULES OF PRACTICE RAIL EXEMPTION PROCEDURES § 1121.2 Discovery. Discovery shall follow the procedures set forth at 49 CFR part 1114, subpart B. Discovery may begin upon the filing of the petition for...

  12. 49 CFR 1121.2 - Discovery.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 49 Transportation 8 2013-10-01 2013-10-01 false Discovery. 1121.2 Section 1121.2 Transportation... TRANSPORTATION RULES OF PRACTICE RAIL EXEMPTION PROCEDURES § 1121.2 Discovery. Discovery shall follow the procedures set forth at 49 CFR part 1114, subpart B. Discovery may begin upon the filing of the petition for...

  13. 49 CFR 1121.2 - Discovery.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 49 Transportation 8 2012-10-01 2012-10-01 false Discovery. 1121.2 Section 1121.2 Transportation... TRANSPORTATION RULES OF PRACTICE RAIL EXEMPTION PROCEDURES § 1121.2 Discovery. Discovery shall follow the procedures set forth at 49 CFR part 1114, subpart B. Discovery may begin upon the filing of the petition for...

  14. 49 CFR 1121.2 - Discovery.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 49 Transportation 8 2011-10-01 2011-10-01 false Discovery. 1121.2 Section 1121.2 Transportation... TRANSPORTATION RULES OF PRACTICE RAIL EXEMPTION PROCEDURES § 1121.2 Discovery. Discovery shall follow the procedures set forth at 49 CFR part 1114, subpart B. Discovery may begin upon the filing of the petition for...

  15. 49 CFR 1121.2 - Discovery.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 8 2010-10-01 2010-10-01 false Discovery. 1121.2 Section 1121.2 Transportation... TRANSPORTATION RULES OF PRACTICE RAIL EXEMPTION PROCEDURES § 1121.2 Discovery. Discovery shall follow the procedures set forth at 49 CFR part 1114, subpart B. Discovery may begin upon the filing of the petition for...

  16. "Discoveries in Planetary Sciences": Slide Sets Highlighting New Advances for Astronomy Educators

    NASA Astrophysics Data System (ADS)

    Brain, David; Schneider, N.; Molaverdikhani, K.; Afsharahmadi, F.

    2012-10-01

    We present two new features of an ongoing effort to bring recent newsworthy advances in planetary science to undergraduate lecture halls. The effort, called 'Discoveries in Planetary Sciences', summarizes selected recently announced discoveries that are 'too new for textbooks' in the form of 3-slide PowerPoint presentations. The first slide describes the discovery, the second slide discusses the underlying planetary science concepts at a level appropriate for students of 'Astronomy 101', and the third presents the big picture implications of the discovery. A fourth slide includes links to associated press releases, images, and primary sources. This effort is generously sponsored by the Division for Planetary Sciences of the American Astronomical Society, and the slide sets are available at http://dps.aas.org/education/dpsdisc/ for download by undergraduate instructors or any interested party. Several new slide sets have just been released, and we summarize the topics covered. The slide sets are also being translated into languages other than English (including Spanish and Farsi), and we will provide an overview of the translation strategy and process. Finally, we will present web statistics on how many people are using the slide sets, as well as individual feedback from educators.

  17. Co-fuse: a new class discovery analysis tool to identify and prioritize recurrent fusion genes from RNA-sequencing data.

    PubMed

    Paisitkriangkrai, Sakrapee; Quek, Kelly; Nievergall, Eva; Jabbour, Anissa; Zannettino, Andrew; Kok, Chung Hoow

    2018-06-07

    Recurrent oncogenic fusion genes play a critical role in the development of various cancers and diseases and provide, in some cases, excellent therapeutic targets. To date, analysis tools that can identify and compare recurrent fusion genes across multiple samples have not been available to researchers. To address this deficiency, we developed Co-occurrence Fusion (Co-fuse), a new and easy to use software tool that enables biologists to merge RNA-seq information, allowing them to identify recurrent fusion genes, without the need for exhaustive data processing. Notably, Co-fuse is based on pattern mining and statistical analysis which enables the identification of hidden patterns of recurrent fusion genes. In this report, we show that Co-fuse can be used to identify 2 distinct groups within a set of 49 leukemic cell lines based on their recurrent fusion genes: a multiple myeloma (MM) samples-enriched cluster and an acute myeloid leukemia (AML) samples-enriched cluster. Our experimental results further demonstrate that Co-fuse can identify known driver fusion genes (e.g., IGH-MYC, IGH-WHSC1) in MM, when compared to AML samples, indicating the potential of Co-fuse to aid the discovery of yet unknown driver fusion genes through cohort comparisons. Additionally, using a 272 primary glioma sample RNA-seq dataset, Co-fuse was able to validate recurrent fusion genes, further demonstrating the power of this analysis tool to identify recurrent fusion genes. Taken together, Co-fuse is a powerful new analysis tool that can be readily applied to large RNA-seq datasets, and may lead to the discovery of new disease subgroups and potentially new driver genes, for which, targeted therapies could be developed. The Co-fuse R source code is publicly available at https://github.com/sakrapee/co-fuse .

  18. Drug discovery in an academic setting: playing to the strengths.

    PubMed

    Huryn, Donna M

    2013-03-14

    Drug discovery and medicinal chemistry initiatives in academia provide an opportunity to create a unique environment that is distinct from the traditional industrial model. Two characteristics of a university setting that are not usually associated with pharma are the ability to pursue high-risk projects and a depth of expertise, infrastructure, and capabilities in focused areas. Encouraging, supporting, and fostering drug discovery efforts that take advantage of these and other distinguishing characteristics of an academic setting can lead to novel and innovative therapies that might not be discovered otherwise.

  19. Targeted next-generation sequencing helps to decipher the genetic and phenotypic heterogeneity of hypertrophic cardiomyopathy

    PubMed Central

    Cecconi, Massimiliano; Parodi, Maria I.; Formisano, Francesco; Spirito, Paolo; Autore, Camillo; Musumeci, Maria B.; Favale, Stefano; Forleo, Cinzia; Rapezzi, Claudio; Biagini, Elena; Davì, Sabrina; Canepa, Elisabetta; Pennese, Loredana; Castagnetta, Mauro; Degiorgio, Dario; Coviello, Domenico A.

    2016-01-01

    Hypertrophic cardiomyopathy (HCM) is mainly associated with myosin, heavy chain 7 (MYH7) and myosin binding protein C, cardiac (MYBPC3) mutations. In order to better explain the clinical and genetic heterogeneity in HCM patients, in this study, we implemented a target-next generation sequencing (NGS) assay. An Ion AmpliSeq™ Custom Panel for the enrichment of 19 genes, of which 9 of these did not encode thick/intermediate and thin myofilament (TTm) proteins and, among them, 3 responsible of HCM phenocopy, was created. Ninety-two DNA samples were analyzed by the Ion Personal Genome Machine: 73 DNA samples (training set), previously genotyped in some of the genes by Sanger sequencing, were used to optimize the NGS strategy, whereas 19 DNA samples (discovery set) allowed the evaluation of NGS performance. In the training set, we identified 72 out of 73 expected mutations and 15 additional mutations: the molecular diagnosis was achieved in one patient with a previously wild-type status and the pre-excitation syndrome was explained in another. In the discovery set, we identified 20 mutations, 5 of which were in genes encoding non-TTm proteins, increasing the diagnostic yield by approximately 20%: a single mutation in genes encoding non-TTm proteins was identified in 2 out of 3 borderline HCM patients, whereas co-occuring mutations in genes encoding TTm and galactosidase alpha (GLA) altered proteins were characterized in a male with HCM and multiorgan dysfunction. Our combined targeted NGS-Sanger sequencing-based strategy allowed the molecular diagnosis of HCM with greater efficiency than using the conventional (Sanger) sequencing alone. Mutant alleles encoding non-TTm proteins may aid in the complete understanding of the genetic and phenotypic heterogeneity of HCM: co-occuring mutations of genes encoding TTm and non-TTm proteins could explain the wide variability of the HCM phenotype, whereas mutations in genes encoding only the non-TTm proteins are identifiable in patients with a milder HCM status. PMID:27600940

  20. 12 CFR 1081.210 - Expert discovery.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 12 Banks and Banking 8 2012-01-01 2012-01-01 false Expert discovery. 1081.210 Section 1081.210... Initiation of Proceedings and Prehearing Rules § 1081.210 Expert discovery. (a) At a date set by the hearing... discovery in appropriate cases. ...

  1. Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants.

    PubMed

    Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin

    2015-06-01

    Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples.

  2. Integration, Networking, and Global Biobanking in the Age of New Biology.

    PubMed

    Karimi-Busheri, Feridoun; Rasouli-Nia, Aghdass

    2015-01-01

    Scientific revolution is changing the world forever. Many new disciplines and fields have emerged with unlimited possibilities and opportunities. Biobanking is one of many that is benefiting from revolutionary milestones in human genome, post-genomic, and computer and bioinformatics discoveries. The storage, management, and analysis of massive clinical and biological data sets cannot be achieved without a global collaboration and networking. At the same time, biobanking is facing many significant challenges that need to be addressed and solved including dealing with an ever increasing complexity of sample storage and retrieval, data management and integration, and establishing common platforms in a global context. The overall picture of the biobanking of the future, however, is promising. Many population-based biobanks have been formed, and more are under development. It is certain that amazing discoveries will emerge from this large-scale method of preserving and accessing human samples. Signs of a healthy collaboration between industry, academy, and government are encouraging.

  3. 12 CFR 509.24 - Scope of document discovery.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 5 2010-01-01 2010-01-01 false Scope of document discovery. 509.24 Section 509... discovery. (a) Limits on discovery. (1) Subject to the limitations set out in paragraphs (b), (c), and (d) of this section, a party to a proceeding under this subpart may obtain document discovery by serving...

  4. Accuracy of microRNAs as markers for the detection of neck lymph node metastases in patients with head and neck squamous cell carcinoma.

    PubMed

    de Carvalho, Ana Carolina; Scapulatempo-Neto, Cristovam; Maia, Danielle Calheiros Campelo; Evangelista, Adriane Feijó; Morini, Mariana Andozia; Carvalho, André Lopes; Vettore, André Luiz

    2015-05-09

    The presence of metastatic disease in cervical lymph nodes of head and neck squamous cell carcinoma (HNSCC) patients is a very important determinant in therapy choice and prognosis, with great impact in overall survival. Frequently, routine lymph node staging cannot detect occult metastases and the post-surgical histologic evaluation of resected lymph nodes is not sensitive in detecting small metastatic deposits. Molecular markers based on tissue-specific microRNA expression are alternative accurate diagnostic markers. Herein, we evaluated the feasibility of using the expression of microRNAs to detect metastatic cells in formalin-fixed paraffin-embedded (FFPE) lymph nodes and in fine-needle aspiration (FNA) biopsies of HNSCC patients. An initial screening compared the expression of 667 microRNAs in a discovery set comprised by metastatic and non-metastatic lymph nodes from HNSCC patients. The most differentially expressed microRNAs were validated by qRT-PCR in two independent cohorts: i) 48 FFPE lymph node samples, and ii) 113 FNA lymph node biopsies. The accuracy of the markers in identifying metastatic samples was assessed through the analysis of sensitivity, specificity, accuracy, negative predictive value, positive predictive value, and area under the curve values. Seven microRNAs highly expressed in metastatic lymph nodes from the discovery set were validated in FFPE lymph node samples. MiR-203 and miR-205 identified all metastatic samples, regardless of the size of the metastatic deposit. Additionally, these markers also showed high accuracy when FNA samples were examined. The high accuracy of miR-203 and miR-205 warrant these microRNAs as diagnostic markers of neck metastases in HNSCC. These can be evaluated in entire lymph nodes and in FNA biopsies collected at different time-points such as pre-treatment samples, intraoperative sentinel node biopsy, and during patient follow-up. These markers can be useful in a clinical setting in the management of HNSCC patients from initial disease staging and therapy planning to patient surveillance.

  5. Machine Learning Based Classifier for Falsehood Detection

    NASA Astrophysics Data System (ADS)

    Mallikarjun, H. M.; Manimegalai, P., Dr.; Suresh, H. N., Dr.

    2017-08-01

    The investigation of physiological techniques for Falsehood identification tests utilizing the enthusiastic aggravations started as a part of mid 1900s. The need of Falsehood recognition has been a piece of our general public from hundreds of years back. Different requirements drifted over the general public raising the need to create trick evidence philosophies for Falsehood identification. The established similar addressing tests have been having a tendency to gather uncertain results against which new hearty strategies are being explored upon for acquiring more productive Falsehood discovery set up. Electroencephalography (EEG) is a non-obtrusive strategy to quantify the action of mind through the anodes appended to the scalp of a subject. Electroencephalogram is a record of the electric signs produced by the synchronous activity of mind cells over a timeframe. The fundamental goal is to accumulate and distinguish the important information through this action which can be acclimatized for giving surmising to Falsehood discovery in future analysis. This work proposes a strategy for Falsehood discovery utilizing EEG database recorded on irregular people of various age gatherings and social organizations. The factual investigation is directed utilizing MATLAB v-14. It is a superior dialect for specialized registering which spares a considerable measure of time with streamlined investigation systems. In this work center is made on Falsehood Classification by Support Vector Machine (SVM). 72 Samples are set up by making inquiries from standard poll with a Wright and wrong replies in a diverse era from the individual in wearable head unit. 52 samples are trained and 20 are tested. By utilizing Bluetooth based Neurosky’s Mindwave kit, brain waves are recorded and qualities are arranged appropriately. In this work confusion matrix is derived by matlab programs and accuracy of 56.25 % is achieved.

  6. 12 CFR 1081.210 - Expert discovery.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 12 Banks and Banking 8 2013-01-01 2013-01-01 false Expert discovery. 1081.210 Section 1081.210... Initiation of Proceedings and Prehearing Rules § 1081.210 Expert discovery. (a) At a date set by the hearing... requirement of expert discovery in appropriate cases. ...

  7. Genomewide association study of cocaine dependence and related traits: FAM53B identified as a risk gene

    PubMed Central

    Gelernter, Joel; Sherva, Richard; Koesterer, Ryan; Almasy, Laura; Zhao, Hongyu; Kranzler, Henry R.; Farrer, Lindsay

    2013-01-01

    We report a GWAS for cocaine dependence (CD) in three sets of African- and European-American subjects (AAs and EAs, respectively), to identify pathways, genes, and alleles important in CD risk. The discovery GWAS dataset (n=5,697 subjects) was genotyped using the Illumina OmniQuad microarray (890,000 analyzed SNPs). Additional genotypes were imputed based on the 1000 Genomes reference panel. Top-ranked findings were evaluated by incorporating information from publicly available GWAS data from 4,063 subjects. Then, the most significant GWAS SNPs were genotyped in 2,549 independent subjects. We observed one genomewide-significant (GWS) result: rs7086629 at the FAM53B (“family with sequence similarity 53, member B”) locus. This was supported in both AAs and EAs; p-value (meta-analysis of all samples) =4.28×10−8. The gene maps to the same chromosomal region as the maximum peak we observed in a previous linkage study. NCOR2 (nuclear receptor corepressor 1) SNP rs150954431 was associated with p=1.19×10−9 in the EA discovery sample. SNP rs2456778, which maps to CDK1 (“cyclin-dependent kinase 1”), was associated with cocaine-induced paranoia in AAs in the discovery sample only (p=4.68×10−8). This is the first study to identify risk variants for CD using GWAS. Our results implicate novel risk loci and provide insights into potential therapeutic and prevention strategies. PMID:23958962

  8. A Proteomic Analysis of Eccrine Sweat: Implications for the Discovery of Schizophrenia Biomarker Proteins

    PubMed Central

    Raiszadeh, Michelle M.; Ross, Mark M.; Russo, Paul S.; Schaepper, Mary Ann H.; Zhou, Weidong; Deng, Jianghong; Ng, Daniel; Dickson, April; Dickson, Cindy; Strom, Monica; Osorio, Carolina; Soeprono, Thomas; Wulfkuhle, Julia D.; Kabbani, Nadine; Petricoin, Emanuel F.; Liotta, Lance A.; Kirsch, Wolff M.

    2012-01-01

    Liquid chromatography tandem mass spectrometry (LC-MS/MS) and multiple reaction monitoring mass spectrometry (MRM-MS) proteomics analyses were performed on eccrine sweat of healthy controls, and the results were compared with those from individuals diagnosed with schizophrenia (SZ). This is the first large scale study of the sweat proteome. First, we performed LC-MS/MS on pooled SZ samples and pooled control samples for global proteomics analysis. Results revealed a high abundance of diverse proteins and peptides in eccrine sweat. Most of the proteins identified from sweat samples were found to be different than the most abundant proteins from serum, which indicates that eccrine sweat is not simply a plasma transudate, and may thereby be a source of unique disease-associated biomolecules. A second independent set of patient and control sweat samples were analyzed by LC-MS/MS and spectral counting to determine qualitative protein differential abundances between the control and disease groups. Differential abundances of selected proteins, initially determined by spectral counting, were verified by MRM-MS analyses. Seventeen proteins showed a differential abundance of approximately two-fold or greater between the SZ pooled sample and the control pooled sample. This study demonstrates the utility of LC-MS/MS and MRM-MS as a viable strategy for the discovery and verification of potential sweat protein disease biomarkers. PMID:22256890

  9. SPANISH PEAKS PRIMITIVE AREA, MONTANA.

    USGS Publications Warehouse

    Calkins, James A.; Pattee, Eldon C.

    1984-01-01

    A mineral survey of the Spanish Peaks Primitive Area, Montana, disclosed a small low-grade deposit of demonstrated chromite and asbestos resources. The chances for discovery of additional chrome resources are uncertain and the area has little promise for the occurrence of other mineral or energy resources. A reevaluation, sampling at depth, and testing for possible extensions of the Table Mountain asbestos and chromium deposit should be undertaken in the light of recent interpretations regarding its geologic setting.

  10. Separate class true discovery rate degree of association sets for biomarker identification.

    PubMed

    Crager, Michael R; Ahmed, Murat

    2014-01-01

    In 2008, Efron showed that biological features in a high-dimensional study can be divided into classes and a separate false discovery rate (FDR) analysis can be conducted in each class using information from the entire set of features to assess the FDR within each class. We apply this separate class approach to true discovery rate degree of association (TDRDA) set analysis, which is used in clinical-genomic studies to identify sets of biomarkers having strong association with clinical outcome or state while controlling the FDR. Careful choice of classes based on prior information can increase the identification power of the separate class analysis relative to the overall analysis.

  11. Genome-wide association studies identify several new loci associated with pigmentation traits and skin cancer risk in European Americans

    PubMed Central

    Zhang, Mingfeng; Song, Fengju; Liang, Liming; Nan, Hongmei; Zhang, Jiangwen; Liu, Hongliang; Wang, Li-E.; Wei, Qingyi; Lee, Jeffrey E.; Amos, Christopher I.; Kraft, Peter; Qureshi, Abrar A.; Han, Jiali

    2013-01-01

    Aiming to identify novel genetic loci for pigmentation and skin cancer, we conducted a series of genome-wide association studies on hair color, eye color, number of sunburns, tanning ability and number of non-melanoma skin cancers (NMSCs) among 10 183 European Americans in the discovery stage and 4504 European Americans in the replication stage (for eye color, 3871 males in the discovery stage and 2496 males in the replication stage). We targeted novel chromosome regions besides the known ones for replication. As a result, we identified a new region downstream of the EDNRB gene on 13q22 associated with hair color and the strongest association was the single-nucleotide polymorphism (SNP) rs975739 (P = 2.4 × 10−14; P = 5.4 × 10−9 in the discovery set and P = 1.2 × 10−6 in the replication set). Using blue, intermediate (including green) and brown eye colors as co-dominant outcomes, we identified the SNP rs3002288 in VASH2 on 1q32.3 associated with brown eye (P = 7.0 × 10−8; P = 5.3 × 10−5 in the discovery set and P = 0.02 in the replication set). Additionally, we identified a significant interaction between the SNPs rs7173419 and rs12913832 in the OCA2 gene region on brown eye color (P-value for interaction = 3.8 × 10−3). As for the number of NMSCs, we identified two independent SNPs on chr6 and one SNP on chromosome 14: rs12203592 in IRF4 (P = 7.2 × 10−14; P = 1.8 × 10−8 in the discovery set and P = 6.7 × 10−7 in the replication set), rs12202284 between IRF4 and EXOC2 (P = 5.0 × 10−8; P = 6.6 × 10−7 in the discovery set and P = 3.0 × 10−3 in the replication set) and rs8015138 upstream of GNG2 (P = 6.6 × 10−8; P = 5.3 × 10−7 in the discovery set and P = 0.01 in the replication set). PMID:23548203

  12. 37 CFR 42.51 - Discovery.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Discovery. 42.51 Section 42... Production § 42.51 Discovery. (a) Mandatory initial disclosures. (1) With agreement. Parties may agree to mandatory discovery requiring the initial disclosures set forth in the Office Patent Trial Practice Guide...

  13. 37 CFR 42.51 - Discovery.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Discovery. 42.51 Section 42... Production § 42.51 Discovery. (a) Mandatory initial disclosures. (1) With agreement. Parties may agree to mandatory discovery requiring the initial disclosures set forth in the Office Patent Trial Practice Guide...

  14. DPS Discovery Slide Sets for the Introductory Astronomy Instructor

    NASA Astrophysics Data System (ADS)

    Meinke, Bonnie K.; Jackson, Brian; Buxner, Sanlyn; Horst, Sarah; Brain, David; Schneider, Nicholas M.

    2016-10-01

    The DPS actively supports the E/PO needs of the society's membership, including those at the front of the college classroom. The DPS Discovery Slide Sets are an opportunity for instructors to put the latest planetary science into their lectures and for scientists to get their exciting results to college students.In an effort to keep the astronomy classroom apprised of the fast moving field of planetary science, the Division for Planetary Sciences (DPS) has developed "DPS Discoveries", which are 3-slide presentations that can be incorporated into college lectures. The slide sets are targeted at the Introductory Astronomy undergraduate level. Each slide set consists of three slides which cover a description of the discovery, a discussion of the underlying science, and a presentation of the big picture implications of the discovery, with a fourth slide that includes links to associated press releases, images, and primary sources. Topics span all subdisciplines of planetary science, and 26 sets are available in Farsi and Spanish. We intend for these slide sets to help Astronomy 101 instructors include new developments (not yet in their textbooks) into the broader context of the course. If you need supplemental material for your classroom, please checkout the archived collection: http://dps.aas.org/education/dpsdiscMore slide sets are now in development and will be available soon! In the meantime, we seek input, feedback, and help from the DPS membership to add fresh slide sets to the series and to connect the college classroom to YOUR science. It's easy to get involved - we'll provide a content template, tips and tricks for a great slide set, and pedagogy reviews. Talk to a coauthor to find out how you can disseminate your science or get involved in E/PO with your contributions.

  15. Two-way learning with one-way supervision for gene expression data.

    PubMed

    Wong, Monica H T; Mutch, David M; McNicholas, Paul D

    2017-03-04

    A family of parsimonious Gaussian mixture models for the biclustering of gene expression data is introduced. Biclustering is accommodated by adopting a mixture of factor analyzers model with a binary, row-stochastic factor loadings matrix. This particular form of factor loadings matrix results in a block-diagonal covariance matrix, which is a useful property in gene expression analyses, specifically in biomarker discovery scenarios where blood can potentially act as a surrogate tissue for other less accessible tissues. Prior knowledge of the factor loadings matrix is useful in this application and is reflected in the one-way supervised nature of the algorithm. Additionally, the factor loadings matrix can be assumed to be constant across all components because of the relationship desired between the various types of tissue samples. Parameter estimates are obtained through a variant of the expectation-maximization algorithm and the best-fitting model is selected using the Bayesian information criterion. The family of models is demonstrated using simulated data and two real microarray data sets. The first real data set is from a rat study that investigated the influence of diabetes on gene expression in different tissues. The second real data set is from a human transcriptomics study that focused on blood and immune tissues. The microarray data sets illustrate the biclustering family's performance in biomarker discovery involving peripheral blood as surrogate biopsy material. The simulation studies indicate that the algorithm identifies the correct biclusters, most optimally when the number of observation clusters is known. Moreover, the biclustering algorithm identified biclusters comprised of biologically meaningful data related to insulin resistance and immune function in the rat and human real data sets, respectively. Initial results using real data show that this biclustering technique provides a novel approach for biomarker discovery by enabling blood to be used as a surrogate for hard-to-obtain tissues.

  16. Quantitative Analysis of Tissue Samples by Combining iTRAQ Isobaric Labeling with Selected/Multiple Reaction Monitoring (SRM/MRM).

    PubMed

    Narumi, Ryohei; Tomonaga, Takeshi

    2016-01-01

    Mass spectrometry-based phosphoproteomics is an indispensible technique used in the discovery and quantification of phosphorylation events on proteins in biological samples. The application of this technique to tissue samples is especially useful for the discovery of biomarkers as well as biological studies. We herein describe the application of a large-scale phosphoproteome analysis and SRM/MRM-based quantitation to develop a strategy for the systematic discovery and validation of biomarkers using tissue samples.

  17. 12 CFR 1780.26 - Discovery.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 7 2010-01-01 2010-01-01 false Discovery. 1780.26 Section 1780.26 Banks and... OF PRACTICE AND PROCEDURE RULES OF PRACTICE AND PROCEDURE Prehearing Proceedings § 1780.26 Discovery. (a) Limits on discovery. Subject to the limitations set out in paragraphs (b), (d), and (e) of this...

  18. Motif-based analysis of large nucleotide data sets using MEME-ChIP

    PubMed Central

    Ma, Wenxiu; Noble, William S; Bailey, Timothy L

    2014-01-01

    MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by cLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix–based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP’s interactive HTML output groups and aligns significant motifs to ease interpretation. this protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods. PMID:24853928

  19. [Criteria for the classification as a "domestic-setting corpse"--a literature search and review to define the term].

    PubMed

    Merz, Marius; Birngruber, Christoph G; Heidorn, Frank; Ramsthaler, Frank; Risse, Manfred; Kreutz, Kerstin; Krähahn, Jonathan; Verhoff, Marcel A

    2011-01-01

    In German medical and media circles (daily routine, specialist literature, press, novels), the term "domestic-setting corpse" is frequently used, but the term is only vaguely defined. The authors thus decided to perform an in-depth study of the literature, including historic textbooks and all German- and English-language medicolegal journals, going as far back as their first issues, in an attempt to more clearly define the term. Inclusion criteria used in the search were a post-mortem interval of at least 24 hours prior to discovery and discovery of the corpse in a domestic setting. In the literature, 37 cases that complied with the above-mentioned inclusion criteria were found. These cases frequently described "advanced decomposition", often "unclear cause of death" and "problems in identification". These characteristics can thus be considered as being additional pointers in the definition. However, we suggest that the two general defining characteristics of a "domestic-setting corpse" are a post-mortem interval of more than 24 hours before discovery and the discovery of the corpse in a domestic setting.

  20. Discovery of coesite and shocked quartz associated with the upper Eocene cpx spherule layer

    NASA Technical Reports Server (NTRS)

    Liu, S.; Kyte, T.; Glass, B. P.

    2002-01-01

    At least two major impact ejecta layers have been discovered in upper Eocene strata. The upper layer is the North American microtektite layer. lt consists tektite fragments, microtektites, and shocked mineral grains (e.g., quartz and feldspar with multiple sets of PDFs, coesite and reidite (a high-pressure polymorph of zircon)). The slightly older layer contains clinopyroxene-bearing (cpx) spherules and microtektites associated with an Ir anomaly. The North American tektite layer may be derived from the Chesapeake Bay impact structure, and the cpx spherule layer may from the Popigai impact crater. A cpx spherule layer associated with a positive Ir anomaly was recently found at ODP Site 709, western Indian Ocean. A large sample (Hole 709C, core 31, section 4, 145-150 cm), originally used for a study of interstitial water by shipboard scientists, was acquired for the purpose of recovering a large number of spherules for various petrographic and geochemical studies. A split of the sample (50.35 g) was disaggregated and wet-sieved. More than 17,000 cpx spherules and several hundred microtektites (larger than 125 microns) were recovered from the sample. Rare white opaque grains were observed in the 125-250 micron size fraction after removal of the carbonate component using dilute HCI. Seven of the white opaque grains were X-rayed using a Gandolfi camera and six were found to be coesite (probably mixed with lechatelierite). Eighty translucent colorless grains from the 63-125 micron size fraction were studied with a petrographic microscope. Four of the grains exhibit one to two sets of planar deformation features (PDFs). The only other possible known occurrence of shocked minerals associated with the cpx spherule layer is at Massignano, Italy, where pancake-shaped clay spherules (thought to be diagenetically altered cpx spherules are associated with a positive Ir anomaly and Ni- rich spinel crystals. Shocked quartz grains with multiple sets of PDFs also occur at this site. Until now, unmelted impact ejecta have not been found associated with the cpx spherules at any of the other 20 sites around the world and this is the first time that coesite has been found associated with the cpx spherule layer. The discovery of coesite and shocked quartz associated with the cpx spherules at Site 709 in Indian Ocean is further evidence for the impact origin of the cpx spherule layer. We hope that future discovery of other unmelted minerals from this sample may provide materials to establish constraints on the provenance of this late Eocene ejecta.

  1. 12 CFR 308.24 - Scope of document discovery.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Scope of document discovery. 308.24 Section 308... PRACTICE AND PROCEDURE Uniform Rules of Practice and Procedure § 308.24 Scope of document discovery. (a) Limits on discovery. (1) Subject to the limitations set out in paragraphs (b), (c), and (d) of this...

  2. 12 CFR 308.107 - Document discovery.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 12 Banks and Banking 5 2014-01-01 2014-01-01 false Document discovery. 308.107 Section 308.107... PRACTICE AND PROCEDURE General Rules of Procedure § 308.107 Document discovery. (a) Parties to proceedings set forth at § 308.01 of the Uniform Rules and as provided in the Local Rules may obtain discovery...

  3. 12 CFR 908.46 - Discovery.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 12 Banks and Banking 7 2011-01-01 2011-01-01 false Discovery. 908.46 Section 908.46 Banks and... PRACTICE AND PROCEDURE IN HEARINGS ON THE RECORD Pre-Hearing Proceedings § 908.46 Discovery. (a) Limits on discovery. Subject to the limitations set out in paragraphs (b), (d), and (e) of this section, any party to...

  4. 12 CFR 308.107 - Document discovery.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 12 Banks and Banking 4 2011-01-01 2011-01-01 false Document discovery. 308.107 Section 308.107... PRACTICE AND PROCEDURE General Rules of Procedure § 308.107 Document discovery. (a) Parties to proceedings set forth at § 308.01 of the Uniform Rules and as provided in the Local Rules may obtain discovery...

  5. 12 CFR 308.107 - Document discovery.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 12 Banks and Banking 5 2012-01-01 2012-01-01 false Document discovery. 308.107 Section 308.107... PRACTICE AND PROCEDURE General Rules of Procedure § 308.107 Document discovery. (a) Parties to proceedings set forth at § 308.01 of the Uniform Rules and as provided in the Local Rules may obtain discovery...

  6. 12 CFR 308.107 - Document discovery.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 12 Banks and Banking 5 2013-01-01 2013-01-01 false Document discovery. 308.107 Section 308.107... PRACTICE AND PROCEDURE General Rules of Procedure § 308.107 Document discovery. (a) Parties to proceedings set forth at § 308.01 of the Uniform Rules and as provided in the Local Rules may obtain discovery...

  7. 12 CFR 308.107 - Document discovery.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Document discovery. 308.107 Section 308.107... PRACTICE AND PROCEDURE General Rules of Procedure § 308.107 Document discovery. (a) Parties to proceedings set forth at § 308.01 of the Uniform Rules and as provided in the Local Rules may obtain discovery...

  8. 12 CFR 908.46 - Discovery.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 7 2010-01-01 2010-01-01 false Discovery. 908.46 Section 908.46 Banks and... PRACTICE AND PROCEDURE IN HEARINGS ON THE RECORD Pre-Hearing Proceedings § 908.46 Discovery. (a) Limits on discovery. Subject to the limitations set out in paragraphs (b), (d), and (e) of this section, any party to...

  9. Utility of metabolic profiling of serum in the diagnosis of pregnancy complications.

    PubMed

    Powell, Katie L; Carrozzi, Anthony; Stephens, Alexandre S; Tasevski, Vitomir; Morris, Jonathan M; Ashton, Anthony W; Dona, Anthony C

    2018-06-01

    Currently there are no clinical screening tests available to identify pregnancies at risk of developing preeclampsia (PET) and/or intrauterine growth restriction (IUGR), both of which are associated with abnormal placentation. Metabolic profiling is now a stable analytical platform used in many laboratories and has successfully been used to identify biomarkers associated with various pathological states. We used nuclear magnetic resonance spectroscopy (NMR) to metabolically profile serum samples collected from 143 pregnant women at 26-41 weeks gestation with pregnancy outcomes of PET, IUGR, PET IUGR or small for gestational age (SGA) that were age-matched to normal pre/term pregnancies. Spectral analysis found no difference in the measured metabolites from normal term, pre-term and SGA samples, and of 25 identified metabolites, only glutamate was marginally different between groups. Of the identified metabolites, 3-methylhistidine, creatinine, acetyl groups and acetate, were determined to be independent predictors of PET and produced area under the curves (AUC) = 0.938 and 0.936 for the discovery and validation sets. Only 3-hydroxybutyrate was determined to be an independent predictor of IUGR, however the model had low predictive power (AUC = 0.623 and 0.581 for the discovery and validation sets). A sub-panel of metabolites had strong predictive power for identifying PET samples in a validation dataset, however prediction of IUGR was more difficult using the identified metabolites. NMR based metabolomics can identify metabolites strongly associated with disease and has the potential to be useful in developing early clinical screening tests for at risk pregnancies. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Analysis of latency performance of bluetooth low energy (BLE) networks.

    PubMed

    Cho, Keuchul; Park, Woojin; Hong, Moonki; Park, Gisu; Cho, Wooseong; Seo, Jihoon; Han, Kijun

    2014-12-23

    Bluetooth Low Energy (BLE) is a short-range wireless communication technology aiming at low-cost and low-power communication. The performance evaluation of classical Bluetooth device discovery have been intensively studied using analytical modeling and simulative methods, but these techniques are not applicable to BLE, since BLE has a fundamental change in the design of the discovery mechanism, including the usage of three advertising channels. Recently, there several works have analyzed the topic of BLE device discovery, but these studies are still far from thorough. It is thus necessary to develop a new, accurate model for the BLE discovery process. In particular, the wide range settings of the parameters introduce lots of potential for BLE devices to customize their discovery performance. This motivates our study of modeling the BLE discovery process and performing intensive simulation. This paper is focused on building an analytical model to investigate the discovery probability, as well as the expected discovery latency, which are then validated via extensive experiments. Our analysis considers both continuous and discontinuous scanning modes. We analyze the sensitivity of these performance metrics to parameter settings to quantitatively examine to what extent parameters influence the performance metric of the discovery processes.

  11. Analysis of Latency Performance of Bluetooth Low Energy (BLE) Networks

    PubMed Central

    Cho, Keuchul; Park, Woojin; Hong, Moonki; Park, Gisu; Cho, Wooseong; Seo, Jihoon; Han, Kijun

    2015-01-01

    Bluetooth Low Energy (BLE) is a short-range wireless communication technology aiming at low-cost and low-power communication. The performance evaluation of classical Bluetooth device discovery have been intensively studied using analytical modeling and simulative methods, but these techniques are not applicable to BLE, since BLE has a fundamental change in the design of the discovery mechanism, including the usage of three advertising channels. Recently, there several works have analyzed the topic of BLE device discovery, but these studies are still far from thorough. It is thus necessary to develop a new, accurate model for the BLE discovery process. In particular, the wide range settings of the parameters introduce lots of potential for BLE devices to customize their discovery performance. This motivates our study of modeling the BLE discovery process and performing intensive simulation. This paper is focused on building an analytical model to investigate the discovery probability, as well as the expected discovery latency, which are then validated via extensive experiments. Our analysis considers both continuous and discontinuous scanning modes. We analyze the sensitivity of these performance metrics to parameter settings to quantitatively examine to what extent parameters influence the performance metric of the discovery processes. PMID:25545266

  12. 18 CFR 385.401 - Applicability (Rule 401).

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ..., DEPARTMENT OF ENERGY PROCEDURAL RULES RULES OF PRACTICE AND PROCEDURE Discovery Procedures for Matters Set... in paragraph (b) of this section, this subpart applies to discovery in proceedings set for hearing... under the Freedom of Information Act, 5 U.S.C. 552, governed by Part 388 of this chapter; or, (2...

  13. A Genome-Wide Association Study of Depressive Symptoms

    PubMed Central

    Cornelis, Marilyn C.; Amin, Najaf; Bakshis, Erin; Baumert, Jens; Ding, Jingzhong; Liu, Yongmei; Marciante, Kristin; Meirelles, Osorio; Nalls, Michael A.; Sun, Yan V.; Vogelzangs, Nicole; Yu, Lei; Bandinelli, Stefania; Benjamin, Emelia J.; Bennett, David A.; Boomsma, Dorret; Cannas, Alessandra; Coker, Laura H.; de Geus, Eco; De Jager, Philip L.; Diez-Roux, Ana V.; Purcell, Shaun; Hu, Frank B.; Rimma, Eric B.; Hunter, David J.; Jensen, Majken K.; Curhan, Gary; Rice, Kenneth; Penman, Alan D.; Rotter, Jerome I.; Sotoodehnia, Nona; Emeny, Rebecca; Eriksson, Johan G.; Evans, Denis A.; Ferrucci, Luigi; Fornage, Myriam; Gudnason, Vilmundur; Hofman, Albert; Illig, Thomas; Kardia, Sharon; Kelly-Hayes, Margaret; Koenen, Karestan; Kraft, Peter; Kuningas, Maris; Massaro, Joseph M.; Melzer, David; Mulas, Antonella; Mulder, Cornelis L.; Murray, Anna; Oostra, Ben A.; Palotie, Aarno; Penninx, Brenda; Petersmann, Astrid; Pilling, Luke C.; Psaty, Bruce; Rawal, Rajesh; Reiman, Eric M.; Schulz, Andrea; Shulman, Joshua M.; Singleton, Andrew B.; Smith, Albert V.; Sutin, Angelina R.; Uitterlinden, André G.; Völzke, Henry; Widen, Elisabeth; Yaffe, Kristine; Zonderman, Alan B.; Cucca, Francesco; Harris, Tamara; Ladwig, Karl-Heinz; Llewellyn, David J.; Räikkönen, Katri; Tanaka, Toshiko

    2013-01-01

    Background Depression is a heritable trait that exists on a continuum of varying severity and duration. Yet, the search for genetic variants associated with depression has had few successes. We exploit the entire continuum of depression to find common variants for depressive symptoms. Methods In this genome-wide association study, we combined the results of 17 population-based studies assessing depressive symptoms with the Center for Epidemiological Studies Depression Scale. Replication of the independent top hits (p < 1 × 10−5) was performed in five studies assessing depressive symptoms with other instruments. In addition, we performed a combined meta-analysis of all 22 discovery and replication studies. Results The discovery sample comprised 34,549 individuals (mean age of 66.5) and no loci reached genome-wide significance (lowest p = 1.05 × 10−7). Seven independent single nucleotide polymorphisms were considered for replication. In the replication set (n = 16,709), we found suggestive association of one single nucleotide polymorphism with depressive symptoms (rs161645, 5q21, p = 9.19 × 10−3). This 5q21 region reached genome-wide significance (p = 4.78 × 10−8) in the overall meta-analysis combining discovery and replication studies (n = 51,258). Conclusions The results suggest that only a large sample comprising more than 50,000 subjects may be sufficiently powered to detect genes for depressive symptoms. PMID:23290196

  14. A fortran program for Monte Carlo simulation of oil-field discovery sequences

    USGS Publications Warehouse

    Bohling, Geoffrey C.; Davis, J.C.

    1993-01-01

    We have developed a program for performing Monte Carlo simulation of oil-field discovery histories. A synthetic parent population of fields is generated as a finite sample from a distribution of specified form. The discovery sequence then is simulated by sampling without replacement from this parent population in accordance with a probabilistic discovery process model. The program computes a chi-squared deviation between synthetic and actual discovery sequences as a function of the parameters of the discovery process model, the number of fields in the parent population, and the distributional parameters of the parent population. The program employs the three-parameter log gamma model for the distribution of field sizes and employs a two-parameter discovery process model, allowing the simulation of a wide range of scenarios. ?? 1993.

  15. Biomarker discovery for colon cancer using a 761 gene RT-PCR assay.

    PubMed

    Clark-Langone, Kim M; Wu, Jenny Y; Sangli, Chithra; Chen, Angela; Snable, James L; Nguyen, Anhthu; Hackett, James R; Baker, Joffre; Yothers, Greg; Kim, Chungyeul; Cronin, Maureen T

    2007-08-15

    Reverse transcription PCR (RT-PCR) is widely recognized to be the gold standard method for quantifying gene expression. Studies using RT-PCR technology as a discovery tool have historically been limited to relatively small gene sets compared to other gene expression platforms such as microarrays. We have recently shown that TaqMan RT-PCR can be scaled up to profile expression for 192 genes in fixed paraffin-embedded (FPE) clinical study tumor specimens. This technology has also been used to develop and commercialize a widely used clinical test for breast cancer prognosis and prediction, the Onco typeDX assay. A similar need exists in colon cancer for a test that provides information on the likelihood of disease recurrence in colon cancer (prognosis) and the likelihood of tumor response to standard chemotherapy regimens (prediction). We have now scaled our RT-PCR assay to efficiently screen 761 biomarkers across hundreds of patient samples and applied this process to biomarker discovery in colon cancer. This screening strategy remains attractive due to the inherent advantages of maintaining platform consistency from discovery through clinical application. RNA was extracted from formalin fixed paraffin embedded (FPE) tissue, as old as 28 years, from 354 patients enrolled in NSABP C-01 and C-02 colon cancer studies. Multiplexed reverse transcription reactions were performed using a gene specific primer pool containing 761 unique primers. PCR was performed as independent TaqMan reactions for each candidate gene. Hierarchal clustering demonstrates that genes expected to co-express form obvious, distinct and in certain cases very tightly correlated clusters, validating the reliability of this technical approach to biomarker discovery. We have developed a high throughput, quantitatively precise multi-analyte gene expression platform for biomarker discovery that approaches low density DNA arrays in numbers of genes analyzed while maintaining the high specificity, sensitivity and reproducibility that are characteristics of RT-PCR. Biomarkers discovered using this approach can be transferred to a clinical reference laboratory setting without having to re-validate the assay on a second technology platform.

  16. Comparative Evaluation of Preprocessing Freeware on Chromatography/Mass Spectrometry Data for Signature Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coble, Jamie B.; Fraga, Carlos G.

    2014-07-07

    Preprocessing software is crucial for the discovery of chemical signatures in metabolomics, chemical forensics, and other signature-focused disciplines that involve analyzing large data sets from chemical instruments. Here, four freely available and published preprocessing tools known as metAlign, MZmine, SpectConnect, and XCMS were evaluated for impurity profiling using nominal mass GC/MS data and accurate mass LC/MS data. Both data sets were previously collected from the analysis of replicate samples from multiple stocks of a nerve-agent precursor. Each of the four tools had their parameters set for the untargeted detection of chromatographic peaks from impurities present in the stocks. The peakmore » table generated by each preprocessing tool was analyzed to determine the number of impurity components detected in all replicate samples per stock. A cumulative set of impurity components was then generated using all available peak tables and used as a reference to calculate the percent of component detections for each tool, in which 100% indicated the detection of every component. For the nominal mass GC/MS data, metAlign performed the best followed by MZmine, SpectConnect, and XCMS with detection percentages of 83, 60, 47, and 42%, respectively. For the accurate mass LC/MS data, the order was metAlign, XCMS, and MZmine with detection percentages of 80, 45, and 35%, respectively. SpectConnect did not function for the accurate mass LC/MS data. Larger detection percentages were obtained by combining the top performer with at least one of the other tools such as 96% by combining metAlign with MZmine for the GC/MS data and 93% by combining metAlign with XCMS for the LC/MS data. In terms of quantitative performance, the reported peak intensities had average absolute biases of 41, 4.4, 1.3 and 1.3% for SpectConnect, metAlign, XCMS, and MZmine, respectively, for the GC/MS data. For the LC/MS data, the average absolute biases were 22, 4.5, and 3.1% for metAlign, MZmine, and XCMS, respectively. In summary, metAlign performed the best in terms of peak discovery; however, more than one preprocessing tool should be considered to avoid missing potential chemical signatures.« less

  17. Alzheimer's Disease Sequencing Project discovery and replication criteria for cases and controls: Data from a community-based prospective cohort study with autopsy follow-up.

    PubMed

    Crane, Paul K; Foroud, Tatiana; Montine, Thomas J; Larson, Eric B

    2017-12-01

    The Alzheimer's Disease Sequencing Project (ADSP) used different criteria for assigning case and control status from the discovery and replication phases of the project. We considered data from a community-based prospective cohort study with autopsy follow-up where participants could be categorized as case, control, or neither by both definitions and compared the two sets of criteria. We used data from the Adult Changes in Thought (ACT) study including Diagnostic and Statistical Manual-IV criteria for dementia status, McKhann et al. criteria for clinical Alzheimer's disease, and Braak and Consortium to Establish a Registry for AD findings on neurofibrillary tangles and neuritic plaques to categorize the 621 ACT participants of European ancestry who died and came to autopsy. We applied ADSP discovery and replication definitions to identify controls, cases, and people who were neither controls nor cases. There was some agreement between the discovery and replication definitions. Major areas of discrepancy included the finding that only 40% of the discovery sample controls had sufficiently low levels of neurofibrillary tangles and neuritic plaques to be considered controls by the replication criteria and the finding that 16% of the replication phase cases were diagnosed with non-AD dementia during life and thus were excluded as cases for the discovery phase. These findings should inform interpretation of genetic association findings from the ADSP. Differences in genetic association findings between the two phases of the study may reflect these different phenotype definitions from the discovery and replication phase of the ADSP. Copyright © 2017 the Alzheimer's Association. Published by Elsevier Inc. All rights reserved.

  18. The Outer Solar System Origin Survey full data release orbit catalog and characterization.

    NASA Astrophysics Data System (ADS)

    Kavelaars, J. J.; Bannister, Michele T.; Gladman, Brett; Petit, Jean-Marc; Gwyn, Stephen; Alexandersen, Mike; Chen, Ying-Tung; Volk, Kathryn; OSSOS Collaboration.

    2017-10-01

    The Outer Solar System Origin Survey (OSSOS) completed main data acquisition in February 2017. Here we report the release of our full orbit sample, which include 836 TNOs with high precision orbit determination and classification. We combine the OSSOS orbit sample with previously release Canada-France Ecliptic Plane Survey (CFEPS) and a precursor survey to OSSOS by Alexandersen et al. to provide a sample of over 1100 TNO orbits with high precision classified orbits and precisely determined discovery and tracking circumstances (characterization). We are releasing the full sample and characterization to the world community, along with software for conducting ‘Survey Simulations’, so that this sample of orbits can be used to test models of the formation of our outer solar system against the observed sample. Here I will present the characteristics of the data set and present a parametric model for the structure of the classical Kuiper belt.

  19. 12 CFR 19.24 - Scope of document discovery.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 1 2010-01-01 2010-01-01 false Scope of document discovery. 19.24 Section 19... PROCEDURE Uniform Rules of Practice and Procedure § 19.24 Scope of document discovery. (a) Limits on discovery. (1) Subject to the limitations set out in paragraphs (b), (c), and (d) of this section, a party...

  20. 12 CFR 747.100 - Discovery limitations.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 12 Banks and Banking 7 2014-01-01 2014-01-01 false Discovery limitations. 747.100 Section 747.100... Practice and Procedure § 747.100 Discovery limitations. (a) Parties to a proceeding set forth either at § 747.1 of subpart A or in subpart C, E or G of this part may obtain discovery only through the...

  1. 12 CFR 747.100 - Discovery limitations.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 12 Banks and Banking 7 2012-01-01 2012-01-01 false Discovery limitations. 747.100 Section 747.100... Practice and Procedure § 747.100 Discovery limitations. (a) Parties to a proceeding set forth either at § 747.1 of subpart A or in subpart C, E or G of this part may obtain discovery only through the...

  2. 12 CFR 747.100 - Discovery limitations.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 12 Banks and Banking 7 2013-01-01 2013-01-01 false Discovery limitations. 747.100 Section 747.100... Practice and Procedure § 747.100 Discovery limitations. (a) Parties to a proceeding set forth either at § 747.1 of subpart A or in subpart C, E or G of this part may obtain discovery only through the...

  3. 12 CFR 747.100 - Discovery limitations.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 6 2010-01-01 2010-01-01 false Discovery limitations. 747.100 Section 747.100... Practice and Procedure § 747.100 Discovery limitations. (a) Parties to a proceeding set forth either at § 747.1 of subpart A or in subpart C, E or G of this part may obtain discovery only through the...

  4. Mining large heterogeneous data sets in drug discovery.

    PubMed

    Wild, David J

    2009-10-01

    Increasingly, effective drug discovery involves the searching and data mining of large volumes of information from many sources covering the domains of chemistry, biology and pharmacology amongst others. This has led to a proliferation of databases and data sources relevant to drug discovery. This paper provides a review of the publicly-available large-scale databases relevant to drug discovery, describes the kinds of data mining approaches that can be applied to them and discusses recent work in integrative data mining that looks for associations that pan multiple sources, including the use of Semantic Web techniques. The future of mining large data sets for drug discovery requires intelligent, semantic aggregation of information from all of the data sources described in this review, along with the application of advanced methods such as intelligent agents and inference engines in client applications.

  5. Integrating sampling techniques and inverse virtual screening: toward the discovery of artificial peptide-based receptors for ligands.

    PubMed

    Pérez, Germán M; Salomón, Luis A; Montero-Cabrera, Luis A; de la Vega, José M García; Mascini, Marcello

    2016-05-01

    A novel heuristic using an iterative select-and-purge strategy is proposed. It combines statistical techniques for sampling and classification by rigid molecular docking through an inverse virtual screening scheme. This approach aims to the de novo discovery of short peptides that may act as docking receptors for small target molecules when there are no data available about known association complexes between them. The algorithm performs an unbiased stochastic exploration of the sample space, acting as a binary classifier when analyzing the entire peptides population. It uses a novel and effective criterion for weighting the likelihood of a given peptide to form an association complex with a particular ligand molecule based on amino acid sequences. The exploratory analysis relies on chemical information of peptides composition, sequence patterns, and association free energies (docking scores) in order to converge to those peptides forming the association complexes with higher affinities. Statistical estimations support these results providing an association probability by improving predictions accuracy even in cases where only a fraction of all possible combinations are sampled. False positives/false negatives ratio was also improved with this method. A simple rigid-body docking approach together with the proper information about amino acid sequences was used. The methodology was applied in a retrospective docking study to all 8000 possible tripeptide combinations using the 20 natural amino acids, screened against a training set of 77 different ligands with diverse functional groups. Afterward, all tripeptides were screened against a test set of 82 ligands, also containing different functional groups. Results show that our integrated methodology is capable of finding a representative group of the top-scoring tripeptides. The associated probability of identifying the best receptor or a group of the top-ranked receptors is more than double and about 10 times higher, respectively, when compared to classical random sampling methods.

  6. 18 CFR 385.410 - Objections to discovery, motions to quash or to compel, and protective orders (Rule 410).

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Objections to discovery... RULES OF PRACTICE AND PROCEDURE Discovery Procedures for Matters Set for Hearing Under Subpart E § 385.410 Objections to discovery, motions to quash or to compel, and protective orders (Rule 410). (a...

  7. WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

    PubMed Central

    Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest

    2007-01-01

    WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794

  8. "Discoveries in Planetary Sciences": Slide Sets Highlighting New Advances for Astronomy Educators

    NASA Astrophysics Data System (ADS)

    Brain, D. A.; Schneider, N. M.; Beyer, R. A.

    2010-12-01

    Planetary science is a field that evolves rapidly, motivated by spacecraft mission results. Exciting new mission results are generally communicated rather quickly to the public in the form of press releases and news stories, but it can take several years for new advances to work their way into college textbooks. Yet it is important for students to have exposure to these new advances for a number of reasons. In some cases, new work renders older textbook knowledge incorrect or incomplete. In some cases, new discoveries make it possible to emphasize older textbook knowledge in a new way. In all cases, new advances provide exciting and accessible examples of the scientific process in action. To bridge the gap between textbooks and new advances in planetary sciences we have developed content on new discoveries for use by undergraduate instructors. Called 'Discoveries in Planetary Sciences', each new discovery is summarized in a 3-slide PowerPoint presentation. The first slide describes the discovery, the second slide discusses the underlying planetary science concepts, and the third presents the big picture implications of the discovery. A fourth slide includes links to associated press releases, images, and primary sources. This effort is generously sponsored by the Division for Planetary Sciences of the American Astronomical Society, and the slide sets are available at http://dps.aas.org/education/dpsdisc/. Sixteen slide sets have been released so far covering topics spanning all sub-disciplines of planetary science. Results from the following spacecraft missions have been highlighted: MESSENGER, the Spirit and Opportunity rovers, Cassini, LCROSS, EPOXI, Chandrayan, Mars Reconnaissance Orbiter, Mars Express, and Venus Express. Additionally, new results from Earth-orbiting and ground-based observing platforms and programs such as Hubble, Keck, IRTF, the Catalina Sky Survey, HARPS, MEarth, Spitzer, and amateur astronomers have been highlighted. 4-5 new slide sets are scheduled for release before December 2010. In this presentation we will discuss our motivation for this project, our implementation approach (from choosing topics to creating the slide sets, to getting them reviewed and released), and give examples of slide sets. We will present information in the form of web statistics on how many educators are using the slide sets, and which topics are most popular. We will also present feedback from educators who have used them in the classroom, and possible new directions for our activity.

  9. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study.

    PubMed

    Ficklin, Stephen P; Dunwoodie, Leland J; Poehlman, William L; Watson, Christopher; Roche, Kimberly E; Feltus, F Alex

    2017-08-17

    A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.

  10. The EIPeptiDi tool: enhancing peptide discovery in ICAT-based LC MS/MS experiments.

    PubMed

    Cannataro, Mario; Cuda, Giovanni; Gaspari, Marco; Greco, Sergio; Tradigo, Giuseppe; Veltri, Pierangelo

    2007-07-15

    Isotope-coded affinity tags (ICAT) is a method for quantitative proteomics based on differential isotopic labeling, sample digestion and mass spectrometry (MS). The method allows the identification and relative quantification of proteins present in two samples and consists of the following phases. First, cysteine residues are either labeled using the ICAT Light or ICAT Heavy reagent (having identical chemical properties but different masses). Then, after whole sample digestion, the labeled peptides are captured selectively using the biotin tag contained in both ICAT reagents. Finally, the simplified peptide mixture is analyzed by nanoscale liquid chromatography-tandem mass spectrometry (LC-MS/MS). Nevertheless, the ICAT LC-MS/MS method still suffers from insufficient sample-to-sample reproducibility on peptide identification. In particular, the number and the type of peptides identified in different experiments can vary considerably and, thus, the statistical (comparative) analysis of sample sets is very challenging. Low information overlap at the peptide and, consequently, at the protein level, is very detrimental in situations where the number of samples to be analyzed is high. We designed a method for improving the data processing and peptide identification in sample sets subjected to ICAT labeling and LC-MS/MS analysis, based on cross validating MS/MS results. Such a method has been implemented in a tool, called EIPeptiDi, which boosts the ICAT data analysis software improving peptide identification throughout the input data set. Heavy/Light (H/L) pairs quantified but not identified by the MS/MS routine, are assigned to peptide sequences identified in other samples, by using similarity criteria based on chromatographic retention time and Heavy/Light mass attributes. EIPeptiDi significantly improves the number of identified peptides per sample, proving that the proposed method has a considerable impact on the protein identification process and, consequently, on the amount of potentially critical information in clinical studies. The EIPeptiDi tool is available at http://bioingegneria.unicz.it/~veltri/projects/eipeptidi/ with a demo data set. EIPeptiDi significantly increases the number of peptides identified and quantified in analyzed samples, thus reducing the number of unassigned H/L pairs and allowing a better comparative analysis of sample data sets.

  11. Clinical response to PD-1 blockade correlates with a sub-fraction of peripheral central memory CD4+ T cells in patients with malignant melanoma.

    PubMed

    Takeuchi, Yoshiko; Tanemura, Atsushi; Tada, Yasuko; Katayama, Ichiro; Kumanogoh, Atsushi; Nishikawa, Hiroyoshi

    2018-02-03

    Cancer immunotherapy that blocks immune checkpoint molecules, such as PD-1/PD-L1, unleashes dysfunctional antitumor T-cell responses and has durable clinical benefits in various types of cancers. Yet its clinical efficacy is limited to a small proportion of patients, highlighting the need for identifying biomarkers that can predict the clinical response by exploring antitumor responses crucial for tumor regression. Here, we explored comprehensive immune-cell responses associated with clinical benefits using PBMCs from patients with malignant melanoma treated with anti-PD-1 monoclonal antibody. Pre- and post-treatment samples were collected from two different cohorts (discovery set and validation set) and subjected to mass cytometry assays that measured the expression levels of 35 proteins. Screening by high dimensional clustering in the discovery set identified increases in three micro-clusters of CD4+ T cells, a subset of central memory CD4+ T cells harboring the CD27+FAS-CD45RA-CCR7+ phenotype, after treatment in long-term survivors, but not in non-responders. The same increase was also observed in clinical responders in the validation set. We propose that increases in this subset of central memory CD4+ T cells in peripheral blood can be potentially used as a predictor of clinical response to PD-1 blockade therapy in patients with malignant melanoma. © The Japanese Society for Immunology. 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Discovery and Targeted Proteomics on Cutaneous Biopsies Infected by Borrelia to Investigate Lyme Disease*

    PubMed Central

    Schnell, Gilles; Boeuf, Amandine; Westermann, Benoît; Jaulhac, Benoît; Lipsker, Dan; Carapito, Christine; Boulanger, Nathalie; Ehret-Sabatier, Laurence

    2015-01-01

    Lyme disease is the most important vector-borne disease in the Northern hemisphere and represents a major public health challenge with insufficient means of reliable diagnosis. Skin is rarely investigated in proteomics but constitutes in the case of Lyme disease the key interface where the pathogens can enter, persist, and multiply. Therefore, we investigated proteomics on skin samples to detect Borrelia proteins directly in cutaneous biopsies in a robust and specific way. We first set up a discovery gel prefractionation-LC-MS/MS approach on a murine model infected by Borrelia burgdorferi sensu stricto that allowed the identification of 25 Borrelia proteins among more than 1300 mouse proteins. Then we developed a targeted gel prefractionation-LC-selected reaction monitoring (SRM) assay to detect 9/33 Borrelia proteins/peptides in mouse skin tissue samples using heavy labeled synthetic peptides. We successfully transferred this assay from the mouse model to human skin biopsies (naturally infected by Borrelia), and we were able to detect two Borrelia proteins: OspC and flagellin. Considering the extreme variability of OspC, we developed an extended SRM assay to target a large set of variants. This assay afforded the detection of nine peptides belonging to either OspC or flagellin in human skin biopsies. We further shortened the sample preparation and showed that Borrelia is detectable in mouse and human skin biopsies by directly using a liquid digestion followed by LC-SRM analysis without any prefractionation. This study thus shows that a targeted SRM approach is a promising tool for the early direct diagnosis of Lyme disease with high sensitivity (<10 fmol of OspC/mg of human skin biopsy). PMID:25713121

  13. Building Format-Agnostic Metadata Repositories

    NASA Astrophysics Data System (ADS)

    Cechini, M.; Pilone, D.

    2010-12-01

    This presentation will discuss the problems that surround persisting and discovering metadata in multiple formats; a set of tenets that must be addressed in a solution; and NASA’s Earth Observing System (EOS) ClearingHOuse’s (ECHO) proposed approach. In order to facilitate cross-discipline data analysis, Earth Scientists will potentially interact with more than one data source. The most common data discovery paradigm relies on services and/or applications facilitating the discovery and presentation of metadata. What may not be common are the formats in which the metadata are formatted. As the number of sources and datasets utilized for research increases, it becomes more likely that a researcher will encounter conflicting metadata formats. Metadata repositories, such as the EOS ClearingHOuse (ECHO), along with data centers, must identify ways to address this issue. In order to define the solution to this problem, the following tenets are identified: - There exists a set of ‘core’ metadata fields recommended for data discovery. - There exists a set of users who will require the entire metadata record for advanced analysis. - There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. - There will never be a cessation of new formats or a total retirement of all old formats. - Users should be presented metadata in a consistent format. ECHO has undertaken an effort to transform its metadata ingest and discovery services in order to support the growing set of metadata formats. In order to address the previously listed items, ECHO’s new metadata processing paradigm utilizes the following approach: - Identify a cross-format set of ‘core’ metadata fields necessary for discovery. - Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. - Archive the original metadata in its entirety for presentation to users requiring the full record. - Provide on-demand translation of ‘core’ metadata to any supported result format. With this identified approach, the Earth Scientist is provided with a consistent data representation as they interact with a variety of datasets that utilize multiple metadata formats. They are then able to focus their efforts on the more critical research activities which they are undertaking.

  14. 18 CFR 385.402 - Scope of discovery (Rule 402).

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... Matters Set for Hearing Under Subpart E § 385.402 Scope of discovery (Rule 402). (a) General. Unless... Rule 410(c), participants may obtain discovery of any matter, not privileged, that is relevant to the subject matter of the pending proceeding, including the existence, description, nature, custody, condition...

  15. 18 CFR 385.402 - Scope of discovery (Rule 402).

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... Matters Set for Hearing Under Subpart E § 385.402 Scope of discovery (Rule 402). (a) General. Unless... Rule 410(c), participants may obtain discovery of any matter, not privileged, that is relevant to the subject matter of the pending proceeding, including the existence, description, nature, custody, condition...

  16. 18 CFR 385.402 - Scope of discovery (Rule 402).

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... Matters Set for Hearing Under Subpart E § 385.402 Scope of discovery (Rule 402). (a) General. Unless... Rule 410(c), participants may obtain discovery of any matter, not privileged, that is relevant to the subject matter of the pending proceeding, including the existence, description, nature, custody, condition...

  17. 18 CFR 385.402 - Scope of discovery (Rule 402).

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... Matters Set for Hearing Under Subpart E § 385.402 Scope of discovery (Rule 402). (a) General. Unless... Rule 410(c), participants may obtain discovery of any matter, not privileged, that is relevant to the subject matter of the pending proceeding, including the existence, description, nature, custody, condition...

  18. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGES

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  19. A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatography-mass spectrometry.

    PubMed

    Li, Xiao-jun; Yi, Eugene C; Kemp, Christopher J; Zhang, Hui; Aebersold, Ruedi

    2005-09-01

    There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics.

  20. Does Discovery-Based Instruction Enhance Learning?

    ERIC Educational Resources Information Center

    Alfieri, Louis; Brooks, Patricia J.; Aldrich, Naomi J.; Tenenbaum, Harriet R.

    2011-01-01

    Discovery learning approaches to education have recently come under scrutiny (Tobias & Duffy, 2009), with many studies indicating limitations to discovery learning practices. Therefore, 2 meta-analyses were conducted using a sample of 164 studies: The 1st examined the effects of unassisted discovery learning versus explicit instruction, and the…

  1. Public-private relationships in biobanking: a still underestimated key component of open innovation.

    PubMed

    Hofman, Paul; Bréchot, Christian; Zatloukal, Kurt; Dagher, Georges; Clément, Bruno

    2014-01-01

    Access to human bioresources is essential to the understanding of human diseases and to the discovery of new biomarkers aimed at improving the diagnosis, prognosis, and the predictive response of patients to treatments. The use of biospecimens is strictly controlled by ethical assessment, which complies with the laws of the country. These laws regulate the partnerships between the biobanks and industrial actors. However, private-public partnerships (PPP) can be limiting for several reasons, which can hamper the discovery of new biological tests and new active molecules targeted to human diseases. The bottlenecks and roadblocks in establishing these partnerships include: poor organization of the biobank in setting up PPP, evaluation of the cost of human samples, the absence of experience on the public side in setting up contracts with industry, and the fact that public and private partners may not share the same objectives. However, it is critical, in particular for academic biobanks, to establish strong PPP to accelerate translational research for the benefits of patients, and to allow the sustainability of the biobank. The purpose of this review is to discuss the main bottlenecks and roadblocks that can hamper the establishment of PPP based on solid and trusting relationships.

  2. Genome-Wide Methylation Analyses in Glioblastoma Multiforme

    PubMed Central

    Lai, Rose K.; Chen, Yanwen; Guan, Xiaowei; Nousome, Darryl; Sharma, Charu; Canoll, Peter; Bruce, Jeffrey; Sloan, Andrew E.; Cortes, Etty; Vonsattel, Jean-Paul; Su, Tao; Delgado-Cruzata, Lissette; Gurvich, Irina; Santella, Regina M.; Ostrom, Quinn; Lee, Annette; Gregersen, Peter; Barnholtz-Sloan, Jill

    2014-01-01

    Few studies had investigated genome-wide methylation in glioblastoma multiforme (GBM). Our goals were to study differential methylation across the genome in gene promoters using an array-based method, as well as repetitive elements using surrogate global methylation markers. The discovery sample set for this study consisted of 54 GBM from Columbia University and Case Western Reserve University, and 24 brain controls from the New York Brain Bank. We assembled a validation dataset using methylation data of 162 TCGA GBM and 140 brain controls from dbGAP. HumanMethylation27 Analysis Bead-Chips (Illumina) were used to interrogate 26,486 informative CpG sites in both the discovery and validation datasets. Global methylation levels were assessed by analysis of L1 retrotransposon (LINE1), 5 methyl-deoxycytidine (5m-dC) and 5 hydroxylmethyl-deoxycytidine (5hm-dC) in the discovery dataset. We validated a total of 1548 CpG sites (1307 genes) that were differentially methylated in GBM compared to controls. There were more than twice as many hypomethylated genes as hypermethylated ones. Both the discovery and validation datasets found 5 tumor methylation classes. Pathway analyses showed that the top ten pathways in hypomethylated genes were all related to functions of innate and acquired immunities. Among hypermethylated pathways, transcriptional regulatory network in embryonic stem cells was the most significant. In the study of global methylation markers, 5m-dC level was the best discriminant among methylation classes, whereas in survival analyses, high level of LINE1 methylation was an independent, favorable prognostic factor in the discovery dataset. Based on a pathway approach, hypermethylation in genes that control stem cell differentiation were significant, poor prognostic factors of overall survival in both the discovery and validation datasets. Approaches that targeted these methylated genes may be a future therapeutic goal. PMID:24586730

  3. 120 YEARS SINCE THE DISCOVERY OF X-RAYS.

    PubMed

    Babic, Rade R; Stankovic Babic, Gordana; Babic, Strahinja R; Babic, Nevena R

    2016-09-01

    This paper is intended to celebrate the 120th anniversary of the discovery of X-rays. X-rays (Roentgen-rays) were discovered on the 8th ofNovember, 1895 by the German physicist Wilhelm Conrad Roentgen. Fifty days after the discovery of X-ray, on December 28, 1895. Wilhelm Conrad Roentgen published a paper about the discovery of X-rays - "On a new kind of rays" (Wilhelm Conrad Roentgen: Ober eine neue Art von Strahlen. In: Sitzungsberichte der Wurzburger Physik.-Medic.- Gesellschaft. 1895.). Therefore, the date of 28th ofDecember, 1895 was taken as the date of X-rays discovery. This paper describes the work of Wilhelm Conrad Roentgen, Nikola Tesla, Mihajlo Pupin and Maria Sklodowska-Curie about the nature of X-rays . The fantastic four - Wilhelm Conrad Roentgen, NikolaTesla, Mihajlo ldvorski Pupin and Maria Sklodowska-Curie set the foundation of radiology with their discovery and study of X-rays. Five years after the discovery of X-rays, in 1900, Dr Avram Vinaver had the first X-ray machine installed in abac, in Serbia at the time when many developed countries did not have an X-ray machine and thus set the foundation of radiology in Serbia.

  4. Recent development in software and automation tools for high-throughput discovery bioanalysis.

    PubMed

    Shou, Wilson Z; Zhang, Jun

    2012-05-01

    Bioanalysis with LC-MS/MS has been established as the method of choice for quantitative determination of drug candidates in biological matrices in drug discovery and development. The LC-MS/MS bioanalytical support for drug discovery, especially for early discovery, often requires high-throughput (HT) analysis of large numbers of samples (hundreds to thousands per day) generated from many structurally diverse compounds (tens to hundreds per day) with a very quick turnaround time, in order to provide important activity and liability data to move discovery projects forward. Another important consideration for discovery bioanalysis is its fit-for-purpose quality requirement depending on the particular experiments being conducted at this stage, and it is usually not as stringent as those required in bioanalysis supporting drug development. These aforementioned attributes of HT discovery bioanalysis made it an ideal candidate for using software and automation tools to eliminate manual steps, remove bottlenecks, improve efficiency and reduce turnaround time while maintaining adequate quality. In this article we will review various recent developments that facilitate automation of individual bioanalytical procedures, such as sample preparation, MS/MS method development, sample analysis and data review, as well as fully integrated software tools that manage the entire bioanalytical workflow in HT discovery bioanalysis. In addition, software tools supporting the emerging high-resolution accurate MS bioanalytical approach are also discussed.

  5. How iSamples (Internet of Samples in the Earth Sciences) Improves Sample and Data Stewardship in the Next Generation of Geoscientists

    NASA Astrophysics Data System (ADS)

    Hallett, B. W.; Dere, A. L. D.; Lehnert, K.; Carter, M.

    2016-12-01

    Vast numbers of physical samples are routinely collected by geoscientists to probe key scientific questions related to global climate change, biogeochemical cycles, magmatic processes, mantle dynamics, etc. Despite their value as irreplaceable records of nature the majority of these samples remain undiscoverable by the broader scientific community because they lack a digital presence or are not well-documented enough to facilitate their discovery and reuse for future scientific and educational use. The NSF EarthCube iSamples Research Coordination Network seeks to develop a unified approach across all Earth Science disciplines for the registration, description, identification, and citation of physical specimens in order to take advantage of the new opportunities that cyberinfrastructure offers. Even as consensus around best practices begins to emerge, such as the use of the International Geo Sample Number (IGSN), more work is needed to communicate these practices to investigators to encourage widespread adoption. Recognizing the importance of students and early career scientists in particular to transforming data and sample management practices, the iSamples Education and Training Working Group is developing training modules for sample collection, documentation, and management workflows. These training materials are made available to educators/research supervisors online at http://earthcube.org/group/isamples and can be modularized for supervisors to create a customized research workflow. This study details the design and development of several sample management tutorials, created by early career scientists and documented in collaboration with undergraduate research students in field and lab settings. Modules under development focus on rock outcrops, rock cores, soil cores, and coral samples, with an emphasis on sample management throughout the collection, analysis and archiving process. We invite others to share their sample management/registration workflows and to develop training modules. This educational approach, with evolving digital materials, can help prepare future scientists to perform research in a way that will contribute to EarthCube data integration and discovery.

  6. Differential effects of common variants in SCN2A on general cognitive ability, brain physiology, and messenger RNA expression in schizophrenia cases and control individuals.

    PubMed

    Dickinson, Dwight; Straub, Richard E; Trampush, Joey W; Gao, Yuan; Feng, Ningping; Xie, Bin; Shin, Joo Heon; Lim, Hun Ki; Ursini, Gianluca; Bigos, Kristin L; Kolachana, Bhaskar; Hashimoto, Ryota; Takeda, Masatoshi; Baum, Graham L; Rujescu, Dan; Callicott, Joseph H; Hyde, Thomas M; Berman, Karen F; Kleinman, Joel E; Weinberger, Daniel R

    2014-06-01

    One approach to understanding the genetic complexity of schizophrenia is to study associated behavioral and biological phenotypes that may be more directly linked to genetic variation. To identify single-nucleotide polymorphisms associated with general cognitive ability (g) in people with schizophrenia and control individuals. Genomewide association study, followed by analyses in unaffected siblings and independent schizophrenia samples, functional magnetic resonance imaging studies of brain physiology in vivo, and RNA sequencing in postmortem brain samples. The discovery cohort and unaffected siblings were participants in the National Institute of Mental Health Clinical Brain Disorders Branch schizophrenia genetics studies. Additional schizophrenia cohorts were from psychiatric treatment settings in the United States, Japan, and Germany. The discovery cohort comprised 339 with schizophrenia and 363 community control participants. Follow-up analyses studied 147 unaffected siblings of the schizophrenia cases and independent schizophrenia samples including a total of an additional 668 participants. Imaging analyses included 87 schizophrenia cases and 397 control individuals. Brain tissue samples were available for 64 cases and 61 control individuals. We studied genomewide association with g, by group, in the discovery cohort. We used selected genotypes to test specific associations in unaffected siblings and independent schizophrenia samples. Imaging analyses focused on activation in the prefrontal cortex during working memory. Brain tissue studies yielded messenger RNA expression levels for RefSeq transcripts. The schizophrenia discovery cohort showed genomewide-significant association of g with polymorphisms in sodium channel gene SCN2A, accounting for 10.4% of g variance (rs10174400, P = 9.27 × 10(-10)). Control individuals showed a trend for g/genotype association with reversed allelic directionality. The genotype-by-group interaction was also genomewide significant (P = 1.75 × 10(-9)). Siblings showed a genotype association with g parallel to the schizophrenia group and the same interaction pattern. Parallel, but weaker, associations with cognition were found in independent schizophrenia samples. Imaging analyses showed a similar pattern of genotype associations by group and genotype-by-group interaction. Sequencing of RNA in brain revealed reduced expression in 2 of 3 SCN2A alternative transcripts in the patient group, with genotype-by-group interaction, that again paralleled the cognition effects. The findings implicate SCN2A and sodium channel biology in cognitive impairment in schizophrenia cases and unaffected relatives and may facilitate development of cognition-enhancing treatments.

  7. Discovering discovery patterns with Predication-based Semantic Indexing.

    PubMed

    Cohen, Trevor; Widdows, Dominic; Schvaneveldt, Roger W; Davies, Peter; Rindflesch, Thomas C

    2012-12-01

    In this paper we utilize methods of hyperdimensional computing to mediate the identification of therapeutically useful connections for the purpose of literature-based discovery. Our approach, named Predication-based Semantic Indexing, is utilized to identify empirically sequences of relationships known as "discovery patterns", such as "drug x INHIBITS substance y, substance y CAUSES disease z" that link pharmaceutical substances to diseases they are known to treat. These sequences are derived from semantic predications extracted from the biomedical literature by the SemRep system, and subsequently utilized to direct the search for known treatments for a held out set of diseases. Rapid and efficient inference is accomplished through the application of geometric operators in PSI space, allowing for both the derivation of discovery patterns from a large set of known TREATS relationships, and the application of these discovered patterns to constrain search for therapeutic relationships at scale. Our results include the rediscovery of discovery patterns that have been constructed manually by other authors in previous research, as well as the discovery of a set of previously unrecognized patterns. The application of these patterns to direct search through PSI space results in better recovery of therapeutic relationships than is accomplished with models based on distributional statistics alone. These results demonstrate the utility of efficient approximate inference in geometric space as a means to identify therapeutic relationships, suggesting a role of these methods in drug repurposing efforts. In addition, the results provide strong support for the utility of the discovery pattern approach pioneered by Hristovski and his colleagues. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Discovering discovery patterns with predication-based Semantic Indexing

    PubMed Central

    Cohen, Trevor; Widdows, Dominic; Schvaneveldt, Roger W.; Davies, Peter; Rindflesch, Thomas C.

    2012-01-01

    In this paper we utilize methods of hyperdimensional computing to mediate the identification of therapeutically useful connections for the purpose of literature-based discovery. Our approach, named Predication-based Semantic Indexing, is utilized to identify empirically sequences of relationships known as “discovery patterns”, such as “drug x INHIBITS substance y, substance y CAUSES disease z” that link pharmaceutical substances to diseases they are known to treat. These sequences are derived from semantic predications extracted from the biomedical literature by the SemRep system, and subsequently utilized to direct the search for known treatments for a held out set of diseases. Rapid and efficient inference is accomplished through the application of geometric operators in PSI space, allowing for both the derivation of discovery patterns from a large set of known TREATS relationships, and the application of these discovered patterns to constrain search for therapeutic relationships at scale. Our results include the rediscovery of discovery patterns that have been constructed manually by other authors in previous research, as well as the discovery of a set of previously unrecognized patterns. The application of these patterns to direct search through PSI space results in better recovery of therapeutic relationships than is accomplished with models based on distributional statistics alone. These results demonstrate the utility of efficient approximate inference in geometric space as a means to identify therapeutic relationships, suggesting a role of these methods in drug repurposing efforts. In addition, the results provide strong support for the utility of the discovery pattern approach pioneered by Hristovski and his colleagues. PMID:22841748

  9. Immunophenotype Discovery, Hierarchical Organization, and Template-Based Classification of Flow Cytometry Samples

    DOE PAGES

    Azad, Ariful; Rajwa, Bartek; Pothen, Alex

    2016-08-31

    We describe algorithms for discovering immunophenotypes from large collections of flow cytometry samples and using them to organize the samples into a hierarchy based on phenotypic similarity. The hierarchical organization is helpful for effective and robust cytometry data mining, including the creation of collections of cell populations’ characteristic of different classes of samples, robust classification, and anomaly detection. We summarize a set of samples belonging to a biological class or category with a statistically derived template for the class. Whereas individual samples are represented in terms of their cell populations (clusters), a template consists of generic meta-populations (a group ofmore » homogeneous cell populations obtained from the samples in a class) that describe key phenotypes shared among all those samples. We organize an FC data collection in a hierarchical data structure that supports the identification of immunophenotypes relevant to clinical diagnosis. A robust template-based classification scheme is also developed, but our primary focus is in the discovery of phenotypic signatures and inter-sample relationships in an FC data collection. This collective analysis approach is more efficient and robust since templates describe phenotypic signatures common to cell populations in several samples while ignoring noise and small sample-specific variations. We have applied the template-based scheme to analyze several datasets, including one representing a healthy immune system and one of acute myeloid leukemia (AML) samples. The last task is challenging due to the phenotypic heterogeneity of the several subtypes of AML. However, we identified thirteen immunophenotypes corresponding to subtypes of AML and were able to distinguish acute promyelocytic leukemia (APL) samples with the markers provided. Clinically, this is helpful since APL has a different treatment regimen from other subtypes of AML. Core algorithms used in our data analysis are available in the flowMatch package at www.bioconductor.org. It has been downloaded nearly 6,000 times since 2014.« less

  10. Immunophenotype Discovery, Hierarchical Organization, and Template-Based Classification of Flow Cytometry Samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Azad, Ariful; Rajwa, Bartek; Pothen, Alex

    We describe algorithms for discovering immunophenotypes from large collections of flow cytometry samples and using them to organize the samples into a hierarchy based on phenotypic similarity. The hierarchical organization is helpful for effective and robust cytometry data mining, including the creation of collections of cell populations’ characteristic of different classes of samples, robust classification, and anomaly detection. We summarize a set of samples belonging to a biological class or category with a statistically derived template for the class. Whereas individual samples are represented in terms of their cell populations (clusters), a template consists of generic meta-populations (a group ofmore » homogeneous cell populations obtained from the samples in a class) that describe key phenotypes shared among all those samples. We organize an FC data collection in a hierarchical data structure that supports the identification of immunophenotypes relevant to clinical diagnosis. A robust template-based classification scheme is also developed, but our primary focus is in the discovery of phenotypic signatures and inter-sample relationships in an FC data collection. This collective analysis approach is more efficient and robust since templates describe phenotypic signatures common to cell populations in several samples while ignoring noise and small sample-specific variations. We have applied the template-based scheme to analyze several datasets, including one representing a healthy immune system and one of acute myeloid leukemia (AML) samples. The last task is challenging due to the phenotypic heterogeneity of the several subtypes of AML. However, we identified thirteen immunophenotypes corresponding to subtypes of AML and were able to distinguish acute promyelocytic leukemia (APL) samples with the markers provided. Clinically, this is helpful since APL has a different treatment regimen from other subtypes of AML. Core algorithms used in our data analysis are available in the flowMatch package at www.bioconductor.org. It has been downloaded nearly 6,000 times since 2014.« less

  11. Identification of potential serum peptide biomarkers of biliary tract cancer using MALDI MS profiling

    PubMed Central

    2014-01-01

    Background The aim of this discovery study was the identification of peptide serum biomarkers for detecting biliary tract cancer (BTC) using samples from healthy volunteers and benign cases of biliary disease as control groups. This work was based on the hypothesis that cancer-specific exopeptidases exist and that their activities in serum can generate cancer-predictive peptide fragments from circulating proteins during coagulation. Methods This case control study used a semi-automated platform incorporating polypeptide extraction linked to matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) to profile 92 patient serum samples. Predictive models were generated to test a validation serum set from BTC cases and healthy volunteers. Results Several peptide peaks were found that could significantly differentiate BTC patients from healthy controls and benign biliary disease. A predictive model resulted in a sensitivity of 100% and a specificity of 93.8% in detecting BTC in the validation set, whilst another model gave a sensitivity of 79.5% and a specificity of 83.9% in discriminating BTC from benign biliary disease samples in the training set. Discriminatory peaks were identified by tandem MS as fragments of abundant clotting proteins. Conclusions Serum MALDI MS peptide signatures can accurately discriminate patients with BTC from healthy volunteers. PMID:24495412

  12. Discrete False-Discovery Rate Improves Identification of Differentially Abundant Microbes.

    PubMed

    Jiang, Lingjing; Amir, Amnon; Morton, James T; Heller, Ruth; Arias-Castro, Ery; Knight, Rob

    2017-01-01

    Differential abundance testing is a critical task in microbiome studies that is complicated by the sparsity of data matrices. Here we adapt for microbiome studies a solution from the field of gene expression analysis to produce a new method, discrete false-discovery rate (DS-FDR), that greatly improves the power to detect differential taxa by exploiting the discreteness of the data. Additionally, DS-FDR is relatively robust to the number of noninformative features, and thus removes the problem of filtering taxonomy tables by an arbitrary abundance threshold. We show by using a combination of simulations and reanalysis of nine real-world microbiome data sets that this new method outperforms existing methods at the differential abundance testing task, producing a false-discovery rate that is up to threefold more accurate, and halves the number of samples required to find a given difference (thus increasing the efficiency of microbiome experiments considerably). We therefore expect DS-FDR to be widely applied in microbiome studies. IMPORTANCE DS-FDR can achieve higher statistical power to detect significant findings in sparse and noisy microbiome data compared to the commonly used Benjamini-Hochberg procedure and other FDR-controlling procedures.

  13. A concept of a MIABIS based register of biosample collections at the Medical University of Innsbruck.

    PubMed

    Hofer, Philipp; Fiegl, Heidi; Angerer, Justina; Mueller-Holzner, Elisabeth; Chamson, Martina; Klocker, Helmut; Steiner, Eberhardt; Hauffe, Helga; Zschocke, Johannes; Goebel, Georg

    2014-01-01

    The knowledge about the quality of samples and associated clinical data in biospecimen collections is a premise of clinical research. An electronic biosample register aims to facilitate the discovery of information about biosample collections in a hospital. Moreover, it might improve scientific collaboration and research quality through a shared access to harmonized sample collection description data. The aim of this paper is to present a concept of a web-based biosample register of the existing biosample collections at the Medical University of Innsbruck. A uniform description model is built based on an analysis of the sample collection data of independent sample management systems from two departments within the hospital. An extended set of attributes of the minimum dataset used by the Swedish sample collection register (MIABIS) has been applied to all biosample collections as a common description model. The results of the analysis and the data model are presented together with a first concept of a sample collection search register.

  14. Effective use of metadata in the integration and analysis of multi-dimensional optical data

    NASA Astrophysics Data System (ADS)

    Pastorello, G. Z.; Gamon, J. A.

    2012-12-01

    Data discovery and integration relies on adequate metadata. However, creating and maintaining metadata is time consuming and often poorly addressed or avoided altogether, leading to problems in later data analysis and exchange. This is particularly true for research fields in which metadata standards do not yet exist or are under development, or within smaller research groups without enough resources. Vegetation monitoring using in-situ and remote optical sensing is an example of such a domain. In this area, data are inherently multi-dimensional, with spatial, temporal and spectral dimensions usually being well characterized. Other equally important aspects, however, might be inadequately translated into metadata. Examples include equipment specifications and calibrations, field/lab notes and field/lab protocols (e.g., sampling regimen, spectral calibration, atmospheric correction, sensor view angle, illumination angle), data processing choices (e.g., methods for gap filling, filtering and aggregation of data), quality assurance, and documentation of data sources, ownership and licensing. Each of these aspects can be important as metadata for search and discovery, but they can also be used as key data fields in their own right. If each of these aspects is also understood as an "extra dimension," it is possible to take advantage of them to simplify the data acquisition, integration, analysis, visualization and exchange cycle. Simple examples include selecting data sets of interest early in the integration process (e.g., only data collected according to a specific field sampling protocol) or applying appropriate data processing operations to different parts of a data set (e.g., adaptive processing for data collected under different sky conditions). More interesting scenarios involve guided navigation and visualization of data sets based on these extra dimensions, as well as partitioning data sets to highlight relevant subsets to be made available for exchange. The DAX (Data Acquisition to eXchange) Web-based tool uses a flexible metadata representation model and takes advantage of multi-dimensional data structures to translate metadata types into data dimensions, effectively reshaping data sets according to available metadata. With that, metadata is tightly integrated into the acquisition-to-exchange cycle, allowing for more focused exploration of data sets while also increasing the value of, and incentives for, keeping good metadata. The tool is being developed and tested with optical data collected in different settings, including laboratory, field, airborne, and satellite platforms.

  15. Multiplexing of miniaturized planar antibody arrays for serum protein profiling--a biomarker discovery in SLE nephritis.

    PubMed

    Petersson, Linn; Dexlin-Mellby, Linda; Bengtsson, Anders A; Sturfelt, Gunnar; Borrebaeck, Carl A K; Wingren, Christer

    2014-06-07

    In the quest to decipher disease-associated biomarkers, miniaturized and multiplexed antibody arrays may play a central role in generating protein expression profiles, or protein maps, of crude serum samples. In this conceptual study, we explored a novel, 4-times larger pen design, enabling us to, in a unique manner, simultaneously print 48 different reagents (antibodies) as individual 78.5 μm(2) (10 μm in diameter) sized spots at a density of 38,000 spots cm(-2) using dip-pen nanolithography technology. The antibody array set-up was interfaced with a high-resolution fluorescent-based scanner for sensitive sensing. The performance and applicability of this novel 48-plex recombinant antibody array platform design was demonstrated in a first clinical application targeting SLE nephritis, a severe chronic autoimmune connective tissue disorder, as the model disease. To this end, crude, directly biotinylated serum samples were targeted. The results showed that the miniaturized and multiplexed array platform displayed adequate performance, and that SLE-associated serum biomarker panels reflecting the disease process could be deciphered, outlining the use of miniaturized antibody arrays for disease proteomics and biomarker discovery.

  16. Biomarker Development for Intraductal Papillary Mucinous Neoplasms Using Multiple Reaction Monitoring Mass Spectrometry.

    PubMed

    Kim, Yikwon; Kang, MeeJoo; Han, Dohyun; Kim, Hyunsoo; Lee, KyoungBun; Kim, Sun-Whe; Kim, Yongkang; Park, Taesung; Jang, Jin-Young; Kim, Youngsoo

    2016-01-04

    Intraductal papillary mucinous neoplasm (IPMN) is a common precursor of pancreatic cancer (PC). Much clinical attention has been directed toward IPMNs due to the increase in the prevalence of PC. The diagnosis of IPMN depends primarily on a radiological examination, but the diagnostic accuracy of this tool is not satisfactory, necessitating the development of accurate diagnostic biomarkers for IPMN to prevent PC. Recently, high-throughput targeted proteomic quantification methods have accelerated the discovery of biomarkers, rendering them powerful platforms for the evolution of IPMN diagnostic biomarkers. In this study, a robust multiple reaction monitoring (MRM) pipeline was applied to discovery and verify IPMN biomarker candidates in a large cohort of plasma samples. Through highly reproducible MRM assays and a stringent statistical analysis, 11 proteins were selected as IPMN marker candidates with high confidence in 184 plasma samples, comprising a training (n = 84) and test set (n = 100). To improve the discriminatory power, we constructed a six-protein panel by combining marker candidates. The multimarker panel had high discriminatory power in distinguishing between IPMN and controls, including other benign diseases. Consequently, the diagnostic accuracy of IPMN can be improved dramatically with this novel plasma-based panel in combination with a radiological examination.

  17. Promise Fulfilled? An EBSCO Discovery Service Usability Study

    ERIC Educational Resources Information Center

    Williams, Sarah C.; Foster, Anita K.

    2011-01-01

    Discovery tools are the next phase of library search systems. Illinois State University's Milner Library implemented EBSCO Discovery Service in August 2010. The authors conducted usability studies on the system in the fall of 2010. The aims of the study were twofold: first, to determine how Milner users set about using the system in order to…

  18. Rembrandt: Helping Personalized Medicine Become a Reality Through Integrative Translational Research

    PubMed Central

    Madhavan, Subha; Zenklusen, Jean-Claude; Kotliarov, Yuri; Sahni, Himanso; Fine, Howard A.; Buetow, Kenneth

    2009-01-01

    Finding better therapies for the treatment of brain tumors is hampered by the lack of consistently obtained molecular data in a large sample set, and ability to integrate biomedical data from disparate sources enabling translation of therapies from bench to bedside. Hence, a critical factor in the advancement of biomedical research and clinical translation is the ease with which data can be integrated, redistributed and analyzed both within and across functional domains. Novel biomedical informatics infrastructure and tools are essential for developing individualized patient treatment based on the specific genomic signatures in each patient’s tumor. Here we present Rembrandt, Repository of Molecular BRAin Neoplasia DaTa, a cancer clinical genomics database and a web-based data mining and analysis platform aimed at facilitating discovery by connecting the dots between clinical information and genomic characterization data. To date, Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising nearly 566 gene expression arrays, 834 copy number arrays and 13,472 clinical phenotype data points. Data can be queried and visualized for a selected gene across all data platforms or for multiple genes in a selected platform. Additionally, gene sets can be limited to clinically important annotations including secreted, kinase, membrane, and known gene-anomaly pairs to facilitate the discovery of novel biomarkers and therapeutic targets. We believe that REMBRANDT represents a prototype of how high throughput genomic and clinical data can be integrated in a way that will allow expeditious and efficient translation of laboratory discoveries to the clinic. PMID:19208739

  19. To the Geoportal and Beyond! Preparing the Earth Observing Laboratory's Datasets for Inter-Repository Discovery

    NASA Astrophysics Data System (ADS)

    Gordon, S.; Dattore, E.; Williams, S.

    2014-12-01

    Even when a data center makes it's datasets accessible, they can still be hard to discover if the user is unaware of the laboratory or organization the data center supports. NCAR's Earth Observing Laboratory (EOL) is no exception. In response to this problem and as an inquiry into the feasibility of inter-connecting all of NCAR's repositories at a discovery layer, ESRI's Geoportal was researched. It was determined that an implementation of Geoportal would be a good choice to build a proof of concept model of inter-repository discovery around. This collaborative project between the University of Illinois and NCAR is coordinated through the Data Curation Education in Research Centers program. This program is funded by the Institute of Museum and Library Services.Geoportal is open source software. It serves as an aggregation point for metadata catalogs of earth science datasets, with a focus on geospatial information. EOL's metadata is in static THREDDS catalogs. Geoportal can only create records from a THREDDS Data Server. The first step was to make EOL metadata more accessible by utilizing the ISO 19115-2 standard. It was also decided to create DIF records so EOL datasets could be ingested in NASA's Global Change Master Directory (GCMD). To offer records for harvest, it was decided to develop an OAI-PMH server. To make a compliant server, the OAI_DC standard was also implemented. A server was written in Perl to serve a set of static records. We created a sample set of records in ISO 19115-2, FGDC, DIF, and OAI_DC. We utilized GCMD shared vocabularies to enhance discoverability and precision. The proof of concept was tested and verified by having another NCAR laboratory's Geoportal harvest our sample set. To prepare for production, templates for each standard were developed and mapped to the database. These templates will help the automated creation of records. Once the OAI-PMH server is re-written in a Grails framework a dynamic representation of EOL's metadata will be available for harvest. EOL will need to develop an implementation of a Geoportal and point GCMD to the OAI-PMH server. We will also seek out partnerships with other earth science and related discipline repositories that can communicate by OAI-PMH or Geoportal so that the scientific community will benefit from more discoverable data.

  20. The Discovery of a Class of High-Temperature Superconductors.

    ERIC Educational Resources Information Center

    Muller, K. Alex; Bednorz, J. Georg

    1987-01-01

    Describes the new class of oxide superconductors, the importance of these materials, and the concepts that led to its discovery. Summarizes the discovery itself and its early confirmation. Discusses the observation of a superconductive glass state in percolative samples. (TW)

  1. discovery toolset for Emulytics v. 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fritz, David; Crussell, Jonathan

    The discovery toolset for Emulytics enables the construction of high-fidelity emulation models of systems. The toolset consists of a set of tools and techniques to automatically go from network discovery of operational systems to emulating those complex systems. Our toolset combines data from host discovery and network mapping tools into an intermediate representation that can then be further refined. Once the intermediate representation reaches the desired state, our toolset supports emitting the Emulytics models with varying levels of specificity based on experiment needs.

  2. Accurate clinical detection of exon copy number variants in a targeted NGS panel using DECoN.

    PubMed

    Fowler, Anna; Mahamdallie, Shazia; Ruark, Elise; Seal, Sheila; Ramsay, Emma; Clarke, Matthew; Uddin, Imran; Wylie, Harriet; Strydom, Ann; Lunter, Gerton; Rahman, Nazneen

    2016-11-25

    Background: Targeted next generation sequencing (NGS) panels are increasingly being used in clinical genomics to increase capacity, throughput and affordability of gene testing. Identifying whole exon deletions or duplications (termed exon copy number variants, 'exon CNVs') in exon-targeted NGS panels has proved challenging, particularly for single exon CNVs.  Methods: We developed a tool for the Detection of Exon Copy Number variants (DECoN), which is optimised for analysis of exon-targeted NGS panels in the clinical setting. We evaluated DECoN performance using 96 samples with independently validated exon CNV data. We performed simulations to evaluate DECoN detection performance of single exon CNVs and to evaluate performance using different coverage levels and sample numbers. Finally, we implemented DECoN in a clinical laboratory that tests BRCA1 and BRCA2 with the TruSight Cancer Panel (TSCP). We used DECoN to analyse 1,919 samples, validating exon CNV detections by multiplex ligation-dependent probe amplification (MLPA).  Results: In the evaluation set, DECoN achieved 100% sensitivity and 99% specificity for BRCA exon CNVs, including identification of 8 single exon CNVs. DECoN also identified 14/15 exon CNVs in 8 other genes. Simulations of all possible BRCA single exon CNVs gave a mean sensitivity of 98% for deletions and 95% for duplications. DECoN performance remained excellent with different levels of coverage and sample numbers; sensitivity and specificity was >98% with the typical NGS run parameters. In the clinical pipeline, DECoN automatically analyses pools of 48 samples at a time, taking 24 minutes per pool, on average. DECoN detected 24 BRCA exon CNVs, of which 23 were confirmed by MLPA, giving a false discovery rate of 4%. Specificity was 99.7%.  Conclusions: DECoN is a fast, accurate, exon CNV detection tool readily implementable in research and clinical NGS pipelines. It has high sensitivity and specificity and acceptable false discovery rate. DECoN is freely available at www.icr.ac.uk/decon.

  3. The influence of locus number and information content on species delimitation: an empirical test case in an endangered Mexican salamander.

    PubMed

    Hime, Paul M; Hotaling, Scott; Grewelle, Richard E; O'Neill, Eric M; Voss, S Randal; Shaffer, H Bradley; Weisrock, David W

    2016-12-01

    Perhaps the most important recent advance in species delimitation has been the development of model-based approaches to objectively diagnose species diversity from genetic data. Additionally, the growing accessibility of next-generation sequence data sets provides powerful insights into genome-wide patterns of divergence during speciation. However, applying complex models to large data sets is time-consuming and computationally costly, requiring careful consideration of the influence of both individual and population sampling, as well as the number and informativeness of loci on species delimitation conclusions. Here, we investigated how locus number and information content affect species delimitation results for an endangered Mexican salamander species, Ambystoma ordinarium. We compared results for an eight-locus, 137-individual data set and an 89-locus, seven-individual data set. For both data sets, we used species discovery methods to define delimitation models and species validation methods to rigorously test these hypotheses. We also used integrated demographic model selection tools to choose among delimitation models, while accounting for gene flow. Our results indicate that while cryptic lineages may be delimited with relatively few loci, sampling larger numbers of loci may be required to ensure that enough informative loci are available to accurately identify and validate shallow-scale divergences. These analyses highlight the importance of striking a balance between dense sampling of loci and individuals, particularly in shallowly diverged lineages. They also suggest the presence of a currently unrecognized, endangered species in the western part of A. ordinarium's range. © 2016 John Wiley & Sons Ltd.

  4. Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti.

    PubMed

    Rašić, Gordana; Filipović, Igor; Weeks, Andrew R; Hoffmann, Ary A

    2014-04-11

    Genetic markers are widely used to understand the biology and population dynamics of disease vectors, but often markers are limited in the resolution they provide. In particular, the delineation of population structure, fine scale movement and patterns of relatedness are often obscured unless numerous markers are available. To address this issue in the major arbovirus vector, the yellow fever mosquito (Aedes aegypti), we used double digest Restriction-site Associated DNA (ddRAD) sequencing for the discovery of genome-wide single nucleotide polymorphisms (SNPs). We aimed to characterize the new SNP set and to test the resolution against previously described microsatellite markers in detecting broad and fine-scale genetic patterns in Ae. aegypti. We developed bioinformatics tools that support the customization of restriction enzyme-based protocols for SNP discovery. We showed that our approach for RAD library construction achieves unbiased genome representation that reflects true evolutionary processes. In Ae. aegypti samples from three continents we identified more than 18,000 putative SNPs. They were widely distributed across the three Ae. aegypti chromosomes, with 47.9% found in intergenic regions and 17.8% in exons of over 2,300 genes. Pattern of their imputed effects in ORFs and UTRs were consistent with those found in a recent transcriptome study. We demonstrated that individual mosquitoes from Indonesia, Australia, Vietnam and Brazil can be assigned with a very high degree of confidence to their region of origin using a large SNP panel. We also showed that familial relatedness of samples from a 0.4 km2 area could be confidently established with a subset of SNPs. Using a cost-effective customized RAD sequencing approach supported by our bioinformatics tools, we characterized over 18,000 SNPs in field samples of the dengue fever mosquito Ae. aegypti. The variants were annotated and positioned onto the three Ae. aegypti chromosomes. The new SNP set provided much greater resolution in detecting population structure and estimating fine-scale relatedness than a set of polymorphic microsatellites. RAD-based markers demonstrate great potential to advance our understanding of mosquito population processes, critical for implementing new control measures against this major disease vector.

  5. Simultaneous Proteomic Discovery and Targeted Monitoring using Liquid Chromatography, Ion Mobility Spectrometry, and Mass Spectrometry*

    PubMed Central

    Burnum-Johnson, Kristin E.; Nie, Song; Casey, Cameron P.; Monroe, Matthew E.; Orton, Daniel J.; Ibrahim, Yehia M.; Gritsenko, Marina A.; Clauss, Therese R. W.; Shukla, Anil K.; Moore, Ronald J.; Purvine, Samuel O.; Shi, Tujin; Qian, Weijun; Liu, Tao; Baker, Erin S.; Smith, Richard D.

    2016-01-01

    Current proteomic approaches include both broad discovery measurements and quantitative targeted analyses. In many cases, discovery measurements are initially used to identify potentially important proteins (e.g. candidate biomarkers) and then targeted studies are employed to quantify a limited number of selected proteins. Both approaches, however, suffer from limitations. Discovery measurements aim to sample the whole proteome but have lower sensitivity, accuracy, and quantitation precision than targeted approaches, whereas targeted measurements are significantly more sensitive but only sample a limited portion of the proteome. Herein, we describe a new approach that performs both discovery and targeted monitoring (DTM) in a single analysis by combining liquid chromatography, ion mobility spectrometry and mass spectrometry (LC-IMS-MS). In DTM, heavy labeled target peptides are spiked into tryptic digests and both the labeled and unlabeled peptides are detected using LC-IMS-MS instrumentation. Compared with the broad LC-MS discovery measurements, DTM yields greater peptide/protein coverage and detects lower abundance species. DTM also achieved detection limits similar to selected reaction monitoring (SRM) indicating its potential for combined high quality discovery and targeted analyses, which is a significant step toward the convergence of discovery and targeted approaches. PMID:27670688

  6. Returning Samples from Enceladus

    NASA Astrophysics Data System (ADS)

    Tsou, P.; Kanik, I.; Brownlee, D.; McKay, C.; Anbar, A.; Glavin, D.; Yano, H.

    2012-12-01

    From the first half century of space exploration, we have obtained samples only from the Moon, comet Wild 2, the Solar Wind and the asteroid Itokawa. The in-depth analyses of these samples in terrestrial laboratories have yielded profound knowledge that could not have been obtained without the returned samples. While obtaining samples from Solar System bodies is crucial science, it is rarely done due to cost and complexity. Cassini's discovery of geysers on Enceladus and organic materials, indicate that there is an exceptional opportunity and science rational to do a low-cost flyby sample return mission, similar to what was done by the Stardust. The earliest low cost possible flight opportunity is the next Discovery Mission [Tsou et al 2012]. Enceladus Plume Discovery - While Voyager provided evidence for young surfaces on Enceladus, the existence of Enceladus plumes was discovered by Cassini. Enceladus and comets are the only known solar system bodies that have jets enabling sample collection without landing or surface contact. Cassini in situ Findings -Cassini's made many discoveries at Saturn, including the break up of large organics in the plumes of Enceladus. Four prime criteria for habitability are liquid water, a heat source, organics and nitrogen [McKay et al. 2008, Waite et al. 2009, Postberg et al. 2011]. Out of all the NASA designated habitability targets, Enceladus is the single body that presents evidence for all four criteria. Significant advancement in the exploration of the biological potential of Enceladus can be made on returned samples in terrestrial laboratories where the full power of state-of-the-art laboratory instrumentation and procedures can be used. Without serious limits on power, mass or even cost, terrestrial laboratories provide the ultimate in analytical capability, adaptability, reproducibility and reliability. What Questions can Samples Address? - Samples collected from the Enceladus plume will enable a thorough and replicated search for chemical biosignatures to understand the habitability potential of the subsurface ocean of Enceladus [Glavin et al. 2011]. By assessing the chiral excess among different amino acids, identifying chains of amino acids, isolate distinct sequences of these chains and the same for nucleic acids, we can formulate a new set of hypotheses to address some of the key science questions required for investigating the stage of extraterrestrial life at Enceladus beyond the four factors of habitability. Criticality of Analyses - For extraterrestrial organic matter analyses such as chirality and compound-specific isotopes, the repeatable robustness of laboratory measurements is a necessity. These analyses require a series of chemical extraction and derivatization steps prior to analysis that is adapted to the sample and procedures results-driven. The Stardust mission is an excellent example of the challenges in the analysis of organics. Confirmation of the cometary origin of the amino acid glycine from comet Wild 2 was obtained 3 years after the samples were returned to Earth. This long period of laboratory development allowed several modifications to the extraction protocol, multiple analytical techniques and instrumentations. Reference: Tsou et al., Astrobiology, in press 2012. McKay et al. Astrobiology 2008. Waite et al. Nature V 460 I 7254, 2009. Postberg et al. EPSC 642P 2011. Glavin et al., LPSC, #5002, 2011.

  7. Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Begoli, Edmon; Dunning, Ted; Charlie, Frasure

    We present a service platform for schema-leess exploration of data and discovery of patient-related statistics from healthcare data sets. The architecture of this platform is motivated by the need for fast, schema-less, and flexible approaches to SQL-based exploration and discovery of information embedded in the common, heterogeneously structured healthcare data sets and supporting components (electronic health records, practice management systems, etc.) The motivating use cases described in the paper are clinical trials candidate discovery, and a treatment effectiveness analysis. Following the use cases, we discuss the key features and software architecture of the platform, the underlying core components (Apache Parquet,more » Drill, the web services server), and the runtime profiles and performance characteristics of the platform. We conclude by showing dramatic speedup with some approaches, and the performance tradeoffs and limitations of others.« less

  8. Prognostic Effect of Tumor Lymphocytic Infiltration in Resectable Non–Small-Cell Lung Cancer

    PubMed Central

    Le Teuff, Gwénaël; Marguet, Sophie; Lantuejoul, Sylvie; Dunant, Ariane; Graziano, Stephen; Pirker, Robert; Douillard, Jean-Yves; Le Chevalier, Thierry; Filipits, Martin; Rosell, Rafael; Kratzke, Robert; Popper, Helmut; Soria, Jean-Charles; Shepherd, Frances A.; Seymour, Lesley; Tsao, Ming Sound

    2016-01-01

    Purpose Tumor lymphocytic infiltration (TLI) has differing prognostic value among various cancers. The objective of this study was to assess the effect of TLI in lung cancer. Patients and Methods A discovery set (one trial, n = 824) and a validation set (three trials, n = 984) that evaluated the benefit of platinum-based adjuvant chemotherapy in non–small-cell lung cancer were used as part of the LACE-Bio (Lung Adjuvant Cisplatin Evaluation Biomarker) study. TLI was defined as intense versus nonintense. The main end point was overall survival (OS); secondary end points were disease-free survival (DFS) and specific DFS (SDFS). Hazard ratios (HRs) and 95% CIs associated with TLI were estimated through a multivariable Cox model in both sets. TLI-histology and TLI-treatment interactions were explored in the combined set. Results Discovery and validation sets with complete data included 783 (409 deaths) and 763 (344 deaths) patients, respectively. Median follow-up was 4.8 and 6 years, respectively. TLI was intense in 11% of patients in the discovery set compared with 6% in the validation set (P < .001). The prognostic value of TLI in the discovery set (OS: HR, 0.56; 95% CI, 0.38 to 0.81; P = .002; DFS: HR, 0.59; 95% CI, 0.42 to 0.83; P = .002; SDFS: HR, 0.56; 95% CI, 0.38 to 0.82; P = .003) was confirmed in the validation set (OS: HR, 0.45; 95% CI, 0.23 to 0.85; P = .01; DFS: HR, 0.44; 95% CI, 0.24 to 0.78; P = .005; SDFS: HR, 0.42; 95% CI, 0.22 to 0.80; P = .008) with no heterogeneity across trials (P ≥ .38 for all end points). No significant predictive effect was observed for TLI (P ≥ .78 for all end points). Conclusion Intense lymphocytic infiltration, found in a minority of tumors, was validated as a favorable prognostic marker for survival in resected non–small-cell lung cancer. PMID:26834066

  9. Systematic assessment of survey scan and MS2-based abundance strategies for label-free quantitative proteomics using high-resolution MS data.

    PubMed

    Tu, Chengjian; Li, Jun; Sheng, Quanhu; Zhang, Ming; Qu, Jun

    2014-04-04

    Survey-scan-based label-free method have shown no compelling benefit over fragment ion (MS2)-based approaches when low-resolution mass spectrometry (MS) was used, the growing prevalence of high-resolution analyzers may have changed the game. This necessitates an updated, comparative investigation of these approaches for data acquired by high-resolution MS. Here, we compared survey scan-based (ion current, IC) and MS2-based abundance features including spectral-count (SpC) and MS2 total-ion-current (MS2-TIC), for quantitative analysis using various high-resolution LC/MS data sets. Key discoveries include: (i) study with seven different biological data sets revealed only IC achieved high reproducibility for lower-abundance proteins; (ii) evaluation with 5-replicate analyses of a yeast sample showed IC provided much higher quantitative precision and lower missing data; (iii) IC, SpC, and MS2-TIC all showed good quantitative linearity (R(2) > 0.99) over a >1000-fold concentration range; (iv) both MS2-TIC and IC showed good linear response to various protein loading amounts but not SpC; (v) quantification using a well-characterized CPTAC data set showed that IC exhibited markedly higher quantitative accuracy, higher sensitivity, and lower false-positives/false-negatives than both SpC and MS2-TIC. Therefore, IC achieved an overall superior performance than the MS2-based strategies in terms of reproducibility, missing data, quantitative dynamic range, quantitative accuracy, and biomarker discovery.

  10. Systematic Assessment of Survey Scan and MS2-Based Abundance Strategies for Label-Free Quantitative Proteomics Using High-Resolution MS Data

    PubMed Central

    2015-01-01

    Survey-scan-based label-free method have shown no compelling benefit over fragment ion (MS2)-based approaches when low-resolution mass spectrometry (MS) was used, the growing prevalence of high-resolution analyzers may have changed the game. This necessitates an updated, comparative investigation of these approaches for data acquired by high-resolution MS. Here, we compared survey scan-based (ion current, IC) and MS2-based abundance features including spectral-count (SpC) and MS2 total-ion-current (MS2-TIC), for quantitative analysis using various high-resolution LC/MS data sets. Key discoveries include: (i) study with seven different biological data sets revealed only IC achieved high reproducibility for lower-abundance proteins; (ii) evaluation with 5-replicate analyses of a yeast sample showed IC provided much higher quantitative precision and lower missing data; (iii) IC, SpC, and MS2-TIC all showed good quantitative linearity (R2 > 0.99) over a >1000-fold concentration range; (iv) both MS2-TIC and IC showed good linear response to various protein loading amounts but not SpC; (v) quantification using a well-characterized CPTAC data set showed that IC exhibited markedly higher quantitative accuracy, higher sensitivity, and lower false-positives/false-negatives than both SpC and MS2-TIC. Therefore, IC achieved an overall superior performance than the MS2-based strategies in terms of reproducibility, missing data, quantitative dynamic range, quantitative accuracy, and biomarker discovery. PMID:24635752

  11. Autoantibodies to MUC1 glycopeptides cannot be used as a screening assay for early detection of breast, ovarian, lung or pancreatic cancer

    PubMed Central

    Burford, B; Gentry-Maharaj, A; Graham, R; Allen, D; Pedersen, J W; Nudelman, A S; Blixt, O; Fourkala, E O; Bueti, D; Dawnay, A; Ford, J; Desai, R; David, L; Trinder, P; Acres, B; Schwientek, T; Gammerman, A; Reis, C A; Silva, L; Osório, H; Hallett, R; Wandall, H H; Mandel, U; Hollingsworth, M A; Jacobs, I; Fentiman, I; Clausen, H; Taylor-Papadimitriou, J; Menon, U; Burchell, J M

    2013-01-01

    Background: Autoantibodies have been detected in sera before diagnosis of cancer leading to interest in their potential as screening/early detection biomarkers. As we have found autoantibodies to MUC1 glycopeptides to be elevated in early-stage breast cancer patients, in this study we analysed these autoantibodies in large population cohorts of sera taken before cancer diagnosis. Methods: Serum samples from women who subsequently developed breast cancer, and aged-matched controls, were identified from UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) and Guernsey serum banks to formed discovery and validation sets. These were screened on a microarray platform of 60mer MUC1 glycopeptides and recombinant MUC1 containing 16 tandem repeats. Additional case–control sets comprised of women who subsequently developed ovarian, pancreatic and lung cancer were also screened on the arrays. Results: In the discovery (273 cases, 273 controls) and the two validation sets (UKCTOCS 426 cases, 426 controls; Guernsey 303 cases and 606 controls), no differences were found in autoantibody reactivity to MUC1 tandem repeat peptide or glycoforms between cases and controls. Furthermore, no differences were observed between ovarian, pancreatic and lung cancer cases and controls. Conclusion: This robust, validated study shows autoantibodies to MUC1 peptide or glycopeptides cannot be used for breast, ovarian, lung or pancreatic cancer screening. This has significant implications for research on the use of MUC1 in cancer detection. PMID:23652307

  12. Automatic Classification of Time-variable X-Ray Sources

    NASA Astrophysics Data System (ADS)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  13. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species.

    PubMed

    Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C

    2016-01-01

    Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  14. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species

    PubMed Central

    Irizarry, Kristopher J. L.; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L.; Barrett, Gini; Barr, Margaret C.

    2016-01-01

    Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management. PMID:27376076

  15. Protective pathways against colitis mediated by appendicitis and appendectomy

    PubMed Central

    Cheluvappa, R; Luo, A S; Palmer, C; Grimm, M C

    2011-01-01

    Appendicitis followed by appendectomy (AA) at a young age protects against inflammatory bowel disease (IBD). Using a novel murine appendicitis model, we showed that AA protected against subsequent experimental colitis. To delineate genes/pathways involved in this protection, AA was performed and samples harvested from the most distal colon. RNA was extracted from four individual colonic samples per group (AA group and double-laparotomy control group) and each sample microarray analysed followed by gene-set enrichment analysis (GSEA). The gene-expression study was validated by quantitative reverse transcription–polymerase chain reaction (RT–PCR) of 14 selected genes across the immunological spectrum. Distal colonic expression of 266 gene-sets was up-regulated significantly in AA group samples (false discovery rates < 1%; P-value < 0·001). Time–course RT–PCR experiments involving the 14 genes displayed down-regulation over 28 days. The IBD-associated genes tnfsf10, SLC22A5, C3, ccr5, irgm, ptger4 and ccl20 were modulated in AA mice 3 days after surgery. Many key immunological and cellular function-associated gene-sets involved in the protective effect of AA in experimental colitis were identified. The down-regulation of 14 selected genes over 28 days after surgery indicates activation, repression or de-repression of these genes leading to downstream AA-conferred anti-colitis protection. Further analysis of these genes, profiles and biological pathways may assist in developing better therapeutic strategies in the management of intractable IBD. PMID:21707591

  16. Extraction of consensus protein patterns in regions containing non-proline cis peptide bonds and their functional assessment.

    PubMed

    Exarchos, Konstantinos P; Exarchos, Themis P; Rigas, Georgios; Papaloukas, Costas; Fotiadis, Dimitrios I

    2011-05-10

    In peptides and proteins, only a small percentile of peptide bonds adopts the cis configuration. Especially in the case of amide peptide bonds, the amount of cis conformations is quite limited thus hampering systematic studies, until recently. However, lately the emerging population of databases with more 3D structures of proteins has produced a considerable number of sequences containing non-proline cis formations (cis-nonPro). In our work, we extract regular expression-type patterns that are descriptive of regions surrounding the cis-nonPro formations. For this purpose, three types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, and iii) pattern discovery using a structural equivalency set. Afterwards, using each pattern as predicate, we search the Eukaryotic Linear Motif (ELM) resource to identify potential functional implications of regions with cis-nonPro peptide bonds. The patterns extracted from each type of pattern discovery are further employed, in order to formulate a pattern-based classifier, which is used to discriminate between cis-nonPro and trans-nonPro formations. In terms of functional implications, we observe a significant association of cis-nonPro peptide bonds towards ligand/binding functionalities. As for the pattern-based classification scheme, the highest results were obtained using the structural equivalency set, which yielded 70% accuracy, 77% sensitivity and 63% specificity.

  17. SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

    PubMed

    Yu, Qiang; Wei, Dingbang; Huo, Hongwei

    2018-06-18

    Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.

  18. Progress in Biomedical Knowledge Discovery: A 25-year Retrospective

    PubMed Central

    Sacchi, L.

    2016-01-01

    Summary Objectives We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. Methods We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. Results A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992-2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Conclusions Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data. PMID:27488403

  19. Progress in Biomedical Knowledge Discovery: A 25-year Retrospective.

    PubMed

    Sacchi, L; Holmes, J H

    2016-08-02

    We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning. We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated. A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992- 2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992. Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data.

  20. Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

    PubMed

    Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

    2011-08-01

    The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. Knowledge Discovery and Data Mining: An Overview

    NASA Technical Reports Server (NTRS)

    Fayyad, U.

    1995-01-01

    The process of knowledge discovery and data mining is the process of information extraction from very large databases. Its importance is described along with several techniques and considerations for selecting the most appropriate technique for extracting information from a particular data set.

  2. Compound Passport Service: supporting corporate collection owners in open innovation.

    PubMed

    Andrews, David M; Degorce, Sébastien L; Drake, David J; Gustafsson, Magnus; Higgins, Kevin M; Winter, Jon J

    2015-10-01

    A growing number of early discovery collaborative agreements are being put in place between large pharma companies and partners in which the rights for assets can reside with a partner, exclusively or jointly. Our corporate screening collection, like many others, was built on the premise that compounds generated in-house and not the subject of paper or patent disclosure were proprietary to the company. Collaborative screening arrangements and medicinal chemistry now make the origin, ownership rights and usage of compounds difficult to determine and manage. The Compound Passport Service is a dynamic database, managed and accessed through a set of reusable services that borrows from social media concepts to allow sample owners to take control of their samples in a much more active way. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Integrative analysis for the discovery of lung cancer serological markers and validation by MRM-MS

    PubMed Central

    An, Byung Chull; Choi, Yoo-Duk; Yang, Eun Gyeong; Na, Kook-Joo; Lee, Seung-Taek; Park, Jae-Il; Kim, Seon-Young; Lee, Cheolju

    2017-01-01

    Non-small-cell lung cancer (NSCLC) constitutes approximately 80% of all diagnosed lung cancers, and diagnostic markers detectable in the plasma/serum of NSCLC patients are greatly needed. In this study, we established a pipeline for the discovery of markers using 9 transcriptome datasets from publicly available databases and profiling of six lung cancer cell secretomes. Thirty-one out of 312 proteins that overlapped between two-fold differentially expressed genes and identified cell secretome proteins were detected in the pooled plasma of lung cancer patients. To quantify the candidates in the serum of NSCLC patients, multiple-reaction-monitoring mass spectrometry (MRM-MS) was performed for five candidate biomarkers. Finally, two potential biomarkers (BCHE and GPx3; AUC = 0.713 and 0.673, respectively) and one two-marker panel generated by logistic regression (BCHE/GPx3; AUC = 0.773) were identified. A validation test was performed by ELISA to evaluate the reproducibility of GPx3 and BCHE expression in an independent set of samples (BCHE and GPx3; AUC = 0.630 and 0.759, respectively, BCHE/GPx3 panel; AUC = 0.788). Collectively, these results demonstrate the feasibility of using our pipeline for marker discovery and our MRM-MS platform for verifying potential biomarkers of human diseases. PMID:28837649

  4. Integrative analysis for the discovery of lung cancer serological markers and validation by MRM-MS.

    PubMed

    Shin, Jihye; Song, Sang-Yun; Ahn, Hee-Sung; An, Byung Chull; Choi, Yoo-Duk; Yang, Eun Gyeong; Na, Kook-Joo; Lee, Seung-Taek; Park, Jae-Il; Kim, Seon-Young; Lee, Cheolju; Lee, Seung-Won

    2017-01-01

    Non-small-cell lung cancer (NSCLC) constitutes approximately 80% of all diagnosed lung cancers, and diagnostic markers detectable in the plasma/serum of NSCLC patients are greatly needed. In this study, we established a pipeline for the discovery of markers using 9 transcriptome datasets from publicly available databases and profiling of six lung cancer cell secretomes. Thirty-one out of 312 proteins that overlapped between two-fold differentially expressed genes and identified cell secretome proteins were detected in the pooled plasma of lung cancer patients. To quantify the candidates in the serum of NSCLC patients, multiple-reaction-monitoring mass spectrometry (MRM-MS) was performed for five candidate biomarkers. Finally, two potential biomarkers (BCHE and GPx3; AUC = 0.713 and 0.673, respectively) and one two-marker panel generated by logistic regression (BCHE/GPx3; AUC = 0.773) were identified. A validation test was performed by ELISA to evaluate the reproducibility of GPx3 and BCHE expression in an independent set of samples (BCHE and GPx3; AUC = 0.630 and 0.759, respectively, BCHE/GPx3 panel; AUC = 0.788). Collectively, these results demonstrate the feasibility of using our pipeline for marker discovery and our MRM-MS platform for verifying potential biomarkers of human diseases.

  5. miRDis: a Web tool for endogenous and exogenous microRNA discovery based on deep-sequencing data analysis.

    PubMed

    Zhang, Hanyuan; Vieira Resende E Silva, Bruno; Cui, Juan

    2018-05-01

    Small RNA sequencing is the most widely used tool for microRNA (miRNA) discovery, and shows great potential for the efficient study of miRNA cross-species transport, i.e., by detecting the presence of exogenous miRNA sequences in the host species. Because of the increased appreciation of dietary miRNAs and their far-reaching implication in human health, research interests are currently growing with regard to exogenous miRNAs bioavailability, mechanisms of cross-species transport and miRNA function in cellular biological processes. In this article, we present microRNA Discovery (miRDis), a new small RNA sequencing data analysis pipeline for both endogenous and exogenous miRNA detection. Specifically, we developed and deployed a Web service that supports the annotation and expression profiling data of known host miRNAs and the detection of novel miRNAs, other noncoding RNAs, and the exogenous miRNAs from dietary species. As a proof-of-concept, we analyzed a set of human plasma sequencing data from a milk-feeding study where 225 human miRNAs were detected in the plasma samples and 44 show elevated expression after milk intake. By examining the bovine-specific sequences, data indicate that three bovine miRNAs (bta-miR-378, -181* and -150) are present in human plasma possibly because of the dietary uptake. Further evaluation based on different sets of public data demonstrates that miRDis outperforms other state-of-the-art tools in both detection and quantification of miRNA from either animal or plant sources. The miRDis Web server is available at: http://sbbi.unl.edu/miRDis/index.php.

  6. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

    PubMed

    Mallik, Saurav; Zhao, Zhongming

    2017-12-28

    For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  7. Machine-assisted discovery of relationships in astronomy

    NASA Astrophysics Data System (ADS)

    Graham, Matthew J.; Djorgovski, S. G.; Mahabal, Ashish A.; Donalek, Ciro; Drake, Andrew J.

    2013-05-01

    High-volume feature-rich data sets are becoming the bread-and-butter of 21st century astronomy but present significant challenges to scientific discovery. In particular, identifying scientifically significant relationships between sets of parameters is non-trivial. Similar problems in biological and geosciences have led to the development of systems which can explore large parameter spaces and identify potentially interesting sets of associations. In this paper, we describe the application of automated discovery systems of relationships to astronomical data sets, focusing on an evolutionary programming technique and an information-theory technique. We demonstrate their use with classical astronomical relationships - the Hertzsprung-Russell diagram and the Fundamental Plane of elliptical galaxies. We also show how they work with the issue of binary classification which is relevant to the next generation of large synoptic sky surveys, such as the Large Synoptic Survey Telescope (LSST). We find that comparable results to more familiar techniques, such as decision trees, are achievable. Finally, we consider the reality of the relationships discovered and how this can be used for feature selection and extraction.

  8. Simultaneous Proteomic Discovery and Targeted Monitoring using Liquid Chromatography, Ion Mobility Spectrometry, and Mass Spectrometry.

    PubMed

    Burnum-Johnson, Kristin E; Nie, Song; Casey, Cameron P; Monroe, Matthew E; Orton, Daniel J; Ibrahim, Yehia M; Gritsenko, Marina A; Clauss, Therese R W; Shukla, Anil K; Moore, Ronald J; Purvine, Samuel O; Shi, Tujin; Qian, Weijun; Liu, Tao; Baker, Erin S; Smith, Richard D

    2016-12-01

    Current proteomic approaches include both broad discovery measurements and quantitative targeted analyses. In many cases, discovery measurements are initially used to identify potentially important proteins (e.g. candidate biomarkers) and then targeted studies are employed to quantify a limited number of selected proteins. Both approaches, however, suffer from limitations. Discovery measurements aim to sample the whole proteome but have lower sensitivity, accuracy, and quantitation precision than targeted approaches, whereas targeted measurements are significantly more sensitive but only sample a limited portion of the proteome. Herein, we describe a new approach that performs both discovery and targeted monitoring (DTM) in a single analysis by combining liquid chromatography, ion mobility spectrometry and mass spectrometry (LC-IMS-MS). In DTM, heavy labeled target peptides are spiked into tryptic digests and both the labeled and unlabeled peptides are detected using LC-IMS-MS instrumentation. Compared with the broad LC-MS discovery measurements, DTM yields greater peptide/protein coverage and detects lower abundance species. DTM also achieved detection limits similar to selected reaction monitoring (SRM) indicating its potential for combined high quality discovery and targeted analyses, which is a significant step toward the convergence of discovery and targeted approaches. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  9. Identification of susceptibility genes and genetic modifiers of human diseases

    NASA Astrophysics Data System (ADS)

    Abel, Kenneth; Kammerer, Stefan; Hoyal, Carolyn; Reneland, Rikard; Marnellos, George; Nelson, Matthew R.; Braun, Andreas

    2005-03-01

    The completion of the human genome sequence enables the discovery of genes involved in common human disorders. The successful identification of these genes is dependent on the availability of informative sample sets, validated marker panels, a high-throughput scoring technology, and a strategy for combining these resources. We have developed a universal platform technology based on mass spectrometry (MassARRAY) for analyzing nucleic acids with high precision and accuracy. To fuel this technology, we generated more than 100,000 validated assays for single nucleotide polymorphisms (SNPs) covering virtually all known and predicted human genes. We also established a large DNA sample bank comprised of more than 50,000 consented healthy and diseased individuals. This combination of reagents and technology allows the execution of large-scale genome-wide association studies. Taking advantage of MassARRAY"s capability for quantitative analysis of nucleic acids, allele frequencies are estimated in sample pools containing large numbers of individual DNAs. To compare pools as a first-pass "filtering" step is a tremendous advantage in throughput and cost over individual genotyping. We employed this approach in numerous genome-wide, hypothesis-free searches to identify genes associated with common complex diseases, such as breast cancer, osteoporosis, and osteoarthritis, and genes involved in quantitative traits like high density lipoproteins cholesterol (HDL-c) levels and central fat. Access to additional well-characterized patient samples through collaborations allows us to conduct replication studies that validate true disease genes. These discoveries will expand our understanding of genetic disease predisposition, and our ability for early diagnosis and determination of specific disease subtype or progression stage.

  10. Genome-wide interaction study identifies RCBTB1 as a modifier for smoking effect on carotid intima-media thickness.

    PubMed

    Wang, Liyong; Rundek, Tatjana; Beecham, Ashley; Hudson, Barry; Blanton, Susan H; Zhao, Hongyu; Sacco, Ralph L; Dong, Chuanhui

    2014-01-01

    Carotid intima-media thickness (cIMT), a marker for atherosclerosis, is affected by smoking and has substantial interindividual variation. We sought to identify the genetic moderators influencing the effect of smoking on cIMT. With a multistage design using 722 379 single nucleotide polymorphisms (SNP), a genome-wide interaction study was performed in a discovery sample of 669 Hispanics, followed by replication in 589 subjects (264 Hispanics, 172 non-Hispanic blacks, 153 non-Hispanic whites). Assuming an additive genetic model, regression analysis was performed to test for smoking-SNP interaction on cIMT while controlling for age, sex, and the top 3 principal components of ancestry. The strongest interaction in Hispanics was found with a synonymous splicing SNP (rs3751383) in exon 9 of RCBTB1 (P=2.5e(-6) in discovery sample; P=0.01 in the Hispanic replication sample; P<8.8e(-9) in the combined Hispanic sample). Stratification analysis in the combined Hispanic sample showed that smoking had no effect on cIMT among rs3751383 G homozygote (P=0.15), a moderate effect among rs3751383 heterozygote (P=0.01), and a strong effect among rs3751383 A homozygote (P=2.1e(-7)). A consistent trend was observed in the non-Hispanic white and black data sets, leading to an interaction effect of P<2.9e(-9) in the meta-analysis of all 1258 subjects. Our study represents the first genome-wide smoking-SNP interaction study of cIMT and identifies RCBTB1 as a modifier of the smoking effect on cIMT. Testing for gene-environment interactions can help uncover genetic factors that contribute to the interindividual variation in response to the same environmental exposure.

  11. Sparse Substring Pattern Set Discovery Using Linear Programming Boosting

    NASA Astrophysics Data System (ADS)

    Kashihara, Kazuaki; Hatano, Kohei; Bannai, Hideo; Takeda, Masayuki

    In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.

  12. Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks

    PubMed Central

    2012-01-01

    Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614

  13. Museums, Adventures, Discovery Activities: Gifted Curriculum Intrinsically Differentiated.

    ERIC Educational Resources Information Center

    Haensly, Patricia A.

    This paper discusses how museums, adventure programs, and discovery activities can become an intrinsically differentiated gifted curriculum for gifted learners. Museums and adventure programs are a forum for meaningful learning activities. The contextual characteristics of effectively designed settings for learning activities can, if the…

  14. Planetary image conversion task

    NASA Technical Reports Server (NTRS)

    Martin, M. D.; Stanley, C. L.; Laughlin, G.

    1985-01-01

    The Planetary Image Conversion Task group processed 12,500 magnetic tapes containing raw imaging data from JPL planetary missions and produced an image data base in consistent format on 1200 fully packed 6250-bpi tapes. The output tapes will remain at JPL. A copy of the entire tape set was delivered to US Geological Survey, Flagstaff, Ariz. A secondary task converted computer datalogs, which had been stored in project specific MARK IV File Management System data types and structures, to flat-file, text format that is processable on any modern computer system. The conversion processing took place at JPL's Image Processing Laboratory on an IBM 370-158 with existing software modified slightly to meet the needs of the conversion task. More than 99% of the original digital image data was successfully recovered by the conversion task. However, processing data tapes recorded before 1975 was destructive. This discovery is of critical importance to facilities responsible for maintaining digital archives since normal periodic random sampling techniques would be unlikely to detect this phenomenon, and entire data sets could be wiped out in the act of generating seemingly positive sampling results. Reccomended follow-on activities are also included.

  15. Feedback-Driven Dynamic Invariant Discovery

    NASA Technical Reports Server (NTRS)

    Zhang, Lingming; Yang, Guowei; Rungta, Neha S.; Person, Suzette; Khurshid, Sarfraz

    2014-01-01

    Program invariants can help software developers identify program properties that must be preserved as the software evolves, however, formulating correct invariants can be challenging. In this work, we introduce iDiscovery, a technique which leverages symbolic execution to improve the quality of dynamically discovered invariants computed by Daikon. Candidate invariants generated by Daikon are synthesized into assertions and instrumented onto the program. The instrumented code is executed symbolically to generate new test cases that are fed back to Daikon to help further re ne the set of candidate invariants. This feedback loop is executed until a x-point is reached. To mitigate the cost of symbolic execution, we present optimizations to prune the symbolic state space and to reduce the complexity of the generated path conditions. We also leverage recent advances in constraint solution reuse techniques to avoid computing results for the same constraints across iterations. Experimental results show that iDiscovery converges to a set of higher quality invariants compared to the initial set of candidate invariants in a small number of iterations.

  16. Genomics, "Discovery Science," Systems Biology, and Causal Explanation: What Really Works?

    PubMed

    Davidson, Eric H

    2015-01-01

    Diverse and non-coherent sets of epistemological principles currently inform research in the general area of functional genomics. Here, from the personal point of view of a scientist with over half a century of immersion in hypothesis driven scientific discovery, I compare and deconstruct the ideological bases of prominent recent alternatives, such as "discovery science," some productions of the ENCODE project, and aspects of large data set systems biology. The outputs of these types of scientific enterprise qualitatively reflect their radical definitions of scientific knowledge, and of its logical requirements. Their properties emerge in high relief when contrasted (as an example) to a recent, system-wide, predictive analysis of a developmental regulatory apparatus that was instead based directly on hypothesis-driven experimental tests of mechanism.

  17. Microwave-Assisted Esterification: A Discovery-Based Microscale Laboratory Experiment

    ERIC Educational Resources Information Center

    Reilly, Maureen K.; King, Ryan P.; Wagner, Alexander J.; King, Susan M.

    2014-01-01

    An undergraduate organic chemistry laboratory experiment has been developed that features a discovery-based microscale Fischer esterification utilizing a microwave reactor. Students individually synthesize a unique ester from known sets of alcohols and carboxylic acids. Each student identifies the best reaction conditions given their particular…

  18. Ranking metrics in gene set enrichment analysis: do they matter?

    PubMed

    Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

    2017-05-12

    There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.

  19. Protective pathways against colitis mediated by appendicitis and appendectomy.

    PubMed

    Cheluvappa, R; Luo, A S; Palmer, C; Grimm, M C

    2011-09-01

    Appendicitis followed by appendectomy (AA) at a young age protects against inflammatory bowel disease (IBD). Using a novel murine appendicitis model, we showed that AA protected against subsequent experimental colitis. To delineate genes/pathways involved in this protection, AA was performed and samples harvested from the most distal colon. RNA was extracted from four individual colonic samples per group (AA group and double-laparotomy control group) and each sample microarray analysed followed by gene-set enrichment analysis (GSEA). The gene-expression study was validated by quantitative reverse transcription-polymerase chain reaction (RT-PCR) of 14 selected genes across the immunological spectrum. Distal colonic expression of 266 gene-sets was up-regulated significantly in AA group samples (false discovery rates < 1%; P-value < 0·001). Time-course RT-PCR experiments involving the 14 genes displayed down-regulation over 28 days. The IBD-associated genes tnfsf10, SLC22A5, C3, ccr5, irgm, ptger4 and ccl20 were modulated in AA mice 3 days after surgery. Many key immunological and cellular function-associated gene-sets involved in the protective effect of AA in experimental colitis were identified. The down-regulation of 14 selected genes over 28 days after surgery indicates activation, repression or de-repression of these genes leading to downstream AA-conferred anti-colitis protection. Further analysis of these genes, profiles and biological pathways may assist in developing better therapeutic strategies in the management of intractable IBD. © 2011 The Authors. Clinical and Experimental Immunology © 2011 British Society for Immunology.

  20. Bladder cancer biomarker discovery using global metabolomic profiling of urine.

    PubMed

    Wittmann, Bryan M; Stirdivant, Steven M; Mitchell, Matthew W; Wulff, Jacob E; McDunn, Jonathan E; Li, Zhen; Dennis-Barrie, Aphrihl; Neri, Bruce P; Milburn, Michael V; Lotan, Yair; Wolfert, Robert L

    2014-01-01

    Bladder cancer (BCa) is a common malignancy worldwide and has a high probability of recurrence after initial diagnosis and treatment. As a result, recurrent surveillance, primarily involving repeated cystoscopies, is a critical component of post diagnosis patient management. Since cystoscopy is invasive, expensive and a possible deterrent to patient compliance with regular follow-up screening, new non-invasive technologies to aid in the detection of recurrent and/or primary bladder cancer are strongly needed. In this study, mass spectrometry based metabolomics was employed to identify biochemical signatures in human urine that differentiate bladder cancer from non-cancer controls. Over 1000 distinct compounds were measured including 587 named compounds of known chemical identity. Initial biomarker identification was conducted using a 332 subject sample set of retrospective urine samples (cohort 1), which included 66 BCa positive samples. A set of 25 candidate biomarkers was selected based on statistical significance, fold difference and metabolic pathway coverage. The 25 candidate biomarkers were tested against an independent urine sample set (cohort 2) using random forest analysis, with palmitoyl sphingomyelin, lactate, adenosine and succinate providing the strongest predictive power for differentiating cohort 2 cancer from non-cancer urines. Cohort 2 metabolite profiling revealed additional metabolites, including arachidonate, that were higher in cohort 2 cancer vs. non-cancer controls, but were below quantitation limits in the cohort 1 profiling. Metabolites related to lipid metabolism may be especially interesting biomarkers. The results suggest that urine metabolites may provide a much needed non-invasive adjunct diagnostic to cystoscopy for detection of bladder cancer and recurrent disease management.

  1. MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes.

    PubMed

    Kim, Sungjin; Jinich, Adrián; Aspuru-Guzik, Alán

    2017-04-24

    We propose a multiple descriptor multiple kernel (MultiDK) method for efficient molecular discovery using machine learning. We show that the MultiDK method improves both the speed and accuracy of molecular property prediction. We apply the method to the discovery of electrolyte molecules for aqueous redox flow batteries. Using multiple-type-as opposed to single-type-descriptors, we obtain more relevant features for machine learning. Following the principle of "wisdom of the crowds", the combination of multiple-type descriptors significantly boosts prediction performance. Moreover, by employing multiple kernels-more than one kernel function for a set of the input descriptors-MultiDK exploits nonlinear relations between molecular structure and properties better than a linear regression approach. The multiple kernels consist of a Tanimoto similarity kernel and a linear kernel for a set of binary descriptors and a set of nonbinary descriptors, respectively. Using MultiDK, we achieve an average performance of r 2 = 0.92 with a test set of molecules for solubility prediction. We also extend MultiDK to predict pH-dependent solubility and apply it to a set of quinone molecules with different ionizable functional groups to assess their performance as flow battery electrolytes.

  2. Biomarker discovery study design for type 1 diabetes in The Environmental Determinants of Diabetes in the Young (TEDDY) study.

    PubMed

    Lee, Hye-Seung; Burkhardt, Brant R; McLeod, Wendy; Smith, Susan; Eberhard, Chris; Lynch, Kristian; Hadley, David; Rewers, Marian; Simell, Olli; She, Jin-Xiong; Hagopian, Bill; Lernmark, Ake; Akolkar, Beena; Ziegler, Anette G; Krischer, Jeffrey P

    2014-07-01

    The Environmental Determinants of Diabetes in the Young planned biomarker discovery studies on longitudinal samples for persistent confirmed islet cell autoantibodies and type 1 diabetes using dietary biomarkers, metabolomics, microbiome/viral metagenomics and gene expression. This article describes the details of planning The Environmental Determinants of Diabetes in the Young biomarker discovery studies using a nested case-control design that was chosen as an alternative to the full cohort analysis. In the frame of a nested case-control design, it guides the choice of matching factors, selection of controls, preparation of external quality control samples and reduction of batch effects along with proper sample allocation. Our design is to reduce potential bias and retain study power while reducing the costs by limiting the numbers of samples requiring laboratory analyses. It also covers two primary end points (the occurrence of diabetes-related autoantibodies and the diagnosis of type 1 diabetes). The resulting list of case-control matched samples for each laboratory was augmented with external quality control samples. Copyright © 2013 John Wiley & Sons, Ltd.

  3. Discovery of error-tolerant biclusters from noisy gene expression data.

    PubMed

    Gupta, Rohit; Rao, Navneet; Kumar, Vipin

    2011-11-24

    An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET-bicluster approach not only recover larger set of genes as compared to those obtained from RAP approach but also have higher functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets. The results obtained for both the problems: functional module discovery and biomarkers discovery, clearly signifies the usefulness of the proposed ET-bicluster approach and illustrate the importance of explicitly incorporating noise/errors in discovering coherent groups of genes from gene-expression data.

  4. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  5. Urine cell-based DNA methylation classifier for monitoring bladder cancer.

    PubMed

    van der Heijden, Antoine G; Mengual, Lourdes; Ingelmo-Torres, Mercedes; Lozano, Juan J; van Rijt-van de Westerlo, Cindy C M; Baixauli, Montserrat; Geavlete, Bogdan; Moldoveanud, Cristian; Ene, Cosmin; Dinney, Colin P; Czerniak, Bogdan; Schalken, Jack A; Kiemeney, Lambertus A L M; Ribal, Maria J; Witjes, J Alfred; Alcaraz, Antonio

    2018-01-01

    Current standard methods used to detect and monitor bladder cancer (BC) are invasive or have low sensitivity. This study aimed to develop a urine methylation biomarker classifier for BC monitoring and validate this classifier in patients in follow-up for bladder cancer (PFBC). Voided urine samples ( N  = 725) from BC patients, controls, and PFBC were prospectively collected in four centers. Finally, 626 urine samples were available for analysis. DNA was extracted from the urinary cells and bisulfite modificated, and methylation status was analyzed using pyrosequencing. Cytology was available from a subset of patients ( N  = 399). In the discovery phase, seven selected genes from the literature ( CDH13 , CFTR , NID2 , SALL3 , TMEFF2 , TWIST1 , and VIM2 ) were studied in 111 BC and 57 control samples. This training set was used to develop a gene classifier by logistic regression and was validated in 458 PFBC samples (173 with recurrence). A three-gene methylation classifier containing CFTR , SALL3 , and TWIST1 was developed in the training set (AUC 0.874). The classifier achieved an AUC of 0.741 in the validation series. Cytology results were available for 308 samples from the validation set. Cytology achieved AUC 0.696 whereas the classifier in this subset of patients reached an AUC 0.768. Combining the methylation classifier with cytology results achieved an AUC 0.86 in the validation set, with a sensitivity of 96%, a specificity of 40%, and a positive and negative predictive value of 56 and 92%, respectively. The combination of the three-gene methylation classifier and cytology results has high sensitivity and high negative predictive value in a real clinical scenario (PFBC). The proposed classifier is a useful test for predicting BC recurrence and decrease the number of cystoscopies in the follow-up of BC patients. If only patients with a positive combined classifier result would be cystoscopied, 36% of all cystoscopies can be prevented.

  6. Evaluation of stream sediments in areas of known mineralization, San Jose and Talamanca Quadrangles, Costa Rica: An orientation survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arauz, A.J.

    1986-12-01

    Costa Rica's compressional island arc-type tectonic setting and considerable geologic diversity hold great promise for future discovery of economic metallic deposits. The study constitutes an orientation investigation of stream sediment sampling techniques to establish optimum survey specifications for the regional geochemical survey coverage of the country. The study was conducted in two separate areas of known mineralization which represent distinctive tropical environments and different metallogenic provinces within Costa Rica: (1) the Esparza Area, which contains the Santa Clara Gold Mine, the largest in the country, and (2) the San Isidro Area, which contains a major copper prospect.

  7. Automatic classification of time-variable X-ray sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, andmore » other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.« less

  8. Analyzing Student Inquiry Data Using Process Discovery and Sequence Classification

    ERIC Educational Resources Information Center

    Emond, Bruno; Buffett, Scott

    2015-01-01

    This paper reports on results of applying process discovery mining and sequence classification mining techniques to a data set of semi-structured learning activities. The main research objective is to advance educational data mining to model and support self-regulated learning in heterogeneous environments of learning content, activities, and…

  9. Why Quantify Uncertainty in Ecosystem Studies: Obligation versus Discovery Tool?

    NASA Astrophysics Data System (ADS)

    Harmon, M. E.

    2016-12-01

    There are multiple motivations for quantifying uncertainty in ecosystem studies. One is as an obligation; the other is as a tool useful in moving ecosystem science toward discovery. While reporting uncertainty should become a routine expectation, a more convincing motivation involves discovery. By clarifying what is known and to what degree it is known, uncertainty analyses can point the way toward improvements in measurements, sampling designs, and models. While some of these improvements (e.g., better sampling designs) may lead to incremental gains, those involving models (particularly model selection) may require large gains in knowledge. To be fully harnessed as a discovery tool, attitudes toward uncertainty may have to change: rather than viewing uncertainty as a negative assessment of what was done, it should be viewed as positive, helpful assessment of what remains to be done.

  10. A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.

    PubMed

    Savitski, Mikhail M; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus

    2015-09-01

    Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  11. iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates*

    PubMed Central

    Shteynberg, David; Deutsch, Eric W.; Lam, Henry; Eng, Jimmy K.; Sun, Zhi; Tasman, Natalie; Mendoza, Luis; Moritz, Robert L.; Aebersold, Ruedi; Nesvizhskii, Alexey I.

    2011-01-01

    The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines have been developed that identify different, overlapping subsets of the sample peptides from a particular set of tandem mass spectrometry spectra. We present iProphet, the new addition to the widely used open-source suite of proteomic data analysis tools Trans-Proteomics Pipeline. Applied in tandem with PeptideProphet, it provides more accurate representation of the multilevel nature of shotgun proteomic data. iProphet combines the evidence from multiple identifications of the same peptide sequences across different spectra, experiments, precursor ion charge states, and modified states. It also allows accurate and effective integration of the results from multiple database search engines applied to the same data. The use of iProphet in the Trans-Proteomics Pipeline increases the number of correctly identified peptides at a constant false discovery rate as compared with both PeptideProphet and another state-of-the-art tool Percolator. As the main outcome, iProphet permits the calculation of accurate posterior probabilities and false discovery rate estimates at the level of sequence identical peptide identifications, which in turn leads to more accurate probability estimates at the protein level. Fully integrated with the Trans-Proteomics Pipeline, it supports all commonly used MS instruments, search engines, and computer platforms. The performance of iProphet is demonstrated on two publicly available data sets: data from a human whole cell lysate proteome profiling experiment representative of typical proteomic data sets, and from a set of Streptococcus pyogenes experiments more representative of organism-specific composite data sets. PMID:21876204

  12. A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets

    PubMed Central

    Savitski, Mikhail M.; Wilhelm, Mathias; Hahne, Hannes; Kuster, Bernhard; Bantscheff, Marcus

    2015-01-01

    Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. PMID:25987413

  13. A new approach to epigenome-wide discovery of non-invasive methylation biomarkers for colorectal cancer screening in circulating cell-free DNA using pooled samples.

    PubMed

    Gallardo-Gómez, María; Moran, Sebastian; Páez de la Cadena, María; Martínez-Zorzano, Vicenta Soledad; Rodríguez-Berrocal, Francisco Javier; Rodríguez-Girondo, Mar; Esteller, Manel; Cubiella, Joaquín; Bujanda, Luis; Castells, Antoni; Balaguer, Francesc; Jover, Rodrigo; De Chiara, Loretta

    2018-01-01

    Colorectal cancer is the fourth cause of cancer-related deaths worldwide, though detection at early stages associates with good prognosis. Thus, there is a clear demand for novel non-invasive tests for the early detection of colorectal cancer and premalignant advanced adenomas, to be used in population-wide screening programs. Aberrant DNA methylation detected in liquid biopsies, such as serum circulating cell-free DNA (cfDNA), is a promising source of non-invasive biomarkers. This study aimed to assess the feasibility of using cfDNA pooled samples to identify potential serum methylation biomarkers for the detection of advanced colorectal neoplasia (colorectal cancer or advanced adenomas) using microarray-based technology. cfDNA was extracted from serum samples from 20 individuals with no colorectal findings, 20 patients with advanced adenomas, and 20 patients with colorectal cancer (stages I and II). Two pooled samples were prepared for each pathological group using equal amounts of cfDNA from 10 individuals, sex-, age-, and recruitment hospital-matched. We measured the methylation levels of 866,836 CpG positions across the genome using the MethylationEPIC array. Pooled serum cfDNA methylation data meets the quality requirements. The proportion of detected CpG in all pools (> 99% with detection p value < 0.01) exceeded Illumina Infinium methylation data quality metrics of the number of sites detected. The differential methylation analysis revealed 1384 CpG sites (5% false discovery rate) with at least 10% difference in the methylation level between no colorectal findings controls and advanced neoplasia, the majority of which were hypomethylated. Unsupervised clustering showed that cfDNA methylation patterns can distinguish advanced neoplasia from healthy controls, as well as separate tumor tissue from healthy mucosa in an independent dataset. We also observed that advanced adenomas and stage I/II colorectal cancer methylation profiles, grouped as advanced neoplasia, are largely homogenous and clustered close together. This preliminary study shows the viability of microarray-based methylation biomarker discovery using pooled serum cfDNA samples as an alternative approach to tissue specimens. Our strategy sets an open door for deciphering new non-invasive biomarkers not only for colorectal cancer detection, but also for other types of cancers.

  14. Epigenome-Wide DNA Methylation in Hearing Ability: New Mechanisms for an Old Problem

    PubMed Central

    Wolber, Lisa E.; Steves, Claire J.; Tsai, Pei-Chien; Deloukas, Panos; Spector, Tim D.

    2014-01-01

    Epigenetic regulation of gene expression has been shown to change over time and may be associated with environmental exposures in common complex traits. Age-related hearing impairment is a complex disorder, known to be heritable, with heritability estimates of 57–70%. Epigenetic regulation might explain the observed difference in age of onset and magnitude of hearing impairment with age. Epigenetic epidemiology studies using unrelated samples can be limited in their ability to detect small effects, and recent epigenetic findings in twins underscore the power of this well matched study design. We investigated the association between venous blood DNA methylation epigenome-wide and hearing ability. Pure-tone audiometry (PTA) and Illumina HumanMethylation array data were obtained from female twin volunteers enrolled in the TwinsUK register. Two study groups were explored: first, an epigenome-wide association scan (EWAS) was performed in a discovery sample (n = 115 subjects, age range: 47–83 years, Illumina 27 k array), then replication of the top ten associated probes from the discovery EWAS was attempted in a second unrelated sample (n = 203, age range: 41–86 years, Illumina 450 k array). Finally, a set of monozygotic (MZ) twin pairs (n = 21 pairs) within the discovery sample (Illumina 27 k array) was investigated in more detail in an MZ discordance analysis. Hearing ability was strongly associated with DNA methylation levels in the promoter regions of several genes, including TCF25 (cg01161216, p = 6.6×10−6), FGFR1 (cg15791248, p = 5.7×10−5) and POLE (cg18877514, p = 6.3×10−5). Replication of these results in a second sample confirmed the presence of differential methylation at TCF25 (p(replication) = 6×10−5) and POLE (p(replication) = 0.016). In the MZ discordance analysis, twins' intrapair difference in hearing ability correlated with DNA methylation differences at ACP6 (cg01377755, r = −0.75, p = 1.2×10−4) and MEF2D (cg08156349, r = −0.75, p = 1.4×10−4). Examination of gene expression in skin, suggests an influence of differential methylation on expression, which may account for the variation in hearing ability with age. PMID:25184702

  15. GeoGebra Assist Discovery Learning Model for Problem Solving Ability and Attitude toward Mathematics

    NASA Astrophysics Data System (ADS)

    Murni, V.; Sariyasa, S.; Ardana, I. M.

    2017-09-01

    This study aims to describe the effet of GeoGebra utilization in the discovery learning model on mathematical problem solving ability and students’ attitude toward mathematics. This research was quasi experimental and post-test only control group design was used in this study. The population in this study was 181 of students. The sampling technique used was cluster random sampling, so the sample in this study was 120 students divided into 4 classes, 2 classes for the experimental class and 2 classes for the control class. Data were analyzed by using one way MANOVA. The results of data analysis showed that the utilization of GeoGebra in discovery learning can lead to solving problems and attitudes towards mathematics are better. This is because the presentation of problems using geogebra can assist students in identifying and solving problems and attracting students’ interest because geogebra provides an immediate response process to students. The results of the research are the utilization of geogebra in the discovery learning can be applied in learning and teaching wider subject matter, beside subject matter in this study.

  16. Optimal selection of epitopes for TXP-immunoaffinity mass spectrometry.

    PubMed

    Planatscher, Hannes; Supper, Jochen; Poetz, Oliver; Stoll, Dieter; Joos, Thomas; Templin, Markus F; Zell, Andreas

    2010-06-25

    Mass spectrometry (MS) based protein profiling has become one of the key technologies in biomedical research and biomarker discovery. One bottleneck in MS-based protein analysis is sample preparation and an efficient fractionation step to reduce the complexity of the biological samples, which are too complex to be analyzed directly with MS. Sample preparation strategies that reduce the complexity of tryptic digests by using immunoaffinity based methods have shown to lead to a substantial increase in throughput and sensitivity in the proteomic mass spectrometry approach. The limitation of using such immunoaffinity-based approaches is the availability of the appropriate peptide specific capture antibodies. Recent developments in these approaches, where subsets of peptides with short identical terminal sequences can be enriched using antibodies directed against short terminal epitopes, promise a significant gain in efficiency. We show that the minimal set of terminal epitopes for the coverage of a target protein list can be found by the formulation as a set cover problem, preceded by a filtering pipeline for the exclusion of peptides and target epitopes with undesirable properties. For small datasets (a few hundred proteins) it is possible to solve the problem to optimality with moderate computational effort using commercial or free solvers. Larger datasets, like full proteomes require the use of heuristics.

  17. NGSmethDB 2017: enhanced methylomes and differential methylation

    PubMed Central

    Lebrón, Ricardo; Gómez-Martín, Cristina; Carpena, Pedro; Bernaola-Galván, Pedro; Barturen, Guillermo; Hackenberg, Michael; Oliver, José L.

    2017-01-01

    The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methylation epigenetic biomarkers: (i) differentially methylated single-cytosines; and (ii) methylation segments (i.e. genome regions of homogeneous methylation). The NGSmethDB back-end is now based on MongoDB, a NoSQL hierarchical database using JSON-formatted documents and dynamic schemas, thus accelerating sample comparative analyses. Besides conventional database dumps, track hubs were implemented, which improved database access, visualization in genome browsers and comparative analyses to third-part annotations. In addition, the database can be also accessed through a RESTful API. Lastly, a Python client and a multiplatform virtual machine allow for program-driven access from user desktop. This way, private methylation data can be compared to NGSmethDB without the need to upload them to public servers. Database website: http://bioinfo2.ugr.es/NGSmethDB. PMID:27794041

  18. Plasma proteomic analysis reveals altered protein abundances in cardiovascular disease.

    PubMed

    Lygirou, Vasiliki; Latosinska, Agnieszka; Makridakis, Manousos; Mullen, William; Delles, Christian; Schanstra, Joost P; Zoidakis, Jerome; Pieske, Burkert; Mischak, Harald; Vlahou, Antonia

    2018-04-17

    Cardiovascular disease (CVD) describes the pathological conditions of the heart and blood vessels. Despite the large number of studies on CVD and its etiology, its key modulators remain largely unknown. To this end, we performed a comprehensive proteomic analysis of blood plasma, with the scope to identify disease-associated changes after placing them in the context of existing knowledge, and generate a well characterized dataset for further use in CVD multi-omics integrative analysis. LC-MS/MS was employed to analyze plasma from 32 subjects (19 cases of various CVD phenotypes and 13 controls) in two steps: discovery (13 cases and 8 controls) and test (6 cases and 5 controls) set analysis. Following label-free quantification, the detected proteins were correlated to existing plasma proteomics datasets (plasma proteome database; PPD) and functionally annotated (Cytoscape, Ingenuity Pathway Analysis). Differential expression was defined based on identification confidence (≥ 2 peptides per protein), statistical significance (Mann-Whitney p value ≤ 0.05) and a minimum of twofold change. Peptides detected in at least 50% of samples per group were considered, resulting in a total of 3796 identified proteins (838 proteins based on ≥ 2 peptides). Pathway annotation confirmed the functional relevance of the findings (representation of complement cascade, fibrin clot formation, platelet degranulation, etc.). Correlation of the relative abundance of the proteins identified in the discovery set with their reported concentrations in the PPD was significant, confirming the validity of the quantification method. The discovery set analysis revealed 100 differentially expressed proteins between cases and controls, 39 of which were verified (≥ twofold change) in the test set. These included proteins already studied in the context of CVD (such as apolipoprotein B, alpha-2-macroglobulin), as well as novel findings (such as low density lipoprotein receptor related protein 2 [LRP2], protein SZT2) for which a mechanism of action is suggested. This proteomic study provides a comprehensive dataset to be used for integrative and functional studies in the field. The observed protein changes reflect known CVD-related processes (e.g. lipid uptake, inflammation) but also novel hypotheses for further investigation including a potential pleiotropic role of LPR2 but also links of SZT2 to CVD.

  19. How to succeed in science: a concise guide for young biomedical scientists. Part II: making discoveries

    PubMed Central

    Yewdell, Jonathan W.

    2009-01-01

    Making discoveries is the most important part of being a scientist, and also the most fun. Young scientists need to develop the experimental and mental skill sets that enable them to make discoveries, including how to recognize and exploit serendipity when it strikes. Here, I provide practical advice to young scientists on choosing a research topic, designing, performing and interpreting experiments and, last but not least, on maintaining your sanity in the process. PMID:18401347

  20. The Critical Role of Organic Chemistry in Drug Discovery.

    PubMed

    Rotella, David P

    2016-10-19

    Small molecules remain the backbone for modern drug discovery. They are conceived and synthesized by medicinal chemists, many of whom were originally trained as organic chemists. Support from government and industry to provide training and personnel for continued development of this critical skill set has been declining for many years. This Viewpoint highlights the value of organic chemistry and organic medicinal chemists in the complex journey of drug discovery as a reminder that basic science support must be restored.

  1. How to succeed in science: a concise guide for young biomedical scientists. Part II: making discoveries.

    PubMed

    Yewdell, Jonathan W

    2008-06-01

    Making discoveries is the most important part of being a scientist, and also the most fun. Young scientists need to develop the experimental and mental skill sets that enable them to make discoveries, including how to recognize and exploit serendipity when it strikes. Here, I provide practical advice to young scientists on choosing a research topic, designing, performing and interpreting experiments and, last but not least, on maintaining your sanity in the process.

  2. Woodpecker Preventative measures at Launch Pad 39B

    NASA Technical Reports Server (NTRS)

    1995-01-01

    Technicians at Launch Pad 39B take steps to prevent further damage from woodpeckers to the Space Shuttle Discovery, set to lift off July 13 on Mission STS-70. Installing balloons with scary eyes, such as these two near the external tank, are just one of the measures being taken to keep woodpeckers away since Discovery's second rollout to Pad B. Discovery had to be rolled back once to the Vehicle Assembly Building to repair woodpecker holes made in the insulation covering the external tank.

  3. [Recent advances in metabonomics].

    PubMed

    Xu, Guo-Wang; Lu, Xin; Yang, Sheng-Li

    2007-12-01

    Metabonomics (or metabolomics) aims at the comprehensive and quantitative analysis of the wide arrays of metabolites in biological samples. Metabonomics has been labeled as one of the new" -omics" joining genomics, transcriptomics, and proteomics as a science employed toward the understanding of global systems biology. It has been widely applied in many research areas including drug toxicology, biomarker discovery, functional genomics, and molecular pathology etc. The comprehensive analysis of the metabonome is particularly challenging due to the diverse chemical natures of metabolites. Metabonomics investigations require special approaches for sample preparation, data-rich analytical chemical measurements, and information mining. The outputs from a metabonomics study allow sample classification, biomarker discovery, and interpretation of the reasons for classification information. This review focuses on the currently new advances in various technical platforms of metabonomics and its applications in drug discovery and development, disease biomarker identification, plant and microbe related fields.

  4. KSC-05PD-1449

    NASA Technical Reports Server (NTRS)

    2005-01-01

    KENNEDY SPACE CENTER, FLA. At Launch Pad 39B, the Orbiter Boom Sensor System (OBSS) sensor package is viewed before the orbiter's payload bay doors are closed for launch. Payload bay door closure is a significant milestone in the preparations of Discovery for the first Return to Flight mission, STS-114. This sensor package will provide surface area and depth defect inspection for all the surfaces of the orbiter. It includes an intensified television camera (ITVC) and a laser dynamic range imager, which are mounted on a pan and tilt unit, and a laser camera system (LCS) mounted on a stationary bracket. The package is part of the new safety measures added for all future Space Shuttle missions. During its 12-day mission, Discoverys seven- person crew will test new hardware and techniques to improve Shuttle safety, as well as deliver supplies to the International Space Station. Discoverys payloads include the Multi-Purpose Logistics Module Raffaello, the Lightweight Multi-Purpose Experiment Support Structure Carrier (LMC), and the External Stowage Platform-2 (ESP-2). Raffaello will deliver supplies to the International Space Station including food, clothing and research equipment. The LMC supports a replacement Control Moment Gyroscope and a tile repair sample box. The ESP-2 is outfitted with replacement parts. Launch of mission STS-114 was set for July 13 at the conclusion of the Flight Readiness Review yesterday.

  5. ISO 19115 Experiences in NASA's Earth Observing System (EOS) ClearingHOuse (ECHO)

    NASA Astrophysics Data System (ADS)

    Cechini, M. F.; Mitchell, A.

    2011-12-01

    Metadata is an important entity in the process of cataloging, discovering, and describing earth science data. As science research and the gathered data increases in complexity, so does the complexity and importance of descriptive metadata. To meet these growing needs, the metadata models required utilize richer and more mature metadata attributes. Categorizing, standardizing, and promulgating these metadata models to a politically, geographically, and scientifically diverse community is a difficult process. An integral component of metadata management within NASA's Earth Observing System Data and Information System (EOSDIS) is the Earth Observing System (EOS) ClearingHOuse (ECHO). ECHO is the core metadata repository for the EOSDIS data centers providing a centralized mechanism for metadata and data discovery and retrieval. ECHO has undertaken an internal restructuring to meet the changing needs of scientists, the consistent advancement in technology, and the advent of new standards such as ISO 19115. These improvements were based on the following tenets for data discovery and retrieval: + There exists a set of 'core' metadata fields recommended for data discovery. + There exists a set of users who will require the entire metadata record for advanced analysis. + There exists a set of users who will require a 'core' set metadata fields for discovery only. + There will never be a cessation of new formats or a total retirement of all old formats. + Users should be presented metadata in a consistent format of their choosing. In order to address the previously listed items, ECHO's new metadata processing paradigm utilizes the following approach: + Identify a cross-format set of 'core' metadata fields necessary for discovery. + Implement format-specific indexers to extract the 'core' metadata fields into an optimized query capability. + Archive the original metadata in its entirety for presentation to users requiring the full record. + Provide on-demand translation of 'core' metadata to any supported result format. Lessons learned by the ECHO team while implementing its new metadata approach to support usage of the ISO 19115 standard will be presented. These lessons learned highlight some discovered strengths and weaknesses in the ISO 19115 standard as it is introduced to an existing metadata processing system.

  6. Integrated Genomic Biomarkers to Identify Aggressive Disease in African Americans with Prostate Cancer

    DTIC Science & Technology

    2016-09-01

    300 of these men; have completed pathology review of 70 of the discovery sample tumors; macrodissected and performed DNA extraction from 50 tumors...block, and sections cut and tumor areas marked by histopathologist. Target completion September 1st 2017; Discovery sample 35% completed Pathology ...African American population. Target completion March 2017; 50% completed. What was accomplished under these goals? In the current reporting

  7. The Biomedical Resource Ontology (BRO) to Enable Resource Discovery in Clinical and Translational Research

    PubMed Central

    Tenenbaum, Jessica D.; Whetzel, Patricia L.; Anderson, Kent; Borromeo, Charles D.; Dinov, Ivo D.; Gabriel, Davera; Kirschner, Beth; Mirel, Barbara; Morris, Tim; Noy, Natasha; Nyulas, Csongor; Rubenson, David; Saxman, Paul R.; Singh, Harpreet; Whelan, Nancy; Wright, Zach; Athey, Brian D.; Becich, Michael J.; Ginsburg, Geoffrey S.; Musen, Mark A.; Smith, Kevin A.; Tarantal, Alice F.; Rubin, Daniel L; Lyster, Peter

    2010-01-01

    The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health. PMID:20955817

  8. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    PubMed

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  9. Documenting the Conversation: A Systematic Review of Library Discovery Layers

    ERIC Educational Resources Information Center

    Bossaller, Jenny S.; Sandy, Heather Moulaison

    2017-01-01

    This article describes the results of a systematic review of peer-reviewed, published research articles about "discovery layers," user-friendly interfaces or systems that provide single-search box access to library content. Focusing on articles in LISTA published 2009-2013, a set of 80 articles was coded for community of users, journal…

  10. Designing for Discovery Learning of Complexity Principles of Congestion by Driving Together in the TrafficJams Simulation

    ERIC Educational Resources Information Center

    Levy, Sharona T.; Peleg, Ran; Ofeck, Eyal; Tabor, Naamit; Dubovi, Ilana; Bluestein, Shiri; Ben-Zur, Hadar

    2018-01-01

    We propose and evaluate a framework supporting collaborative discovery learning of complex systems. The framework blends five design principles: (1) individual action: amidst (2) social interactions; challenged with (3) multiple tasks; set in (4) a constrained interactive learning environment that draws attention to (5) highlighted target…

  11. Analysis of potential protein-modifying variants in 9000 endometriosis patients and 150000 controls of European ancestry.

    PubMed

    Sapkota, Yadav; Vivo, Immaculata De; Steinthorsdottir, Valgerdur; Fassbender, Amelie; Bowdler, Lisa; Buring, Julie E; Edwards, Todd L; Jones, Sarah; O, Dorien; Peterse, Daniëlle; Rexrode, Kathryn M; Ridker, Paul M; Schork, Andrew J; Thorleifsson, Gudmar; Wallace, Leanne M; Kraft, Peter; Morris, Andrew P; Nyholt, Dale R; Edwards, Digna R Velez; Nyegaard, Mette; D'Hooghe, Thomas; Chasman, Daniel I; Stefansson, Kari; Missmer, Stacey A; Montgomery, Grant W

    2017-09-12

    Genome-wide association (GWA) studies have identified 19 independent common risk loci for endometriosis. Most of the GWA variants are non-coding and the genes responsible for the association signals have not been identified. Herein, we aimed to assess the potential role of protein-modifying variants in endometriosis using exome-array genotyping in 7164 cases and 21005 controls, and a replication set of 1840 cases and 129016 controls of European ancestry. Results in the discovery sample identified significant evidence for association with coding variants in single-variant (rs1801232-CUBN) and gene-level (CIITA and PARP4) meta-analyses, but these did not survive replication. In the combined analysis, there was genome-wide significant evidence for rs13394619 (P = 2.3 × 10 -9 ) in GREB1 at 2p25.1 - a locus previously identified in a GWA meta-analysis of European and Japanese samples. Despite sufficient power, our results did not identify any protein-modifying variants (MAF > 0.01) with moderate or large effect sizes in endometriosis, although these variants may exist in non-European populations or in high-risk families. The results suggest continued discovery efforts should focus on genotyping large numbers of surgically-confirmed endometriosis cases and controls, and/or sequencing high-risk families to identify novel rare variants to provide greater insights into the molecular pathogenesis of the disease.

  12. Effective knowledge management in translational medicine.

    PubMed

    Szalma, Sándor; Koka, Venkata; Khasanova, Tatiana; Perakslis, Eric D

    2010-07-19

    The growing consensus that most valuable data source for biomedical discoveries is derived from human samples is clearly reflected in the growing number of translational medicine and translational sciences departments across pharma as well as academic and government supported initiatives such as Clinical and Translational Science Awards (CTSA) in the US and the Seventh Framework Programme (FP7) of EU with emphasis on translating research for human health. The pharmaceutical companies of Johnson and Johnson have established translational and biomarker departments and implemented an effective knowledge management framework including building a data warehouse and the associated data mining applications. The implemented resource is built from open source systems such as i2b2 and GenePattern. The system has been deployed across multiple therapeutic areas within the pharmaceutical companies of Johnson and Johnsons and being used actively to integrate and mine internal and public data to support drug discovery and development decisions such as indication selection and trial design in a translational medicine setting. Our results show that the established system allows scientist to quickly re-validate hypotheses or generate new ones with the use of an intuitive graphical interface. The implemented resource can serve as the basis of precompetitive sharing and mining of studies involving samples from human subjects thus enhancing our understanding of human biology and pathophysiology and ultimately leading to more effective treatment of diseases which represent unmet medical needs.

  13. The Emory Chemical Biology Discovery Center: leveraging academic innovation to advance novel targets through HTS and beyond.

    PubMed

    Johns, Margaret A; Meyerkord-Belton, Cheryl L; Du, Yuhong; Fu, Haian

    2014-03-01

    The Emory Chemical Biology Discovery Center (ECBDC) aims to accelerate high throughput biology and translation of biomedical research discoveries into therapeutic targets and future medicines by providing high throughput research platforms to scientific collaborators worldwide. ECBDC research is focused at the interface of chemistry and biology, seeking to fundamentally advance understanding of disease-related biology with its HTS/HCS platforms and chemical tools, ultimately supporting drug discovery. Established HTS/HCS capabilities, university setting, and expertise in diverse assay formats, including protein-protein interaction interrogation, have enabled the ECBDC to contribute to national chemical biology efforts, empower translational research, and serve as a training ground for young scientists. With these resources, the ECBDC is poised to leverage academic innovation to advance biology and therapeutic discovery.

  14. Comparison of Collection Methods for Fecal Samples for Discovery Metabolomics in Epidemiologic Studies.

    PubMed

    Loftfield, Erikka; Vogtmann, Emily; Sampson, Joshua N; Moore, Steven C; Nelson, Heidi; Knight, Rob; Chia, Nicholas; Sinha, Rashmi

    2016-11-01

    The gut metabolome may be associated with the incidence and progression of numerous diseases. The composition of the gut metabolome can be captured by measuring metabolite levels in the feces. However, there are little data describing the effect of fecal sample collection methods on metabolomic measures. We collected fecal samples from 18 volunteers using four methods: no solution, 95% ethanol, fecal occult blood test (FOBT) cards, and fecal immunochemical test (FIT). One set of samples was frozen after collection (day 0), and for 95% ethanol, FOBT, and FIT, a second set was frozen after 96 hours at room temperature. We evaluated (i) technical reproducibility within sample replicates, (ii) stability after 96 hours at room temperature for 95% ethanol, FOBT, and FIT, and (iii) concordance of metabolite measures with the putative "gold standard," day 0 samples without solution. Intraclass correlation coefficients (ICC) estimating technical reproducibility were high for replicate samples for each collection method. ICCs estimating stability at room temperature were high for 95% ethanol and FOBT (median ICC > 0.87) but not FIT (median ICC = 0.52). Similarly, Spearman correlation coefficients (r s ) estimating metabolite concordance with the "gold standard" were higher for 95% ethanol (median r s = 0.82) and FOBT (median r s = 0.70) than for FIT (median r s = 0.40). Metabolomic measurements appear reproducible and stable in fecal samples collected with 95% ethanol or FOBT. Concordance with the "gold standard" is highest with 95% ethanol and acceptable with FOBT. Future epidemiologic studies should collect feces using 95% ethanol or FOBT if interested in studying fecal metabolomics. Cancer Epidemiol Biomarkers Prev; 25(11); 1483-90. ©2016 AACR. ©2016 American Association for Cancer Research.

  15. Challenges of the information age: the impact of false discovery on pathway identification.

    PubMed

    Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E

    2012-11-21

    Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.

  16. Systematic modelling and design evaluation of unperturbed tumour dynamics in xenografts.

    PubMed

    Parra Guillen, Zinnia P Patricia; Mangas Sanjuan, Victor; Garcia-Cremades, Maria; Troconiz, Inaki F; Mo, Gary; Pitou, Celine; Iversen, Philip W; Wallin, Johan E

    2018-04-24

    Xenograft mice are largely used to evaluate the efficacy of oncological drugs during preclinical phases of drug discovery and development. Mathematical models provide a useful tool to quantitatively characterise tumour growth dynamics and also optimise upcoming experiments. To the best of our knowledge, this is the first report where unperturbed growth of a large set of tumour cell lines (n=28) has been systematically analysed using the model proposed by Simeoni in the context of non-linear mixed effect (NLME). Exponential growth was identified as the governing mechanism in the majority of the cell lines, with constant rate values ranging from 0.0204 to 0.203 day -1 No common patterns could be observed across tumour types, highlighting the importance of combining information from different cell lines when evaluating drug activity. Overall, typical model parameters were precisely estimated using designs where tumour size measurements were taken every two days. Moreover, reducing the number of measurement to twice per week, or even once per week for cell lines with low growth rates, showed little impact on parameter precision. However, in order to accurately characterise parameter variability (i.e. relative standard errors below 50%), a sample size of at least 50 mice is needed. This work illustrates the feasibility to systematically apply NLME models to characterise tumour growth in drug discovery and development, and constitutes a valuable source of data to optimise experimental designs by providing an a priori sampling window and minimising the number of samples required. The American Society for Pharmacology and Experimental Therapeutics.

  17. Enhancing Undergraduate Education with NASA Resources

    NASA Astrophysics Data System (ADS)

    Manning, James G.; Meinke, Bonnie; Schultz, Gregory; Smith, Denise Anne; Lawton, Brandon L.; Gurton, Suzanne; Astrophysics Community, NASA

    2015-08-01

    The NASA Astrophysics Science Education and Public Outreach Forum (SEPOF) coordinates the work of NASA Science Mission Directorate (SMD) Astrophysics EPO projects and their teams to bring cutting-edge discoveries of NASA missions to the introductory astronomy college classroom. Uniquely poised to foster collaboration between scientists with content expertise and educators with pedagogical expertise, the Forum has coordinated the development of several resources that provide new opportunities for college and university instructors to bring the latest NASA discoveries in astrophysics into their classrooms.To address the needs of the higher education community, the Astrophysics Forum collaborated with the astrophysics E/PO community, researchers, and introductory astronomy instructors to place individual science discoveries and learning resources into context for higher education audiences. The resulting products include two “Resource Guides” on cosmology and exoplanets, each including a variety of accessible resources. The Astrophysics Forum also coordinates the development of the “Astro 101” slide set series. The sets are five- to seven-slide presentations on new discoveries from NASA astrophysics missions relevant to topics in introductory astronomy courses. These sets enable Astronomy 101 instructors to include new discoveries not yet in their textbooks in their courses, and may be found at: https://www.astrosociety.org/education/resources-for-the-higher-education-audience/.The Astrophysics Forum also coordinated the development of 12 monthly “Universe Discovery Guides,” each featuring a theme and a representative object well-placed for viewing, with an accompanying interpretive story, strategies for conveying the topics, and supporting NASA-approved education activities and background information from a spectrum of NASA missions and programs. These resources are adaptable for use by instructors and may be found at: http://nightsky.jpl.nasa.gov/news-display.cfm?News_ID=611.These resources help enhance the Science, Technology, Engineering, and Mathematics (STEM) experiences of undergraduates, and will be described with access information provided.

  18. SAMPL4 & DOCK3.7: lessons for automated docking procedures

    NASA Astrophysics Data System (ADS)

    Coleman, Ryan G.; Sterling, Teague; Weiss, Dahlia R.

    2014-03-01

    The SAMPL4 challenges were used to test current automated methods for solvation energy, virtual screening, pose and affinity prediction of the molecular docking pipeline DOCK 3.7. Additionally, first-order models of binding affinity were proposed as milestones for any method predicting binding affinity. Several important discoveries about the molecular docking software were made during the challenge: (1) Solvation energies of ligands were five-fold worse than any other method used in SAMPL4, including methods that were similarly fast, (2) HIV Integrase is a challenging target, but automated docking on the correct allosteric site performed well in terms of virtual screening and pose prediction (compared to other methods) but affinity prediction, as expected, was very poor, (3) Molecular docking grid sizes can be very important, serious errors were discovered with default settings that have been adjusted for all future work. Overall, lessons from SAMPL4 suggest many changes to molecular docking tools, not just DOCK 3.7, that could improve the state of the art. Future difficulties and projects will be discussed.

  19. A fully automated liquid–liquid extraction system utilizing interface detection

    PubMed Central

    Maslana, Eugene; Schmitt, Robert; Pan, Jeffrey

    2000-01-01

    The development of the Abbott Liquid-Liquid Extraction Station was a result of the need for an automated system to perform aqueous extraction on large sets of newly synthesized organic compounds used for drug discovery. The system utilizes a cylindrical laboratory robot to shuttle sample vials between two loading racks, two identical extraction stations, and a centrifuge. Extraction is performed by detecting the phase interface (by difference in refractive index) of the moving column of fluid drawn from the bottom of each vial containing a biphasic mixture. The integration of interface detection with fluid extraction maximizes sample throughput. Abbott-developed electronics process the detector signals. Sample mixing is performed by high-speed solvent injection. Centrifuging of the samples reduces interface emulsions. Operating software permits the user to program wash protocols with any one of six solvents per wash cycle with as many cycle repeats as necessary. Station capacity is eighty, 15 ml vials. This system has proven successful with a broad spectrum of both ethyl acetate and methylene chloride based chemistries. The development and characterization of this automated extraction system will be presented. PMID:18924693

  20. Fine mapping on chromosome 13q32-34 and brain expression analysis implicates MYO16 in schizophrenia.

    PubMed

    Rodriguez-Murillo, Laura; Xu, Bin; Roos, J Louw; Abecasis, Gonçalo R; Gogos, Joseph A; Karayiorgou, Maria

    2014-03-01

    We previously reported linkage of schizophrenia and schizoaffective disorder to 13q32-34 in the European descent Afrikaner population from South Africa. The nature of genetic variation underlying linkage peaks in psychiatric disorders remains largely unknown and both rare and common variants may be contributing. Here, we examine the contribution of common variants located under the 13q32-34 linkage region. We used densely spaced SNPs to fine map the linkage peak region using both a discovery sample of 415 families and a meta-analysis incorporating two additional replication family samples. In a second phase of the study, we use one family-based data set with 237 families and independent case-control data sets for fine mapping of the common variant association signal using HapMap SNPs. We report a significant association with a genetic variant (rs9583277) within the gene encoding for the myosin heavy-chain Myr 8 (MYO16), which has been implicated in neuronal phosphoinositide 3-kinase signaling. Follow-up analysis of HapMap variation within MYO16 in a second set of Afrikaner families and additional case-control data sets of European descent highlighted a region across introns 2-6 as the most likely region to harbor common MYO16 risk variants. Expression analysis revealed a significant increase in the level of MYO16 expression in the brains of schizophrenia patients. Our results suggest that common variation within MYO16 may contribute to the genetic liability to schizophrenia.

  1. Guided discovery of the nine-point circle theorem and its proof

    NASA Astrophysics Data System (ADS)

    Buchbinder, Orly

    2018-01-01

    The nine-point circle theorem is one of the most beautiful and surprising theorems in Euclidean geometry. It establishes an existence of a circle passing through nine points, all of which are related to a single triangle. This paper describes a set of instructional activities that can help students discover the nine-point circle theorem through investigation in a dynamic geometry environment, and consequently prove it using a method of guided discovery. The paper concludes with a variety of suggestions for the ways in which the whole set of activities can be implemented in geometry classrooms.

  2. Applications of chemogenomic library screening in drug discovery.

    PubMed

    Jones, Lyn H; Bunnage, Mark E

    2017-04-01

    The allure of phenotypic screening, combined with the industry preference for target-based approaches, has prompted the development of innovative chemical biology technologies that facilitate the identification of new therapeutic targets for accelerated drug discovery. A chemogenomic library is a collection of selective small-molecule pharmacological agents, and a hit from such a set in a phenotypic screen suggests that the annotated target or targets of that pharmacological agent may be involved in perturbing the observable phenotype. In this Review, we describe opportunities for chemogenomic screening to considerably expedite the conversion of phenotypic screening projects into target-based drug discovery approaches. Other applications are explored, including drug repositioning, predictive toxicology and the discovery of novel pharmacological modalities.

  3. A New Era of Multidisciplinary Expeditions: Recent Opportunities and Progress to Advance the Telepresence Paradigm

    NASA Astrophysics Data System (ADS)

    Cantwell, K. L.; Kennedy, B. R.; Malik, M.; Gray, L. M.; Elliott, K.; Lobecker, E.; Drewniak, J.; Reser, B.; Crum, E.; Lovalvo, D.

    2016-02-01

    Since it's commissioning in 2008, NOAA Ship Okeanos Explorer has used telepresence technology both as an outreach tool and as a new way to conduct interdisciplinary science expeditions. NOAA's Office of Ocean Exploration and Research (OER) has developed a set of collaboration tools and protocols to enable extensive shore-based participation. Telepresence offers unique advantages including access to a large pool of expertise on shore and flexibility to react to new discoveries as they occur. During early years, the telepresence experience was limited to Internet 2 enabled Exploration Command Centers, but with advent of improved bandwidth and new video transcoders, scientists from anywhere with an internet connection can participate in a telepresence expedition. Scientists have also capitalized on social media (Twitter, Facebook, Reddit etc.) by sharing discoveries to leverage the intellectual capital of scientists worldwide and engaging the general public in real-time. Aside from using telepresence to stream video off the ship, the high-bandwidth satellite connection allows for the transfer of large quantities of data in near real-time. This enables not only ship - shore data transfers, but can also support ship - ship collaborations as demonstrated during the 2015 and 2014 seasons where Okeanos worked directly with science teams onboard other vessels to share data and immediately follow up on features of interest, leading to additional discoveries. OER continues to expand its use of telepresence by experimenting with procedures to offload roles previously tied to the ship, such as data acquisition watch standers; prototyping tools for distributed user data analysis and video annotation; and incorporating in-situ sampling devices. OER has also developed improved tools to provide access to archived data to increase data distribution and facilitate additional discoveries post-expedition.

  4. The sun sets on the Space Shuttle Discovery during post-flight processing in the Mate-Demate Device (MDD), following its landing at NASA DFRC in California

    NASA Image and Video Library

    2005-08-14

    The sun sets on the Space Shuttle Discovery during post-flight processing in the Mate-Demate Device (MDD), following its landing at NASA's Dryden Flight Research Center in California. The gantry-like MDD structure is used for servicing the shuttle orbiters in preparation for their ferry flight back to the Kennedy Space Center in Florida, including mounting the shuttle atop NASA's modified Boeing 747 Shuttle Carrier Aircraft. Space Shuttle Discovery landed safely at NASA's Dryden Flight Research Center at Edwards Air Force Base in California at 5:11:22 a.m. PDT, August 9, 2005, following the very successful 14-day STS-114 return to flight mission. During their two weeks in space, Commander Eileen Collins and her six crewmates tested out new safety procedures and delivered supplies and equipment the International Space Station. Discovery spent two weeks in space, where the crew demonstrated new methods to inspect and repair the Shuttle in orbit. The crew also delivered supplies, outfitted and performed maintenance on the International Space Station. A number of these tasks were conducted during three spacewalks. In an unprecedented event, spacewalkers were called upon to remove protruding gap fillers from the heat shield on Discovery's underbelly. In other spacewalk activities, astronauts installed an external platform onto the Station's Quest Airlock and replaced one of the orbital outpost's Control Moment Gyroscopes. Inside the Station, the STS-114 crew conducted joint operations with the Expedition 11 crew. They unloaded fresh supplies from the Shuttle and the Raffaello Multi-Purpose Logistics Module. Before Discovery undocked, the crews filled Raffeallo with unneeded items and returned to Shuttle payload bay. Discovery launched on July 26 and spent almost 14 days on orbit.

  5. The Discovery Dome: A Tool for Increasing Student Engagement

    NASA Astrophysics Data System (ADS)

    Brevik, Corinne

    2015-04-01

    The Discovery Dome is a portable full-dome theater that plays professionally-created science films. Developed by the Houston Museum of Natural Science and Rice University, this inflatable planetarium offers a state-of-the-art visual learning experience that can address many different fields of science for any grade level. It surrounds students with roaring dinosaurs, fascinating planets, and explosive storms - all immersive, engaging, and realistic. Dickinson State University has chosen to utilize its Discovery Dome to address Earth Science education at two levels. University courses across the science disciplines can use the Discovery Dome as part of their curriculum. The digital shows immerse the students in various topics ranging from astronomy to geology to weather and climate. The dome has proven to be a valuable tool for introducing new material to students as well as for reinforcing concepts previously covered in lectures or laboratory settings. The Discovery Dome also serves as an amazing science public-outreach tool. University students are trained to run the dome, and they travel with it to schools and libraries around the region. During the 2013-14 school year, our Discovery Dome visited over 30 locations. Many of the schools visited are in rural settings which offer students few opportunities to experience state-of-the-art science technology. The school kids are extremely excited when the Discovery Dome visits their community, and they will talk about the experience for many weeks. Traveling with the dome is also very valuable for the university students who get involved in the program. They become very familiar with the science content, and they gain experience working with teachers as well as the general public. They get to share their love of science, and they get to help inspire a new generation of scientists.

  6. A quantum causal discovery algorithm

    NASA Astrophysics Data System (ADS)

    Giarmatzi, Christina; Costa, Fabio

    2018-03-01

    Finding a causal model for a set of classical variables is now a well-established task—but what about the quantum equivalent? Even the notion of a quantum causal model is controversial. Here, we present a causal discovery algorithm for quantum systems. The input to the algorithm is a process matrix describing correlations between quantum events. Its output consists of different levels of information about the underlying causal model. Our algorithm determines whether the process is causally ordered by grouping the events into causally ordered non-signaling sets. It detects if all relevant common causes are included in the process, which we label Markovian, or alternatively if some causal relations are mediated through some external memory. For a Markovian process, it outputs a causal model, namely the causal relations and the corresponding mechanisms, represented as quantum states and channels. Our algorithm opens the route to more general quantum causal discovery methods.

  7. The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set

    PubMed Central

    Milioli, Heloisa Helena; Vimieiro, Renato; Riveros, Carlos; Tishchenko, Inna; Berretta, Regina; Moscato, Pablo

    2015-01-01

    Background The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction. Methods and Findings The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method. Conclusions The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes. PMID:26132585

  8. The composition of muds from Columbus Marsh, Nevada

    USGS Publications Warehouse

    Hicks, W.B.

    1915-01-01

    The investigation of the dry lake of Columbus Marsh, in Nevada, which had for its economic motive the discovery of potash, was continued by the United States Geological Survey during the summer of 1913 under supervision of Hoyt S. Gale. The work done included the drilling of a shallow well near the old well 400 and the collection of a set of surface samples of muds from the marsh. This exploration, together with the chemical investigation of the samples thus collected, has furnished further data concerning the character of the mud flat and thrown additional light on the conditions there. The writer was associated with Mr. Gale during his study of this region and the field observations have recorded were made jointly and are results of mutual discussion. The accompanying map (fig. 1) is based on a plane-table survey made by Mr. Gale, and for this and other assistance the writer wishes to express due acknowledgment.

  9. Asymptotics of empirical eigenstructure for high dimensional spiked covariance.

    PubMed

    Wang, Weichen; Fan, Jianqing

    2017-06-01

    We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.

  10. Asymptotics of empirical eigenstructure for high dimensional spiked covariance

    PubMed Central

    Wang, Weichen

    2017-01-01

    We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size, and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size, and dimensionality play in principal component analysis. Our results are a natural extension of those in Paul (2007) to a more general setting and solve the rates of convergence problems in Shen et al. (2013). They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called shrinkage principal orthogonal complement thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks of large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies. PMID:28835726

  11. The status of measurement technologies concerning micrometer and submicrometer space articulate matter capture, recovery, velocity and trajectory

    NASA Technical Reports Server (NTRS)

    Alexander, W. M.; Tanner, William G.; Mcdonald, R. A.; Schaub, G. E.; Stephenson, Stepheni L.; Mcdonnell, J. A. M.; Maag, Carl R.

    1994-01-01

    The return of a pristine sample from a comet would lead to greater understanding of cometary structures, as well as offering insights into exobiology. The paper presented at the Discovery Program Workshop outlined a set of measurements for what was identified as a SOCCER-like interplanetary mission. Several experiments comprised the total instrumentation. This paper presents a summary of CCSR with an overview of three of the four major instruments. Details of the major dust dynamics experiment including trajectory are given in this paper. The instrument proposed here offers the opportunity for the return of cometary dust particles gathered in situ. The capture process has been employed aboard the space shuttle with successful results in returning samples to Earth for laboratory analysis. In addition, the sensors will measure the charge, mass, velocity, and size of cometary dust grains during the encounter. This data will help our understanding of dusty plasmas.

  12. Harry Stottlemier's Discovery [Revised Edition].

    ERIC Educational Resources Information Center

    Lipman, Matthew

    "Harry Stottlemeier's Discovery" is the student book for the project in philosophical thinking described in SO 008 123-126. It offers a model of dialogue -- both of children with one another and of children with adults. The story is set among a classroom of children who begin to understand the basics of logical reasoning when Harry, who isn't…

  13. A Strange Fish Indeed: The "Discovery" of a Living Fossil

    ERIC Educational Resources Information Center

    Grant, Robert H.

    2005-01-01

    Through a series of fictionalized diary entries based on Marjorie Courtenay-Latimer's own writings, this case recounts the "discovery" in South Africa in 1938 of a fish believed to be extinct for over 70 million years. The case was developed for use in an introductory freshman biology course. In this setting, it could be used as a…

  14. Infrared and Raman Spectroscopy: A Discovery-Based Activity for the General Chemistry Curriculum

    ERIC Educational Resources Information Center

    Borgsmiller, Karen L.; O'Connell, Dylan J.; Klauenberg, Kathryn M.; Wilson, Peter M.; Stromberg, Christopher J.

    2012-01-01

    A discovery-based method is described for incorporating the concepts of IR and Raman spectroscopy into the general chemistry curriculum. Students use three sets of springs to model the properties of single, double, and triple covalent bonds. Then, Gaussian 03W molecular modeling software is used to illustrate the relationship between bond…

  15. Optimizing the discovery organization for innovation.

    PubMed

    Sams-Dodd, Frank

    2005-08-01

    Strategic management is the process of adapting organizational structure and management principles to fit the strategic goal of the business unit. The pharmaceutical industry has generally been expert at optimizing its organizations for drug development, but has rarely implemented different structures for the early discovery process, where the objective is innovation and the transformation of innovation into drug projects. Here, a set of strategic management methods is proposed, covering team composition, organizational structure, management principles and portfolio management, which are designed to increase the level of innovation in the early drug discovery process.

  16. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tsai, Yingssu; Stanford University, 333 Campus Drive, Mudd Building, Stanford, CA 94305-5080; McPhillips, Scott E.

    New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data,more » performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference Fourier maps.« less

  17. Determining Definitions for Comparing Cardinalities

    ERIC Educational Resources Information Center

    Shipman, B. A.

    2012-01-01

    Through a series of six guided classroom discoveries, students create, via targeted questions, a definition for deciding when two sets have the same cardinality. The program begins by developing basic facts about cardinalities of finite sets. Extending two of these facts to infinite sets yields two statements on comparing infinite cardinalities…

  18. Let's get honest about sampling.

    PubMed

    Mobley, David L

    2012-01-01

    Molecular simulations see widespread and increasing use in computation and molecular design, especially within the area of molecular simulations applied to biomolecular binding and interactions, our focus here. However, force field accuracy remains a concern for many practitioners, and it is often not clear what level of accuracy is really needed for payoffs in a discovery setting. Here, I argue that despite limitations of today's force fields, current simulation tools and force fields now provide the potential for real benefits in a variety of applications. However, these same tools also provide irreproducible results which are often poorly interpreted. Continued progress in the field requires more honesty in assessment and care in evaluation of simulation results, especially with respect to convergence.

  19. DiffSplice: the genome-wide detection of differential splicing events with RNA-seq

    PubMed Central

    Hu, Yin; Huang, Yan; Du, Ying; Orellana, Christian F.; Singh, Darshan; Johnson, Amy R.; Monroy, Anaïs; Kuan, Pei-Fen; Hammond, Scott M.; Makowski, Liza; Randell, Scott H.; Chiang, Derek Y.; Hayes, D. Neil; Jones, Corbin; Liu, Yufeng; Prins, Jan F.; Liu, Jinze

    2013-01-01

    The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at http://www.netlab.uky.edu/p/bioinfo/DiffSplice. PMID:23155066

  20. Act-Frequency Signatures of the Big Five.

    PubMed

    Chapman, Benjamin P; Goldberg, Lewis R

    2017-10-01

    The traditional focus of work on personality and behavior has tended toward "major outcomes" such as health or antisocial behavior, or small sets of behaviors observable over short periods in laboratories or in convenience samples. In a community sample, we examined a wide set (400) of mundane, incidental or "every day" behavioral acts, the frequencies of which were reported over the past year. Using an exploratory methodology similar to genomic approaches (relying on the False Discovery Rate) revealed 26 prototypical acts for Intellect, 24 acts for Extraversion, 13 for Emotional Stability, nine for Conscientiousness, and six for Agreeableness. Many links were consistent with general intuition-for instance, low Conscientiousness with work and procrastination. Some of the most robust associations, however, were for acts too specific for a priori hypothesis. For instance, Extraversion was strongly associated with telling dirty jokes, Intellect with "loung[ing] around [the] house without clothes on", and Agreeableness with singing in the shower. Frequency categories for these acts changed with markedly non-linearity across Big Five Z-scores. Findings may help ground trait scores in emblematic acts, and enrich understanding of mundane or common behavioral signatures of the Big Five.

  1. Transfer Learning of Classification Rules for Biomarker Discovery and Verification from Molecular Profiling Studies

    PubMed Central

    Ganchev, Philip; Malehorn, David; Bigbee, William L.; Gopalakrishnan, Vanathi

    2013-01-01

    We present a novel framework for integrative biomarker discovery from related but separate data sets created in biomarker profiling studies. The framework takes prior knowledge in the form of interpretable, modular rules, and uses them during the learning of rules on a new data set. The framework consists of two methods of transfer of knowledge from source to target data: transfer of whole rules and transfer of rule structures. We evaluated the methods on three pairs of data sets: one genomic and two proteomic. We used standard measures of classification performance and three novel measures of amount of transfer. Preliminary evaluation shows that whole-rule transfer improves classification performance over using the target data alone, especially when there is more source data than target data. It also improves performance over using the union of the data sets. PMID:21571094

  2. Optimizing methods and dodging pitfalls in microbiome research.

    PubMed

    Kim, Dorothy; Hofstaedter, Casey E; Zhao, Chunyu; Mattei, Lisa; Tanes, Ceylan; Clarke, Erik; Lauder, Abigail; Sherrill-Mix, Scott; Chehoud, Christel; Kelsen, Judith; Conrad, Máire; Collman, Ronald G; Baldassano, Robert; Bushman, Frederic D; Bittinger, Kyle

    2017-05-05

    Research on the human microbiome has yielded numerous insights into health and disease, but also has resulted in a wealth of experimental artifacts. Here, we present suggestions for optimizing experimental design and avoiding known pitfalls, organized in the typical order in which studies are carried out. We first review best practices in experimental design and introduce common confounders such as age, diet, antibiotic use, pet ownership, longitudinal instability, and microbial sharing during cohousing in animal studies. Typically, samples will need to be stored, so we provide data on best practices for several sample types. We then discuss design and analysis of positive and negative controls, which should always be run with experimental samples. We introduce a convenient set of non-biological DNA sequences that can be useful as positive controls for high-volume analysis. Careful analysis of negative and positive controls is particularly important in studies of samples with low microbial biomass, where contamination can comprise most or all of a sample. Lastly, we summarize approaches to enhancing experimental robustness by careful control of multiple comparisons and to comparing discovery and validation cohorts. We hope the experimental tactics summarized here will help researchers in this exciting field advance their studies efficiently while avoiding errors.

  3. OPEN DATA FOR DISCOVERY SCIENCE.

    PubMed

    Payne, Philip R O; Huang, Kun; Shah, Nigam H; Tenenbaum, Jessica

    2017-01-01

    The modern healthcare and life sciences ecosystem is moving towards an increasingly open and data-centric approach to discovery science. This evolving paradigm is predicated on a complex set of information needs related to our collective ability to share, discover, reuse, integrate, and analyze open biological, clinical, and population level data resources of varying composition, granularity, and syntactic or semantic consistency. Such an evolution is further impacted by a concomitant growth in the size of data sets that can and should be employed for both hypothesis discovery and testing. When such open data can be accessed and employed for discovery purposes, a broad spectrum of high impact end-points is made possible. These span the spectrum from identification of de novo biomarker complexes that can inform precision medicine, to the repositioning or repurposing of extant agents for new and cost-effective therapies, to the assessment of population level influences on disease and wellness. Of note, these types of uses of open data can be either primary, wherein open data is the substantive basis for inquiry, or secondary, wherein open data is used to augment or enrich project-specific or proprietary data that is not open in and of itself. This workshop is concerned with the key challenges, opportunities, and methodological best practices whereby open data can be used to drive the advancement of discovery science in all of the aforementioned capacities.

  4. Machine learning of molecular properties: Locality and active learning

    NASA Astrophysics Data System (ADS)

    Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

    2018-06-01

    In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

  5. MSA Bladder Reference Set Application: Charles Rosser-Hawaii (2014) — EDRN Public Portal

    Cancer.gov

    The goal of this proposal is straightforward. We wish to assay in a discovery set, reference set from EDRN, both PAI-1 and ANG promoters and genes for mutations. Then the results will be confirmed in a test cohort comprised of DNA extracted from fresh frozen tissue (n = 80 BCa patients). DNA from matching buffy coat from these 80 patients will serve as control. Extracted RNA can be assessed for difference in transcription. Furthermore, matched voided urine samples from these 80 patients are available to assess protein levels of PAI-1 and ANG by ELISA in addition to assessing activity of PAI-1 and ANG. At the end, we will link any genetic alteration with changes in RNA, protein and protein activity level as well as clinical features (e.g., age, race, tobacco history, grade, stage and outcomes). This comprehensive study will allow us with certainty to state if there are mutations in the promoters and genes of PAI-1 and ANG that are functional and thus may lead to the growth advantage that we previously demonstrated in our experiments.

  6. Unraveling signatures of biogeochemical processes and the depositional setting in the molecular composition of pore water DOM across different marine environments

    NASA Astrophysics Data System (ADS)

    Schmidt, Frauke; Koch, Boris P.; Goldhammer, Tobias; Elvert, Marcus; Witt, Matthias; Lin, Yu-Shih; Wendt, Jenny; Zabel, Matthias; Heuer, Verena B.; Hinrichs, Kai-Uwe

    2017-06-01

    Dissolved organic matter (DOM) in marine sediment pore waters derives largely from decomposition of particulate organic matter and its composition is influenced by various biogeochemical and oceanographic processes in yet undetermined ways. Here, we determine the molecular inventory of pore water DOM in marine sediments of contrasting depositional regimes with ultrahigh-resolution mass spectrometry and complementary bulk chemical analyses in order to elucidate the factors that shape DOM composition. Our sample sets from the Mediterranean, Marmara and Black Seas covered different sediment depths, ages and a range of marine environments with different (i) organic matter sources, (ii) balances of organic matter production and preservation, and (iii) geochemical conditions in sediment and water column including anoxic, sulfidic and hypersaline conditions. Pore water DOM had a higher molecular formula richness than overlying water with up to 11,295 vs. 2114 different molecular formulas in the mass range of 299-600 Da and covered a broader range of element ratios (H/C = 0.35-2.19, O/C = 0.03-1.19 vs. H/C = 0.56-2.13, O/C = 0.15-1.14). Formula richness was independent of concentrations of DOC and TOC. Near-surface pore water DOM was more similar to water column DOM than to deep pore water DOM from the same core with respect to formula richness and the molecular composition, suggesting exchange at the sediment-water interface. The DOM composition in the deeper sediments was controlled by organic matter source, selective decomposition of specific DOM fractions and early diagenetic molecule transformations. Compounds in pelagic sediment pore waters were predominantly highly unsaturated and N-bearing formulas, whereas oxygen-rich CHO-formulas and aromatic compounds were more abundant in pore water DOM from terrigenous sediments. The increase of S-bearing molecular formulas in the water column and pore waters of the Black Sea and the Mediterranean Discovery Basin was consistent with elevated HS- concentrations reflecting the incorporation of sulfur into biomolecules during early diagenesis. Sulfurization resulted in an increased average molecular mass of DOM and higher formula richness (up to 5899 formulas per sample). In sediments from the methanogenic zone in the Black Sea, the DOM pool was distinctly more reduced than overlying sediments from the sulfate-reducing zone. Bottom and pore water DOM from the Discovery Basin contained the highest abundances of aliphatic compounds in the entire dataset; a large fraction of abundant N-bearing formulas possibly represented peptide and nucleotide formulas suggesting preservation of these molecules in the life inhibiting environment of the Discovery Basin. Our unique data set provides the basis for a comprehensive understanding of the molecular signatures in pore water DOM and the turnover of sedimentary organic matter in marine sediments.

  7. Constructing a Graph Database for Semantic Literature-Based Discovery.

    PubMed

    Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C

    2015-01-01

    Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.

  8. ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry

    PubMed Central

    2011-01-01

    Background Since its inception, proteomics has essentially operated in a discovery mode with the goal of identifying and quantifying the maximal number of proteins in a sample. Increasingly, proteomic measurements are also supporting hypothesis-driven studies, in which a predetermined set of proteins is consistently detected and quantified in multiple samples. Selected reaction monitoring (SRM) is a targeted mass spectrometric technique that supports the detection and quantification of specific proteins in complex samples at high sensitivity and reproducibility. Here, we describe ATAQS, an integrated software platform that supports all stages of targeted, SRM-based proteomics experiments including target selection, transition optimization and post acquisition data analysis. This software will significantly facilitate the use of targeted proteomic techniques and contribute to the generation of highly sensitive, reproducible and complete datasets that are particularly critical for the discovery and validation of targets in hypothesis-driven studies in systems biology. Result We introduce a new open source software pipeline, ATAQS (Automated and Targeted Analysis with Quantitative SRM), which consists of a number of modules that collectively support the SRM assay development workflow for targeted proteomic experiments (project management and generation of protein, peptide and transitions and the validation of peptide detection by SRM). ATAQS provides a flexible pipeline for end-users by allowing the workflow to start or end at any point of the pipeline, and for computational biologists, by enabling the easy extension of java algorithm classes for their own algorithm plug-in or connection via an external web site. This integrated system supports all steps in a SRM-based experiment and provides a user-friendly GUI that can be run by any operating system that allows the installation of the Mozilla Firefox web browser. Conclusions Targeted proteomics via SRM is a powerful new technique that enables the reproducible and accurate identification and quantification of sets of proteins of interest. ATAQS is the first open-source software that supports all steps of the targeted proteomics workflow. ATAQS also provides software API (Application Program Interface) documentation that enables the addition of new algorithms to each of the workflow steps. The software, installation guide and sample dataset can be found in http://tools.proteomecenter.org/ATAQS/ATAQS.html PMID:21414234

  9. A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories

    PubMed Central

    2014-01-01

    Background In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. Results We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. Conclusions The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods. PMID:24731138

  10. Bioenergy Knowledge Discovery Framework Fact Sheet

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    The Bioenergy Knowledge Discovery Framework (KDF) supports the development of a sustainable bioenergy industry by providing access to a variety of data sets, publications, and collaboration and mapping tools that support bioenergy research, analysis, and decision making. In the KDF, users can search for information, contribute data, and use the tools and map interface to synthesize, analyze, and visualize information in a spatially integrated manner.

  11. The discovery of 9/8-ribbons, β/γ-peptides with curved shapes governed by a combined configuration-conformation code.

    PubMed

    Grison, Claire M; Robin, Sylvie; Aitken, David J

    2015-11-21

    The de novo design of a β/γ-peptidic foldamer motif has led to the discovery of an unprecedented 9/8-ribbon featuring an uninterrupted alternating C9/C8 hydrogen-bonding network. The ribbons adopt partially curved topologies determined synchronistically by the β-residue configuration and the γ-residue conformation sets.

  12. Early Probe and Drug Discovery in Academia: A Minireview.

    PubMed

    Roy, Anuradha

    2018-02-09

    Drug discovery encompasses processes ranging from target selection and validation to the selection of a development candidate. While comprehensive drug discovery work flows are implemented predominantly in the big pharma domain, early discovery focus in academia serves to identify probe molecules that can serve as tools to study targets or pathways. Despite differences in the ultimate goals of the private and academic sectors, the same basic principles define the best practices in early discovery research. A successful early discovery program is built on strong target definition and validation using a diverse set of biochemical and cell-based assays with functional relevance to the biological system being studied. The chemicals identified as hits undergo extensive scaffold optimization and are characterized for their target specificity and off-target effects in in vitro and in animal models. While the active compounds from screening campaigns pass through highly stringent chemical and Absorption, Distribution, Metabolism, and Excretion (ADME) filters for lead identification, the probe discovery involves limited medicinal chemistry optimization. The goal of probe discovery is identification of a compound with sub-µM activity and reasonable selectivity in the context of the target being studied. The compounds identified from probe discovery can also serve as starting scaffolds for lead optimization studies.

  13. System-level multi-target drug discovery from natural products with applications to cardiovascular diseases.

    PubMed

    Zheng, Chunli; Wang, Jinan; Liu, Jianling; Pei, Mengjie; Huang, Chao; Wang, Yonghua

    2014-08-01

    The term systems pharmacology describes a field of study that uses computational and experimental approaches to broaden the view of drug actions rooted in molecular interactions and advance the process of drug discovery. The aim of this work is to stick out the role that the systems pharmacology plays across the multi-target drug discovery from natural products for cardiovascular diseases (CVDs). Firstly, based on network pharmacology methods, we reconstructed the drug-target and target-target networks to determine the putative protein target set of multi-target drugs for CVDs treatment. Secondly, we reintegrated a compound dataset of natural products and then obtained a multi-target compounds subset by virtual-screening process. Thirdly, a drug-likeness evaluation was applied to find the ADME-favorable compounds in this subset. Finally, we conducted in vitro experiments to evaluate the reliability of the selected chemicals and targets. We found that four of the five randomly selected natural molecules can effectively act on the target set for CVDs, indicating the reasonability of our systems-based method. This strategy may serve as a new model for multi-target drug discovery of complex diseases.

  14. A Sensitive Assay for Virus Discovery in Respiratory Clinical Samples

    PubMed Central

    de Vries, Michel; Deijs, Martin; Canuti, Marta; van Schaik, Barbera D. C.; Faria, Nuno R.; van de Garde, Martijn D. B.; Jachimowski, Loes C. M.; Jebbink, Maarten F.; Jakobs, Marja; Luyf, Angela C. M.; Coenjaerts, Frank E. J.; Claas, Eric C. J.; Molenkamp, Richard; Koekkoek, Sylvie M.; Lammens, Christine; Leus, Frank; Goossens, Herman; Ieven, Margareta; Baas, Frank; van der Hoek, Lia

    2011-01-01

    In 5–40% of respiratory infections in children, the diagnostics remain negative, suggesting that the patients might be infected with a yet unknown pathogen. Virus discovery cDNA-AFLP (VIDISCA) is a virus discovery method based on recognition of restriction enzyme cleavage sites, ligation of adaptors and subsequent amplification by PCR. However, direct discovery of unknown pathogens in nasopharyngeal swabs is difficult due to the high concentration of ribosomal RNA (rRNA) that acts as competitor. In the current study we optimized VIDISCA by adjusting the reverse transcription enzymes and decreasing rRNA amplification in the reverse transcription, using hexamer oligonucleotides that do not anneal to rRNA. Residual cDNA synthesis on rRNA templates was further reduced with oligonucleotides that anneal to rRNA but can not be extended due to 3′-dideoxy-C6-modification. With these modifications >90% reduction of rRNA amplification was established. Further improvement of the VIDISCA sensitivity was obtained by high throughput sequencing (VIDISCA-454). Eighteen nasopharyngeal swabs were analysed, all containing known respiratory viruses. We could identify the proper virus in the majority of samples tested (11/18). The median load in the VIDISCA-454 positive samples was 7.2 E5 viral genome copies/ml (ranging from 1.4 E3–7.7 E6). Our results show that optimization of VIDISCA and subsequent high-throughput-sequencing enhances sensitivity drastically and provides the opportunity to perform virus discovery directly in patient material. PMID:21283679

  15. High-throughput discovery of rare human nucleotide polymorphisms by Ecotilling

    PubMed Central

    Till, Bradley J.; Zerr, Troy; Bowers, Elisabeth; Greene, Elizabeth A.; Comai, Luca; Henikoff, Steven

    2006-01-01

    Human individuals differ from one another at only ∼0.1% of nucleotide positions, but these single nucleotide differences account for most heritable phenotypic variation. Large-scale efforts to discover and genotype human variation have been limited to common polymorphisms. However, these efforts overlook rare nucleotide changes that may contribute to phenotypic diversity and genetic disorders, including cancer. Thus, there is an increasing need for high-throughput methods to robustly detect rare nucleotide differences. Toward this end, we have adapted the mismatch discovery method known as Ecotilling for the discovery of human single nucleotide polymorphisms. To increase throughput and reduce costs, we developed a universal primer strategy and implemented algorithms for automated band detection. Ecotilling was validated by screening 90 human DNA samples for nucleotide changes in 5 gene targets and by comparing results to public resequencing data. To increase throughput for discovery of rare alleles, we pooled samples 8-fold and found Ecotilling to be efficient relative to resequencing, with a false negative rate of 5% and a false discovery rate of 4%. We identified 28 new rare alleles, including some that are predicted to damage protein function. The detection of rare damaging mutations has implications for models of human disease. PMID:16893952

  16. Observed oil and gas field size distributions: A consequence of the discovery process and prices of oil and gas

    USGS Publications Warehouse

    Drew, L.J.; Attanasi, E.D.; Schuenemeyer, J.H.

    1988-01-01

    If observed oil and gas field size distributions are obtained by random samplings, the fitted distributions should approximate that of the parent population of oil and gas fields. However, empirical evidence strongly suggests that larger fields tend to be discovered earlier in the discovery process than they would be by random sampling. Economic factors also can limit the number of small fields that are developed and reported. This paper examines observed size distributions in state and federal waters of offshore Texas. Results of the analysis demonstrate how the shape of the observable size distributions change with significant hydrocarbon price changes. Comparison of state and federal observed size distributions in the offshore area shows how production cost differences also affect the shape of the observed size distribution. Methods for modifying the discovery rate estimation procedures when economic factors significantly affect the discovery sequence are presented. A primary conclusion of the analysis is that, because hydrocarbon price changes can significantly affect the observed discovery size distribution, one should not be confident about inferring the form and specific parameters of the parent field size distribution from the observed distributions. ?? 1988 International Association for Mathematical Geology.

  17. Apparently low reproducibility of true differential expression discoveries in microarray studies.

    PubMed

    Zhang, Min; Yao, Chen; Guo, Zheng; Zou, Jinfeng; Zhang, Lin; Xiao, Hui; Wang, Dong; Yang, Da; Gong, Xue; Zhu, Jing; Li, Yanhui; Li, Xia

    2008-09-15

    Differentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries. Based on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes. Supplementaty information: Supplementary data are available at Bioinformatics online.

  18. NGSmethDB 2017: enhanced methylomes and differential methylation.

    PubMed

    Lebrón, Ricardo; Gómez-Martín, Cristina; Carpena, Pedro; Bernaola-Galván, Pedro; Barturen, Guillermo; Hackenberg, Michael; Oliver, José L

    2017-01-04

    The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methylation epigenetic biomarkers: (i) differentially methylated single-cytosines; and (ii) methylation segments (i.e. genome regions of homogeneous methylation). The NGSmethDB back-end is now based on MongoDB, a NoSQL hierarchical database using JSON-formatted documents and dynamic schemas, thus accelerating sample comparative analyses. Besides conventional database dumps, track hubs were implemented, which improved database access, visualization in genome browsers and comparative analyses to third-part annotations. In addition, the database can be also accessed through a RESTful API. Lastly, a Python client and a multiplatform virtual machine allow for program-driven access from user desktop. This way, private methylation data can be compared to NGSmethDB without the need to upload them to public servers. Database website: http://bioinfo2.ugr.es/NGSmethDB. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. A prediction model of drug-induced ototoxicity developed by an optimal support vector machine (SVM) method.

    PubMed

    Zhou, Shu; Li, Guo-Bo; Huang, Lu-Yi; Xie, Huan-Zhang; Zhao, Ying-Lan; Chen, Yu-Zong; Li, Lin-Li; Yang, Sheng-Yong

    2014-08-01

    Drug-induced ototoxicity, as a toxic side effect, is an important issue needed to be considered in drug discovery. Nevertheless, current experimental methods used to evaluate drug-induced ototoxicity are often time-consuming and expensive, indicating that they are not suitable for a large-scale evaluation of drug-induced ototoxicity in the early stage of drug discovery. We thus, in this investigation, established an effective computational prediction model of drug-induced ototoxicity using an optimal support vector machine (SVM) method, GA-CG-SVM. Three GA-CG-SVM models were developed based on three training sets containing agents bearing different risk levels of drug-induced ototoxicity. For comparison, models based on naïve Bayesian (NB) and recursive partitioning (RP) methods were also used on the same training sets. Among all the prediction models, the GA-CG-SVM model II showed the best performance, which offered prediction accuracies of 85.33% and 83.05% for two independent test sets, respectively. Overall, the good performance of the GA-CG-SVM model II indicates that it could be used for the prediction of drug-induced ototoxicity in the early stage of drug discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Towards microfluidic technology-based MALDI-MS platforms for drug discovery: a review.

    PubMed

    Winkle, Richard F; Nagy, Judit M; Cass, Anthony Eg; Sharma, Sanjiv

    2008-11-01

    Microfluidic methods have found applications in various disciplines. It has been predicted that the microfluidic technology would be useful in performing routine steps in drug discovery ranging from target identification to lead optimisation in which the number of compounds evaluated in this regard determines the success of combinatorial screening. The sheer size of the parameter space that can be explored often poses an enormous challenge. We set out to find how close we are towards the use of integrated matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-MS) microfluidic systems for drug discovery. In this article we review the latest applications of microfluidic technology in the area of MALDI-MS and drug discovery. Our literature survey revealed microfluidic technologies-based approaches for various stages of drug discovery; however, they are in still in developmental stages. Furthermore, we speculate on how these technologies could be used in the future.

  1. International Search for Life in Ocean Worlds

    NASA Astrophysics Data System (ADS)

    Sherwood, B.

    2015-12-01

    We now know that our solar system contains diverse "ocean worlds." One has abundant surface water and life; another had significant surface water in the distant past and has drawn significant exploration attention; several contain large amounts of water beneath ice shells; and several others evince unexpected, diverse transient or dynamic water-related processes. In this century, humanity will explore these worlds, searching for life beyond Earth and seeking thereby to understand the limits of habitability. Of our ocean worlds, Enceladus presents a unique combination of attributes: large reservoir of subsurface water already known to contain salts, organics, and silica nanoparticles originating from hydrothermal activity; and able to be sampled via a plume predictably expressed into space. These special circumstances immediately tag Enceladus as a key destination for potential missions to search for evidence of non-Earth life, and lead to a range of potential mission concepts: for orbital reconnaissance; in situ and returned-sample analysis of plume and surface-fallback material; and direct sulcus, vent, cavern, and ocean exploration. Each mission type can address a unique set of science questions, and would require a unique set of capabilities, most of which are not yet developed. Both the questions and the capability developments can be sequenced into a programmatic precedence network, the realization of which requires international cooperation. Three factors make this true: exploring remote oceans autonomously will cost a lot; the Outer Space Treaty governs planetary protection; and discovery of non-Earth life is an epochal human imperative. Results of current planning will be presented in AGU session 8599: how ocean-world science questions and capability requirements can be parsed into programmatically acceptable mission increments; how one mission proposed into the Discovery program in 2015 would take the next step on this path; the Decadal calendar of decision points and program options that will constrain ocean-world exploration through mid-century; and findings of the COSPAR Planetary Protection Panel colloquium for ocean-world exploration held in September 2015.

  2. Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

    PubMed

    Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

    2013-03-15

    The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.

  3. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets.

    PubMed

    Korotcov, Alexandru; Tkachenko, Valery; Russo, Daniel P; Ekins, Sean

    2017-12-04

    Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further using multiple metrics with much larger scale comparisons, prospective testing as well as assessment of different fingerprints and DNN architectures beyond those used.

  4. cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

    PubMed Central

    Klambauer, Günter; Schwarzbauer, Karin; Mayr, Andreas; Clevert, Djork-Arné; Mitterecker, Andreas; Bodenhofer, Ulrich; Hochreiter, Sepp

    2012-01-01

    Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor. PMID:22302147

  5. cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

    PubMed

    Klambauer, Günter; Schwarzbauer, Karin; Mayr, Andreas; Clevert, Djork-Arné; Mitterecker, Andreas; Bodenhofer, Ulrich; Hochreiter, Sepp

    2012-05-01

    Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.

  6. Metabolic Signature of Electrosurgical Liver Dissection

    PubMed Central

    von Schönfels, Witigo; von Kampen, Oliver; Patsenker, Eleonora; Stickel, Felix; Schniewind, Bodo; Hinz, Sebastian; Ahrens, Markus; Balschun, Katharina; Egberts, Jan-Hendrik; Richter, Klaus; Landrock, Andreas; Sipos, Bence; Will, Olga; Huebbe, Patrizia; Schreiber, Stefan; Nothnagel, Michael; Röcken, Christoph; Rimbach, Gerald; Becker, Thomas

    2013-01-01

    Background and Aims High frequency electrosurgery has a key role in the broadening application of liver surgery. Its molecular signature, i.e. the metabolites evolving from electrocauterization which may inhibit hepatic wound healing, have not been systematically studied. Methods Human liver samples were thus obtained during surgery before and after electrosurgical dissection and subjected to a two-stage metabolomic screening experiment (discovery sample: N = 18, replication sample: N = 20) using gas chromatography/mass spectrometry. Results In a set of 208 chemically defined metabolites, electrosurgical dissection lead to a distinct metabolic signature resulting in a separation in the first two dimensions of a principal components analysis. Six metabolites including glycolic acid, azelaic acid, 2-n-pentylfuran, dihydroactinidiolide, 2-butenal and n-pentanal were consistently increased after electrosurgery meeting the discovery (p<2.0×10−4) and the replication thresholds (p<3.5×10−3). Azelaic acid, a lipid peroxidation product from the fragmentation of abundant sn-2 linoleoyl residues, was most abundant and increased 8.1-fold after electrosurgical liver dissection (preplication = 1.6×10−4). The corresponding phospholipid hexadecyl azelaoyl glycerophosphocholine inhibited wound healing and tissue remodelling in scratch- and proliferation assays of hepatic stellate cells and cholangiocytes, and caused apoptosis dose-dependently in vitro, which may explain in part the tissue damage due to electrosurgery. Conclusion Hepatic electrosurgery generates a metabolic signature with characteristic lipid peroxidation products. Among these, azelaic acid shows a dose-dependent toxicity in liver cells and inhibits wound healing. These observations potentially pave the way for pharmacological intervention prior liver surgery to modify the metabolic response and prevent postoperative complications. PMID:24058442

  7. Phenolic and microbial-targeted metabolomics to discovering and evaluating wine intake biomarkers in human urine and plasma.

    PubMed

    Urpi-Sarda, Mireia; Boto-Ordóñez, María; Queipo-Ortuño, María Isabel; Tulipani, Sara; Corella, Dolores; Estruch, Ramon; Tinahones, Francisco J; Andres-Lacueva, Cristina

    2015-09-01

    The discovery of biomarkers of intake in nutritional epidemiological studies is essential in establishing an association between dietary intake (considering their bioavailability) and diet-related risk factors for diseases. The aim is to study urine and plasma phenolic and microbial profile by targeted metabolomics approach in a wine intervention clinical trial for discovering and evaluating food intake biomarkers. High-risk male volunteers (n = 36) were included in a randomized, crossover intervention clinical trial. After a washout period, subjects received red wine or gin, or dealcoholized red wine over four weeks. Fasting plasma and 24-h urine were collected at baseline and after each intervention period. A targeted metabolomic analysis of 70 host and microbial phenolic metabolites was performed using ultra performance liquid chromatography-tandem mass spectrometer (UPLC-MS/MS). Metabolites were subjected to stepwise logistic regression to establish prediction models and received operation curves were performed to evaluate biomarkers. Prediction models based mainly on gallic acid metabolites, obtained sensitivity, specificity and area under the curve (AUC) for the training and validation sets of between 91 and 98% for urine and between 74 and 91% for plasma. Resveratrol, ethylgallate and gallic acid metabolite groups in urine samples also resulted in being good predictors of wine intake (AUC>87%). However, lower values for metabolites were obtained in plasma samples. The highest correlations between fasting plasma and urine were obtained for the prediction model score (r = 0.6, P<0.001), followed by gallic acid metabolites (r = 0.5-0.6, P<0.001). This study provides new insights into the discovery of food biomarkers in different biological samples. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Genome-wide Studies of Verbal Declarative Memory in Nondemented Older People: The Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium

    PubMed Central

    Debette, Stéphanie; Ibrahim Verbaas, Carla A.; Bressler, Jan; Schuur, Maaike; Smith, Albert; Bis, Joshua C.; Davies, Gail; Wolf, Christiane; Gudnason, Vilmundur; Chibnik, Lori B.; Yang, Qiong; deStefano, Anita L.; de Quervain, Dominique J.F.; Srikanth, Velandai; Lahti, Jari; Grabe, Hans J.; Smith, Jennifer A.; Priebe, Lutz; Yu, Lei; Karbalai, Nazanin; Hayward, Caroline; Wilson, James F.; Campbell, Harry; Petrovic, Katja; Fornage, Myriam; Chauhan, Ganesh; Yeo, Robin; Boxall, Ruth; Becker, James; Stegle, Oliver; Mather, Karen A.; Chouraki, Vincent; Sun, Qi; Rose, Lynda M.; Resnick, Susan; Oldmeadow, Christopher; Kirin, Mirna; Wright, Alan F.; Jonsdottir, Maria K.; Au, Rhoda; Becker, Albert; Amin, Najaf; Nalls, Mike A.; Turner, Stephen T.; Kardia, Sharon L.R.; Oostra, Ben; Windham, Gwen; Coker, Laura H.; Zhao, Wei; Knopman, David S.; Heiss, Gerardo; Griswold, Michael E.; Gottesman, Rebecca F.; Vitart, Veronique; Hastie, Nicholas D.; Zgaga, Lina; Rudan, Igor; Polasek, Ozren; Holliday, Elizabeth G.; Schofield, Peter; Choi, Seung Hoan; Tanaka, Toshiko; An, Yang; Perry, Rodney T.; Kennedy, Richard E.; Sale, Michèle M.; Wang, Jing; Wadley, Virginia G.; Liewald, David C.; Ridker, Paul M.; Gow, Alan J.; Pattie, Alison; Starr, John M.; Porteous, David; Liu, Xuan; Thomson, Russell; Armstrong, Nicola J.; Eiriksdottir, Gudny; Assareh, Arezoo A.; Kochan, Nicole A.; Widen, Elisabeth; Palotie, Aarno; Hsieh, Yi-Chen; Eriksson, Johan G.; Vogler, Christian; van Swieten, John C.; Shulman, Joshua M.; Beiser, Alexa; Rotter, Jerome; Schmidt, Carsten O.; Hoffmann, Wolfgang; Nöthen, Markus M.; Ferrucci, Luigi; Attia, John; Uitterlinden, Andre G.; Amouyel, Philippe; Dartigues, Jean-François; Amieva, Hélène; Räikkönen, Katri; Garcia, Melissa; Wolf, Philip A.; Hofman, Albert; Longstreth, W.T.; Psaty, Bruce M.; Boerwinkle, Eric; DeJager, Philip L.; Sachdev, Perminder S.; Schmidt, Reinhold; Breteler, Monique M.B.; Teumer, Alexander; Lopez, Oscar L.; Cichon, Sven; Chasman, Daniel I.; Grodstein, Francine; Müller-Myhsok, Bertram; Tzourio, Christophe; Papassotiropoulos, Andreas; Bennett, David A.; Ikram, Arfan M.; Deary, Ian J.; van Duijn, Cornelia M.; Launer, Lenore; Fitzpatrick, Annette L.; Seshadri, Sudha; Mosley, Thomas H.

    2015-01-01

    BACKGROUND Memory performance in older persons can reflect genetic influences on cognitive function and dementing processes. We aimed to identify genetic contributions to verbal declarative memory in a community setting. METHODS We conducted genome-wide association studies for paragraph or word list delayed recall in 19 cohorts from the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium, comprising 29,076 dementia-and stroke-free individuals of European descent, aged ≥45 years. Replication of suggestive associations (p < 5 × 10−6) was sought in 10,617 participants of European descent, 3811 African-Americans, and 1561 young adults. RESULTS rs4420638, near APOE, was associated with poorer delayed recall performance in discovery (p = 5.57 × 10−10) and replication cohorts (p = 5.65 × 10−8). This association was stronger for paragraph than word list delayed recall and in the oldest persons. Two associations with specific tests, in subsets of the total sample, reached genome-wide significance in combined analyses of discovery and replication (rs11074779 [HS3ST4], p = 3.11 × 10−8, and rs6813517 [SPOCK3], p = 2.58 × 10−8) near genes involved in immune response. A genetic score combining 58 independent suggestive memory risk variants was associated with increasing Alzheimer disease pathology in 725 autopsy samples. Association of memory risk loci with gene expression in 138 human hippocampus samples showed cis-associations with WDR48 and CLDN5, both related to ubiquitin metabolism. CONCLUSIONS This largest study to date exploring the genetics of memory function in ~ 40,000 older individuals revealed genome-wide associations and suggested an involvement of immune and ubiquitin pathways. PMID:25648963

  9. Future Mission Proposal Opportunities: Discovery, New Frontiers, and Project Prometheus

    NASA Technical Reports Server (NTRS)

    Niebur, S. M.; Morgan, T. H.; Niebur, C. S.

    2003-01-01

    The NASA Office of Space Science is expanding opportunities to propose missions to comets, asteroids, and other solar system targets. The Discovery Program continues to be popular, with two sample return missions, Stardust and Genesis, currently in operation. The New Frontiers Program, a new proposal opportunity modeled on the successful Discovery Program, begins this year with the release of its first Announcement of Opportunity. Project Prometheus, a program to develop nuclear electric power and propulsion technology intended to enable a new class of high-power, high-capability investigations, is a third opportunity to propose solar system exploration. All three classes of mission include a commitment to provide data to the Planetary Data System, any samples to the NASA Curatorial Facility at Johnson Space Center, and programs for education and public outreach.

  10. Experimental Design in Clinical 'Omics Biomarker Discovery.

    PubMed

    Forshed, Jenny

    2017-11-03

    This tutorial highlights some issues in the experimental design of clinical 'omics biomarker discovery, how to avoid bias and get as true quantities as possible from biochemical analyses, and how to select samples to improve the chance of answering the clinical question at issue. This includes the importance of defining clinical aim and end point, knowing the variability in the results, randomization of samples, sample size, statistical power, and how to avoid confounding factors by including clinical data in the sample selection, that is, how to avoid unpleasant surprises at the point of statistical analysis. The aim of this Tutorial is to help translational clinical and preclinical biomarker candidate research and to improve the validity and potential of future biomarker candidate findings.

  11. Metabolomic Profile of Ards by Nuclear Magnetic Resonance Spectroscopy in Patients with H1N1 Influenza Virus Pneumonia.

    PubMed

    Izquierdo-Garcia, Jose L; Nin, Nicolas; Jimenez-Clemente, Jorge; Horcajada, Juan P; Arenas-Miras, Maria Del Mar; Gea, Joaquim; Esteban, Andres; Ruiz-Cabello, Jesus; Lorente, Jose A

    2017-12-29

    The integrated analysis of changes in the metabolic profile could be critical for the discovery of biomarkers of lung injury, and also for generating new pathophysiological hypotheses and designing novel therapeutic targets for the acute respiratory distress syndrome (ARDS). This study aimed at developing a Nuclear Magnetic Resonance (NMR)-based approach for the identification of the metabolomic profile of ARDS in patients with H1N1 influenza virus pneumonia. Serum samples from 30 patients (derivation set) diagnosed of H1N1 influenza virus pneumonia were analysed by unsupervised Principal Component Analysis (PCA) to identify metabolic differences between patients with and without ARDS by NMR-spectroscopy. A predictive model of partial least squares discriminant analysis (PLS-DA) was developed for the identification of ARDS. PLS-DA was trained with the derivation set and tested in another set of samples from 26 patients also diagnosed of H1N1 influenza virus pneumonia (validation set). Decreased serum glucose, alanine, glutamine, methylhistidine and fatty acids concentrations, and elevated serum phenylalanine and methylguanidine concentrations, discriminated patients with ARDS versus patients without ARDS. PLS-DA model successfully identified the presence of ARDS in the validation set with a success rate of 92% (sensitivity 100% and specificity 91%). The classification functions showed a good correlation with the Sequential Organ Failure Assessment (SOFA) score (R = 0.74, p < 0.0001) and the Pa02/Fi02 ratio (R = 0.41, p = 0.03). The serum metabolomic profile is sensitive and specific to identify ARDS in patients with H1N1 influenza A pneumonia. Future studies are needed to determine the role of NMR-spectroscopy as a biomarker of ARDS.

  12. Synthesis of (±)-amathaspiramide F and discovery of an unusual stereocontrolling element for the [2,3]-Stevens rearrangement.

    PubMed

    Soheili, Arash; Tambar, Uttam K

    2013-10-04

    A formal total synthesis of (±)-amathaspiramide F through a tandem palladium-catalyzed allylic amination/[2,3]-Stevens rearrangement is reported. The unexpected diastereoselectivity of the [2,3]-Stevens rearrangement was controlled by the substitution patterns of an aromatic ring. This discovery represents a new stereocontrolling element for [2,3]-sigmatropic rearrangements in complex molecular settings.

  13. Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection.

    PubMed

    Kacmarczyk, Thadeous J; Bourque, Caitlin; Zhang, Xihui; Jiang, Yanwen; Houvras, Yariv; Alonso, Alicia; Betel, Doron

    2015-01-01

    Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal.

  14. Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection

    PubMed Central

    Kacmarczyk, Thadeous J.; Bourque, Caitlin; Zhang, Xihui; Jiang, Yanwen; Houvras, Yariv; Alonso, Alicia; Betel, Doron

    2015-01-01

    Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal. PMID:26066343

  15. How Will We React to the Discovery of Extraterrestrial Life?

    PubMed

    Kwon, Jung Yul; Bercovici, Hannah L; Cunningham, Katja; Varnum, Michael E W

    2017-01-01

    How will humanity react to the discovery of extraterrestrial life? Speculation on this topic abounds, but empirical research is practically non-existent. We report the results of three empirical studies assessing psychological reactions to the discovery of extraterrestrial life using the Linguistic Inquiry and Word Count (LIWC) text analysis software. We examined language use in media coverage of past discovery announcements of this nature, with a focus on extraterrestrial microbial life (Pilot Study). A large online sample ( N = 501) was asked to write about their own and humanity's reaction to a hypothetical announcement of such a discovery (Study 1), and an independent, large online sample ( N = 256) was asked to read and respond to a newspaper story about the claim that fossilized extraterrestrial microbial life had been found in a meteorite of Martian origin (Study 2). Across these studies, we found that reactions were significantly more positive than negative, and more reward vs. risk oriented. A mini-meta-analysis revealed large overall effect sizes (positive vs. negative affect language: g = 0.98; reward vs. risk language: g = 0.81). We also found that people's forecasts of their own reactions showed a greater positivity bias than their forecasts of humanity's reactions (Study 1), and that responses to reading an actual announcement of the discovery of extraterrestrial microbial life showed a greater positivity bias than responses to reading an actual announcement of the creation of man-made synthetic life (Study 2). Taken together, this work suggests that our reactions to a future confirmed discovery of microbial extraterrestrial life are likely to be fairly positive.

  16. How Will We React to the Discovery of Extraterrestrial Life?

    PubMed Central

    Kwon, Jung Yul; Bercovici, Hannah L.; Cunningham, Katja; Varnum, Michael E. W.

    2018-01-01

    How will humanity react to the discovery of extraterrestrial life? Speculation on this topic abounds, but empirical research is practically non-existent. We report the results of three empirical studies assessing psychological reactions to the discovery of extraterrestrial life using the Linguistic Inquiry and Word Count (LIWC) text analysis software. We examined language use in media coverage of past discovery announcements of this nature, with a focus on extraterrestrial microbial life (Pilot Study). A large online sample (N = 501) was asked to write about their own and humanity’s reaction to a hypothetical announcement of such a discovery (Study 1), and an independent, large online sample (N = 256) was asked to read and respond to a newspaper story about the claim that fossilized extraterrestrial microbial life had been found in a meteorite of Martian origin (Study 2). Across these studies, we found that reactions were significantly more positive than negative, and more reward vs. risk oriented. A mini-meta-analysis revealed large overall effect sizes (positive vs. negative affect language: g = 0.98; reward vs. risk language: g = 0.81). We also found that people’s forecasts of their own reactions showed a greater positivity bias than their forecasts of humanity’s reactions (Study 1), and that responses to reading an actual announcement of the discovery of extraterrestrial microbial life showed a greater positivity bias than responses to reading an actual announcement of the creation of man-made synthetic life (Study 2). Taken together, this work suggests that our reactions to a future confirmed discovery of microbial extraterrestrial life are likely to be fairly positive. PMID:29367849

  17. IndeCut evaluates performance of network motif discovery algorithms.

    PubMed

    Ansariola, Mitra; Megraw, Molly; Koslicki, David

    2018-05-01

    Genomic networks represent a complex map of molecular interactions which are descriptive of the biological processes occurring in living cells. Identifying the small over-represented circuitry patterns in these networks helps generate hypotheses about the functional basis of such complex processes. Network motif discovery is a systematic way of achieving this goal. However, a reliable network motif discovery outcome requires generating random background networks which are the result of a uniform and independent graph sampling method. To date, there has been no method to numerically evaluate whether any network motif discovery algorithm performs as intended on realistically sized datasets-thus it was not possible to assess the validity of resulting network motifs. In this work, we present IndeCut, the first method to date that characterizes network motif finding algorithm performance in terms of uniform sampling on realistically sized networks. We demonstrate that it is critical to use IndeCut prior to running any network motif finder for two reasons. First, IndeCut indicates the number of samples needed for a tool to produce an outcome that is both reproducible and accurate. Second, IndeCut allows users to choose the tool that generates samples in the most independent fashion for their network of interest among many available options. The open source software package is available at https://github.com/megrawlab/IndeCut. megrawm@science.oregonstate.edu or david.koslicki@math.oregonstate.edu. Supplementary data are available at Bioinformatics online.

  18. The metagenomic approach and causality in virology

    PubMed Central

    Castrignano, Silvana Beres; Nagasse-Sugahara, Teresa Keico

    2015-01-01

    Nowadays, the metagenomic approach has been a very important tool in the discovery of new viruses in environmental and biological samples. Here we discuss how these discoveries may help to elucidate the etiology of diseases and the criteria necessary to establish a causal association between a virus and a disease. PMID:25902566

  19. Using dried blood spot sampling to improve data quality and reduce animal use in mouse pharmacokinetic studies.

    PubMed

    Wickremsinhe, Enaksha R; Perkins, Everett J

    2015-03-01

    Traditional pharmacokinetic analysis in nonclinical studies is based on the concentration of a test compound in plasma and requires approximately 100 to 200 μL blood collected per time point. However, the total blood volume of mice limits the number of samples that can be collected from an individual animal-often to a single collection per mouse-thus necessitating dosing multiple mice to generate a pharmacokinetic profile in a sparse-sampling design. Compared with traditional methods, dried blood spot (DBS) analysis requires smaller volumes of blood (15 to 20 μL), thus supporting serial blood sampling and the generation of a complete pharmacokinetic profile from a single mouse. Here we compare plasma-derived data with DBS-derived data, explain how to adopt DBS sampling to support discovery mouse studies, and describe how to generate pharmacokinetic and pharmacodynamic data from a single mouse. Executing novel study designs that use DBS enhances the ability to identify and streamline better drug candidates during drug discovery. Implementing DBS sampling can reduce the number of mice needed in a drug discovery program. In addition, the simplicity of DBS sampling and the smaller numbers of mice needed translate to decreased study costs. Overall, DBS sampling is consistent with 3Rs principles by achieving reductions in the number of animals used, decreased restraint-associated stress, improved data quality, direct comparison of interanimal variability, and the generation of multiple endpoints from a single study.

  20. Using Dried Blood Spot Sampling to Improve Data Quality and Reduce Animal Use in Mouse Pharmacokinetic Studies

    PubMed Central

    Wickremsinhe, Enaksha R; Perkins, Everett J

    2015-01-01

    Traditional pharmacokinetic analysis in nonclinical studies is based on the concentration of a test compound in plasma and requires approximately 100 to 200 µL blood collected per time point. However, the total blood volume of mice limits the number of samples that can be collected from an individual animal—often to a single collection per mouse—thus necessitating dosing multiple mice to generate a pharmacokinetic profile in a sparse-sampling design. Compared with traditional methods, dried blood spot (DBS) analysis requires smaller volumes of blood (15 to 20 µL), thus supporting serial blood sampling and the generation of a complete pharmacokinetic profile from a single mouse. Here we compare plasma-derived data with DBS-derived data, explain how to adopt DBS sampling to support discovery mouse studies, and describe how to generate pharmacokinetic and pharmacodynamic data from a single mouse. Executing novel study designs that use DBS enhances the ability to identify and streamline better drug candidates during drug discovery. Implementing DBS sampling can reduce the number of mice needed in a drug discovery program. In addition, the simplicity of DBS sampling and the smaller numbers of mice needed translate to decreased study costs. Overall, DBS sampling is consistent with 3Rs principles by achieving reductions in the number of animals used, decreased restraint-associated stress, improved data quality, direct comparison of interanimal variability, and the generation of multiple endpoints from a single study. PMID:25836959

  1. Fine Mapping on Chromosome 13q32–34 and Brain Expression Analysis Implicates MYO16 in Schizophrenia

    PubMed Central

    Rodriguez-Murillo, Laura; Xu, Bin; Roos, J Louw; Abecasis, Gonçalo R; Gogos, Joseph A; Karayiorgou, Maria

    2014-01-01

    We previously reported linkage of schizophrenia and schizoaffective disorder to 13q32–34 in the European descent Afrikaner population from South Africa. The nature of genetic variation underlying linkage peaks in psychiatric disorders remains largely unknown and both rare and common variants may be contributing. Here, we examine the contribution of common variants located under the 13q32–34 linkage region. We used densely spaced SNPs to fine map the linkage peak region using both a discovery sample of 415 families and a meta-analysis incorporating two additional replication family samples. In a second phase of the study, we use one family-based data set with 237 families and independent case–control data sets for fine mapping of the common variant association signal using HapMap SNPs. We report a significant association with a genetic variant (rs9583277) within the gene encoding for the myosin heavy-chain Myr 8 (MYO16), which has been implicated in neuronal phosphoinositide 3-kinase signaling. Follow-up analysis of HapMap variation within MYO16 in a second set of Afrikaner families and additional case–control data sets of European descent highlighted a region across introns 2–6 as the most likely region to harbor common MYO16 risk variants. Expression analysis revealed a significant increase in the level of MYO16 expression in the brains of schizophrenia patients. Our results suggest that common variation within MYO16 may contribute to the genetic liability to schizophrenia. PMID:24141571

  2. BCL-2: Long and winding path from discovery to therapeutic target

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schenk, Robyn L.; Strasser, Andreas; Department of Medical Biology, University of Melbourne, Parkville, Melbourne, Victoria 3010

    In 1988, the BCL-2 protein was found to promote cancer by limiting cell death rather than enhancing proliferation. This discovery set the wheels in motion for an almost 30 year journey involving many international research teams that has recently culminated in the approval for a drug, ABT-199/venetoclax/Venclexta that targets this protein in the treatment of cancer. This review will describe the long and winding path from the discovery of this protein and understanding the fundamental process of apoptosis that BCL-2 and its numerous homologues control, through to its exploitation as a drug target that is set to have significant benefitmore » for cancer patients. - Highlights: • BCL-2 proteins control the intrinsic or mitochondrial pathway of apoptosis. • Defective apoptosis is a hallmark of cancer. • BH3-mimetics inhibit pro-survival BCL-2 proteins to induce cancer cell death. • ABT-199/venetoclax is approved for treatment of chronic lymphocytic leukaemia.« less

  3. CRIMALDDI: a co-ordinated, rational, and integrated effort to set logical priorities in anti-malarial drug discovery initiatives

    PubMed Central

    2010-01-01

    Despite increasing efforts and support for anti-malarial drug R&D, globally anti-malarial drug discovery and development remains largely uncoordinated and fragmented. The current window of opportunity for large scale funding of R&D into malaria is likely to narrow in the coming decade due to a contraction in available resources caused by the current economic difficulties and new priorities (e.g. climate change). It is, therefore, essential that stakeholders are given well-articulated action plans and priorities to guide judgments on where resources can be best targeted. The CRIMALDDI Consortium (a European Union funded initiative) has been set up to develop, through a process of stakeholder and expert consultations, such priorities and recommendations to address them. It is hoped that the recommendations will help to guide the priorities of the European anti-malarial research as well as the wider global discovery agenda in the coming decade. PMID:20626844

  4. CUAHSI Data Services: Tools and Cyberinfrastructure for Water Data Discovery, Research and Collaboration

    NASA Astrophysics Data System (ADS)

    Seul, M.; Brazil, L.; Castronova, A. M.

    2017-12-01

    CUAHSI Data Services: Tools and Cyberinfrastructure for Water Data Discovery, Research and CollaborationEnabling research surrounding interdisciplinary topics often requires a combination of finding, managing, and analyzing large data sets and models from multiple sources. This challenge has led the National Science Foundation to make strategic investments in developing community data tools and cyberinfrastructure that focus on water data, as it is central need for many of these research topics. CUAHSI (The Consortium of Universities for the Advancement of Hydrologic Science, Inc.) is a non-profit organization funded by the National Science Foundation to aid students, researchers, and educators in using and managing data and models to support research and education in the water sciences. This presentation will focus on open-source CUAHSI-supported tools that enable enhanced data discovery online using advanced searching capabilities and computational analysis run in virtual environments pre-designed for educators and scientists so they can focus their efforts on data analysis rather than IT set-up.

  5. Systems analysis of multiple regulator perturbations allows discovery of virulence factors in Salmonella

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoon, Hyunjin; Ansong, Charles; McDermott, Jason E.

    Background: Systemic bacterial infections are highly regulated and complex processes that are orchestrated by numerous virulence factors. Genes that are coordinately controlled by the set of regulators required for systemic infection are potentially required for pathogenicity. Results: In this study we present a systems biology approach in which sample-matched multi-omic measurements of fourteen virulence-essential regulator mutants were coupled with computational network analysis to efficiently identify Salmonella virulence factors. Immunoblot experiments verified network-predicted virulence factors and a subset was determined to be secreted into the host cytoplasm, suggesting that they are virulence factors directly interacting with host cellular components. Two ofmore » these, SrfN and PagK2, were required for full mouse virulence and were shown to be translocated independent of either of the type III secretion systems in Salmonella or the type III injectisome-related flagellar mechanism. Conclusions: Integrating multi-omic datasets from Salmonella mutants lacking virulence regulators not only identified novel virulence factors but also defined a new class of translocated effectors involved in pathogenesis. The success of this strategy at discovery of known and novel virulence factors suggests that the approach may have applicability for other bacterial pathogens.« less

  6. Rethinking 'academic' drug discovery: the Manchester Institute perspective.

    PubMed

    Jordan, Allan M; Waddell, Ian D; Ogilvie, Donald J

    2015-05-01

    The contraction in research within pharma has seen a renaissance in drug discovery within the academic setting. Often, groups grow organically from academic research laboratories, exploiting a particular area of novel biology or new technology. However, increasingly, new groups driven by industrial staff are emerging with demonstrable expertise in the delivery of medicines. As part of a strategic review by Cancer Research UK (CR-UK), the drug discovery team at the Manchester Institute was established to translate novel research from the Manchester cancer research community into drug discovery programmes. From a standing start, we have taken innovative approaches to solve key issues faced by similar groups, such as hit finding and target identification. Herein, we share our lessons learnt and successful strategies. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. False Discovery Control in Large-Scale Spatial Multiple Testing

    PubMed Central

    Sun, Wenguang; Reich, Brian J.; Cai, T. Tony; Guindani, Michele; Schwartzman, Armin

    2014-01-01

    Summary This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US. PMID:25642138

  8. A digital atlas of hydrocarbon accumulations within and adjacent to the National Petroleum Reserve - Alaska (NPRA)

    USGS Publications Warehouse

    Kumar, Naresh; Bird, Kenneth J.; Nelson, Philip H.; Grow, John A.; Evans, Kevin R.

    2002-01-01

    The United States Geological Survey (USGS) has initiated a project to reassess the hydrocarbon potential of the NPRA. Although exploration for hydrocarbons in the NPRA was initiated in 1944, it has taken fifty years for the first commercial discovery to be made. That discovery, the Alpine field (projected recoverable reserves of 430 million barrels), was made in 1994 along the eastern boundary of the NPRA. This field produces from a formation heretofore considered to be mostly a source rock. The Alpine discovery made such a reassessment necessary. As part of this assessment, we have compiled stratigraphic, structural, petrophysical, and seismic data related to nineteen accumulations within and nearby the NPRA. The goal is to provide basic documentation and a set of analog accumulations for the new assessment. The first two displays of this atlas consist of a location map and a stratigraphic column showing the stratigraphic settings for the primary reservoir and source rocks for these accumulations. The third display is a table listing each accumulation and providing the hydrocarbon fluid type, reservoir, operator, status, and discovery well and date for each. Compilation of basic information for each individual accumulation follows these displays. A typical compilation includes a structurecontour map on or near the reservoir horizon, a log display of the discovery well with reservoir characteristics along with figures for recoverable volumes, and one or two seismic lines across or near the accumulation.

  9. Emerging Concepts and Methodologies in Cancer Biomarker Discovery.

    PubMed

    Lu, Meixia; Zhang, Jinxiang; Zhang, Lanjing

    2017-01-01

    Cancer biomarker discovery is a critical part of cancer prevention and treatment. Despite the decades of effort, only a small number of cancer biomarkers have been identified for and validated in clinical settings. Conceptual and methodological breakthroughs may help accelerate the discovery of additional cancer biomarkers, particularly their use for diagnostics. In this review, we have attempted to review the emerging concepts in cancer biomarker discovery, including real-world evidence, open access data, and data paucity in rare or uncommon cancers. We have also summarized the recent methodological progress in cancer biomarker discovery, such as high-throughput sequencing, liquid biopsy, big data, artificial intelligence (AI), and deep learning and neural networks. Much attention has been given to the methodological details and comparison of the methodologies. Notably, these concepts and methodologies interact with each other and will likely lead to synergistic effects when carefully combined. Newer, more innovative concepts and methodologies are emerging as the current emerging ones became mainstream and widely applied to the field. Some future challenges are also discussed. This review contributes to the development of future theoretical frameworks and technologies in cancer biomarker discovery and will contribute to the discovery of more useful cancer biomarkers.

  10. An integrative model for in-silico clinical-genomics discovery science.

    PubMed

    Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael

    2002-01-01

    Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.

  11. A curated compendium of monocyte transcriptome datasets of relevance to human monocyte immunobiology research

    PubMed Central

    Rinchai, Darawan; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Chaussabel, Damien

    2016-01-01

    Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp. PMID:27158452

  12. Predicting Presynaptic and Postsynaptic Neurotoxins by Developing Feature Selection Technique

    PubMed Central

    Yang, Yunchun; Zhang, Chunmei; Chen, Rong; Huang, Po

    2017-01-01

    Presynaptic and postsynaptic neurotoxins are proteins which act at the presynaptic and postsynaptic membrane. Correctly predicting presynaptic and postsynaptic neurotoxins will provide important clues for drug-target discovery and drug design. In this study, we developed a theoretical method to discriminate presynaptic neurotoxins from postsynaptic neurotoxins. A strict and objective benchmark dataset was constructed to train and test our proposed model. The dipeptide composition was used to formulate neurotoxin samples. The analysis of variance (ANOVA) was proposed to find out the optimal feature set which can produce the maximum accuracy. In the jackknife cross-validation test, the overall accuracy of 94.9% was achieved. We believe that the proposed model will provide important information to study neurotoxins. PMID:28303250

  13. Crystal Structure Predictions Using Adaptive Genetic Algorithm and Motif Search methods

    NASA Astrophysics Data System (ADS)

    Ho, K. M.; Wang, C. Z.; Zhao, X.; Wu, S.; Lyu, X.; Zhu, Z.; Nguyen, M. C.; Umemoto, K.; Wentzcovitch, R. M. M.

    2017-12-01

    Material informatics is a new initiative which has attracted a lot of attention in recent scientific research. The basic strategy is to construct comprehensive data sets and use machine learning to solve a wide variety of problems in material design and discovery. In pursuit of this goal, a key element is the quality and completeness of the databases used. Recent advance in the development of crystal structure prediction algorithms has made it a complementary and more efficient approach to explore the structure/phase space in materials using computers. In this talk, we discuss the importance of the structural motifs and motif-networks in crystal structure predictions. Correspondingly, powerful methods are developed to improve the sampling of the low-energy structure landscape.

  14. Development and confirmation of potential gene classifiers of human clear cell renal cell carcinoma using next-generation RNA sequencing.

    PubMed

    Eikrem, Oystein S; Strauss, Philipp; Beisland, Christian; Scherer, Andreas; Landolt, Lea; Flatberg, Arnar; Leh, Sabine; Beisvag, Vidar; Skogstrand, Trude; Hjelle, Karin; Shresta, Anjana; Marti, Hans-Peter

    2016-12-01

    A previous study by this group demonstrated the feasibility of RNA sequencing (RNAseq) technology for capturing disease biology of clear cell renal cell carcinoma (ccRCC), and presented initial results for carbonic anhydrase-9 (CA9) and tumor necrosis factor-α-induced protein-6 (TNFAIP6) as possible biomarkers of ccRCC (discovery set) [Eikrem et al. PLoS One 2016;11:e0149743]. To confirm these results, the previous study is expanded, and RNAseq data from additional matched ccRCC and normal renal biopsies are analyzed (confirmation set). Two core biopsies from patients (n = 12) undergoing partial or full nephrectomy were obtained with a 16 g needle. RNA sequencing libraries were generated with the Illumina TruSeq ® Access library preparation protocol. Comparative analysis was done using linear modeling (voom/Limma; R Bioconductor). The formalin-fixed and paraffin-embedded discovery and confirmation data yielded 8957 and 11,047 detected transcripts, respectively. The two data sets shared 1193 of differentially expressed genes with each other. The average expression and the log 2 -fold changes of differentially expressed transcripts in both data sets correlated, with R²   =   .95 and R²   =   .94, respectively. Among transcripts with the highest fold changes were CA9, neuronal pentraxin-2 and uromodulin. Epithelial-mesenchymal transition was highlighted by differential expression of, for example, transforming growth factor-β 1 and delta-like ligand-4. The diagnostic accuracy of CA9 was 100% and 93.9% when using the discovery set as the training set and the confirmation data as the test set, and vice versa, respectively. These data further support TNFAIP6 as a novel biomarker of ccRCC. TNFAIP6 had combined accuracy of 98.5% in the two data sets. This study provides confirmatory data on the potential use of CA9 and TNFAIP6 as biomarkers of ccRCC. Thus, next-generation sequencing expands the clinical application of tissue analyses.

  15. Human metabolic profiles are stably controlled by genetic and environmental variation

    PubMed Central

    Nicholson, George; Rantalainen, Mattias; Maher, Anthony D; Li, Jia V; Malmodin, Daniel; Ahmadi, Kourosh R; Faber, Johan H; Hallgrímsdóttir, Ingileif B; Barrett, Amy; Toft, Henrik; Krestyaninova, Maria; Viksna, Juris; Neogi, Sudeshna Guha; Dumas, Marc-Emmanuel; Sarkans, Ugis; The MolPAGE Consortium; Silverman, Bernard W; Donnelly, Peter; Nicholson, Jeremy K; Allen, Maxine; Zondervan, Krina T; Lindon, John C; Spector, Tim D; McCarthy, Mark I; Holmes, Elaine; Baunsgaard, Dorrit; Holmes, Chris C

    2011-01-01

    1H Nuclear Magnetic Resonance spectroscopy (1H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired 1H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in 1H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect 1H NMR-based biomarkers quantifying predisposition to disease. PMID:21878913

  16. Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

    PubMed

    Buschmann, Tilo; Zhang, Rong; Brash, Douglas E; Bystrykh, Leonid V

    2014-08-07

    DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.

  17. The influence of discovery learning model application to the higher order thinking skills student of Srijaya Negara Senior High School Palembang on the animal kingdom subject matter

    NASA Astrophysics Data System (ADS)

    Riandari, F.; Susanti, R.; Suratmi

    2018-05-01

    This study aimed to find out the information in concerning the influence of discovery learning model application to the higher order thinking skills at the tenth grade students of Srijaya Negara senior high school Palembang on the animal kingdom subject matter. The research method used was pre-experimental with one-group pretest-posttest design. The researchconducted at Srijaya Negara senior high school Palembang academic year 2016/2017. The population sample of this research was tenth grade students of natural science 2. Purposive sampling techniquewas applied in this research. Data was collected by(1) the written test, consist of pretest to determine the initial ability and posttest to determine higher order thinking skills of students after learning by using discovery learning models. (2) Questionnaire sheet, aimed to investigate the response of the students during the learning process by using discovery learning models. The t-test result indicated there was significant increasement of higher order thinking skills students. Thus, it can be concluded that the application of discovery learning modelhad a significant effect and increased to higher order thinking skills students of Srijaya Negara senior high school Palembang on the animal kingdom subject matter.

  18. Differential Plasma Glycoproteome of p19 Skin Cancer Mouse Model Using the Corra Label-Free LC-MS Proteomics Platform.

    PubMed

    Letarte, Simon; Brusniak, Mi-Youn; Campbell, David; Eddes, James; Kemp, Christopher J; Lau, Hollis; Mueller, Lukas; Schmidt, Alexander; Shannon, Paul; Kelly-Spratt, Karen S; Vitek, Olga; Zhang, Hui; Aebersold, Ruedi; Watts, Julian D

    2008-12-01

    A proof-of-concept demonstration of the use of label-free quantitative glycoproteomics for biomarker discovery workflow is presented here, using a mouse model for skin cancer as an example. Blood plasma was collected from 10 control mice, and 10 mice having a mutation in the p19(ARF) gene, conferring them high propensity to develop skin cancer after carcinogen exposure. We enriched for N-glycosylated plasma proteins, ultimately generating deglycosylated forms of the modified tryptic peptides for liquid chromatography mass spectrometry (LC-MS) analyses. LC-MS runs for each sample were then performed with a view to identifying proteins that were differentially abundant between the two mouse populations. We then used a recently developed computational framework, Corra, to perform peak picking and alignment, and to compute the statistical significance of any observed changes in individual peptide abundances. Once determined, the most discriminating peptide features were then fragmented and identified by tandem mass spectrometry with the use of inclusion lists. We next assessed the identified proteins to see if there were sets of proteins indicative of specific biological processes that correlate with the presence of disease, and specifically cancer, according to their functional annotations. As expected for such sick animals, many of the proteins identified were related to host immune response. However, a significant number of proteins also directly associated with processes linked to cancer development, including proteins related to the cell cycle, localisation, trasport, and cell death. Additional analysis of the same samples in profiling mode, and in triplicate, confirmed that replicate MS analysis of the same plasma sample generated less variation than that observed between plasma samples from different individuals, demonstrating that the reproducibility of the LC-MS platform was sufficient for this application. These results thus show that an LC-MS-based workflow can be a useful tool for the generation of candidate proteins of interest as part of a disease biomarker discovery effort.

  19. Differential Plasma Glycoproteome of p19ARF Skin Cancer Mouse Model Using the Corra Label-Free LC-MS Proteomics Platform

    PubMed Central

    Letarte, Simon; Brusniak, Mi-Youn; Campbell, David; Eddes, James; Kemp, Christopher J.; Lau, Hollis; Mueller, Lukas; Schmidt, Alexander; Shannon, Paul; Kelly-Spratt, Karen S.; Vitek, Olga; Zhang, Hui; Aebersold, Ruedi; Watts, Julian D.

    2010-01-01

    A proof-of-concept demonstration of the use of label-free quantitative glycoproteomics for biomarker discovery workflow is presented here, using a mouse model for skin cancer as an example. Blood plasma was collected from 10 control mice, and 10 mice having a mutation in the p19ARF gene, conferring them high propensity to develop skin cancer after carcinogen exposure. We enriched for N-glycosylated plasma proteins, ultimately generating deglycosylated forms of the modified tryptic peptides for liquid chromatography mass spectrometry (LC-MS) analyses. LC-MS runs for each sample were then performed with a view to identifying proteins that were differentially abundant between the two mouse populations. We then used a recently developed computational framework, Corra, to perform peak picking and alignment, and to compute the statistical significance of any observed changes in individual peptide abundances. Once determined, the most discriminating peptide features were then fragmented and identified by tandem mass spectrometry with the use of inclusion lists. We next assessed the identified proteins to see if there were sets of proteins indicative of specific biological processes that correlate with the presence of disease, and specifically cancer, according to their functional annotations. As expected for such sick animals, many of the proteins identified were related to host immune response. However, a significant number of proteins also directly associated with processes linked to cancer development, including proteins related to the cell cycle, localisation, trasport, and cell death. Additional analysis of the same samples in profiling mode, and in triplicate, confirmed that replicate MS analysis of the same plasma sample generated less variation than that observed between plasma samples from different individuals, demonstrating that the reproducibility of the LC-MS platform was sufficient for this application. These results thus show that an LC-MS-based workflow can be a useful tool for the generation of candidate proteins of interest as part of a disease biomarker discovery effort. PMID:20157627

  20. Use of eQTL Analysis for the Discovery of Target Genes Identified by GWAS

    DTIC Science & Technology

    2013-04-01

    the biologic pathways affected by these inherited factors, and ultimately to identify targets for disease prediction, risk stratification and...quality using an Agilent chip technology. Cases having a RIN number of 7.0 or greater were considered good quality. Once completed, the optimum set of...AD_________________ Award Number: W81XWH-11-1-0261 TITLE: Use of eQTL Analysis for the Discovery of

  1. Avioserpens in the Western Grebe (Aechmophorus occidentalis): A new Host and Geographic Record for a Dracunculoid Nematode and Implications of Migration and Climate Change.

    PubMed

    Latas, Patricia J; Stockdale Walden, Heather D; Bates, Lisa; Marshall, Summer; Rohr, Tammy; Whitehead, Lou Rae

    2016-01-01

    We report a new host and geographic range for the dracunculoid nematode (Avioserpens sp.) in a Western Grebe (Aechmophorus occidentalis) from southern Arizona, US. This discovery underscores the importance of parasite discovery and identification in the wildlife rehabilitation setting. Climate change and weather events affect the migratory spread of unusual parasites.

  2. Cancer and genetics: what we need to know now.

    PubMed

    Ruccione, K

    1999-07-01

    Profound changes brought about by discoveries in molecular biology may enable us in the future to treat cancer without causing late effects or to prevent cancer altogether. Even before that happens, the age of molecular medicine has arrived. Molecular biology is the study of biological processes at the level of the molecule. A major aspect of molecular biology is molecular genetics--the science that deals with DNA and RNA. Most of the progress in molecular biology has been made in the second half of the 20th century. Each discovery or technological innovation has built on previous discoveries and paved the way for the next, culminating in the current effort to map, sequence, and understand the functions of the entire human genome. In the past 20 years, many pieces of the cancer puzzle have been found, showing us how the normal cellular control mechanisms go awry to cause cancer and setting the stage for genetic testing and disease treatment. These new discoveries bring both promise and peril. To provide comprehensive care for survivors of childhood cancer and care in other settings as well, health care providers must now be familiar with the concepts and language of molecular biology, understand its applications to cancer care, and be fully informed about its implications for clinical practice, research, and education.

  3. Culture-independent discovery of natural products from soil metagenomes.

    PubMed

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.

  4. Net present value approaches for drug discovery.

    PubMed

    Svennebring, Andreas M; Wikberg, Jarl Es

    2013-12-01

    Three dedicated approaches to the calculation of the risk-adjusted net present value (rNPV) in drug discovery projects under different assumptions are suggested. The probability of finding a candidate drug suitable for clinical development and the time to the initiation of the clinical development is assumed to be flexible in contrast to the previously used models. The rNPV of the post-discovery cash flows is calculated as the probability weighted average of the rNPV at each potential time of initiation of clinical development. Practical considerations how to set probability rates, in particular during the initiation and termination of a project is discussed.

  5. Development of an objective gene expression panel as an alternative to self-reported symptom scores in human influenza challenge trials.

    PubMed

    Muller, Julius; Parizotto, Eneida; Antrobus, Richard; Francis, James; Bunce, Campbell; Stranks, Amanda; Nichols, Marshall; McClain, Micah; Hill, Adrian V S; Ramasamy, Adaikalavan; Gilbert, Sarah C

    2017-06-08

    Influenza challenge trials are important for vaccine efficacy testing. Currently, disease severity is determined by self-reported scores to a list of symptoms which can be highly subjective. A more objective measure would allow for improved data analysis. Twenty-one volunteers participated in an influenza challenge trial. We calculated the daily sum of scores (DSS) for a list of 16 influenza symptoms. Whole blood collected at baseline and 24, 48, 72 and 96 h post challenge was profiled on Illumina HT12v4 microarrays. Changes in gene expression most strongly correlated with DSS were selected to train a Random Forest model and tested on two independent test sets consisting of 41 individuals profiled on a different microarray platform and 33 volunteers assayed by qRT-PCR. 1456 probes are significantly associated with DSS at 1% false discovery rate. We selected 19 genes with the largest fold change to train a random forest model. We observed good concordance between predicted and actual scores in the first test set (r = 0.57; RMSE = -16.1%) with the greatest agreement achieved on samples collected approximately 72 h post challenge. Therefore, we assayed samples collected at baseline and 72 h post challenge in the second test set by qRT-PCR and observed good concordance (r = 0.81; RMSE = -36.1%). We developed a 19-gene qRT-PCR panel to predict DSS, validated on two independent datasets. A transcriptomics based panel could provide a more objective measure of symptom scoring in future influenza challenge studies. Trial registration Samples were obtained from a clinical trial with the ClinicalTrials.gov Identifier: NCT02014870, first registered on December 5, 2013.

  6. Microarray-based gene expression profiling in patients with cryopyrin-associated periodic syndromes defines a disease-related signature and IL-1-responsive transcripts.

    PubMed

    Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona

    2013-06-01

    To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values≤false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 post-anakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra.

  7. Microarray-based gene expression profiling in patients with cryopyrin-associated periodic syndromes defines a disease-related signature and IL-1-responsive transcripts

    PubMed Central

    Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona

    2014-01-01

    Objective To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. Methods We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values ≤ false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Results Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 postanakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. Conclusions We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra. PMID:23223423

  8. A Description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline.

    PubMed

    Rudnick, Paul A; Markey, Sanford P; Roth, Jeri; Mirokhin, Yuri; Yan, Xinjian; Tchekhovskoi, Dmitrii V; Edwards, Nathan J; Thangudu, Ratna R; Ketchum, Karen A; Kinsinger, Christopher R; Mesri, Mehdi; Rodriguez, Henry; Stein, Stephen E

    2016-03-04

    The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics data sets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and nonreference markers of cancer. The CPTAC laboratories have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these data sets were produced from 2D liquid chromatography-tandem mass spectrometry analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false-discovery rate-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the data sets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level ("rolled-up") precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data to enable comparisons between different samples and cancer types as well as across the major omics fields.

  9. Examining the Missing Completely at Random Mechanism in Incomplete Data Sets: A Multiple Testing Approach

    ERIC Educational Resources Information Center

    Raykov, Tenko; Lichtenberg, Peter A.; Paulson, Daniel

    2012-01-01

    A multiple testing procedure for examining implications of the missing completely at random (MCAR) mechanism in incomplete data sets is discussed. The approach uses the false discovery rate concept and is concerned with testing group differences on a set of variables. The method can be used for ascertaining violations of MCAR and disproving this…

  10. Integrating genome-wide association study and expression quantitative trait loci data identifies multiple genes and gene set associated with neuroticism.

    PubMed

    Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng

    2017-08-01

    Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10 -10 ), MGC57346 (p value=6.92×10 -7 ), BLK (p value=1.01×10 -6 ), XKR6 (p value=1.11×10 -6 ), C17ORF69 (p value=1.12×10 -6 ) and KIAA1267 (p value=4.00×10 -6 ). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.

  11. International Study to Evaluate PCR Methods for Detection of Trypanosoma cruzi DNA in Blood Samples from Chagas Disease Patients

    PubMed Central

    Schijman, Alejandro G.; Bisio, Margarita; Orellana, Liliana; Sued, Mariela; Duffy, Tomás; Mejia Jaramillo, Ana M.; Cura, Carolina; Auter, Frederic; Veron, Vincent; Qvarnstrom, Yvonne; Deborggraeve, Stijn; Hijar, Gisely; Zulantay, Inés; Lucero, Raúl Horacio; Velazquez, Elsa; Tellez, Tatiana; Sanchez Leon, Zunilda; Galvão, Lucia; Nolder, Debbie; Monje Rumi, María; Levi, José E.; Ramirez, Juan D.; Zorrilla, Pilar; Flores, María; Jercic, Maria I.; Crisante, Gladys; Añez, Néstor; De Castro, Ana M.; Gonzalez, Clara I.; Acosta Viana, Karla; Yachelini, Pedro; Torrico, Faustino; Robello, Carlos; Diosque, Patricio; Triana Chavez, Omar; Aznar, Christine; Russomando, Graciela; Büscher, Philippe; Assal, Azzedine; Guhl, Felipe; Sosa Estani, Sergio; DaSilva, Alexandre; Britto, Constança; Luquetti, Alejandro; Ladzins, Janis

    2011-01-01

    Background A century after its discovery, Chagas disease still represents a major neglected tropical threat. Accurate diagnostics tools as well as surrogate markers of parasitological response to treatment are research priorities in the field. The purpose of this study was to evaluate the performance of PCR methods in detection of Trypanosoma cruzi DNA by an external quality evaluation. Methodology/Findings An international collaborative study was launched by expert PCR laboratories from 16 countries. Currently used strategies were challenged against serial dilutions of purified DNA from stocks representing T. cruzi discrete typing units (DTU) I, IV and VI (set A), human blood spiked with parasite cells (set B) and Guanidine Hidrochloride-EDTA blood samples from 32 seropositive and 10 seronegative patients from Southern Cone countries (set C). Forty eight PCR tests were reported for set A and 44 for sets B and C; 28 targeted minicircle DNA (kDNA), 13 satellite DNA (Sat-DNA) and the remainder low copy number sequences. In set A, commercial master mixes and Sat-DNA Real Time PCR showed better specificity, but kDNA-PCR was more sensitive to detect DTU I DNA. In set B, commercial DNA extraction kits presented better specificity than solvent extraction protocols. Sat-DNA PCR tests had higher specificity, with sensitivities of 0.05–0.5 parasites/mL whereas specific kDNA tests detected 5.10−3 par/mL. Sixteen specific and coherent methods had a Good Performance in both sets A and B (10 fg/µl of DNA from all stocks, 5 par/mL spiked blood). The median values of sensitivities, specificities and accuracies obtained in testing the Set C samples with the 16 tests determined to be good performing by analyzing Sets A and B samples varied considerably. Out of them, four methods depicted the best performing parameters in all three sets of samples, detecting at least 10 fg/µl for each DNA stock, 0.5 par/mL and a sensitivity between 83.3–94.4%, specificity of 85–95%, accuracy of 86.8–89.5% and kappa index of 0.7–0.8 compared to consensus PCR reports of the 16 good performing tests and 63–69%, 100%, 71.4–76.2% and 0.4–0.5, respectively compared to serodiagnosis. Method LbD2 used solvent extraction followed by Sybr-Green based Real time PCR targeted to Sat-DNA; method LbD3 used solvent DNA extraction followed by conventional PCR targeted to Sat-DNA. The third method (LbF1) used glass fiber column based DNA extraction followed by TaqMan Real Time PCR targeted to Sat-DNA (cruzi 1/cruzi 2 and cruzi 3 TaqMan probe) and the fourth method (LbQ) used solvent DNA extraction followed by conventional hot-start PCR targeted to kDNA (primer pairs 121/122). These four methods were further evaluated at the coordinating laboratory in a subset of human blood samples, confirming the performance obtained by the participating laboratories. Conclusion/Significance This study represents a first crucial step towards international validation of PCR procedures for detection of T. cruzi in human blood samples. PMID:21264349

  12. Antibiotic resistance versus small molecules, the chemical evolution.

    PubMed

    Lee, V J; Hecker, S J

    1999-11-01

    Two discovery approaches directed to addressing the problem of increasing bacterial resistance are described. The first is a program to build activity against methicillin-resistant Staphylococcus aureus (MRSA) into the cephalosporin class of antibacterials, by enhancing affinity for PBP2a, the penicillin-binding protein responsible for this resistance. Through stepwise improvement in potency, human serum binding, solubility, and betalactamase stability, a stable of new compounds with excellent potential as anti-MRSA agents was realized. From this set was chosen MC-02, 479 (RWJ-54428), which is now undergoing extensive preclinical evaluation. The second approach explores the uridyl peptide family of antibiotics, inhibitors of bacterial translocase (mraY), whose members include the pacidamycins, mureidomycins, and napsamycins. Access to a diverse set of analogs by total synthesis was catalyzed by the discovery that hydrogenation of the 4'-exoenamidofuranosyl moiety causes no loss in biological activity. Indepth exploration of SAR required (1) establishment of the absolute stereochemistry of the central diaminobutyric acid (DABA) moiety and (2) determination of the stereochemistry of the 4'-substituent on the deoxyfuranose unit. The former was accomplished by comparison of DABA derived from degradation of a natural product pacidamycin with a sample synthesized from L-threonine. The biological activity of one member of a synthesized library of possible stereoisomers of the natural product established the absolute stereochemistry of the remaining centers. A variety of analogs of the natural product were prepared utilizing the synthetic methods developed, and their biological activities provide important insights into the specificity and spectrum of the antibiotic class. Copyright 1999 John Wiley & Sons, Inc. Med Res Rev, 19, No. 6, 521-542, 1999

  13. Searching for missing heritability: Designing rare variant association studies

    PubMed Central

    Zuk, Or; Schaffner, Stephen F.; Samocha, Kaitlin; Do, Ron; Hechter, Eliana; Kathiresan, Sekar; Daly, Mark J.; Neale, Benjamin M.; Sunyaev, Shamil R.; Lander, Eric S.

    2014-01-01

    Genetic studies have revealed thousands of loci predisposing to hundreds of human diseases and traits, revealing important biological pathways and defining novel therapeutic hypotheses. However, the genes discovered to date typically explain less than half of the apparent heritability. Because efforts have largely focused on common genetic variants, one hypothesis is that much of the missing heritability is due to rare genetic variants. Studies of common variants are typically referred to as genomewide association studies, whereas studies of rare variants are often simply called sequencing studies. Because they are actually closely related, we use the terms common variant association study (CVAS) and rare variant association study (RVAS). In this paper, we outline the similarities and differences between RVAS and CVAS and describe a conceptual framework for the design of RVAS. We apply the framework to address key questions about the sample sizes needed to detect association, the relative merits of testing disruptive alleles vs. missense alleles, frequency thresholds for filtering alleles, the value of predictors of the functional impact of missense alleles, the potential utility of isolated populations, the value of gene-set analysis, and the utility of de novo mutations. The optimal design depends critically on the selection coefficient against deleterious alleles and thus varies across genes. The analysis shows that common variant and rare variant studies require similarly large sample collections. In particular, a well-powered RVAS should involve discovery sets with at least 25,000 cases, together with a substantial replication set. PMID:24443550

  14. DNA Methylation Biomarkers: Cancer and Beyond

    PubMed Central

    Mikeska, Thomas; Craig, Jeffrey M.

    2014-01-01

    Biomarkers are naturally-occurring characteristics by which a particular pathological process or disease can be identified or monitored. They can reflect past environmental exposures, predict disease onset or course, or determine a patient’s response to therapy. Epigenetic changes are such characteristics, with most epigenetic biomarkers discovered to date based on the epigenetic mark of DNA methylation. Many tissue types are suitable for the discovery of DNA methylation biomarkers including cell-based samples such as blood and tumor material and cell-free DNA samples such as plasma. DNA methylation biomarkers with diagnostic, prognostic and predictive power are already in clinical trials or in a clinical setting for cancer. Outside cancer, strong evidence that complex disease originates in early life is opening up exciting new avenues for the detection of DNA methylation biomarkers for adverse early life environment and for estimation of future disease risk. However, there are a number of limitations to overcome before such biomarkers reach the clinic. Nevertheless, DNA methylation biomarkers have great potential to contribute to personalized medicine throughout life. We review the current state of play for DNA methylation biomarkers, discuss the barriers that must be crossed on the way to implementation in a clinical setting, and predict their future use for human disease. PMID:25229548

  15. Bayesian Orbit Computation Tools for Objects on Geocentric Orbits

    NASA Astrophysics Data System (ADS)

    Virtanen, J.; Granvik, M.; Muinonen, K.; Oszkiewicz, D.

    2013-08-01

    We consider the space-debris orbital inversion problem via the concept of Bayesian inference. The methodology has been put forward for the orbital analysis of solar system small bodies in early 1990's [7] and results in a full solution of the statistical inverse problem given in terms of a posteriori probability density function (PDF) for the orbital parameters. We demonstrate the applicability of our statistical orbital analysis software to Earth orbiting objects, both using well-established Monte Carlo (MC) techniques (for a review, see e.g. [13] as well as recently developed Markov-chain MC (MCMC) techniques (e.g., [9]). In particular, we exploit the novel virtual observation MCMC method [8], which is based on the characterization of the phase-space volume of orbital solutions before the actual MCMC sampling. Our statistical methods and the resulting PDFs immediately enable probabilistic impact predictions to be carried out. Furthermore, this can be readily done also for very sparse data sets and data sets of poor quality - providing that some a priori information on the observational uncertainty is available. For asteroids, impact probabilities with the Earth from the discovery night onwards have been provided, e.g., by [11] and [10], the latter study includes the sampling of the observational-error standard deviation as a random variable.

  16. Thermodynamic equilibrium solubility measurements in simulated fluids by 96-well plate method in early drug discovery.

    PubMed

    Bharate, Sonali S; Vishwakarma, Ram A

    2015-04-01

    An early prediction of solubility in physiological media (PBS, SGF and SIF) is useful to predict qualitatively bioavailability and absorption of lead candidates. Despite of the availability of multiple solubility estimation methods, none of the reported method involves simplified fixed protocol for diverse set of compounds. Therefore, a simple and medium-throughput solubility estimation protocol is highly desirable during lead optimization stage. The present work introduces a rapid method for assessment of thermodynamic equilibrium solubility of compounds in aqueous media using 96-well microplate. The developed protocol is straightforward to set up and takes advantage of the sensitivity of UV spectroscopy. The compound, in stock solution in methanol, is introduced in microgram quantities into microplate wells followed by drying at an ambient temperature. Microplates were shaken upon addition of test media and the supernatant was analyzed by UV method. A plot of absorbance versus concentration of a sample provides saturation point, which is thermodynamic equilibrium solubility of a sample. The established protocol was validated using a large panel of commercially available drugs and with conventional miniaturized shake flask method (r(2)>0.84). Additionally, the statistically significant QSPR models were established using experimental solubility values of 52 compounds. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. MSClique: Multiple Structure Discovery through the Maximum Weighted Clique Problem.

    PubMed

    Sanroma, Gerard; Penate-Sanchez, Adrian; Alquézar, René; Serratosa, Francesc; Moreno-Noguer, Francesc; Andrade-Cetto, Juan; González Ballester, Miguel Ángel

    2016-01-01

    We present a novel approach for feature correspondence and multiple structure discovery in computer vision. In contrast to existing methods, we exploit the fact that point-sets on the same structure usually lie close to each other, thus forming clusters in the image. Given a pair of input images, we initially extract points of interest and extract hierarchical representations by agglomerative clustering. We use the maximum weighted clique problem to find the set of corresponding clusters with maximum number of inliers representing the multiple structures at the correct scales. Our method is parameter-free and only needs two sets of points along with their tentative correspondences, thus being extremely easy to use. We demonstrate the effectiveness of our method in multiple-structure fitting experiments in both publicly available and in-house datasets. As shown in the experiments, our approach finds a higher number of structures containing fewer outliers compared to state-of-the-art methods.

  18. The discovery reach of CP violation in neutrino oscillation with non-standard interaction effects

    NASA Astrophysics Data System (ADS)

    Rahman, Zini; Dasgupta, Arnab; Adhikari, Rathin

    2015-06-01

    We have studied the CP violation discovery reach in a neutrino oscillation experiment with superbeam, neutrino factory and monoenergetic neutrino beam from the electron capture process. For NSI satisfying model-dependent bound for shorter baselines (like CERN-Fréjus set-up) there is insignificant effect of NSI on the the discovery reach of CP violation due to δ. Particularly, for the superbeam and neutrino factory we have also considered relatively longer baselines for which there could be significant NSI effects on CP violation discovery reach for higher allowed values of NSI. For the monoenergetic beam only shorter baselines are considered to study CP violation with different nuclei as neutrino sources. Interestingly for non-standard interactions—{{\\varepsilon }eμ } and {{\\varepsilon }eτ } of neutrinos with matter during propagation in longer baselines in the superbeam, there is the possibility of better discovery reach of CP violation than that with only Standard Model interactions of neutrinos with matter. For complex NSI we have shown the CP violation discovery reach in the plane of Dirac phase δ and NSI phase {{φ }ij}. The CP violation due to some values of δ remain unobservable with present and near future experimental facilities in the superbeam and neutrino factory. However, in the presence of some ranges of off-diagonal NSI phase values there are some possibilities of discovering total CP violation for any {{δ }CP} value even at 5σ confidence level for neutrino factory. Our analysis indicates that for some values of NSI phases total CP violation may not be at all observable for any values of δ. Combination of shorter and longer baselines could indicate in some cases the presence of NSI. However, in general for NSIs ≲ 1 the CP violation discovery reach is better in neutrino factory set-ups. Using a neutrino beam from the electron capture process for nuclei 50110Sn and 152Yb, we have shown the discovery reach of CP violation in a neutrino oscillation experiment. Particularly for 50110Sn nuclei CP violation could be found for about 51% of the possible δ values for a baseline of 130 km with boost factor γ =500. Although the nuclei 152Yb is technically more feasible for the production of a mono-energetic beam, it is found to be unsuitable in obtaining good discovery reach of CP violation.

  19. Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High-Risk Bladder Cancer

    PubMed Central

    Lam, Lucia L.; Ghadessi, Mercedeh; Erho, Nicholas; Vergara, Ismael A.; Alshalalfa, Mohammed; Buerki, Christine; Haddad, Zaid; Sierocinski, Thomas; Triche, Timothy J.; Skinner, Eila C.; Davicioni, Elai; Daneshmand, Siamak; Black, Peter C.

    2014-01-01

    Background Nearly half of muscle-invasive bladder cancer patients succumb to their disease following cystectomy. Selecting candidates for adjuvant therapy is currently based on clinical parameters with limited predictive power. This study aimed to develop and validate genomic-based signatures that can better identify patients at risk for recurrence than clinical models alone. Methods Transcriptome-wide expression profiles were generated using 1.4 million feature-arrays on archival tumors from 225 patients who underwent radical cystectomy and had muscle-invasive and/or node-positive bladder cancer. Genomic (GC) and clinical (CC) classifiers for predicting recurrence were developed on a discovery set (n = 133). Performances of GC, CC, an independent clinical nomogram (IBCNC), and genomic-clinicopathologic classifiers (G-CC, G-IBCNC) were assessed in the discovery and independent validation (n = 66) sets. GC was further validated on four external datasets (n = 341). Discrimination and prognostic abilities of classifiers were compared using area under receiver-operating characteristic curves (AUCs). All statistical tests were two-sided. Results A 15-feature GC was developed on the discovery set with area under curve (AUC) of 0.77 in the validation set. This was higher than individual clinical variables, IBCNC (AUC = 0.73), and comparable to CC (AUC = 0.78). Performance was improved upon combining GC with clinical nomograms (G-IBCNC, AUC = 0.82; G-CC, AUC = 0.86). G-CC high-risk patients had elevated recurrence probabilities (P < .001), with GC being the best predictor by multivariable analysis (P = .005). Genomic-clinicopathologic classifiers outperformed clinical nomograms by decision curve and reclassification analyses. GC performed the best in validation compared with seven prior signatures. GC markers remained prognostic across four independent datasets. Conclusions The validated genomic-based classifiers outperform clinical models for predicting postcystectomy bladder cancer recurrence. This may be used to better identify patients who need more aggressive management. PMID:25344601

  20. Discovery of the first maize-infecting mastrevirus in the Americas using a vector-enabled metagenomics approach.

    PubMed

    Fontenele, Rafaela S; Alves-Freitas, Dione M T; Silva, Pedro I T; Foresti, Josemar; Silva, Paulo R; Godinho, Márcio T; Varsani, Arvind; Ribeiro, Simone G

    2018-01-01

    The genus Mastrevirus (family Geminiviridae) is composed of single-stranded DNA viruses that infect mono- and dicotyledonous plants and are transmitted by leafhoppers. In South America, there have been only two previous reports of mastreviruses, both identified in sweet potatoes (from Peru and Uruguay). As part of a general viral surveillance program, we used a vector-enabled metagenomics (VEM) approach and sampled leafhoppers (Dalbulus maidis) in Itumbiara (State of Goiás), Brazil. High-throughput sequencing of viral DNA purified from the leafhopper sample revealed mastrevirus-like contigs. Using a set of abutting primers, a 2746-nt circular genome was recovered. The circular genome has a typical mastrevirus genome organization and shares <63% pairwise identity with other mastrevirus isolates from around the world. Therefore, the new mastrevirus was tentatively named "maize striate mosaic virus". Seventeen maize leaf samples were collected in the same field as the leafhoppers, and ten samples were found to be positive for this mastrevirus. Furthermore, the ten genomes recovered from the maize samples share >99% pairwise identity with the one from the leafhopper. This is the first report of a maize-infecting mastrevirus in the Americas, the first identified in a non-vegetatively propagated mastrevirus host in South America, and the first mastrevirus to be identified in Brazil.

  1. Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks

    PubMed Central

    Samusik, Nikolay; Wang, Xiaowei; Guan, Leying; Nolan, Garry P.

    2017-01-01

    Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject. PMID:29281633

  2. Pharmacokinetic de-risking tools for selection of monoclonal antibody lead candidates

    PubMed Central

    Dostalek, Miroslav; Prueksaritanont, Thomayant; Kelley, Robert F.

    2017-01-01

    ABSTRACT Pharmacokinetic studies play an important role in all stages of drug discovery and development. Recent advancements in the tools for discovery and optimization of therapeutic proteins have created an abundance of candidates that may fulfill target product profile criteria. Implementing a set of in silico, small scale in vitro and in vivo tools can help to identify a clinical lead molecule with promising properties at the early stages of drug discovery, thus reducing the labor and cost in advancing multiple candidates toward clinical development. In this review, we describe tools that should be considered during drug discovery, and discuss approaches that could be included in the pharmacokinetic screening part of the lead candidate generation process to de-risk unexpected pharmacokinetic behaviors of Fc-based therapeutic proteins, with an emphasis on monoclonal antibodies. PMID:28463063

  3. Knowledge discovery with classification rules in a cardiovascular dataset.

    PubMed

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  4. [Definition of a "domestic-setting corpse"--a retrospective study of 211 discoveries].

    PubMed

    Merz, Marius; Heidorn, Frank; Birngruber, Christoph G; Ramsthaler, Frank; Risse, Manfred; Kreutz, Kerstin; Krähahn, Jonathan; Verhoff, Marcel A

    2012-01-01

    In Germany, the term "domestic-setting corpse" is regularly used both in the medicolegal field (daily work, specialist literature) and by the general public (press, novels). The only formal definition of the term is in the German-language textbook "Basiswissen Rechtsmedizin" (Madea and Dettmeyer 2007). In this retrospective study, we compared the criteria for this definition with our findings. Autopsy reports from the Institute of Forensic Medicine at the Justus Liebig University in Giessen, Germany, for the period between 2005 and 2011 (including February), were reviewed retropectively to see if the criteria for this formal definition could be found. We chose a postmortem interval of more than 24 hours and discovery of the corpse in a private home as inclusion criteria for our study (n = 211). We could verify four of the criteria for the definition ("advanced signs of decomposition", "reclusiveness", "unclear cause of death", "difficult to identify") in our study. One criterion ("frequently a long postmortem interval") was too vague to be of use, and two further criteria ("discovery circumstances" and "high frequency of active alcohol dependence") could only be partially confirmed. In almost half of our cases there were, however, signs of general substance abuse. The proportion of male "domestic-setting corpses" was distinctly higher than that of females (approx. 3:1). The average age-at-death was 50.1 years for men, and 57.8 years for women, and thus clearly below the average life expectancies. In over half of the cases - even those with explicitly mentioned advanced facial decay--the identification method had not been noted. In the formal definition, the criteria "discovery circumstances" and "alcoholism" thus need to be more precisely defined. Also, due to the inexplicit time range, the criterion "frequently a long postmortem interval" was too vague to be applied to, or compared with, our cases as a classic criterion. We suggest specifying a minimum postmortem interval of 24 hours for "domestic-setting corpses". In addition, more attention should be paid to the identification of "domestic-setting corpses". To date, investigation authorities frequently seem to assume that a corpse discovered in a private residence is that of the home owner or occupant.

  5. A machine learning approach to computer-aided molecular design

    NASA Astrophysics Data System (ADS)

    Bolis, Giorgio; Di Pace, Luigi; Fabrocini, Filippo

    1991-12-01

    Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one — the specialization step — the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase — the generalization step — the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process.

  6. Comparative Analysis of Mass Spectral Similarity Measures on Peak Alignment for Comprehensive Two-Dimensional Gas Chromatography Mass Spectrometry

    PubMed Central

    2013-01-01

    Peak alignment is a critical procedure in mass spectrometry-based biomarker discovery in metabolomics. One of peak alignment approaches to comprehensive two-dimensional gas chromatography mass spectrometry (GC×GC-MS) data is peak matching-based alignment. A key to the peak matching-based alignment is the calculation of mass spectral similarity scores. Various mass spectral similarity measures have been developed mainly for compound identification, but the effect of these spectral similarity measures on the performance of peak matching-based alignment still remains unknown. Therefore, we selected five mass spectral similarity measures, cosine correlation, Pearson's correlation, Spearman's correlation, partial correlation, and part correlation, and examined their effects on peak alignment using two sets of experimental GC×GC-MS data. The results show that the spectral similarity measure does not affect the alignment accuracy significantly in analysis of data from less complex samples, while the partial correlation performs much better than other spectral similarity measures when analyzing experimental data acquired from complex biological samples. PMID:24151524

  7. S-Band POSIX Device Drivers for RTEMS

    NASA Technical Reports Server (NTRS)

    Lux, James P.; Lang, Minh; Peters, Kenneth J.; Taylor, Gregory H.

    2011-01-01

    This is a set of POSIX device driver level abstractions in the RTEMS RTOS (Real-Time Executive for Multiprocessor Systems real-time operating system) to SBand radio hardware devices that have been instantiated in an FPGA (field-programmable gate array). These include A/D (analog-to-digital) sample capture, D/A (digital-to-analog) sample playback, PLL (phase-locked-loop) tuning, and PWM (pulse-width-modulation)-controlled gain. This software interfaces to Sband radio hardware in an attached Xilinx Virtex-2 FPGA. It uses plug-and-play device discovery to map memory to device IDs. Instead of interacting with hardware devices directly, using direct-memory mapped access at the application level, this driver provides an application programming interface (API) offering that easily uses standard POSIX function calls. This simplifies application programming, enables portability, and offers an additional level of protection to the hardware. There are three separate device drivers included in this package: sband_device (ADC capture and DAC playback), pll_device (RF front end PLL tuning), and pwm_device (RF front end AGC control).

  8. KSC-2009-4560

    NASA Image and Video Library

    2009-08-09

    CAPE CANAVERAL, Fla. – On Launch Pad 39A, the payload ground-handling mechanism moves back after placing the multi-purpose logistics module Leonardo in space shuttle Discovery's payload bay. Leonardo is the primary payload on Discovery's STS-128 mission to the International Space Station. Beneath the module is the Lightweight Multi-Purpose Experiment Support Structure Carrier. Discovery will deliver 33,000 pounds of equipment to the station, including science and storage racks, a freezer to store research samples, a new sleeping compartment and the COLBERT treadmill. Launch is targeted for late August. Photo credit: NASA/Jack Pfaller

  9. Sequence-Based Discovery Demonstrates That Fixed Light Chain Human Transgenic Rats Produce a Diverse Repertoire of Antigen-Specific Antibodies.

    PubMed

    Harris, Katherine E; Aldred, Shelley Force; Davison, Laura M; Ogana, Heather Anne N; Boudreau, Andrew; Brüggemann, Marianne; Osborn, Michael; Ma, Biao; Buelow, Benjamin; Clarke, Starlynn C; Dang, Kevin H; Iyer, Suhasini; Jorgensen, Brett; Pham, Duy T; Pratap, Payal P; Rangaswamy, Udaya S; Schellenberger, Ute; van Schooten, Wim C; Ugamraj, Harshad S; Vafa, Omid; Buelow, Roland; Trinklein, Nathan D

    2018-01-01

    We created a novel transgenic rat that expresses human antibodies comprising a diverse repertoire of heavy chains with a single common rearranged kappa light chain (IgKV3-15-JK1). This fixed light chain animal, called OmniFlic, presents a unique system for human therapeutic antibody discovery and a model to study heavy chain repertoire diversity in the context of a constant light chain. The purpose of this study was to analyze heavy chain variable gene usage, clonotype diversity, and to describe the sequence characteristics of antigen-specific monoclonal antibodies (mAbs) isolated from immunized OmniFlic animals. Using next-generation sequencing antibody repertoire analysis, we measured heavy chain variable gene usage and the diversity of clonotypes present in the lymph node germinal centers of 75 OmniFlic rats immunized with 9 different protein antigens. Furthermore, we expressed 2,560 unique heavy chain sequences sampled from a diverse set of clonotypes as fixed light chain antibody proteins and measured their binding to antigen by ELISA. Finally, we measured patterns and overall levels of somatic hypermutation in the full B-cell repertoire and in the 2,560 mAbs tested for binding. The results demonstrate that OmniFlic animals produce an abundance of antigen-specific antibodies with heavy chain clonotype diversity that is similar to what has been described with unrestricted light chain use in mammals. In addition, we show that sequence-based discovery is a highly effective and efficient way to identify a large number of diverse monoclonal antibodies to a protein target of interest.

  10. Development of a high-throughput brain slice method for studying drug distribution in the central nervous system.

    PubMed

    Fridén, Markus; Ducrozet, Frederic; Middleton, Brian; Antonsson, Madeleine; Bredberg, Ulf; Hammarlund-Udenaes, Margareta

    2009-06-01

    New, more efficient methods of estimating unbound drug concentrations in the central nervous system (CNS) combine the amount of drug in whole brain tissue samples measured by conventional methods with in vitro estimates of the unbound brain volume of distribution (V(u,brain)). Although the brain slice method is the most reliable in vitro method for measuring V(u,brain), it has not previously been adapted for the needs of drug discovery research. The aim of this study was to increase the throughput and optimize the experimental conditions of this method. Equilibrium of drug between the buffer and the brain slice within the 4 to 5 h of incubation is a fundamental requirement. However, it is difficult to meet this requirement for many of the extensively binding, lipophilic compounds in drug discovery programs. In this study, the dimensions of the incubation vessel and mode of stirring influenced the equilibration time, as did the amount of brain tissue per unit of buffer volume. The use of cassette experiments for investigating V(u,brain) in a linear drug concentration range increased the throughput of the method. The V(u,brain) for the model compounds ranged from 4 to 3000 ml . g brain(-1), and the sources of variability are discussed. The optimized setup of the brain slice method allows precise, robust estimation of V(u,brain) for drugs with diverse properties, including highly lipophilic compounds. This is a critical step forward for the implementation of relevant measurements of CNS exposure in the drug discovery setting.

  11. DocCube: Multi-Dimensional Visualization and Exploration of Large Document Sets.

    ERIC Educational Resources Information Center

    Mothe, Josiane; Chrisment, Claude; Dousset, Bernard; Alaux, Joel

    2003-01-01

    Describes a user interface that provides global visualizations of large document sets to help users formulate the query that corresponds to their information needs. Highlights include concept hierarchies that users can browse to specify and refine information needs; knowledge discovery in databases and texts; and multidimensional modeling.…

  12. Chromatogram-Bioactivity Correlation-Based Discovery and Identification of Three Bioactive Compounds Affecting Endothelial Function in Ginkgo Biloba Extract.

    PubMed

    Liu, Hong; Tan, Li-Ping; Huang, Xin; Liao, Yi-Qiu; Zhang, Wei-Jian; Li, Pei-Bo; Wang, Yong-Gang; Peng, Wei; Wu, Zhong; Su, Wei-Wei; Yao, Hong-Liang

    2018-05-03

    Discovery and identification of three bioactive compounds affecting endothelial function in Ginkgo biloba Extract (GBE) based on chromatogram-bioactivity correlation analysis. Three portions were separated from GBE via D101 macroporous resin and then re-combined to prepare nine GBE samples. 21 compounds in GBE samples were identified through UFLC-DAD-Q-TOF-MS/MS. Correlation analysis between compounds differences and endothelin-1 (ET-1) in vivo in nine GBE samples was conducted. The analysis results indicated that three bioactive compounds had close relevance to ET-1: Kaempferol-3- O -α-l-glucoside, 3- O -{2- O -{6- O -[P-OH-trans-cinnamoyl]-β-d-glucosyl}-α-rhamnosyl} Quercetin isomers, and 3- O -{2- O -{6- O -[P-OH-trans-cinnamoyl]-β-d-glucosyl}-α-rhamnosyl} Kaempferide. The discovery of bioactive compounds could provide references for the quality control and novel pharmaceuticals development of GRE. The present work proposes a feasible chromatogram-bioactivity correlation based approach to discover the compounds and define their bioactivities for the complex multi-component systems.

  13. Applying Knowledge Discovery in Databases in Public Health Data Set: Challenges and Concerns

    PubMed Central

    Volrathongchia, Kanittha

    2003-01-01

    In attempting to apply Knowledge Discovery in Databases (KDD) to generate a predictive model from a health care dataset that is currently available to the public, the first step is to pre-process the data to overcome the challenges of missing data, redundant observations, and records containing inaccurate data. This study will demonstrate how to use simple pre-processing methods to improve the quality of input data. PMID:14728545

  14. A Discovery Process for Initializing Ad Hoc Underwater Acoustic Networks

    DTIC Science & Technology

    2008-12-01

    the ping utility packet is set to global address 0, its function becomes a broadcast ping and it elicits echoes from all neighboring nodes within...destination. At the Seaweb server, a global neighbor table and a global routing table are maintained to support network configurability. 2. Cellular...aggregates the received peer discovery data in a global neighbor table and ultimately decides how routing to each branch node should be configured

  15. Barratt on middeck

    NASA Image and Video Library

    2011-02-25

    S133-E-006036 (25 Feb. 2011) --- Astronaut Michael Barratt, STS-133 mission specialist, works with the Microbe Group Activation Pack containing eight Fluid Processing Apparatuses on the middeck of space shuttle Discovery while en route to a rendezvous with the International Space Station. A previous set of similar tests made a key discovery about the mechanism that makes salmonella more infectious, aiding the fight against food poisoning on Earth. Photo credit: NASA or National Aeronautics and Space Administration

  16. Gene selection for tumor classification using neighborhood rough sets and entropy measures.

    PubMed

    Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu

    2017-03-01

    With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability.

    PubMed

    Hunt, Karen A; Mistry, Vanisha; Bockett, Nicholas A; Ahmad, Tariq; Ban, Maria; Barker, Jonathan N; Barrett, Jeffrey C; Blackburn, Hannah; Brand, Oliver; Burren, Oliver; Capon, Francesca; Compston, Alastair; Gough, Stephen C L; Jostins, Luke; Kong, Yong; Lee, James C; Lek, Monkol; MacArthur, Daniel G; Mansfield, John C; Mathew, Christopher G; Mein, Charles A; Mirza, Muddassar; Nutland, Sarah; Onengut-Gumuscu, Suna; Papouli, Efterpi; Parkes, Miles; Rich, Stephen S; Sawcer, Steven; Satsangi, Jack; Simmonds, Matthew J; Trembath, Richard C; Walker, Neil M; Wozniak, Eva; Todd, John A; Simpson, Michael A; Plagnol, Vincent; van Heel, David A

    2013-06-13

    Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.

  18. A Thousand Fly Genomes: An Expanded Drosophila Genome Nexus.

    PubMed

    Lack, Justin B; Lange, Jeremy D; Tang, Alison D; Corbett-Detig, Russell B; Pool, John E

    2016-12-01

    The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. Telling plant species apart with DNA: from barcodes to genomes

    PubMed Central

    Li, De-Zhu; van der Bank, Michelle

    2016-01-01

    Land plants underpin a multitude of ecosystem functions, support human livelihoods and represent a critically important component of terrestrial biodiversity—yet many tens of thousands of species await discovery, and plant identification remains a substantial challenge, especially where material is juvenile, fragmented or processed. In this opinion article, we tackle two main topics. Firstly, we provide a short summary of the strengths and limitations of plant DNA barcoding for addressing these issues. Secondly, we discuss options for enhancing current plant barcodes, focusing on increasing discriminatory power via either gene capture of nuclear markers or genome skimming. The former has the advantage of establishing a defined set of target loci maximizing efficiency of sequencing effort, data storage and analysis. The challenge is developing a probe set for large numbers of nuclear markers that works over sufficient phylogenetic breadth. Genome skimming has the advantage of using existing protocols and being backward compatible with existing barcodes; and the depth of sequence coverage can be increased as sequencing costs fall. Its non-targeted nature does, however, present a major informatics challenge for upscaling to large sample sets. This article is part of the themed issue ‘From DNA barcodes to biomes’. PMID:27481790

  20. Expression and Genomic Profiling of Minute Breast Cancer Samples. Addendum

    DTIC Science & Technology

    2007-07-01

    2094. 10. El Gedaily, A., Bubendorf, L., Willi, N., Fu, W., Richter, J., Moch , H., Mihatsch, M.J., Sauter, G. and Gasser, T.C. (2001) Discovery of new...10. El Gedaily,A., Bubendorf,L., Willi,N., Fu,W., Richter,J., Moch ,H., Mihatsch,M.J., Sauter,G. and Gasser,T.C. (2001) Discovery of new DNA ampli

  1. Expression and Genomic Profiling of Minute Breast Cancer Samples

    DTIC Science & Technology

    2006-07-01

    Gedaily, A., Bubendorf, L., Willi, N., Fu, W., Richter, J., Moch , H., Mihatsch, M.J., Sauter, G. and Gasser, T.C. (2001) Discovery of new DNA amplification...10. El Gedaily,A., Bubendorf,L., Willi,N., Fu,W., Richter,J., Moch ,H., Mihatsch,M.J., Sauter,G. and Gasser,T.C. (2001) Discovery of new DNA ampli

  2. Assessment of composite motif discovery methods.

    PubMed

    Klepper, Kjetil; Sandve, Geir K; Abul, Osman; Johansen, Jostein; Drablos, Finn

    2008-02-26

    Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery - discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.

  3. Earth observations taken from orbiter Discovery during STS-91 mission

    NASA Image and Video Library

    2016-08-24

    STS091-713-061 (2-12 June 1998) --- The vertical stabilizer of the Space Shuttle Discovery runs through this Atlantic Ocean image made from its crew cabin. Many sets of internal waves are seen in the 70mm frame traveling through an area off the Atlantic coast of Nova Scotia, Canada. There are seven sets that run perpendicular to each other. Internal waves are tidally induced and travel below the surface of the ocean along a density change which occurs often around 150 feet deep. According to NASA scientists studying the STS-91 collection, the waves are visible because, as the wave action smoothes out the smaller waves on the surface, the manner in which the sun is reflected is changed.

  4. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    PubMed

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  5. Optimal selection of markers for validation or replication from genome-wide association studies.

    PubMed

    Greenwood, Celia M T; Rangrej, Jagadish; Sun, Lei

    2007-07-01

    With reductions in genotyping costs and the fast pace of improvements in genotyping technology, it is not uncommon for the individuals in a single study to undergo genotyping using several different platforms, where each platform may contain different numbers of markers selected via different criteria. For example, a set of cases and controls may be genotyped at markers in a small set of carefully selected candidate genes, and shortly thereafter, the same cases and controls may be used for a genome-wide single nucleotide polymorphism (SNP) association study. After such initial investigations, often, a subset of "interesting" markers is selected for validation or replication. Specifically, by validation, we refer to the investigation of associations between the selected subset of markers and the disease in independent data. However, it is not obvious how to choose the best set of markers for this validation. There may be a prior expectation that some sets of genotyping data are more likely to contain real associations. For example, it may be more likely for markers in plausible candidate genes to show disease associations than markers in a genome-wide scan. Hence, it would be desirable to select proportionally more markers from the candidate gene set. When a fixed number of markers are selected for validation, we propose an approach for identifying an optimal marker-selection configuration by basing the approach on minimizing the stratified false discovery rate. We illustrate this approach using a case-control study of colorectal cancer from Ontario, Canada, and we show that this approach leads to substantial reductions in the estimated false discovery rates in the Ontario dataset for the selected markers, as well as reductions in the expected false discovery rates for the proposed validation dataset. Copyright 2007 Wiley-Liss, Inc.

  6. Comparative mRNA analysis of behavioral and genetic mouse models of aggression.

    PubMed

    Malki, Karim; Tosto, Maria G; Pain, Oliver; Sluyter, Frans; Mineur, Yann S; Crusio, Wim E; de Boer, Sietse; Sandnabba, Kenneth N; Kesserwani, Jad; Robinson, Edward; Schalkwyk, Leonard C; Asherson, Philip

    2016-04-01

    Mouse models of aggression have traditionally compared strains, most notably BALB/cJ and C57BL/6. However, these strains were not designed to study aggression despite differences in aggression-related traits and distinct reactivity to stress. This study evaluated expression of genes differentially regulated in a stress (behavioral) mouse model of aggression with those from a recent genetic mouse model aggression. The study used a discovery-replication design using two independent mRNA studies from mouse brain tissue. The discovery study identified strain (BALB/cJ and C57BL/6J) × stress (chronic mild stress or control) interactions. Probe sets differentially regulated in the discovery set were intersected with those uncovered in the replication study, which evaluated differences between high and low aggressive animals from three strains specifically bred to study aggression. Network analysis was conducted on overlapping genes uncovered across both studies. A significant overlap was found with the genetic mouse study sharing 1,916 probe sets with the stress model. Fifty-one probe sets were found to be strongly dysregulated across both studies mapping to 50 known genes. Network analysis revealed two plausible pathways including one centered on the UBC gene hub which encodes ubiquitin, a protein well-known for protein degradation, and another on P38 MAPK. Findings from this study support the stress model of aggression, which showed remarkable molecular overlap with a genetic model. The study uncovered a set of candidate genes including the Erg2 gene, which has previously been implicated in different psychopathologies. The gene networks uncovered points at a Redox pathway as potentially being implicated in aggressive related behaviors. © 2016 Wiley Periodicals, Inc.

  7. Global Landscape of a Co-Expressed Gene Network in Barley and its Application to Gene Discovery in Triticeae Crops

    PubMed Central

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2011-01-01

    Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235

  8. Interpreting linear support vector machine models with heat map molecule coloring

    PubMed Central

    2011-01-01

    Background Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat map molecule coloring technique to interpret linear support vector machine models. Based on the weights of a linear model, the visualization approach colors each atom and bond of a compound according to its importance for activity. Results We evaluated our approach on a toxicity data set, a chromosome aberration data set, and the maximum unbiased validation data sets. The experiments show that our method sensibly visualizes structure-property and structure-activity relationships of a linear support vector machine model. The coloring of ligands in the binding pocket of several crystal structures of a maximum unbiased validation data set target indicates that our approach assists to determine the correct ligand orientation in the binding pocket. Additionally, the heat map coloring enables the identification of substructures important for the binding of an inhibitor. Conclusions In combination with heat map coloring, linear support vector machine models can help to guide the modification of a compound in later stages of drug discovery. Particularly substructures identified as important by our method might be a starting point for optimization of a lead compound. The heat map coloring should be considered as complementary to structure based modeling approaches. As such, it helps to get a better understanding of the binding mode of an inhibitor. PMID:21439031

  9. Metagenomic Analysis of Upwelling-Affected Brazilian Coastal Seawater Reveals Sequence Domains of Type I PKS and Modular NRPS

    PubMed Central

    Cuadrat, Rafael R. C.; Cury, Juliano C.; Dávila, Alberto M. R.

    2015-01-01

    Marine environments harbor a wide range of microorganisms from the three domains of life. These microorganisms have great potential to enable discovery of new enzymes and bioactive compounds for industrial use. However, only ~1% of microorganisms from the environment can currently be identified through cultured isolates, limiting the discovery of new compounds. To overcome this limitation, a metagenomics approach has been widely adopted for biodiversity studies on samples from marine environments. In this study, we screened metagenomes in order to estimate the potential for new natural compound synthesis mediated by diversity in the Polyketide Synthase (PKS) and Nonribosomal Peptide Synthetase (NRPS) genes. The samples were collected from the Praia dos Anjos (Angel’s Beach) surface water—Arraial do Cabo (Rio de Janeiro state, Brazil), an environment affected by upwelling. In order to evaluate the potential for screening natural products in Arraial do Cabo samples, we used KS (keto-synthase) and C (condensation) domains (from PKS and NRPS, respectively) to build Hidden Markov Models (HMM) models. From both samples, a total of 84 KS and 46 C novel domain sequences were obtained, showing the potential of this environment for the discovery of new genes of biotechnological interest. These domains were classified by phylogenetic analysis and this was the first study conducted to screen PKS and NRPS genes in an upwelling affected sample PMID:26633360

  10. Sampling Operations on Big Data

    DTIC Science & Technology

    2015-11-29

    gories. These include edge sampling methods where edges are selected by a predetermined criteria; snowball sampling methods where algorithms start... Sampling Operations on Big Data Vijay Gadepally, Taylor Herr, Luke Johnson, Lauren Milechin, Maja Milosavljevic, Benjamin A. Miller Lincoln...process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and

  11. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

    PubMed

    Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

    2017-01-01

    A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

  12. The association of genome-wide significant spirometric loci with chronic obstructive pulmonary disease susceptibility.

    PubMed

    Castaldi, Peter J; Cho, Michael H; Litonjua, Augusto A; Bakke, Per; Gulsvik, Amund; Lomas, David A; Anderson, Wayne; Beaty, Terri H; Hokanson, John E; Crapo, James D; Laird, Nan; Silverman, Edwin K

    2011-12-01

    Two recent metaanalyses of genome-wide association studies conducted by the CHARGE and SpiroMeta consortia identified novel loci yielding evidence of association at or near genome-wide significance (GWS) with FEV(1) and FEV(1)/FVC. We hypothesized that a subset of these markers would also be associated with chronic obstructive pulmonary disease (COPD) susceptibility. Thirty-two single-nucleotide polymorphisms (SNPs) in or near 17 genes in 11 previously identified GWS spirometric genomic regions were tested for association with COPD status in four COPD case-control study samples (NETT/NAS, the Norway case-control study, ECLIPSE, and the first 1,000 subjects in COPDGene; total sample size, 3,456 cases and 1,906 controls). In addition to testing the 32 spirometric GWS SNPs, we tested a dense panel of imputed HapMap2 SNP markers from the 17 genes located near the 32 GWS SNPs and in a set of 21 well studied COPD candidate genes. Of the previously identified GWS spirometric genomic regions, three loci harbored SNPs associated with COPD susceptibility at a 5% false discovery rate: the 4q24 locus including FLJ20184/INTS12/GSTCD/NPNT, the 6p21 locus including AGER and PPT2, and the 5q33 locus including ADAM19. In conclusion, markers previously associated at or near GWS with spirometric measures were tested for association with COPD status in data from four COPD case-control studies, and three loci showed evidence of association with COPD susceptibility at a 5% false discovery rate.

  13. Story Telling With Storyboards: Enhancements and Experiences

    NASA Astrophysics Data System (ADS)

    King, T. A.; Grayzeck, E. J.; Galica, C.; Erickson, K. J.

    2016-12-01

    A year ago a tool to help tell stories, called the Planetary Data Storyboard, was introduced. This tool is designed to use today's technologies to tell stories that are rich multi-media experiences, blending text, animations, movies and infographics. The Storyboard tool presents a set of panels that contain representative images of an event with associated notes or instructions. The panels are arranged in a timeline that allow a user to experience a discovery or event in the same way it occurred. Each panel can link to a more detailed source such as a publication, the data that was collected or items derived from the research (like movies or animations). A storyboard can be used to make science discovery more accessible to people by presenting events in an easy to follow layout. A storyboard can also help to teach the scientific method, by following the experiences of a researcher as they investigate a phenomenon or try to understand a new set of observations. We present the new features of Storyboard tool and show example stories for scientific discoveries.

  14. Shell appraising deepwater discovery off Philippines

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scherer, M.; Lambers, E.J.T.; Steffens, G.S.

    1993-05-10

    Shell International Petroleum Co. Ltd. negotiated a farmout in 1990 from Occidental International Exploration and Production Co. for Block SC-38 in the South China Sea off Palawan, Philippines, following Oxy's discovery of gas in 1989 in a Miocene Nido limestone buildup. Under the terms of the farmout agreement, Shell became operator with a 50% share. Following the disappointing well North Iloc 1, Shell was successful in finding oil and gas in Malampaya 1. Water 700-1,000 m deep, remoteness, and adverse weather conditions have imposed major challenges for offshore operations. The paper describes the tectonic setting; the Nido limestone play; themore » Malampaya discovery; and Shell's appraisal studies.« less

  15. Effectiveness of discovery learning model on mathematical problem solving

    NASA Astrophysics Data System (ADS)

    Herdiana, Yunita; Wahyudin, Sispiyati, Ririn

    2017-08-01

    This research is aimed to describe the effectiveness of discovery learning model on mathematical problem solving. This research investigate the students' problem solving competency before and after learned by using discovery learning model. The population used in this research was student in grade VII in one of junior high school in West Bandung Regency. From nine classes, class VII B were randomly selected as the sample of experiment class, and class VII C as control class, which consist of 35 students every class. The method in this research was quasi experiment. The instrument in this research is pre-test, worksheet and post-test about problem solving of mathematics. Based on the research, it can be conclude that the qualification of problem solving competency of students who gets discovery learning model on level 80%, including in medium category and it show that discovery learning model effective to improve mathematical problem solving.

  16. Biomarker Discovery Using New Metabolomics Software for Automated Processing of High Resolution LC-MS Data

    PubMed Central

    Hnatyshyn, S.; Reily, M.; Shipkova, P.; McClure, T.; Sanders, M.; Peake, D.

    2011-01-01

    Robust biomarkers of target engagement and efficacy are required in different stages of drug discovery. Liquid chromatography coupled to high resolution mass spectrometry provides sensitivity, accuracy and wide dynamic range required for identification of endogenous metabolites in biological matrices. LCMS is widely-used tool for biomarker identification and validation. Typical high resolution LCMS profiles from biological samples may contain greater than a million mass spectral peaks corresponding to several thousand endogenous metabolites. Reduction of the total number of peaks, component identification and statistical comparison across sample groups remains to be a difficult and time consuming challenge. Blood samples from four groups of rats (male vs. female, fully satiated and food deprived) were analyzed using high resolution accurate mass (HRAM) LCMS. All samples were separated using a 15 minute reversed-phase C18 LC gradient and analyzed in both positive and negative ion modes. Data was acquired using 15K resolution and 5ppm mass measurement accuracy. The entire data set was analyzed using software developed in collaboration between Bristol Meyers Squibb and Thermo Fisher Scientific to determine the metabolic effects of food deprivation on rats. Metabolomic LC-MS data files are extraordinarily complex and appropriate reduction of the number of spectral peaks via identification of related peaks and background removal is essential. A single component such as hippuric acid generates more than 20 related peaks including isotopic clusters, adducts and dimers. Plasma and urine may contain 500-1500 unique quantifiable metabolites. Noise filtering approaches including blank subtraction were used to reduce the number of irrelevant peaks. By grouping related signals such as isotopic peaks and alkali adducts, data processing was greatly simplified by reducing the total number of components by 10-fold. The software processes 48 samples in under 60minutes. Principle Component Analysis showed substantial differences in endogenous metabolites levels between the animal groups. Annotation of components was accomplished via searching the ChemSpider database. Tentative assignments made using accurate mass need further verification by comparison with the retention time of authentic standards.

  17. Association of Genetic Variants in the Neurotrophic Receptor–Encoding Gene NTRK2 and a Lifetime History of Suicide Attempts in Depressed Patients

    PubMed Central

    Kohli, Martin A.; Salyakina, Daria; Pfennig, Andrea; Lucae, Susanne; Horstmann, Sonja; Menke, Andreas; Kloiber, Stefan; Hennings, Johannes; Bradley, Bekh B.; Ressler, Kerry J.; Uhr, Manfred; Müller-Myhsok, Bertram; Holsboer, Florian; Binder, Elisabeth B.

    2013-01-01

    Context A consistent body of evidence supports a role of reduced neurotrophic signaling in the pathophysiology of major depressive disorder (MDD) and suicidal behavior. Especially in suicide victims, lower postmortem brain messenger RNA and protein levels of neurotrophins and their receptors have been reported. Objective To determine whether the brain-derived neurotrophic factor (BDNF) gene or its high-affinity receptor gene, receptor tyrosine kinase 2 (NTRK2), confer risk for suicide attempt (SA) and MDD by investigating common genetic variants in these loci. Design Eighty-three tagging single-nucleotide polymorphisms (SNPs) covering the genetic variability of these loci in European populations were assessed in a casecontrol association design. Setting Inpatients and screened control subjects. Participants The discovery sample consisted of 394 depressed patients, of whom 113 had SA, and 366 matched healthy control subjects. The replication studies comprised 744 German patients with MDD and 921 African American nonpsychiatric clinic patients, of whom 152 and 119 were positive for SA, respectively. Interventions Blood or saliva samples were collected from each participant for DNA extraction and genotyping. Main Outcome Measures Associations of SNPs in BDNF and NTRK2 with SA and MDD. Results Independent SNPs within NTRK2 were associated with SA among depressed patients of the discovery sample that could be confirmed in both the German and African American replication samples. Multilocus interaction analysis revealed that single SNP associations within this locus contribute to the risk of SA in a multiplicative and interactive fashion (P = 4.7× 10−7 for a 3-SNP model in the combined German sample). The effect size was 4.5 (95% confidence interval, 2.1–9.8) when patients carrying risk genotypes in all 3 markers were compared with those without any of the 3 risk genotypes. Conclusions Our results suggest that a combination of several independent risk alleles within the NTRK2 locus is associated with SA in depressed patients, further supporting a role of neurotrophins in the pathophysiology of suicide. PMID:20124106

  18. A renaissance of neural networks in drug discovery.

    PubMed

    Baskin, Igor I; Winkler, David; Tetko, Igor V

    2016-08-01

    Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.

  19. At the cross-roads of participatory research and biomarker discovery in autism: the need for empirical data.

    PubMed

    Yusuf, Afiqah; Elsabbagh, Mayada

    2015-12-15

    Identifying biomarkers for autism can improve outcomes for those affected by autism. Engaging the diverse stakeholders in the research process using community-based participatory research (CBPR) can accelerate biomarker discovery into clinical applications. However, there are limited examples of stakeholder involvement in autism research, possibly due to conceptual and practical concerns. We evaluate the applicability of CBPR principles to biomarker discovery in autism and critically review empirical studies adopting these principles. Using a scoping review methodology, we identified and evaluated seven studies using CBPR principles in biomarker discovery. The limited number of studies in biomarker discovery adopting CBPR principles coupled with their methodological limitations suggests that such applications are feasible but challenging. These studies illustrate three CBPR themes: community assessment, setting global priorities, and collaboration in research design. We propose that further research using participatory principles would be useful in accelerating the pace of discovery and the development of clinically meaningful biomarkers. For this goal to be successful we advocate for increased attention to previously identified conceptual and methodological challenges to participatory approaches in health research, including improving scientific rigor and developing long-term partnerships among stakeholders.

  20. Where Will All Your Samples Go?

    NASA Astrophysics Data System (ADS)

    Lehnert, K.

    2017-12-01

    Even in the digital age, physical samples remain an essential component of Earth and space science research. Geoscientists collect samples, sometimes locally, often in remote locations during expensive field expeditions, or at sample repositories and museums. They take these samples to their labs to describe and analyze them. When the analyses are completed and the results are published, the samples get stored away in sheds, basements, or desk drawers, where they remain unknown and inaccessible to the broad science community. In some cases, they will get re-analyzed or shared with other researchers, who know of their existence through personal connections. The sad end comes when the researcher retires: There are many stories of samples and entire collections being discarded to free up space for new samples or other purposes, even though these samples may be unique and irreplaceable. Institutions do not feel obligated and do not have the resources to store samples in perpetuity. Only samples collected in large sampling campaigns such as the Ocean Discovery Program or cores taken on ships find a home in repositories that curate and preserve them for reuse in future science endeavors. In the era of open, transparent, and reproducible science, preservation and persistent access to samples must be considered a mandate. Policies need to be developed that guide investigators, institutions, and funding agencies to plan and implement solutions for reliably and persistently curating and providing access to samples. Registration of samples in online catalogs and use of persistent identifiers such as the International Geo Sample Number are first steps to ensure discovery and access of samples. But digital discovery and access loses its value if the physical objects are not preserved and accessible. It is unreasonable to expect that every sample ever collected can be archived. Selection of those samples that are worth preserving requires guidelines and policies. We also need to define standards that institutions must comply with to function as a trustworthy sample repository similar to trustworthy digital repositories. The iSamples Research Coordination Network of the EarthCube program aims to address some of these questions in workshops planned for 2018. This panel session offers an opportunity to ignite the discussion.

  1. A review of blood sample handling and pre-processing for metabolomics studies.

    PubMed

    Hernandes, Vinicius Veri; Barbas, Coral; Dudzik, Danuta

    2017-09-01

    Metabolomics has been found to be applicable to a wide range of clinical studies, bringing a new era for improving clinical diagnostics, early disease detection, therapy prediction and treatment efficiency monitoring. A major challenge in metabolomics, particularly untargeted studies, is the extremely diverse and complex nature of biological specimens. Despite great advances in the field there still exist fundamental needs for considering pre-analytical variability that can introduce bias to the subsequent analytical process and decrease the reliability of the results and moreover confound final research outcomes. Many researchers are mainly focused on the instrumental aspects of the biomarker discovery process, and sample related variables sometimes seem to be overlooked. To bridge the gap, critical information and standardized protocols regarding experimental design and sample handling and pre-processing are highly desired. Characterization of a range variation among sample collection methods is necessary to prevent results misinterpretation and to ensure that observed differences are not due to an experimental bias caused by inconsistencies in sample processing. Herein, a systematic discussion of pre-analytical variables affecting metabolomics studies based on blood derived samples is performed. Furthermore, we provide a set of recommendations concerning experimental design, collection, pre-processing procedures and storage conditions as a practical review that can guide and serve for the standardization of protocols and reduction of undesirable variation. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Data management integration for biomedical core facilities

    NASA Astrophysics Data System (ADS)

    Zhang, Guo-Qiang; Szymanski, Jacek; Wilson, David

    2007-03-01

    We present the design, development, and pilot-deployment experiences of MIMI, a web-based, Multi-modality Multi-Resource Information Integration environment for biomedical core facilities. This is an easily customizable, web-based software tool that integrates scientific and administrative support for a biomedical core facility involving a common set of entities: researchers; projects; equipments and devices; support staff; services; samples and materials; experimental workflow; large and complex data. With this software, one can: register users; manage projects; schedule resources; bill services; perform site-wide search; archive, back-up, and share data. With its customizable, expandable, and scalable characteristics, MIMI not only provides a cost-effective solution to the overarching data management problem of biomedical core facilities unavailable in the market place, but also lays a foundation for data federation to facilitate and support discovery-driven research.

  3. The challenges of educating the public about astrobiology via the mass media

    NASA Astrophysics Data System (ADS)

    Race, Margaret

    Scientific information in astrobiology is being generated at a pace that traditional textbooks cannot easily match. For the most part, students, teachers and the general public will continue to learn piecemeal about the latest advances in the field through headlines and mass media coverage centered around discoveries and new interpretations as they occur. Yet journalists and reporters are themselves unschooled in this emerging interdisciplinary field. While it is important to continue developing astrobiological curricular materials for future use by students in formal settings, it is equally important to find novel ways for educating the mass media in the interim. Current planning in anticipation of a Mars sample return mission has focused on a variety of ways to enlist the mass media in an educational as well as informational role.

  4. The center for causal discovery of biomedical knowledge from big data

    PubMed Central

    Bahar, Ivet; Becich, Michael J; Benos, Panayiotis V; Berg, Jeremy; Espino, Jeremy U; Glymour, Clark; Jacobson, Rebecca Crowley; Kienholz, Michelle; Lee, Adrian V; Lu, Xinghua; Scheines, Richard

    2015-01-01

    The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers. PMID:26138794

  5. PCSK9: From Basic Science Discoveries to Clinical Trials.

    PubMed

    Shapiro, Michael D; Tavori, Hagai; Fazio, Sergio

    2018-05-11

    Unknown 15 years ago, PCSK9 (proprotein convertase subtilisin/kexin type 9) is now common parlance among scientists and clinicians interested in prevention and treatment of atherosclerotic cardiovascular disease. What makes this story so special is not its recent discovery nor the fact that it uncovered previously unknown biology but rather that these important scientific insights have been translated into an effective medical therapy in record time. Indeed, the translation of this discovery to novel therapeutic serves as one of the best examples of how genetic insights can be leveraged into intelligent target drug discovery. The PCSK9 saga is unfolding quickly but is far from complete. Here, we review major scientific understandings as they relate to the role of PCSK9 in lipoprotein metabolism and atherosclerotic cardiovascular disease and the impact that therapies designed to inhibit its action are having in the clinical setting. © 2018 American Heart Association, Inc.

  6. Web-scale discovery in an academic health sciences library: development and implementation of the EBSCO Discovery Service.

    PubMed

    Thompson, Jolinda L; Obrig, Kathe S; Abate, Laura E

    2013-01-01

    Funds made available at the close of the 2010-11 fiscal year allowed purchase of the EBSCO Discovery Service (EDS) for a year-long trial. The appeal of this web-scale discovery product that offers a Google-like interface to library resources was counter-balanced by concerns about quality of search results in an academic health science setting and the challenge of configuring an interface that serves the needs of a diverse group of library users. After initial configuration, usability testing with library users revealed the need for further work before general release. Of greatest concern were continuing issues with the relevance of items retrieved, appropriateness of system-supplied facet terms, and user difficulties with navigating the interface. EBSCO has worked with the library to better understand and identify problems and solutions. External roll-out to users occurred in June 2012.

  7. Key aspects of the Novartis compound collection enhancement project for the compilation of a comprehensive chemogenomics drug discovery screening collection.

    PubMed

    Jacoby, Edgar; Schuffenhauer, Ansgar; Popov, Maxim; Azzaoui, Kamal; Havill, Benjamin; Schopfer, Ulrich; Engeloch, Caroline; Stanek, Jaroslav; Acklin, Pierre; Rigollier, Pascal; Stoll, Friederike; Koch, Guido; Meier, Peter; Orain, David; Giger, Rudolph; Hinrichs, Jürgen; Malagu, Karine; Zimmermann, Jürg; Roth, Hans-Joerg

    2005-01-01

    The NIBR (Novartis Institutes for BioMedical Research) compound collection enrichment and enhancement project integrates corporate internal combinatorial compound synthesis and external compound acquisition activities in order to build up a comprehensive screening collection for a modern drug discovery organization. The main purpose of the screening collection is to supply the Novartis drug discovery pipeline with hit-to-lead compounds for today's and the future's portfolio of drug discovery programs, and to provide tool compounds for the chemogenomics investigation of novel biological pathways and circuits. As such, it integrates designed focused and diversity-based compound sets from the synthetic and natural paradigms able to cope with druggable and currently deemed undruggable targets and molecular interaction modes. Herein, we will summarize together with new trends published in the literature, scientific challenges faced and key approaches taken at NIBR to match the chemical and biological spaces.

  8. KSC-2010-5488

    NASA Image and Video Library

    2010-11-03

    CAPE CANAVERAL, Fla. -- At NASA's Kennedy Space Center in Florida, xenon lights illuminate space shuttle Discovery on Launch Pad 39A following the retraction of the rotating service structure. The structure provides weather protection and access to the shuttle while it awaits lift off on the pad. Launch of Discovery on the STS-133 mission to the International Space Station is set for 3:29 p.m. on Nov. 4. During the 11-day mission, Discovery and its six crew members will deliver the Permanent Multipurpose Module, packed with supplies and critical spare parts, as well as Robonaut 2, to the orbiting laboratory. Discovery, which will fly its 39th mission, is scheduled to be retired following STS-133. This will be the 133rd Space Shuttle Program mission and the 35th shuttle voyage to the space station. For more information on STS-133, visit www.nasa.gov/mission_pages/shuttle/shuttlemissions/sts133/. Photo credit: NASA/Troy Cryder

  9. Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery.

    PubMed

    Hall, Richard J; Wang, Jing; Todd, Angela K; Bissielo, Ange B; Yen, Seiha; Strydom, Hugo; Moore, Nicole E; Ren, Xiaoyun; Huang, Q Sue; Carter, Philip E; Peacey, Matthew

    2014-01-01

    The discovery of new or divergent viruses using metagenomics and high-throughput sequencing has become more commonplace. The preparation of a sample is known to have an effect on the representation of virus sequences within the metagenomic dataset yet comparatively little attention has been given to this. Physical enrichment techniques are often applied to samples to increase the number of viral sequences and therefore enhance the probability of detection. With the exception of virus ecology studies, there is a paucity of information available to researchers on the type of sample preparation required for a viral metagenomic study that seeks to identify an aetiological virus in an animal or human diagnostic sample. A review of published virus discovery studies revealed the most commonly used enrichment methods, that were usually quick and simple to implement, namely low-speed centrifugation, filtration, nuclease-treatment (or combinations of these) which have been routinely used but often without justification. These were applied to a simple and well-characterised artificial sample composed of bacterial and human cells, as well as DNA (adenovirus) and RNA viruses (influenza A and human enterovirus), being either non-enveloped capsid or enveloped viruses. The effect of the enrichment method was assessed by both quantitative real-time PCR and metagenomic analysis that incorporated an amplification step. Reductions in the absolute quantities of bacteria and human cells were observed for each method as determined by qPCR, but the relative abundance of viral sequences in the metagenomic dataset remained largely unchanged. A 3-step method of centrifugation, filtration and nuclease-treatment showed the greatest increase in the proportion of viral sequences. This study provides a starting point for the selection of a purification method in future virus discovery studies, and highlights the need for more data to validate the effect of enrichment methods on different sample types, amplification, bioinformatics approaches and sequencing platforms. This study also highlights the potential risks that may attend selection of a virus enrichment method without any consideration for the sample type being investigated. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  10. An investigation of the effects of relevant samples and a comparison of verification versus discovery based lab design

    NASA Astrophysics Data System (ADS)

    Rieben, James C., Jr.

    This study focuses on the effects of relevance and lab design on student learning within the chemistry laboratory environment. A general chemistry conductivity of solutions experiment and an upper level organic chemistry cellulose regeneration experiment were employed. In the conductivity experiment, the two main variables studied were the effect of relevant (or "real world") samples on student learning and a verification-based lab design versus a discovery-based lab design. With the cellulose regeneration experiment, the effect of a discovery-based lab design vs. a verification-based lab design was the sole focus. Evaluation surveys consisting of six questions were used at three different times to assess student knowledge of experimental concepts. In the general chemistry laboratory portion of this study, four experimental variants were employed to investigate the effect of relevance and lab design on student learning. These variants consisted of a traditional (or verification) lab design, a traditional lab design using "real world" samples, a new lab design employing real world samples/situations using unknown samples, and the new lab design using real world samples/situations that were known to the student. Data used in this analysis were collected during the Fall 08, Winter 09, and Fall 09 terms. For the second part of this study a cellulose regeneration experiment was employed to investigate the effects of lab design. A demonstration creating regenerated cellulose "rayon" was modified and converted to an efficient and low-waste experiment. In the first variant students tested their products and verified a list of physical properties. In the second variant, students filled in a blank physical property chart with their own experimental results for the physical properties. Results from the conductivity experiment show significant student learning of the effects of concentration on conductivity and how to use conductivity to differentiate solution types with the use of real world samples. In the organic chemistry experiment, results suggest that the discovery-based design improved student retention of the chain length differentiation by physical properties relative to the verification-based design.

  11. The forest ecosystem of southeast Alaska: 1. The setting.

    Treesearch

    Arland S. Harris; O. Keith Hutchison; William R. Meehan; Douglas N. Swanston; Austin E. Helmers; John C. Hendee; Thomas M. Collins

    1974-01-01

    A description of the discovery and exploration of southeast Alaska sets the scene for a discussion of the physical and biological features of this region. Subjects discussed include geography, climate, vegetation types, geology, minerals, forest products, soils, fish, wildlife, water, recreation, and aesthetic values. This is the first of a series of publications...

  12. Is Particle Physics Ready for the LHC

    ScienceCinema

    Lykken, Joseph

    2017-12-09

    The advent of the Large Hadron Collider in 2007 entails daunting challenges to particle physicists. The first set of challenges will arise from trying to separate new physics from old. The second set of challenges will come in trying to interpret the new discoveries. I will describe a few of the scariest examples.

  13. Algorithms on Flag Manifolds for Knowledge Discovery in N-way Arrays

    DTIC Science & Technology

    2015-11-20

    that three of 18 subjects will become symptomatic after only 8 hours. Host pathway analysis of a human endotoxin gene expression data set revealed a 14...pathway analysis of a human endotoxin gene expression data set revealed a 14 pathway signature that identified symptomatic subjects within 2-3 hours post

  14. The Pleasure of Discovery: Medieval Literature in Adolescent Novels Set in the Middle Ages.

    ERIC Educational Resources Information Center

    Barnhouse, Rebecca

    1999-01-01

    Discusses three recent novels for young adults set in medieval times, illustrating several ways that modern writers incorporate medieval material into fiction. Argues that pairing such novels with medieval texts such as "Beowulf" and "The Canterbury Tales" offers opportunities to explore traditional literary topics while providing a gateway into…

  15. Straightforward hit identification approach in fragment-based discovery of bromodomain-containing protein 4 (BRD4) inhibitors.

    PubMed

    Borysko, Petro; Moroz, Yurii S; Vasylchenko, Oleksandr V; Hurmach, Vasyl V; Starodubtseva, Anastasia; Stefanishena, Natalia; Nesteruk, Kateryna; Zozulya, Sergey; Kondratov, Ivan S; Grygorenko, Oleksandr O

    2018-05-09

    A combination approach of a fragment screening and "SAR by catalog" was used for the discovery of bromodomain-containing protein 4 (BRD4) inhibitors. Initial screening of 3695-fragment library against bromodomain 1 of BRD4 using thermal shift assay (TSA), followed by initial hit validation, resulted in 73 fragment hits, which were used to construct a follow-up library selected from available screening collection. Additionally, analogs of inactive fragments, as well as a set of randomly selected compounds were also prepared (3 × 3200 compounds in total). Screening of the resulting sets using TSA, followed by re-testing at several concentrations, counter-screen, and TR-FRET assay resulted in 18 confirmed hits. Compounds derived from the initial fragment set showed better hit rate as compared to the other two sets. Finally, building dose-response curves revealed three compounds with IC 50  = 1.9-7.4 μM. For these compounds, binding sites and conformations in the BRD4 (4UYD) have been determined by docking. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Placental Proteomics: A Shortcut to Biological Insight

    PubMed Central

    Robinson, John M.; Vandré, Dale D.; Ackerman, William E.

    2012-01-01

    Proteomics analysis of biological samples has the potential to identify novel protein expression patterns and/or changes in protein expression patterns in different developmental or disease states. An important component of successful proteomics research, at least in its present form, is to reduce the complexity of the sample if it is derived from cells or tissues. One method to simplify complex tissues is to focus on a specific, highly purified sub-proteome. Using this approach we have developed methods to prepare highly enriched fractions of the apical plasma membrane of the syncytiotrophoblast. Through proteomics analysis of this fraction we have identified over five hundred proteins several of which were previously not known to reside in the syncytiotrophoblast. Herein, we focus on two of these, dysferlin and myoferlin. These proteins, largely known from studies of skeletal muscle, may not have been found in the human placenta were it not for discovery-based proteomics analysis. This new knowledge, acquired through a discovery-driven approach, can now be applied for the generation of hypothesis-based experimentation. Thus discovery-based and hypothesis-based research are complimentary approaches that when coupled together can hasten scientific discoveries. PMID:19070895

  17. Discovery of cancer common and specific driver gene sets

    PubMed Central

    2017-01-01

    Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295

  18. Genes@Work: an efficient algorithm for pattern discovery and multivariate feature selection in gene expression data.

    PubMed

    Lepre, Jorge; Rice, J Jeremy; Tu, Yuhai; Stolovitzky, Gustavo

    2004-05-01

    Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).

  19. KSC-2009-4294

    NASA Image and Video Library

    2009-07-30

    CAPE CANAVERAL, Fla. – The payload canister rolls onto Launch Pad 39A at NASA's Kennedy Space Center in Florida. Inside is the payload for space shuttle Discovery and the STS-128 mission, the Multi-Purpose Logistics Module Leonardo and the Lightweight Multi-Purpose Experiment Support Structure Carrier. Discovery's 13-day flight will deliver a new crew member and 33,000 pounds of equipment to the International Space Station. The equipment includes science and storage racks, a freezer to store research samples, a new sleeping compartment and the COLBERT treadmill. Launch of Discovery on its STS-128 mission is targeted for August 25. Photo credit: NASA/Jack Pfaller.

  20. KSC-2009-4292

    NASA Image and Video Library

    2009-07-30

    CAPE CANAVERAL, Fla. – The payload canister rolls to Launch Pad 39A at NASA's Kennedy Space Center in Florida. Inside is the payload for space shuttle Discovery and the STS-128 mission, the Multi-Purpose Logistics Module Leonardo and the Lightweight Multi-Purpose Experiment Support Structure Carrier. Discovery's 13-day flight will deliver a new crew member and 33,000 pounds of equipment to the International Space Station. The equipment includes science and storage racks, a freezer to store research samples, a new sleeping compartment and the COLBERT treadmill. Launch of Discovery on its STS-128 mission is targeted for August 25. Photo credit: NASA/Jack Pfaller.

  1. KSC-2009-4293

    NASA Image and Video Library

    2009-07-30

    CAPE CANAVERAL, Fla. – The payload canister rolls toward Launch Pad 39A at NASA's Kennedy Space Center in Florida. Inside is the payload for space shuttle Discovery and the STS-128 mission, the Multi-Purpose Logistics Module Leonardo and the Lightweight Multi-Purpose Experiment Support Structure Carrier. Discovery's 13-day flight will deliver a new crew member and 33,000 pounds of equipment to the International Space Station. The equipment includes science and storage racks, a freezer to store research samples, a new sleeping compartment and the COLBERT treadmill. Launch of Discovery on its STS-128 mission is targeted for August 25. Photo credit: NASA/Jack Pfaller.

  2. Engaging Scientists in Meaningful E/PO: How the NASA SMD E/PO Community Addresses the Needs of the Higher Ed Community

    NASA Astrophysics Data System (ADS)

    Manning, James; Meinke, Bonnie K.; Schultz, Gregory R.; Smith, Denise A.; Lawton, Brandon L.; Gurton, Suzanne; NASA Astrophysics E/PO Community

    2015-01-01

    The NASA Astrophysics Science Education and Public Outreach Forum (SEPOF) coordinates the work of NASA Science Mission Directorate (SMD) Astrophysics EPO projects and their teams to bring cutting-edge discoveries of NASA missions to the introductory astronomy college classroom. The Astrophysics Forum assists scientist and educator involvement in SMD E/PO (uniquely poised to foster collaboration between scientists with content expertise and educators with pedagogy expertise) and makes SMD E/PO resources and expertise accessible to the science and education communities. We present three new opportunities for college instructors to bring the latest NASA discoveries in Astrophysics into their classrooms.To address the expressed needs of the higher education community, the Astrophysics Forum collaborated with the Astrophysics E/PO community, researchers, and Astronomy 101 instructors to place individual science discoveries and learning resources into context for higher education audiences. Among these resources are two Resource Guides on the topics of cosmology and exoplanets, each including a variety of accessible sources.The Astrophysics Forum also coordinates the development of the Astro 101 slide set series--5 to 7-slide presentations on new discoveries from NASA Astrophysics missions relevant to topics in introductory astronomy courses. These sets enable Astronomy 101 instructors to include new discoveries not yet in their textbooks into the broader context of the course: http://www.astrosociety.org/education/astronomy-resource-guides/.The Astrophysics Forum also coordinated the development of 12 monthly Universe Discovery Guides, each featuring a theme and a representative object well-placed for viewing, with an accompanying interpretive story, strategies for conveying the topics, and supporting NASA-approved education activities and background information from a spectrum of NASA missions and programs: http://nightsky.jpl.nasa.gov/news-display.cfm?News_ID=611.These resources help enhance the Science, Technology, Engineering, and Mathematics (STEM) experiences of undergraduates.

  3. Accounting for control mislabeling in case-control biomarker studies.

    PubMed

    Rantalainen, Mattias; Holmes, Chris C

    2011-12-02

    In biomarker discovery studies, uncertainty associated with case and control labels is often overlooked. By omitting to take into account label uncertainty, model parameters and the predictive risk can become biased, sometimes severely. The most common situation is when the control set contains an unknown number of undiagnosed, or future, cases. This has a marked impact in situations where the model needs to be well-calibrated, e.g., when the prediction performance of a biomarker panel is evaluated. Failing to account for class label uncertainty may lead to underestimation of classification performance and bias in parameter estimates. This can further impact on meta-analysis for combining evidence from multiple studies. Using a simulation study, we outline how conventional statistical models can be modified to address class label uncertainty leading to well-calibrated prediction performance estimates and reduced bias in meta-analysis. We focus on the problem of mislabeled control subjects in case-control studies, i.e., when some of the control subjects are undiagnosed cases, although the procedures we report are generic. The uncertainty in control status is a particular situation common in biomarker discovery studies in the context of genomic and molecular epidemiology, where control subjects are commonly sampled from the general population with an established expected disease incidence rate.

  4. Longitudinal analyses of the DNA methylome in deployed military servicemen identify susceptibility loci for post-traumatic stress disorder.

    PubMed

    Rutten, B P F; Vermetten, E; Vinkers, C H; Ursini, G; Daskalakis, N P; Pishva, E; de Nijs, L; Houtepen, L C; Eijssen, L; Jaffe, A E; Kenis, G; Viechtbauer, W; van den Hove, D; Schraut, K G; Lesch, K-P; Kleinman, J E; Hyde, T M; Weinberger, D R; Schalkwyk, L; Lunnon, K; Mill, J; Cohen, H; Yehuda, R; Baker, D G; Maihofer, A X; Nievergelt, C M; Geuze, E; Boks, M P M

    2018-05-01

    In order to determine the impact of the epigenetic response to traumatic stress on post-traumatic stress disorder (PTSD), this study examined longitudinal changes of genome-wide blood DNA methylation profiles in relation to the development of PTSD symptoms in two prospective military cohorts (one discovery and one replication data set). In the first cohort consisting of male Dutch military servicemen (n=93), the emergence of PTSD symptoms over a deployment period to a combat zone was significantly associated with alterations in DNA methylation levels at 17 genomic positions and 12 genomic regions. Evidence for mediation of the relation between combat trauma and PTSD symptoms by longitudinal changes in DNA methylation was observed at several positions and regions. Bioinformatic analyses of the reported associations identified significant enrichment in several pathways relevant for symptoms of PTSD. Targeted analyses of the significant findings from the discovery sample in an independent prospective cohort of male US marines (n=98) replicated the observed relation between decreases in DNA methylation levels and PTSD symptoms at genomic regions in ZFP57, RNF39 and HIST1H2APS2. Together, our study pinpoints three novel genomic regions where longitudinal decreases in DNA methylation across the period of exposure to combat trauma marks susceptibility for PTSD.

  5. Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline.

    PubMed

    White, Daniel J; Wang, Jing; Hall, Richard J

    2017-09-01

    Applying high-throughput sequencing to pathogen discovery is a relatively new field, the objective of which is to find disease-causing agents when little or no background information on disease is available. Key steps in the process are the generation of millions of sequence reads from an infected tissue sample, followed by assembly of these reads into longer, contiguous stretches of nucleotide sequences, and then identification of the contigs by matching them to known databases, such as those stored at GenBank or Ensembl. This technique, that is, de novo metagenomics, is particularly useful when the pathogen is viral and strong discriminatory power can be achieved. However, recently, we found that striking differences in results can be achieved when different assemblers were used. In this study, we test formally the impact of five popular assemblers (MIRA, VELVET, METAVELVET, SPADES, and OMEGA) on the detection of a novel virus and assembly of its whole genome in a data set for which we have confirmed the presence of the virus by empirical laboratory techniques, and compare the overall performance between assemblers. Our results show that if results from only one assembler are considered, biologically important reads can easily be overlooked. The impacts of these results on the field of pathogen discovery are considered.

  6. On the Discovery of Evolving Truth

    PubMed Central

    Li, Yaliang; Li, Qi; Gao, Jing; Su, Lu; Zhao, Bo; Fan, Wei; Han, Jiawei

    2015-01-01

    In the era of big data, information regarding the same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among the information coming from different sources. To tackle this challenge, truth discovery, i.e., to integrate multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. In many real world applications, however, the information may come sequentially, and as a consequence, the truth of objects as well as the reliability of sources may be dynamically evolving. Existing truth discovery methods, unfortunately, cannot handle such scenarios. To address this problem, we investigate the temporal relations among both object truths and source reliability, and propose an incremental truth discovery framework that can dynamically update object truths and source weights upon the arrival of new data. Theoretical analysis is provided to show that the proposed method is guaranteed to converge at a fast rate. The experiments on three real world applications and a set of synthetic data demonstrate the advantages of the proposed method over state-of-the-art truth discovery methods. PMID:26705502

  7. Automated DBS microsampling, microscale automation and microflow LC-MS for therapeutic protein PK.

    PubMed

    Zhang, Qian; Tomazela, Daniela; Vasicek, Lisa A; Spellman, Daniel S; Beaumont, Maribel; Shyong, BaoJen; Kenny, Jacqueline; Fauty, Scott; Fillgrove, Kerry; Harrelson, Jane; Bateman, Kevin P

    2016-04-01

    Reduce animal usage for discovery-stage PK studies for biologics programs using microsampling-based approaches and microscale LC-MS. We report the development of an automated DBS-based serial microsampling approach for studying the PK of therapeutic proteins in mice. Automated sample preparation and microflow LC-MS were used to enable assay miniaturization and improve overall assay throughput. Serial sampling of mice was possible over the full 21-day study period with the first six time points over 24 h being collected using automated DBS sample collection. Overall, this approach demonstrated comparable data to a previous study using single mice per time point liquid samples while reducing animal and compound requirements by 14-fold. Reduction in animals and drug material is enabled by the use of automated serial DBS microsampling for mice studies in discovery-stage studies of protein therapeutics.

  8. DNA-encoded chemical libraries: advancing beyond conventional small-molecule libraries.

    PubMed

    Franzini, Raphael M; Neri, Dario; Scheuermann, Jörg

    2014-04-15

    DNA-encoded chemical libraries (DECLs) represent a promising tool in drug discovery. DECL technology allows the synthesis and screening of chemical libraries of unprecedented size at moderate costs. In analogy to phage-display technology, where large antibody libraries are displayed on the surface of filamentous phage and are genetically encoded in the phage genome, DECLs feature the display of individual small organic chemical moieties on DNA fragments serving as amplifiable identification barcodes. The DNA-tag facilitates the synthesis and allows the simultaneous screening of very large sets of compounds (up to billions of molecules), because the hit compounds can easily be identified and quantified by PCR-amplification of the DNA-barcode followed by high-throughput DNA sequencing. Several approaches have been used to generate DECLs, differing both in the methods used for library encoding and for the combinatorial assembly of chemical moieties. For example, DECLs can be used for fragment-based drug discovery, displaying a single molecule on DNA or two chemical moieties at the extremities of complementary DNA strands. DECLs can vary substantially in the chemical structures and the library size. While ultralarge libraries containing billions of compounds have been reported containing four or more sets of building blocks, also smaller libraries have been shown to be efficient for ligand discovery. In general, it has been found that the overall library size is a poor predictor for library performance and that the number and diversity of the building blocks are rather important indicators. Smaller libraries consisting of two to three sets of building blocks better fulfill the criteria of drug-likeness and often have higher quality. In this Account, we present advances in the DECL field from proof-of-principle studies to practical applications for drug discovery, both in industry and in academia. DECL technology can yield specific binders to a variety of target proteins and is likely to become a standard tool for pharmaceutical hit discovery, lead expansion, and Chemical Biology research. The introduction of new methodologies for library encoding and for compound synthesis in the presence of DNA is an exciting research field and will crucially contribute to the performance and the propagation of the technology.

  9. Identification of Susceptible Loci and Enriched Pathways for Bipolar II Disorder Using Genome-Wide Association Studies.

    PubMed

    Kao, Chung-Feng; Chen, Hui-Wen; Chen, Hsi-Chung; Yang, Jenn-Hwai; Huang, Ming-Chyi; Chiu, Yi-Hang; Lin, Shih-Ku; Lee, Ya-Chin; Liu, Chih-Min; Chuang, Li-Chung; Chen, Chien-Hsiun; Wu, Jer-Yuarn; Lu, Ru-Band; Kuo, Po-Hsiu

    2016-12-01

    This study aimed to identify susceptible loci and enriched pathways for bipolar disorder subtype II. We conducted a genome-wide association scan in discovery samples with 189 bipolar disorder subtype II patients and 1773 controls, and replication samples with 283 bipolar disorder subtype II patients and 500 controls in a Taiwanese Han population using Affymetrix Axiom Genome-Wide CHB1 Array. We performed single-marker and gene-based association analyses, as well as calculated polygeneic risk scores for bipolar disorder subtype II. Pathway enrichment analyses were employed to reveal significant biological pathways. Seven markers were found to be associated with bipolar disorder subtype II in meta-analysis combining both discovery and replication samples (P<5.0×10 -6 ), including markers in or close to MYO16, HSP90AB3P, noncoding gene LOC100507632, and markers in chromosomes 4 and 10. A novel locus, ETF1, was associated with bipolar disorder subtype II (P<6.0×10 -3 ) in gene-based association tests. Results of risk evaluation demonstrated that higher genetic risk scores were able to distinguish bipolar disorder subtype II patients from healthy controls in both discovery (P=3.9×10 -4 ~1.0×10 -3 ) and replication samples (2.8×10 -4 ~1.7×10 -3 ). Genetic variance explained by chip markers for bipolar disorder subtype II was substantial in the discovery (55.1%) and replication (60.5%) samples. Moreover, pathways related to neurodevelopmental function, signal transduction, neuronal system, and cell adhesion molecules were significantly associated with bipolar disorder subtype II. We reported novel susceptible loci for pure bipolar subtype II disorder that is less addressed in the literature. Future studies are needed to confirm the roles of these loci for bipolar disorder subtype II. © The Author 2016. Published by Oxford University Press on behalf of CINP.

  10. Evaluating Quality of Aged Archival Formalin-Fixed Paraffin-Embedded Samples for RNA-Sequencing

    EPA Science Inventory

    Archival formalin-fixed paraffin-embedded (FFPE) samples offer a vast, untapped source of genomic data for biomarker discovery. However, the quality of FFPE samples is often highly variable, and conventional methods to assess RNA quality for RNA-sequencing (RNA-seq) are not infor...

  11. Computer-assisted initial diagnosis of rare diseases

    PubMed Central

    Piñol, Marc; Vilaplana, Jordi; Teixidó, Ivan; Cruz, Joaquim; Comas, Jorge; Vilaprinyo, Ester; Sorribas, Albert

    2016-01-01

    Introduction. Most documented rare diseases have genetic origin. Because of their low individual frequency, an initial diagnosis based on phenotypic symptoms is not always easy, as practitioners might never have been exposed to patients suffering from the relevant disease. It is thus important to develop tools that facilitate symptom-based initial diagnosis of rare diseases by clinicians. In this work we aimed at developing a computational approach to aid in that initial diagnosis. We also aimed at implementing this approach in a user friendly web prototype. We call this tool Rare Disease Discovery. Finally, we also aimed at testing the performance of the prototype. Methods. Rare Disease Discovery uses the publicly available ORPHANET data set of association between rare diseases and their symptoms to automatically predict the most likely rare diseases based on a patient’s symptoms. We apply the method to retrospectively diagnose a cohort of 187 rare disease patients with confirmed diagnosis. Subsequently we test the precision, sensitivity, and global performance of the system under different scenarios by running large scale Monte Carlo simulations. All settings account for situations where absent and/or unrelated symptoms are considered in the diagnosis. Results. We find that this expert system has high diagnostic precision (≥80%) and sensitivity (≥99%), and is robust to both absent and unrelated symptoms. Discussion. The Rare Disease Discovery prediction engine appears to provide a fast and robust method for initial assisted differential diagnosis of rare diseases. We coupled this engine with a user-friendly web interface and it can be freely accessed at http://disease-discovery.udl.cat/. The code and most current database for the whole project can be downloaded from https://github.com/Wrrzag/DiseaseDiscovery/tree/no_classifiers. PMID:27547534

  12. Hit and lead criteria in drug discovery for infectious diseases of the developing world.

    PubMed

    Katsuno, Kei; Burrows, Jeremy N; Duncan, Ken; Hooft van Huijsduijnen, Rob; Kaneko, Takushi; Kita, Kiyoshi; Mowbray, Charles E; Schmatz, Dennis; Warner, Peter; Slingsby, B T

    2015-11-01

    Reducing the burden of infectious diseases that affect people in the developing world requires sustained collaborative drug discovery efforts. The quality of the chemical starting points for such projects is a key factor in improving the likelihood of clinical success, and so it is important to set clear go/no-go criteria for the progression of hit and lead compounds. With this in mind, the Japanese Global Health Innovative Technology (GHIT) Fund convened with experts from the Medicines for Malaria Venture, the Drugs for Neglected Diseases initiative and the TB Alliance, together with representatives from the Bill &Melinda Gates Foundation, to set disease-specific criteria for hits and leads for malaria, tuberculosis, visceral leishmaniasis and Chagas disease. Here, we present the agreed criteria and discuss the underlying rationale.

  13. Design of a fragment library that maximally represents available chemical space.

    PubMed

    Schulz, M N; Landström, J; Bright, K; Hubbard, R E

    2011-07-01

    Cheminformatics protocols have been developed and assessed that identify a small set of fragments which can represent the compounds in a chemical library for use in fragment-based ligand discovery. Six different methods have been implemented and tested on Input Libraries of compounds from three suppliers. The resulting Fragment Sets have been characterised on the basis of computed physico-chemical properties and their similarity to the Input Libraries. A method that iteratively identifies fragments with the maximum number of similar compounds in the Input Library (Nearest Neighbours) produces the most diverse library. This approach could increase the success of experimental ligand discovery projects, by providing fragments that can be progressed rapidly to larger compounds through access to available similar compounds (known as SAR by Catalog).

  14. The clinical impact of recent advances in LC-MS for cancer biomarker discovery and verification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Hui; Shi, Tujin; Qian, Wei-Jun

    2015-12-04

    Mass spectrometry-based proteomics has become an indispensable tool in biomedical research with broad applications ranging from fundamental biology, systems biology, and biomarker discovery. Recent advances in LC-MS have made it become a major technology in clinical applications, especially in cancer biomarker discovery and verification. To overcome the challenges associated with the analysis of clinical samples, such as extremely wide dynamic range of protein concentrations in biofluids and the need to perform high throughput and accurate quantification, significant efforts have been devoted to improve the overall performance of LC-MS bases clinical proteomics. In this review, we summarize the recent advances inmore » LC-MS in the aspect of cancer biomarker discovery and quantification, and discuss its potentials, limitations, and future perspectives.« less

  15. Modelling and enhanced molecular dynamics to steer structure-based drug discovery.

    PubMed

    Kalyaanamoorthy, Subha; Chen, Yi-Ping Phoebe

    2014-05-01

    The ever-increasing gap between the availabilities of the genome sequences and the crystal structures of proteins remains one of the significant challenges to the modern drug discovery efforts. The knowledge of structure-dynamics-functionalities of proteins is important in order to understand several key aspects of structure-based drug discovery, such as drug-protein interactions, drug binding and unbinding mechanisms and protein-protein interactions. This review presents a brief overview on the different state of the art computational approaches that are applied for protein structure modelling and molecular dynamics simulations of biological systems. We give an essence of how different enhanced sampling molecular dynamics approaches, together with regular molecular dynamics methods, assist in steering the structure based drug discovery processes. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Fidelity and enhanced sensitivity of differential transcription profiles following linear amplification of nanogram amounts of endothelial mRNA

    NASA Technical Reports Server (NTRS)

    Polacek, Denise C.; Passerini, Anthony G.; Shi, Congzhu; Francesco, Nadeene M.; Manduchi, Elisabetta; Grant, Gregory R.; Powell, Steven; Bischof, Helen; Winkler, Hans; Stoeckert, Christian J Jr; hide

    2003-01-01

    Although mRNA amplification is necessary for microarray analyses from limited amounts of cells and tissues, the accuracy of transcription profiles following amplification has not been well characterized. We tested the fidelity of differential gene expression following linear amplification by T7-mediated transcription in a well-established in vitro model of cytokine [tumor necrosis factor alpha (TNFalpha)]-stimulated human endothelial cells using filter arrays of 13,824 human cDNAs. Transcriptional profiles generated from amplified antisense RNA (aRNA) (from 100 ng total RNA, approximately 1 ng mRNA) were compared with profiles generated from unamplified RNA originating from the same homogeneous pool. Amplification accurately identified TNFalpha-induced differential expression in 94% of the genes detected using unamplified samples. Furthermore, an additional 1,150 genes were identified as putatively differentially expressed using amplified RNA which remained undetected using unamplified RNA. Of genes sampled from this set, 67% were validated by quantitative real-time PCR as truly differentially expressed. Thus, in addition to demonstrating fidelity in gene expression relative to unamplified samples, linear amplification results in improved sensitivity of detection and enhances the discovery potential of high-throughput screening by microarrays.

  17. High resolution time course analysis of gene expression from the liver and pituitary

    PubMed Central

    Hughes, Michael E.; DiTacchio, Luciano; Hayes, Kevin; Pullivarthy, Sandhya R.; Panda, Satchidananda; Hogenesch, John

    2009-01-01

    In both the suprachiasmatic nucleus and peripheral tissues, the circadian oscillator drives rhythmic transcription of downstream target genes. Recently, a number of studies have used DNA microarrays to systematically identify oscillating transcripts in plants, fruit flies, rats and mice. These studies have identified several dozen to many hundred rhythmically expressed genes by sampling tissues every four hours for one, two, or more days. To extend this work, we have performed DNA microarray analysis on RNA derived from the mouse pituitary sampled every hour for two days. COSOPT and Fisher's G-test were employed at a false-discovery rate less than 5% to identify more than 250 genes in the pituitary that oscillate with a 24-hour period length. We found that increasing the frequency of sampling across the circadian day dramatically increased the statistical power of both COSOPT and Fisher's G-test, resulting in considerably more high-confidence identifications of rhythmic transcripts than previously described. Finally, to extend the utility of these data sets, a web-based resource has been constructed at http://wasabi.itmat.upenn.edu/circa/mouse that is freely available to the research community. PMID:18419295

  18. EXPOSE-E: an ESA astrobiology mission 1.5 years in space.

    PubMed

    Rabbow, Elke; Rettberg, Petra; Barczyk, Simon; Bohmeier, Maria; Parpart, André; Panitz, Corinna; Horneck, Gerda; von Heise-Rotenburg, Ralf; Hoppenbrouwers, Tom; Willnecker, Rainer; Baglioni, Pietro; Demets, René; Dettmann, Jan; Reitz, Guenther

    2012-05-01

    The multi-user facility EXPOSE-E was designed by the European Space Agency to enable astrobiology research in space (low-Earth orbit). On 7 February 2008, EXPOSE-E was carried to the International Space Station (ISS) on the European Technology Exposure Facility (EuTEF) platform in the cargo bay of Space Shuttle STS-122 Atlantis. The facility was installed at the starboard cone of the Columbus module by extravehicular activity, where it remained in space for 1.5 years. EXPOSE-E was returned to Earth with STS-128 Discovery on 12 September 2009 for subsequent sample analysis. EXPOSE-E provided accommodation in three exposure trays for a variety of astrobiological test samples that were exposed to selected space conditions: either to space vacuum, solar electromagnetic radiation at >110 nm and cosmic radiation (trays 1 and 3) or to simulated martian surface conditions (tray 2). Data on UV radiation, cosmic radiation, and temperature were measured every 10 s and downlinked by telemetry. A parallel mission ground reference (MGR) experiment was performed on ground with a parallel set of hardware and samples under simulated space conditions. EXPOSE-E performed a successful 1.5-year mission in space.

  19. Rapid DNA extraction from dried blood spots on filter paper: potential applications in biobanking.

    PubMed

    Choi, Eun-Hye; Lee, Sang Kwang; Ihm, Chunhwa; Sohn, Young-Hak

    2014-12-01

    Dried blood spot (DBS) technology is a microsampling alternative to traditional plasma or serum sampling for pharmaco- or toxicokinetic evaluation. DBS technology has been applied to diagnostic screening in drug discovery, nonclinical, and clinical settings. We have developed an improved elution protocol involving boiling of blood spots dried on Whatman filter paper. The purpose of this study was to compare the quality, purity, and quantity of DNA isolated from frozen blood samples and DBSs. We optimized a method for extraction and estimation of DNA from blood spots dried on filter paper (3-mm FTA card). A single DBS containing 40 μL blood was used. DNA was efficiently extracted in phosphate-buffered saline (PBS) or Tris-EDTA (TE) buffer by incubation at 37°C overnight. DNA was stable in DBSs that were stored at room temperature or frozen. The housekeeping genes GAPDH and beta-actin were used as positive standards for polymerase chain reaction (PCR) validation of general diagnostic screening. Our simple and convenient DBS storage and extraction methods are suitable for diagnostic screening by using very small volumes of blood collected on filter paper, and can be used in biobanks for blood sample storage.

  20. First SN Discoveries from the Dark Energy Survey

    NASA Astrophysics Data System (ADS)

    Abbott, T.; Abdalla, F.; Achitouv, I.; Ahn, E.; Aldering, G.; Allam, S.; Alonso, D.; Amara, A.; Annis, J.; Antonik, M.; Aragon-Salamanca, A.; Armstrong, R.; Ashall, C.; Asorey, J.; Bacon, D.; Balbinot, E.; Banerji, M.; Barbary, K.; Barkhouse, W.; Baruah, L.; Bauer, A.; Bechtol, K.; Becker, M.; Bender, R.; Benoist, C.; Benoit-Levy, A.; Bernardi, M.; Bernstein, G.; Bernstein, J. P.; Bernstein, R.; Bertin, E.; Beynon, E.; Bhattacharya, S.; Biesiadzinski, T.; Biswas, R.; Blake, C.; Bloom, J. S.; Bocquet, S.; Brandt, C.; Bridle, S.; Brooks, D.; Brown, P. J.; Brunner, R.; Buckley-Geer, E.; Burke, D.; Burkert, A.; Busha, M.; Campa, J.; Campbell, H.; Cane, R.; Capozzi, D.; Carlstrom, J.; Carnero Rosell, A.; Carollo, M.; Carrasco-Kind, M.; Carretero, J.; Carter, M.; Casas, R.; Castander, F. J.; Chen, Y.; Chiu, I.; Chue, C.; Clampitt, J.; Clerkin, L.; Cohn, J.; Colless, M.; Copeland, E.; Covarrubias, R. A.; Crittenden, R.; Crocce, M.; Cunha, C.; da Costa, L.; d'Andrea, C.; Das, S.; Das, R.; Davis, T. M.; Deb, S.; DePoy, D.; Derylo, G.; Desai, S.; de Simoni, F.; Devlin, M.; Diehl, H. T.; Dietrich, J.; Dodelson, S.; Doel, P.; Dolag, K.; Efstathiou, G.; Eifler, T.; Erickson, B.; Eriksen, M.; Estrada, J.; Etherington, J.; Evrard, A.; Farrens, S.; Fausti Neto, A.; Fernandez, E.; Ferreira, P. C.; Finley, D.; Fischer, J. A.; Flaugher, B.; Fosalba, P.; Frieman, J.; Furlanetto, C.; Garcia-Bellido, J.; Gaztanaga, E.; Gelman, M.; Gerdes, D.; Giannantonio, T.; Gilhool, S.; Gill, M.; Gladders, M.; Gladney, L.; Glazebrook, K.; Gray, M.; Gruen, D.; Gruendl, R.; Gupta, R.; Gutierrez, G.; Habib, S.; Hall, E.; Hansen, S.; Hao, J.; Heitmann, K.; Helsby, J.; Henderson, R.; Hennig, C.; High, W.; Hirsch, M.; Hoffmann, K.; Holhjem, K.; Honscheid, K.; Host, O.; Hoyle, B.; Hu, W.; Huff, E.; Huterer, D.; Jain, B.; James, D.; Jarvis, M.; Jarvis, M. J.; Jeltema, T.; Johnson, M.; Jouvel, S.; Kacprzak, T.; Karliner, I.; Katsaros, J.; Kent, S.; Kessler, R.; Kim, A.; Kim-Vy, T.; King, L.; Kirk, D.; Kochanek, C.; Kopp, M.; Koppenhoefer, J.; Kovacs, E.; Krause, E.; Kravtsov, A.; Kron, R.; Kuehn, K.; Kuemmel, M.; Kuhlmann, S.; Kunder, A.; Kuropatkin, N.; Kwan, J.; Lahav, O.; Leistedt, B.; Levi, M.; Lewis, P.; Liddle, A.; Lidman, C.; Lilly, S.; Lin, H.; Liu, J.; Lopez-Arenillas, C.; Lorenzon, W.; LoVerde, M.; Ma, Z.; Maartens, R.; Maccrann, N.; Macri, L.; Maia, M.; Makler, M.; Manera, M.; Maraston, C.; March, M.; Markovic, K.; Marriner, J.; Marshall, J.; Marshall, S.; Martini, P.; Marti Sanahuja, P.; Mayers, J.; McKay, T.; McMahon, R.; Melchior, P.; Merritt, K. W.; Merson, A.; Miller, C.; Miquel, R.; Mohr, J.; Moore, T.; Mortonson, M.; Mosher, J.; Mould, J.; Mukherjee, P.; Neilsen, E.; Ngeow, C.; Nichol, R.; Nidever, D.; Nord, B.; Nugent, P.; Ogando, R.; Old, L.; Olsen, J.; Ostrovski, F.; Paech, K.; Papadopoulos, A.; Papovich, C.; Patton, K.; Peacock, J.; Pellegrini, P. S. S.; Peoples, J.; Percival, W.; Perlmutter, S.; Petravick, D.; Plazas, A.; Ponce, R.; Poole, G.; Pope, A.; Refregier, A.; Reyes, R.; Ricker, P.; Roe, N.; Romer, K.; Roodman, A.; Rooney, P.; Ross, A.; Rowe, B.; Rozo, E.; Rykoff, E.; Sabiu, C.; Saglia, R.; Sako, M.; Sanchez, A.; Sanchez, C.; Sanchez, E.; Sanchez, J.; Santiago, B.; Saro, A.; Scarpine, V.; Schindler, R.; Schmidt, B. P.; Schmitt, R. L.; Schubnell, M.; Seitz, S.; Senger, R.; Sevilla, I.; Sharp, R.; Sheldon, E.; Sheth, R.; Smith, R. C.; Smith, M.; Snigula, J.; Soares-Santos, M.; Sobreira, F.; Song, J.; Soumagnac, M.; Spinka, H.; Stebbins, A.; Stoughton, C.; Suchyta, E.; Suhada, R.; Sullivan, M.; Sun, F.; Suntzeff, N.; Sutherland, W.; Swanson, M. E. C.; Sypniewski, A. J.; Szepietowski, R.; Talaga, R.; Tarle, G.; Tarrant, E.; Balan, S. Thaithara; Thaler, J.; Thomas, D.; Thomas, R. C.; Tucker, D.; Uddin, S. A.; Ural, S.; Vikram, V.; Voigt, L.; Walker, A. R.; Walker, T.; Wechsler, R.; Weinberg, D.; Weller, J.; Wester, W.; Wetzstein, M.; White, M.; Wilcox, H.; Wilman, D.; Yanny, B.; Young, J.; Zablocki, A.; Zenteno, A.; Zhang, Y.; Zuntz, J.

    2012-12-01

    The Dark Energy Survey (DES) report the discovery of the first set of supernovae (SN) from the project. Images were observed as part of the DES Science Verification phase using the newly-installed 570-Megapixel Dark Energy Camera on the CTIO Blanco 4-m telescope by observers J. Annis, E. Buckley-Geer, and H. Lin. SN observations are planned throughout the observing campaign on a regular cadence of 4-6 days in each of the ten 3-deg2 fields in the DES griz filters.

  1. The Ocean World Enceladus

    NASA Astrophysics Data System (ADS)

    Spilker, Linda J.; Cable, Morgan

    2016-06-01

    Does life exist elsewhere in our solar system? This key question has been a major motivator for our exploration beyond Earth. Life as we know it requires liquid water, organic chemistry and energy. As Cassini discoveries have shown, all of these key ingredients appear to exist on Saturn’s tiny moon Enceladus, making it a possible habitat for life.NASA’s Cassini spacecraft arrived at Saturn in July 2004 and began making incredible findings in the Saturn system. Some of the most striking discoveries involved Enceladus. Only 300 miles in diameter, a huge plume of water ice and water vapor is erupting from a liquid water reservoir under Enceladus’ south pole. Jets and curtains of icy material shoot skyward from a series of four linear fractures nicknamed “tiger stripes”. Over the course of the next decade, Cassini repeatedly flew close to Enceladus and directly sampled its icy plume seven times. Cassini’s sensitive instruments discovered complex organic molecules, salts and silicates in the plume indicating that the water is in contact with a rocky core. We now know that the liquid reservoir underneath Enceladus’ icy crust is not a regional sea but a global, subsurface ocean. The ocean is salty, much like our own seas. Excess heat originates from the narrow tiger stripes and tiny silica nanograins in the plume provide evidence for hydrothermal activity on Enceladus’ seafloor. Similar hydrothermal systems on Earth support rich communities of life that contain organisms as large as tubeworms and crabs.With each discovery, Enceladus becomes an increasingly enticing astrobiology target. Could life exist in Enceladus’ ocean? A future mission may answer this question. Cassini was never meant to be a sea-faring mission, and while its instruments have helped answer important questions about the habitability of Enceladus, the question of whether life exists will require a more specialized set of instruments and a targeted mission. Enceladus’ lofting of free samples into space makes it a compelling destination.This research was performed at the Jet Propulsion Laboratory, California Institute of Technology (Caltech), under contract with NASA. Copyright 2016 Caltech. Government sponsorship is acknowledged.

  2. Overview of the SAMPL5 host–guest challenge: Are we doing better?

    PubMed Central

    Yin, Jian; Henriksen, Niel M.; Slochower, David R.; Shirts, Michael R.; Chiu, Michael W.; Mobley, David L.; Gilson, Michael K.

    2016-01-01

    The ability to computationally predict protein-small molecule binding affinities with high accuracy would accelerate drug discovery and reduce its cost by eliminating rounds of trial-and-error synthesis and experimental evaluation of candidate ligands. As academic and industrial groups work toward this capability, there is an ongoing need for datasets that can be used to rigorously test new computational methods. Although protein–ligand data are clearly important for this purpose, their size and complexity make it difficult to obtain well-converged results and to troubleshoot computational methods. Host–guest systems offer a valuable alternative class of test cases, as they exemplify noncovalent molecular recognition but are far smaller and simpler. As a consequence, host–guest systems have been part of the prior two rounds of SAMPL prediction exercises, and they also figure in the present SAMPL5 round. In addition to being blinded, and thus avoiding biases that may arise in retrospective studies, the SAMPL challenges have the merit of focusing multiple researchers on a common set of molecular systems, so that methods may be compared and ideas exchanged. The present paper provides an overview of the host–guest component of SAMPL5, which centers on three different hosts, two octa-acids and a glycoluril-based molecular clip, and two different sets of guest molecules, in aqueous solution. A range of methods were applied, including electronic structure calculations with implicit solvent models; methods that combine empirical force fields with implicit solvent models; and explicit solvent free energy simulations. The most reliable methods tend to fall in the latter class, consistent with results in prior SAMPL rounds, but the level of accuracy is still below that sought for reliable computer-aided drug design. Advances in force field accuracy, modeling of protonation equilibria, electronic structure methods, and solvent models, hold promise for future improvements. PMID:27658802

  3. Overview of the SAMPL5 host-guest challenge: Are we doing better?

    PubMed

    Yin, Jian; Henriksen, Niel M; Slochower, David R; Shirts, Michael R; Chiu, Michael W; Mobley, David L; Gilson, Michael K

    2017-01-01

    The ability to computationally predict protein-small molecule binding affinities with high accuracy would accelerate drug discovery and reduce its cost by eliminating rounds of trial-and-error synthesis and experimental evaluation of candidate ligands. As academic and industrial groups work toward this capability, there is an ongoing need for datasets that can be used to rigorously test new computational methods. Although protein-ligand data are clearly important for this purpose, their size and complexity make it difficult to obtain well-converged results and to troubleshoot computational methods. Host-guest systems offer a valuable alternative class of test cases, as they exemplify noncovalent molecular recognition but are far smaller and simpler. As a consequence, host-guest systems have been part of the prior two rounds of SAMPL prediction exercises, and they also figure in the present SAMPL5 round. In addition to being blinded, and thus avoiding biases that may arise in retrospective studies, the SAMPL challenges have the merit of focusing multiple researchers on a common set of molecular systems, so that methods may be compared and ideas exchanged. The present paper provides an overview of the host-guest component of SAMPL5, which centers on three different hosts, two octa-acids and a glycoluril-based molecular clip, and two different sets of guest molecules, in aqueous solution. A range of methods were applied, including electronic structure calculations with implicit solvent models; methods that combine empirical force fields with implicit solvent models; and explicit solvent free energy simulations. The most reliable methods tend to fall in the latter class, consistent with results in prior SAMPL rounds, but the level of accuracy is still below that sought for reliable computer-aided drug design. Advances in force field accuracy, modeling of protonation equilibria, electronic structure methods, and solvent models, hold promise for future improvements.

  4. Learning predictive models that use pattern discovery--a bootstrap evaluative approach applied in organ functioning sequences.

    PubMed

    Toma, Tudor; Bosman, Robert-Jan; Siebes, Arno; Peek, Niels; Abu-Hanna, Ameen

    2010-08-01

    An important problem in the Intensive Care is how to predict on a given day of stay the eventual hospital mortality for a specific patient. A recent approach to solve this problem suggested the use of frequent temporal sequences (FTSs) as predictors. Methods following this approach were evaluated in the past by inducing a model from a training set and validating the prognostic performance on an independent test set. Although this evaluative approach addresses the validity of the specific models induced in an experiment, it falls short of evaluating the inductive method itself. To achieve this, one must account for the inherent sources of variation in the experimental design. The main aim of this work is to demonstrate a procedure based on bootstrapping, specifically the .632 bootstrap procedure, for evaluating inductive methods that discover patterns, such as FTSs. A second aim is to apply this approach to find out whether a recently suggested inductive method that discovers FTSs of organ functioning status is superior over a traditional method that does not use temporal sequences when compared on each successive day of stay at the Intensive Care Unit. The use of bootstrapping with logistic regression using pre-specified covariates is known in the statistical literature. Using inductive methods of prognostic models based on temporal sequence discovery within the bootstrap procedure is however novel at least in predictive models in the Intensive Care. Our results of applying the bootstrap-based evaluative procedure demonstrate the superiority of the FTS-based inductive method over the traditional method in terms of discrimination as well as accuracy. In addition we illustrate the insights gained by the analyst into the discovered FTSs from the bootstrap samples. Copyright 2010 Elsevier Inc. All rights reserved.

  5. A data-driven, knowledge-based approach to biomarker discovery: application to circulating microRNA markers of colorectal cancer prognosis.

    PubMed

    Vafaee, Fatemeh; Diakos, Connie; Kirschner, Michaela B; Reid, Glen; Michael, Michael Z; Horvath, Lisa G; Alinejad-Rokny, Hamid; Cheng, Zhangkai Jason; Kuncic, Zdenka; Clarke, Stephen

    2018-01-01

    Recent advances in high-throughput technologies have provided an unprecedented opportunity to identify molecular markers of disease processes. This plethora of complex-omics data has simultaneously complicated the problem of extracting meaningful molecular signatures and opened up new opportunities for more sophisticated integrative and holistic approaches. In this era, effective integration of data-driven and knowledge-based approaches for biomarker identification has been recognised as key to improving the identification of high-performance biomarkers, and necessary for translational applications. Here, we have evaluated the role of circulating microRNA as a means of predicting the prognosis of patients with colorectal cancer, which is the second leading cause of cancer-related death worldwide. We have developed a multi-objective optimisation method that effectively integrates a data-driven approach with the knowledge obtained from the microRNA-mediated regulatory network to identify robust plasma microRNA signatures which are reliable in terms of predictive power as well as functional relevance. The proposed multi-objective framework has the capacity to adjust for conflicting biomarker objectives and to incorporate heterogeneous information facilitating systems approaches to biomarker discovery. We have found a prognostic signature of colorectal cancer comprising 11 circulating microRNAs. The identified signature predicts the patients' survival outcome and targets pathways underlying colorectal cancer progression. The altered expression of the identified microRNAs was confirmed in an independent public data set of plasma samples of patients in early stage vs advanced colorectal cancer. Furthermore, the generality of the proposed method was demonstrated across three publicly available miRNA data sets associated with biomarker studies in other diseases.

  6. Identification of candidate cerebrospinal fluid biomarkers in parkinsonism using quantitative proteomics.

    PubMed

    Magdalinou, N K; Noyce, A J; Pinto, R; Lindstrom, E; Holmén-Larsson, J; Holtta, M; Blennow, K; Morris, H R; Skillbäck, T; Warner, T T; Lees, A J; Pike, I; Ward, M; Zetterberg, H; Gobom, J

    2017-04-01

    Neurodegenerative parkinsonian syndromes have significant clinical and pathological overlap, making early diagnosis difficult. Cerebrospinal fluid (CSF) biomarkers may aid the differentiation of these disorders, but other than α-synuclein and neurofilament light chain protein, which have limited diagnostic power, specific protein biomarkers remain elusive. To study disease mechanisms and identify possible CSF diagnostic biomarkers through discovery proteomics, which discriminate parkinsonian syndromes from healthy controls. CSF was collected consecutively from 134 participants; Parkinson's disease (n = 26), atypical parkinsonian syndromes (n = 78, including progressive supranuclear palsy (n = 36), multiple system atrophy (n = 28), corticobasal syndrome (n = 14)), and elderly healthy controls (n = 30). Participants were divided into a discovery and a validation set for analysis. The samples were subjected to tryptic digestion, followed by liquid chromatography-mass spectrometry analysis for identification and relative quantification by isobaric labelling. Candidate protein biomarkers were identified based on the relative abundances of the identified tryptic peptides. Their predictive performance was evaluated by analysis of the validation set. 79 tryptic peptides, derived from 26 proteins were found to differ significantly between atypical parkinsonism patients and controls. They included acute phase/inflammatory markers and neuronal/synaptic markers, which were respectively increased or decreased in atypical parkinsonism, while their levels in PD subjects were intermediate between controls and atypical parkinsonism. Using an unbiased proteomic approach, proteins were identified that were able to differentiate atypical parkinsonian syndrome patients from healthy controls. Our study indicates that markers that may reflect neuronal function and/or plasticity, such as the amyloid precursor protein, and inflammatory markers may hold future promise as candidate biomarkers in parkinsonism. Copyright © 2017. Published by Elsevier Ltd.

  7. Use of a Machine Learning-Based High Content Analysis Approach to Identify Photoreceptor Neurite Promoting Molecules.

    PubMed

    Fuller, John A; Berlinicke, Cynthia A; Inglese, James; Zack, Donald J

    2016-01-01

    High content analysis (HCA) has become a leading methodology in phenotypic drug discovery efforts. Typical HCA workflows include imaging cells using an automated microscope and analyzing the data using algorithms designed to quantify one or more specific phenotypes of interest. Due to the richness of high content data, unappreciated phenotypic changes may be discovered in existing image sets using interactive machine-learning based software systems. Primary postnatal day four retinal cells from the photoreceptor (PR) labeled QRX-EGFP reporter mice were isolated, seeded, treated with a set of 234 profiled kinase inhibitors and then cultured for 1 week. The cells were imaged with an Acumen plate-based laser cytometer to determine the number and intensity of GFP-expressing, i.e. PR, cells. Wells displaying intensities and counts above threshold values of interest were re-imaged at a higher resolution with an INCell2000 automated microscope. The images were analyzed with an open source HCA analysis tool, PhenoRipper (Rajaram et al., Nat Methods 9:635-637, 2012), to identify the high GFP-inducing treatments that additionally resulted in diverse phenotypes compared to the vehicle control samples. The pyrimidinopyrimidone kinase inhibitor CHEMBL-1766490, a pan kinase inhibitor whose major known targets are p38α and the Src family member lck, was identified as an inducer of photoreceptor neuritogenesis by using the open-source HCA program PhenoRipper. This finding was corroborated using a cell-based method of image analysis that measures quantitative differences in the mean neurite length in GFP expressing cells. Interacting with data using machine learning algorithms may complement traditional HCA approaches by leading to the discovery of small molecule-induced cellular phenotypes in addition to those upon which the investigator is initially focusing.

  8. Human Biomarker Discovery and Predictive Models for Disease Progression for Idiopathic Pneumonia Syndrome Following Allogeneic Stem Cell Transplantation*

    PubMed Central

    Schlatzer, Daniela M.; Dazard, Jean-Eudes; Ewing, Rob M.; Ilchenko, Serguei; Tomcheko, Sara E.; Eid, Saada; Ho, Vincent; Yanik, Greg; Chance, Mark R.; Cooke, Kenneth R.

    2012-01-01

    Allogeneic hematopoietic stem cell transplantation (SCT) is the only curative therapy for many malignant and nonmalignant conditions. Idiopathic pneumonia syndrome (IPS) is a frequently fatal complication that limits successful outcomes. Preclinical models suggest that IPS represents an immune mediated attack on the lung involving elements of both the adaptive and the innate immune system. However, the etiology of IPS in humans is less well understood. To explore the disease pathway and uncover potential biomarkers of disease, we performed two separate label-free, proteomics experiments defining the plasma protein profiles of allogeneic SCT patients with IPS. Samples obtained from SCT recipients without complications served as controls. The initial discovery study, intended to explore the disease pathway in humans, identified a set of 81 IPS-associated proteins. These data revealed similarities between the known IPS pathways in mice and the condition in humans, in particular in the acute phase response. In addition, pattern recognition pathways were judged to be significant as a function of development of IPS, and from this pathway we chose the lipopolysaccaharide-binding protein (LBP) protein as a candidate molecular diagnostic for IPS, and verified its increase as a function of disease using an ELISA assay. In a separately designed study, we identified protein-based classifiers that could predict, at day 0 of SCT, patients who: 1) progress to IPS and 2) respond to cytokine neutralization therapy. Using cross-validation strategies, we built highly predictive classifier models of both disease progression and therapeutic response. In sum, data generated in this report confirm previous clinical and experimental findings, provide new insights into the pathophysiology of IPS, identify potential molecular classifiers of the condition, and uncover a set of markers potentially of interest for patient stratification as a basis for individualized therapy. PMID:22337588

  9. Binary Interval Search: a scalable algorithm for counting interval intersections.

    PubMed

    Layer, Ryan M; Skadron, Kevin; Robins, Gabriel; Hall, Ira M; Quinlan, Aaron R

    2013-01-01

    The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. https://github.com/arq5x/bits.

  10. Contributions of Academic Labs to the Discovery and Development of Chemical Biology Tools

    PubMed Central

    Huryn, Donna M.; Resnick, Lynn O.; Wipf, Peter

    2013-01-01

    The academic setting provides an environment that may foster success in the discovery of certain types of small molecule tools, while proving less suitable in others. For example, small molecule probes for poorly understood systems, those that exploit a specific resident expertise, and those whose commercial return is not apparent are ideally suited to be pursued in a university setting. In this perspective, we highlight five projects that emanated from academic research groups and generated valuable tool compounds that have been used to interrogate biological phenomena: Reactive oxygen species (ROS) sensors, GPR30 agonists and antagonists, selective CB2 agonists, Hsp70 modulators and beta-amyloid PET imaging agents. By continuing to take advantage of the unique expertise resident in university settings, and the ability to pursue novel projects that may have great scientific value, but limited or no immediate commercial value, probes from academic research groups continue to provide useful tools and generate a long-term resource for biomedical researchers. PMID:23672690

  11. Contributions of academic laboratories to the discovery and development of chemical biology tools.

    PubMed

    Huryn, Donna M; Resnick, Lynn O; Wipf, Peter

    2013-09-26

    The academic setting provides an environment that may foster success in the discovery of certain types of small molecule tools while proving less suitable in others. For example, small molecule probes for poorly understood systems, those that exploit a specific resident expertise, and those whose commercial return is not apparent are ideally suited to be pursued in a university setting. In this review, we highlight five projects that emanated from academic research groups and generated valuable tool compounds that have been used to interrogate biological phenomena: reactive oxygen species (ROS) sensors, GPR30 agonists and antagonists, selective CB2 agonists, Hsp70 modulators, and β-amyloid PET imaging agents. By taking advantage of the unique expertise resident in university settings and the ability to pursue novel projects that may have great scientific value but with limited or no immediate commercial value, probes from academic research groups continue to provide useful tools and generate a long-term resource for biomedical researchers.

  12. Detection of Elevated Plasma Levels of EGF Receptor Prior to Breast Cancer Diagnosis among Hormone Therapy Users

    PubMed Central

    Pitteri, Sharon J.; Amon, Lynn M.; Buson, Tina Busald; Zhang, Yuzheng; Johnson, Melissa M.; Chin, Alice; Kennedy, Jacob; Wong, Chee-Hong; Zhang, Qing; Wang, Hong; Lampe, Paul D.; Prentice, Ross L.; McIntosh, Martin W.; Hanash, Samir M.; Li, Christopher I.

    2010-01-01

    Applying advanced proteomic technologies to prospectively collected specimens from large studies is one means of identifying preclinical changes in plasma proteins that are potentially relevant to the early detection of diseases like breast cancer. We conducted fourteen independent quantitative proteomics experiments comparing pooled plasma samples collected from 420 estrogen receptor positive (ER+) breast cancer patients ≤17 months prior to their diagnosis and matched controls. Based on the over 3.4 million tandem mass spectra collected in the discovery set, 503 proteins were quantified of which 57 differentiated cases from controls with a p-value<0.1. Seven of these proteins, for which quantitative ELISA assays were available, were assessed in an independent validation set. Of these candidates, epidermal growth factor receptor (EGFR) was validated as a predictor of breast cancer risk in an independent set of preclinical plasma samples for women overall [odds ratio (OR)=1.44, p-value=0.0008], and particularly for current users of estrogen plus progestin (E+P) menopausal hormone therapy (OR=2.49, p-value=0.0001). Among current E+P users EGFR's sensitivity for breast cancer risk was 31% with 90% specificity. While EGFR's sensitivity and specificity are insufficient for a clinically useful early detection biomarker, this study suggests that proteins that are elevated preclinically in women who go on to develop breast cancer can be discovered and validated using current proteomic technologies. Further studies are warranted to both examine the role of EGFR and to discover and validate other proteins that could potentially be used for breast cancer early detection. PMID:20959476

  13. A genome-wide methylation study on obesity: differential variability and differential methylation.

    PubMed

    Xu, Xiaojing; Su, Shaoyong; Barnes, Vernon A; De Miguel, Carmen; Pollock, Jennifer; Ownby, Dennis; Shi, Hidong; Zhu, Haidong; Snieder, Harold; Wang, Xiaoling

    2013-05-01

    Besides differential methylation, DNA methylation variation has recently been proposed and demonstrated to be a potential contributing factor to cancer risk. Here we aim to examine whether differential variability in methylation is also an important feature of obesity, a typical non-malignant common complex disease. We analyzed genome-wide methylation profiles of over 470,000 CpGs in peripheral blood samples from 48 obese and 48 lean African-American youth aged 14-20 y old. A substantial number of differentially variable CpG sites (DVCs), using statistics based on variances, as well as a substantial number of differentially methylated CpG sites (DMCs), using statistics based on means, were identified. Similar to the findings in cancers, DVCs generally exhibited an outlier structure and were more variable in cases than in controls. By randomly splitting the current sample into a discovery and validation set, we observed that both the DVCs and DMCs identified from the first set could independently predict obesity status in the second set. Furthermore, both the genes harboring DMCs and the genes harboring DVCs showed significant enrichment of genes identified by genome-wide association studies on obesity and related diseases, such as hypertension, dyslipidemia, type 2 diabetes and certain types of cancers, supporting their roles in the etiology and pathogenesis of obesity. We generalized the recent finding on methylation variability in cancer research to obesity and demonstrated that differential variability is also an important feature of obesity-related methylation changes. Future studies on the epigenetics of obesity will benefit from both statistics based on means and statistics based on variances.

  14. The Effect of Concept Mapping-Guided Discovery Integrated Teaching Approach on Chemistry Students' Achievement and Retention

    ERIC Educational Resources Information Center

    Fatokun, K. V. F.; Eniayeju, P. A.

    2014-01-01

    This study investigates the effects of Concept Mapping-Guided Discovery Integrated Teaching Approach on the achievement and retention of chemistry students. The sample comprised 162 Senior Secondary two (SS 2) students drawn from two Science Schools in Nasarawa State, Central Nigeria with equivalent mean scores of 9.68 and 9.49 in their pre-test.…

  15. Ensuring Sample Quality for Biomarker Discovery Studies - Use of ICT Tools to Trace Biosample Life-cycle.

    PubMed

    Riondino, Silvia; Ferroni, Patrizia; Spila, Antonella; Alessandroni, Jhessica; D'Alessandro, Roberta; Formica, Vincenzo; Della-Morte, David; Palmirotta, Raffaele; Nanni, Umberto; Roselli, Mario; Guadagni, Fiorella

    2015-01-01

    The growing demand of personalized medicine marked the transition from an empirical medicine to a molecular one, aimed at predicting safer and more effective medical treatment for every patient, while minimizing adverse effects. This passage has emphasized the importance of biomarker discovery studies, and has led sample availability to assume a crucial role in biomedical research. Accordingly, a great interest in Biological Bank science has grown concomitantly. In biobanks, biological material and its accompanying data are collected, handled and stored in accordance with standard operating procedures (SOPs) and existing legislation. Sample quality is ensured by adherence to SOPs and sample whole life-cycle can be recorded by innovative tracking systems employing information technology (IT) tools for monitoring storage conditions and characterization of vast amount of data. All the above will ensure proper sample exchangeability among research facilities and will represent the starting point of all future personalized medicine-based clinical trials. Copyright© 2015, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.

  16. A generic template for automated bioanalytical ligand-binding assays using modular robotic scripts in support of discovery biotherapeutic programs.

    PubMed

    Duo, Jia; Dong, Huijin; DeSilva, Binodh; Zhang, Yan J

    2013-07-01

    Sample dilution and reagent pipetting are time-consuming steps in ligand-binding assays (LBAs). Traditional automation-assisted LBAs use assay-specific scripts that require labor-intensive script writing and user training. Five major script modules were developed on Tecan Freedom EVO liquid handling software to facilitate the automated sample preparation and LBA procedure: sample dilution, sample minimum required dilution, standard/QC minimum required dilution, standard/QC/sample addition, and reagent addition. The modular design of automation scripts allowed the users to assemble an automated assay with minimal script modification. The application of the template was demonstrated in three LBAs to support discovery biotherapeutic programs. The results demonstrated that the modular scripts provided the flexibility in adapting to various LBA formats and the significant time saving in script writing and scientist training. Data generated by the automated process were comparable to those by manual process while the bioanalytical productivity was significantly improved using the modular robotic scripts.

  17. Discovery and Genomic Characterization of a Novel Ovine Partetravirus and a New Genotype of Bovine Partetravirus

    PubMed Central

    Tse, Herman; Tsoi, Hoi-Wah; Teng, Jade L. L.; Chen, Xin-Chun; Liu, Haiying; Zhou, Boping; Zheng, Bo-Jian; Woo, Patrick C. Y.; Lau, Susanna K. P.; Yuen, Kwok-Yung

    2011-01-01

    Partetravirus is a recently described group of animal parvoviruses which include the human partetravirus, bovine partetravirus and porcine partetravirus (previously known as human parvovirus 4, bovine hokovirus and porcine hokovirus respectively). In this report, we describe the discovery and genomic characterization of partetraviruses in bovine and ovine samples from China. These partetraviruses were detected by PCR in 1.8% of bovine liver samples, 66.7% of ovine liver samples and 71.4% of ovine spleen samples. One of the bovine partetraviruses detected in the present samples is phylogenetically distinct from previously reported bovine partetraviruses and likely represents a novel genotype. The ovine partetravirus is a novel partetravirus and phylogenetically most related to the bovine partetraviruses. The genome organization is conserved amongst these viruses, including the presence of a putative transmembrane protein encoded by an overlapping reading frame in ORF2. Results from the present study provide further support to the classification of partetraviruses as a separate genus in Parvovirinae. PMID:21980506

  18. The future of discovery chemistry: quo vadis? Academic to industrial--the maturation of medicinal chemistry to chemical biology.

    PubMed

    Hoffmann, Torsten; Bishop, Cheryl

    2010-04-01

    At Roche, we set out to think about the future role of medicinal chemistry in drug discovery in a project involving both Roche internal stakeholders and external experts in drug discovery chemistry. To derive a coherent strategy, selected scientists were asked to take extreme positions and to derive two orthogonal strategic options: chemistry as the traditional mainstream science and chemistry as the central entrepreneurial science. We believe today's role of medicinal chemistry in industry has remained too narrow. To provide the innovation that industry requires, medicinal chemistry must play its part and diversify at pace with our increasing understanding of chemical biology and network pharmacology. 2010 Elsevier Ltd. All rights reserved.

  19. An extended sequential goodness-of-fit multiple testing method for discrete data.

    PubMed

    Castro-Conde, Irene; Döhler, Sebastian; de Uña-Álvarez, Jacobo

    2017-10-01

    The sequential goodness-of-fit (SGoF) multiple testing method has recently been proposed as an alternative to the familywise error rate- and the false discovery rate-controlling procedures in high-dimensional problems. For discrete data, the SGoF method may be very conservative. In this paper, we introduce an alternative SGoF-type procedure that takes into account the discreteness of the test statistics. Like the original SGoF, our new method provides weak control of the false discovery rate/familywise error rate but attains false discovery rate levels closer to the desired nominal level, and thus it is more powerful. We study the performance of this method in a simulation study and illustrate its application to a real pharmacovigilance data set.

  20. A Virtual Bioinformatics Knowledge Environment for Early Cancer Detection

    NASA Technical Reports Server (NTRS)

    Crichton, Daniel; Srivastava, Sudhir; Johnsey, Donald

    2003-01-01

    Discovery of disease biomarkers for cancer is a leading focus of early detection. The National Cancer Institute created a network of collaborating institutions focused on the discovery and validation of cancer biomarkers called the Early Detection Research Network (EDRN). Informatics plays a key role in enabling a virtual knowledge environment that provides scientists real time access to distributed data sets located at research institutions across the nation. The distributed and heterogeneous nature of the collaboration makes data sharing across institutions very difficult. EDRN has developed a comprehensive informatics effort focused on developing a national infrastructure enabling seamless access, sharing and discovery of science data resources across all EDRN sites. This paper will discuss the EDRN knowledge system architecture, its objectives and its accomplishments.

  1. KSC-2010-4716

    NASA Image and Video Library

    2010-09-20

    CAPE CANAVERAL, Fla. -- Bathed in bright xenon lights, space shuttle Discovery makes its nighttime trek, known as "rollout," from the Vehicle Assembly Building to Launch Pad 39A at NASA's Kennedy Space Center in Florida. It will take the shuttle, attached to its external fuel tank, twin solid rocket boosters and mobile launcher platform, about six hours to complete the move atop a crawler-transporter. Rollout sets the stage for Discovery's STS-133 crew to practice countdown and launch procedures during the Terminal Countdown Demonstration Test in mid-October. Targeted to liftoff Nov. 1, Discovery will take the Permanent Multipurpose Module (PMM) packed with supplies and critical spare parts, as well as Robonaut 2 (R2) to the International Space Station. Photo credit: NASA/Frankie Martin

  2. KSC-2010-4707

    NASA Image and Video Library

    2010-09-20

    CAPE CANAVERAL, Fla. -- Bathed in bright xenon lights, space shuttle Discovery makes its nighttime trek, known as "rollout," from the Vehicle Assembly Building to Launch Pad 39A at NASA's Kennedy Space Center in Florida. It will take the shuttle, attached to its external fuel tank, twin solid rocket boosters and mobile launcher platform, about six hours to complete the move atop a crawler-transporter. Rollout sets the stage for Discovery's STS-133 crew to practice countdown and launch procedures during the Terminal Countdown Demonstration Test in mid-October. Targeted to liftoff Nov. 1, Discovery will take the Permanent Multipurpose Module (PMM) packed with supplies and critical spare parts, as well as Robonaut 2 (R2) to the International Space Station. Photo credit: NASA/Jim Grossmann

  3. Hotspot volcanism in the southern South Atlantic: Geophysical constraints on the evolution of the southern Walvis Ridge and the Discovery Seamounts

    NASA Astrophysics Data System (ADS)

    Jokat, Wilfried; Reents, Stefanie

    2017-10-01

    The southern Atlantic hosts a variety of magmatic structures, namely the Walvis Ridge, the Discovery Seamounts and the Shona Ridge, which are believed to be related to the evolution/movement of hotspots. Although the basement of the Walvis Ridge has been sampled at different locations, geophysical data are too sparse to provide sufficient information about its deeper structure to compare it with other hotspot tracks. The Discovery Seamounts represent a completely different type feature in a way that it cannot be connected to any onshore volcanic feature. However, geological sampling of the volcanic basement indicates that the petrology of the Discovery track is very similar to Gough Island and the southern branch of Walvis Ridge. Both structures erupted into already existing seafloor and so have been seismically investigated to document how/if an associated thermal anomaly might have modified the underlying and surrounding oceanic crust. Seismic lines for both structures indicate rather normal seismic velocity distributions for oceanic crust. Both, the Walvis Ridge and the largest volcano of the Discovery Seamounts have a maximum thickness in our research area of 13 km. An interesting difference between these structures is a high velocity cone (> 6 km/s) at 2.4 km depth in the central part of Discovery Seamount. This might indicate a primarily intrusional type of seamount such as has been reported for several similar structures. In contrast the Walvis Ridge velocity structure does not show evidences for a shallow intrusional cone, but seismic velocities typical for oceanic layer 3 at a more or less constant depth level along the entire profile. This might indicate that the ridge's present-day topography is built mainly by extrusive material.

  4. In the Context of Multiple Intelligences Theory, Intelligent Data Analysis of Learning Styles Was Based on Rough Set Theory

    ERIC Educational Resources Information Center

    Narli, Serkan; Ozgen, Kemal; Alkan, Huseyin

    2011-01-01

    The present study aims to identify the relationship between individuals' multiple intelligence areas and their learning styles with mathematical clarity using the concept of rough sets which is used in areas such as artificial intelligence, data reduction, discovery of dependencies, prediction of data significance, and generating decision…

  5. Applications for unique identifiers in the geological sciences

    NASA Astrophysics Data System (ADS)

    Klump, J.; Lehnert, K. A.

    2012-12-01

    Even though geology has always been a generalist discipline in many parts, approaches towards questions about Earth's past have become increasingly interdisciplinary. At the same time, a wealth of samples has been collected, the resulting data have been stored in in disciplinary databases, the interpretations published in scientific literature. In the past these resources have existed alongside each other, semantically linked only by the knowledge of the researcher and his peers. One of the main drivers towards the inception of the world wide web was the ability to link scientific sources over the internet. The Uniform Resource Locator (URL) used to locate resources on the web soon turned out to be ephemeral in nature. A more reliable way of addressing objects was needed, a way of persistent identification to make digital objects, or digital representations of objects, part of the record of science. With their high degree of centralisation the scientific publishing houses were quick to implement and adopt a system for unique and persistent identification, the Digital Object Identifier (DOI) ®. At the same time other identifier systems exist alongside DOI, e.g. URN, ARK, handle ®, and others. There many uses for persistent identification in science, other than the identification of journal articles. DOI are already used for the identification of data, thus making data citable. There are several initiatives to assign identifiers to authors and institutions to allow unique identification. A recent development is the application of persistent identifiers for geological samples. As most data in the geosciences are derived from samples, it is crucial to be able to uniquely identify the samples from which a set of data were derived. Incomplete documentation of samples in publications, use of ambiguous sample names are major obstacles for synthesis studies and re-use of data. Access to samples for re-analysis and re-appraisal is limited due to the lack of a central catalogue that allows finding a sample's archiving location. The International Geo Sample Number (IGSN) provides solutions to the questions of unique sample identification and discovery. Use of the IGSN in digital data systems allows building linkages between the digital representation of samples in sample registries, e.g. SESAR, and their related data in the literature and in web accessible digital data repositories. Persistent identifiers are now available for literature, data, samples, and authors. More applications, e.g. identification of methods or instruments, will follow. In conjunction with semantic web technology the application of unique and persistent identifiers in the geosciences will aid discovery both through systematic data mining, exploratory data analysis, and serendipity effects. This talk will discuss existing and emerging applications for persistent identifiers in the geological sciences.

  6. Post-column infusion study of the 'dosing vehicle effect' in the liquid chromatography/tandem mass spectrometric analysis of discovery pharmacokinetic samples.

    PubMed

    Shou, Wilson Z; Naidong, Weng

    2003-01-01

    It has become increasingly popular in drug development to conduct discovery pharmacokinetic (PK) studies in order to evaluate important PK parameters of new chemical entities (NCEs) early in the discovery process. In these studies, dosing vehicles are typically employed in high concentrations to dissolve the test compounds in dose formulations. This can pose significant problems for the liquid chromatography/tandem mass spectrometric (LC/MS/MS) analysis of incurred samples due to potential signal suppression of the analytes caused by the vehicles. In this paper, model test compounds in rat plasma were analyzed using a generic fast gradient LC/MS/MS method. Commonly used dosing vehicles, including poly(ethylene glycol) 400 (PEG 400), polysorbate 80 (Tween 80), hydroxypropyl beta-cyclodextrin, and N,N-dimethylacetamide, were fortified into rat plasma at 5 mg/mL before extraction. Their effects on the sample analysis results were evaluated by the method of post-column infusion. Results thus obtained indicated that polymeric vehicles such as PEG 400 and Tween 80 caused significant suppression (> 50%, compared with results obtained from plasma samples free from vehicles) to certain analytes, when minimum sample cleanup was used and the analytes happened to co-elute with the vehicles. Effective means to minimize this 'dosing vehicle effect' included better chromatographic separations, better sample cleanup, and alternative ionization methods. Finally, a real-world example is given to illustrate the suppression problem posed by high levels of PEG 400 in sample analysis, and to discuss steps taken in overcoming the problem. A simple but effective means of identifying a 'dosing vehicle effect' is also proposed. Copyright 2003 John Wiley & Sons, Ltd.

  7. Open Science Meets Stem Cells: A New Drug Discovery Approach for Neurodegenerative Disorders

    PubMed Central

    Han, Chanshuai; Chaineau, Mathilde; Chen, Carol X.-Q.; Beitel, Lenore K.; Durcan, Thomas M.

    2018-01-01

    Neurodegenerative diseases are a challenge for drug discovery, as the biological mechanisms are complex and poorly understood, with a paucity of models that faithfully recapitulate these disorders. Recent advances in stem cell technology have provided a paradigm shift, providing researchers with tools to generate human induced pluripotent stem cells (iPSCs) from patient cells. With the potential to generate any human cell type, we can now generate human neurons and develop “first-of-their-kind” disease-relevant assays for small molecule screening. Now that the tools are in place, it is imperative that we accelerate discoveries from the bench to the clinic. Using traditional closed-door research systems raises barriers to discovery, by restricting access to cells, data and other research findings. Thus, a new strategy is required, and the Montreal Neurological Institute (MNI) and its partners are piloting an “Open Science” model. One signature initiative will be that the MNI biorepository will curate and disseminate patient samples in a more accessible manner through open transfer agreements. This feeds into the MNI open drug discovery platform, focused on developing industry-standard assays with iPSC-derived neurons. All cell lines, reagents and assay findings developed in this open fashion will be made available to academia and industry. By removing the obstacles many universities and companies face in distributing patient samples and assay results, our goal is to accelerate translational medical research and the development of new therapies for devastating neurodegenerative disorders. PMID:29467610

  8. Open Science Meets Stem Cells: A New Drug Discovery Approach for Neurodegenerative Disorders.

    PubMed

    Han, Chanshuai; Chaineau, Mathilde; Chen, Carol X-Q; Beitel, Lenore K; Durcan, Thomas M

    2018-01-01

    Neurodegenerative diseases are a challenge for drug discovery, as the biological mechanisms are complex and poorly understood, with a paucity of models that faithfully recapitulate these disorders. Recent advances in stem cell technology have provided a paradigm shift, providing researchers with tools to generate human induced pluripotent stem cells (iPSCs) from patient cells. With the potential to generate any human cell type, we can now generate human neurons and develop "first-of-their-kind" disease-relevant assays for small molecule screening. Now that the tools are in place, it is imperative that we accelerate discoveries from the bench to the clinic. Using traditional closed-door research systems raises barriers to discovery, by restricting access to cells, data and other research findings. Thus, a new strategy is required, and the Montreal Neurological Institute (MNI) and its partners are piloting an "Open Science" model. One signature initiative will be that the MNI biorepository will curate and disseminate patient samples in a more accessible manner through open transfer agreements. This feeds into the MNI open drug discovery platform, focused on developing industry-standard assays with iPSC-derived neurons. All cell lines, reagents and assay findings developed in this open fashion will be made available to academia and industry. By removing the obstacles many universities and companies face in distributing patient samples and assay results, our goal is to accelerate translational medical research and the development of new therapies for devastating neurodegenerative disorders.

  9. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium

    PubMed Central

    2014-01-01

    We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838

  10. Sulfur "Concrete" for Lunar Applications - Sublimation Concerns

    NASA Technical Reports Server (NTRS)

    Grugel, Richard N.; Toutanji, Houssam

    2006-01-01

    Melting sulfur and mixing it with an aggregate to form "concrete" is commercially well established and constitutes a material that is particularly well-suited for use in corrosive environments. Discovery of the mineral troilite (FeS) on the moon poses the question of extracting the sulfur for use as a lunar construction material. This would be an attractive alternative to conventional concrete as it does not require water. However, the viability of sulfur concrete in a lunar environment, which is characterized by lack of an atmosphere and extreme temperatures, is not well understood. Here it is assumed that the lunar ore can be mined, refined, and the raw sulfur melded with appropriate lunar regolith to form, for example, bricks. This study evaluates pure sulfur and two sets of small sulfur concrete samples that have been prepared using JSC-1 lunar stimulant and SiO2 powder as aggregate additions. Each set was subjected to extended periods in a vacuum environment to evaluate sublimation issues. Results from these experiments are presented and discussed within the context of the lunar environment.

  11. An Evolutionary Approach for Identifying Driver Mutations in Colorectal Cancer

    PubMed Central

    Leder, Kevin; Riester, Markus; Iwasa, Yoh; Lengauer, Christoph; Michor, Franziska

    2015-01-01

    The traditional view of cancer as a genetic disease that can successfully be treated with drugs targeting mutant onco-proteins has motivated whole-genome sequencing efforts in many human cancer types. However, only a subset of mutations found within the genomic landscape of cancer is likely to provide a fitness advantage to the cell. Distinguishing such “driver” mutations from innocuous “passenger” events is critical for prioritizing the validation of candidate mutations in disease-relevant models. We design a novel statistical index, called the Hitchhiking Index, which reflects the probability that any observed candidate gene is a passenger alteration, given the frequency of alterations in a cross-sectional cancer sample set, and apply it to a mutational data set in colorectal cancer. Our methodology is based upon a population dynamics model of mutation accumulation and selection in colorectal tissue prior to cancer initiation as well as during tumorigenesis. This methodology can be used to aid in the prioritization of candidate mutations for functional validation and contributes to the process of drug discovery. PMID:26379039

  12. Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.

    PubMed

    Geeleher, Paul; Zhang, Zhenyu; Wang, Fan; Gruener, Robert F; Nath, Aritro; Morrison, Gladys; Bhutra, Steven; Grossman, Robert L; Huang, R Stephanie

    2017-10-01

    Obtaining accurate drug response data in large cohorts of cancer patients is very challenging; thus, most cancer pharmacogenomics discovery is conducted in preclinical studies, typically using cell lines and mouse models. However, these platforms suffer from serious limitations, including small sample sizes. Here, we have developed a novel computational method that allows us to impute drug response in very large clinical cancer genomics data sets, such as The Cancer Genome Atlas (TCGA). The approach works by creating statistical models relating gene expression to drug response in large panels of cancer cell lines and applying these models to tumor gene expression data in the clinical data sets (e.g., TCGA). This yields an imputed drug response for every drug in each patient. These imputed drug response data are then associated with somatic genetic variants measured in the clinical cohort, such as copy number changes or mutations in protein coding genes. These analyses recapitulated drug associations for known clinically actionable somatic genetic alterations and identified new predictive biomarkers for existing drugs. © 2017 Geeleher et al.; Published by Cold Spring Harbor Laboratory Press.

  13. An Initiative to Facilitate Park Usage, Discovery, and Physical Activity Among Children and Adolescents in Greenville County, South Carolina, 2014

    PubMed Central

    Kaczynski, Andrew T.; Hughey, S. Morgan; Besenyi, Gina M.; Powers, Alicia R.

    2017-01-01

    Introduction Parks are important settings for increasing population-level physical activity (PA). The objective of this study was to evaluate Park Hop, an incentivized scavenger-hunt–style intervention designed to influence park usage, discovery, park-based PA, and perceptions of parks among children and adolescents in Greenville County, South Carolina. Methods We used 2 data collection methods: matched preintervention and postintervention parent-completed surveys and in-park observations during 4 days near the midpoint of the intervention. We used paired-samples t tests and logistic regression to analyze changes in park visitation, perceptions, and PA. Results Children and adolescents visited an average of 12.1 (of 19) Park Hop parks, and discovered an average of 4.6 venues. In a subset of participants, from preintervention to postintervention, the mean number of park visits increased from 5.0 visits to 6.1 visits, the proportion of time engaged in PA during the most recent park visit increased from 77% to 87%, and parents reported more positive perceptions of the quality of park amenities. We observed more children and adolescents (n = 586) in the 2 intervention parks than in the 2 matched control parks (n = 305). However, the likelihood of children and adolescents engaging in moderate-to-vigorous PA was significantly greater in the control parks (74.3%) than in Park Hop parks (64.2%). Conclusion Park Hop facilitated community-collaboration between park agencies and positively influenced park usage, park discovery, time engaged in PA during park visits, and perceptions of parks. This low-cost, replicable, and scalable model can be implemented across communities to facilitate youth and family-focused PA through parks. PMID:28182864

  14. Towards High-throughput Immunomics for Infectious Diseases: Use of Next-generation Peptide Microarrays for Rapid Discovery and Mapping of Antigenic Determinants*

    PubMed Central

    Carmona, Santiago J.; Nielsen, Morten; Schafer-Nielsen, Claus; Mucci, Juan; Altcheh, Jaime; Balouz, Virginia; Tekiel, Valeria; Frasch, Alberto C.; Campetella, Oscar; Buscaglia, Carlos A.; Agüero, Fernán

    2015-01-01

    Complete characterization of antibody specificities associated to natural infections is expected to provide a rich source of serologic biomarkers with potential applications in molecular diagnosis, follow-up of chemotherapeutic treatments, and prioritization of targets for vaccine development. Here, we developed a highly-multiplexed platform based on next-generation high-density peptide microarrays to map these specificities in Chagas Disease, an exemplar of a human infectious disease caused by the protozoan Trypanosoma cruzi. We designed a high-density peptide microarray containing more than 175,000 overlapping 15mer peptides derived from T. cruzi proteins. Peptides were synthesized in situ on microarray slides, spanning the complete length of 457 parasite proteins with fully overlapped 15mers (1 residue shift). Screening of these slides with antibodies purified from infected patients and healthy donors demonstrated both a high technical reproducibility as well as epitope mapping consistency when compared with earlier low-throughput technologies. Using a conservative signal threshold to classify positive (reactive) peptides we identified 2,031 disease-specific peptides and 97 novel parasite antigens, effectively doubling the number of known antigens and providing a 10-fold increase in the number of fine mapped antigenic determinants for this disease. Finally, further analysis of the chip data showed that optimizing the amount of sequence overlap of displayed peptides can increase the protein space covered in a single chip by at least ∼threefold without sacrificing sensitivity. In conclusion, we show the power of high-density peptide chips for the discovery of pathogen-specific linear B-cell epitopes from clinical samples, thus setting the stage for high-throughput biomarker discovery screenings and proteome-wide studies of immune responses against pathogens. PMID:25922409

  15. Towards High-throughput Immunomics for Infectious Diseases: Use of Next-generation Peptide Microarrays for Rapid Discovery and Mapping of Antigenic Determinants.

    PubMed

    Carmona, Santiago J; Nielsen, Morten; Schafer-Nielsen, Claus; Mucci, Juan; Altcheh, Jaime; Balouz, Virginia; Tekiel, Valeria; Frasch, Alberto C; Campetella, Oscar; Buscaglia, Carlos A; Agüero, Fernán

    2015-07-01

    Complete characterization of antibody specificities associated to natural infections is expected to provide a rich source of serologic biomarkers with potential applications in molecular diagnosis, follow-up of chemotherapeutic treatments, and prioritization of targets for vaccine development. Here, we developed a highly-multiplexed platform based on next-generation high-density peptide microarrays to map these specificities in Chagas Disease, an exemplar of a human infectious disease caused by the protozoan Trypanosoma cruzi. We designed a high-density peptide microarray containing more than 175,000 overlapping 15 mer peptides derived from T. cruzi proteins. Peptides were synthesized in situ on microarray slides, spanning the complete length of 457 parasite proteins with fully overlapped 15 mers (1 residue shift). Screening of these slides with antibodies purified from infected patients and healthy donors demonstrated both a high technical reproducibility as well as epitope mapping consistency when compared with earlier low-throughput technologies. Using a conservative signal threshold to classify positive (reactive) peptides we identified 2,031 disease-specific peptides and 97 novel parasite antigens, effectively doubling the number of known antigens and providing a 10-fold increase in the number of fine mapped antigenic determinants for this disease. Finally, further analysis of the chip data showed that optimizing the amount of sequence overlap of displayed peptides can increase the protein space covered in a single chip by at least ∼ threefold without sacrificing sensitivity. In conclusion, we show the power of high-density peptide chips for the discovery of pathogen-specific linear B-cell epitopes from clinical samples, thus setting the stage for high-throughput biomarker discovery screenings and proteome-wide studies of immune responses against pathogens. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  16. Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis.

    PubMed

    Shchetynsky, Klementy; Diaz-Gallo, Lina-Marcella; Folkersen, Lasse; Hensvold, Aase Haj; Catrina, Anca Irinel; Berg, Louise; Klareskog, Lars; Padyukov, Leonid

    2017-02-02

    Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of "connector" genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA.

  17. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data.

    PubMed

    Aliper, Alexander; Plis, Sergey; Artemov, Artem; Ulloa, Alvaro; Mamoshina, Polina; Zhavoronkov, Alex

    2016-07-05

    Deep learning is rapidly advancing many areas of science and technology with multiple success stories in image, text, voice and video recognition, robotics, and autonomous driving. In this paper we demonstrate how deep neural networks (DNN) trained on large transcriptional response data sets can classify various drugs to therapeutic categories solely based on their transcriptional profiles. We used the perturbation samples of 678 drugs across A549, MCF-7, and PC-3 cell lines from the LINCS Project and linked those to 12 therapeutic use categories derived from MeSH. To train the DNN, we utilized both gene level transcriptomic data and transcriptomic data processed using a pathway activation scoring algorithm, for a pooled data set of samples perturbed with different concentrations of the drug for 6 and 24 hours. In both pathway and gene level classification, DNN achieved high classification accuracy and convincingly outperformed the support vector machine (SVM) model on every multiclass classification problem, however, models based on pathway level data performed significantly better. For the first time we demonstrate a deep learning neural net trained on transcriptomic data to recognize pharmacological properties of multiple drugs across different biological systems and conditions. We also propose using deep neural net confusion matrices for drug repositioning. This work is a proof of principle for applying deep learning to drug discovery and development.

  18. 2017 EULAR recommendations for a core data set to support observational research and clinical care in rheumatoid arthritis.

    PubMed

    Radner, Helga; Chatzidionysiou, Katerina; Nikiphorou, Elena; Gossec, Laure; Hyrich, Kimme L; Zabalan, Condruta; van Eijk-Hustings, Yvonne; Williamson, Paula R; Balanescu, Andra; Burmester, Gerd R; Carmona, Loreto; Dougados, Maxime; Finckh, Axel; Haugeberg, Glenn; Hetland, Merete Lund; Oliver, Susan; Porter, Duncan; Raza, Karim; Ryan, Patrick; Santos, Maria Jose; van der Helm-van Mil, Annette; van Riel, Piet; von Krause, Gabrielle; Zavada, Jakub; Dixon, William G; Askling, Johan

    2018-04-01

    Personalised medicine, new discoveries and studies on rare exposures or outcomes require large samples that are increasingly difficult for any single investigator to obtain. Collaborative work is limited by heterogeneities, both what is being collected and how it is defined. To develop a core set for data collection in rheumatoid arthritis (RA) research which (1) allows harmonisation of data collection in future observational studies, (2) acts as a common data model against which existing databases can be mapped and (3) serves as a template for standardised data collection in routine clinical practice to support generation of research-quality data. A multistep, international multistakeholder consensus process was carried out involving voting via online surveys and two face-to-face meetings. A core set of 21 items ('what to collect') and their instruments ('how to collect') was agreed: age, gender, disease duration, diagnosis of RA, body mass index, smoking, swollen/tender joints, patient/evaluator global, pain, quality of life, function, composite scores, acute phase reactants, serology, structural damage, treatment and comorbidities. The core set should facilitate collaborative research, allow for comparisons across studies and harmonise future data from clinical practice via electronic medical record systems. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  19. iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure.

    PubMed

    Bolduc, Benjamin; Youens-Clark, Ken; Roux, Simon; Hurwitz, Bonnie L; Sullivan, Matthew B

    2017-01-01

    Microbes affect nutrient and energy transformations throughout the world's ecosystems, yet they do so under viral constraints. In complex communities, viral metagenome (virome) sequencing is transforming our ability to quantify viral diversity and impacts. Although some bottlenecks, for example, few reference genomes and nonquantitative viromics, have been overcome, the void of centralized data sets and specialized tools now prevents viromics from being broadly applied to answer fundamental ecological questions. Here we present iVirus, a community resource that leverages the CyVerse cyberinfrastructure to provide access to viromic tools and data sets. The iVirus Data Commons contains both raw and processed data from 1866 samples and 73 projects derived from global ocean expeditions, as well as existing and legacy public repositories. Through the CyVerse Discovery Environment, users can interrogate these data sets using existing analytical tools (software applications known as 'Apps') for assembly, open reading frame prediction and annotation, as well as several new Apps specifically developed for analyzing viromes. Because Apps are web based and powered by CyVerse supercomputing resources, they enable scalable analyses for a broad user base. Finally, a use-case scenario documents how to apply these advances toward new data. This growing iVirus resource should help researchers utilize viromics as yet another tool to elucidate viral roles in nature.

  20. iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure

    PubMed Central

    Bolduc, Benjamin; Youens-Clark, Ken; Roux, Simon; Hurwitz, Bonnie L; Sullivan, Matthew B

    2017-01-01

    Microbes affect nutrient and energy transformations throughout the world's ecosystems, yet they do so under viral constraints. In complex communities, viral metagenome (virome) sequencing is transforming our ability to quantify viral diversity and impacts. Although some bottlenecks, for example, few reference genomes and nonquantitative viromics, have been overcome, the void of centralized data sets and specialized tools now prevents viromics from being broadly applied to answer fundamental ecological questions. Here we present iVirus, a community resource that leverages the CyVerse cyberinfrastructure to provide access to viromic tools and data sets. The iVirus Data Commons contains both raw and processed data from 1866 samples and 73 projects derived from global ocean expeditions, as well as existing and legacy public repositories. Through the CyVerse Discovery Environment, users can interrogate these data sets using existing analytical tools (software applications known as ‘Apps') for assembly, open reading frame prediction and annotation, as well as several new Apps specifically developed for analyzing viromes. Because Apps are web based and powered by CyVerse supercomputing resources, they enable scalable analyses for a broad user base. Finally, a use-case scenario documents how to apply these advances toward new data. This growing iVirus resource should help researchers utilize viromics as yet another tool to elucidate viral roles in nature. PMID:27420028

  1. Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.

    PubMed

    Bandyopadhyay, Sanghamitra; Mallik, Saurav

    2018-01-01

    Identification of combinatorial markers from multiple data sources is a challenging task in bioinformatics. Here, we propose a novel computational framework for identifying significant combinatorial markers ( s) using both gene expression and methylation data. The gene expression and methylation data are integrated into a single continuous data as well as a (post-discretized) boolean data based on their intrinsic (i.e., inverse) relationship. A novel combined score of methylation and expression data (viz., ) is introduced which is computed on the integrated continuous data for identifying initial non-redundant set of genes. Thereafter, (maximal) frequent closed homogeneous genesets are identified using a well-known biclustering algorithm applied on the integrated boolean data of the determined non-redundant set of genes. A novel sample-based weighted support ( ) is then proposed that is consecutively calculated on the integrated boolean data of the determined non-redundant set of genes in order to identify the non-redundant significant genesets. The top few resulting genesets are identified as potential s. Since our proposed method generates a smaller number of significant non-redundant genesets than those by other popular methods, the method is much faster than the others. Application of the proposed technique on an expression and a methylation data for Uterine tumor or Prostate Carcinoma produces a set of significant combination of markers. We expect that such a combination of markers will produce lower false positives than individual markers.

  2. Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography—Tandem Mass Spectrometry

    PubMed Central

    Tabb, David L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.; Variyath, Asokan Mulayath; Ham, Amy-Joan L.; Bunk, David M.; Kilpatrick, Lisa E.; Billheimer, Dean D.; Blackman, Ronald K.; Cardasis, Helene L.; Carr, Steven A.; Clauser, Karl R.; Jaffe, Jacob D.; Kowalski, Kevin A.; Neubert, Thomas A.; Regnier, Fred E.; Schilling, Birgit; Tegeler, Tony J.; Wang, Mu; Wang, Pei; Whiteaker, Jeffrey R.; Zimmerman, Lisa J.; Fisher, Susan J.; Gibson, Bradford W.; Kinsinger, Christopher R.; Mesri, Mehdi; Rodriguez, Henry; Stein, Steven E.; Tempst, Paul; Paulovich, Amanda G.; Liebler, Daniel C.; Spiegelman, Cliff

    2009-01-01

    The complexity of proteomic instrumentation for LC-MS/MS introduces many possible sources of variability. Data-dependent sampling of peptides constitutes a stochastic element at the heart of discovery proteomics. Although this variation impacts the identification of peptides, proteomic identifications are far from completely random. In this study, we analyzed interlaboratory data sets from the NCI Clinical Proteomic Technology Assessment for Cancer to examine repeatability and reproducibility in peptide and protein identifications. Included data spanned 144 LC-MS/MS experiments on four Thermo LTQ and four Orbitrap instruments. Samples included yeast lysate, the NCI-20 defined dynamic range protein mix, and the Sigma UPS 1 defined equimolar protein mix. Some of our findings reinforced conventional wisdom, such as repeatability and reproducibility being higher for proteins than for peptides. Most lessons from the data, however, were more subtle. Orbitraps proved capable of higher repeatability and reproducibility, but aberrant performance occasionally erased these gains. Even the simplest protein digestions yielded more peptide ions than LC-MS/MS could identify during a single experiment. We observed that peptide lists from pairs of technical replicates overlapped by 35–60%, giving a range for peptide-level repeatability in these experiments. Sample complexity did not appear to affect peptide identification repeatability, even as numbers of identified spectra changed by an order of magnitude. Statistical analysis of protein spectral counts revealed greater stability across technical replicates for Orbitraps, making them superior to LTQ instruments for biomarker candidate discovery. The most repeatable peptides were those corresponding to conventional tryptic cleavage sites, those that produced intense MS signals, and those that resulted from proteins generating many distinct peptides. Reproducibility among different instruments of the same type lagged behind repeatability of technical replicates on a single instrument by several percent. These findings reinforce the importance of evaluating repeatability as a fundamental characteristic of analytical technologies. PMID:19921851

  3. Automated Discovery of Functional Generality of Human Gene Expression Programs

    PubMed Central

    Gerber, Georg K; Dowell, Robin D; Jaakkola, Tommi S; Gifford, David K

    2007-01-01

    An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data. PMID:17696603

  4. Evaluation of Allele-Specific Somatic Changes of Genome-Wide Association Study Susceptibility Alleles in Human Colorectal Cancers

    PubMed Central

    Gerber, Madelyn M.; Hampel, Heather; Schulz, Nathan P.; Fernandez, Soledad; Wei, Lai; Zhou, Xiao-Ping; de la Chapelle, Albert; Toland, Amanda Ewart

    2012-01-01

    Background Tumors frequently exhibit loss of tumor suppressor genes or allelic gains of activated oncogenes. A significant proportion of cancer susceptibility loci in the mouse show somatic losses or gains consistent with the presence of a tumor susceptibility or resistance allele. Thus, allele-specific somatic gains or losses at loci may demarcate the presence of resistance or susceptibility alleles. The goal of this study was to determine if previously mapped susceptibility loci for colorectal cancer show evidence of allele-specific somatic events in colon tumors. Methods We performed quantitative genotyping of 16 single nucleotide polymorphisms (SNPs) showing statistically significant association with colorectal cancer in published genome-wide association studies (GWAS). We genotyped 194 paired normal and colorectal tumor DNA samples and 296 paired validation samples to investigate these SNPs for allele-specific somatic gains and losses. We combined analysis of our data with published data for seven of these SNPs. Results No statistically significant evidence for allele-specific somatic selection was observed for the tested polymorphisms in the discovery set. The rs6983267 variant, which has shown preferential loss of the non-risk T allele and relative gain of the risk G allele in previous studies, favored relative gain of the G allele in the combined discovery and validation samples (corrected p-value = 0.03). When we combined our data with published allele-specific imbalance data for this SNP, the G allele of rs6983267 showed statistically significant evidence of relative retention (p-value = 2.06×10−4). Conclusions Our results suggest that the majority of variants identified as colon cancer susceptibility alleles through GWAS do not exhibit somatic allele-specific imbalance in colon tumors. Our data confirm previously published results showing allele-specific imbalance for rs6983267. These results indicate that allele-specific imbalance of cancer susceptibility alleles may not be a common phenomenon in colon cancer. PMID:22629442

  5. Chemical characterization of the acid alteration of diesel fuel: Non-targeted analysis by two-dimensional gas chromatography coupled with time-of-flight mass spectrometry with tile-based Fisher ratio and combinatorial threshold determination

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parsons, Brendon A.; Pinkerton, David K.; Wright, Bob W.

    The illicit chemical alteration of petroleum fuels is of scientific interest, particularly to regulatory agencies which set fuel specifications, or excises based on those specifications. One type of alteration is the reaction of diesel fuel with concentrated sulfuric acid. Such reactions are known to subtly alter the chemical composition of the fuel, particularly the aromatic species native to the fuel. Comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC–TOFMS) is ideally suited for the analysis of diesel fuel, but may provide the analyst with an overwhelming amount of data, particularly in sample-class comparison experiments comprised of manymore » samples. The tile-based Fisher-ratio (F-ratio) method reduces the abundance of data in a GC × GC–TOFMS experiment to only the peaks which significantly distinguish the unaltered and acid altered sample classes. Three samples of diesel fuel from different filling stations were each altered to discover chemical features, i.e., analyte peaks, which were consistently changed by the acid reaction. Using different fuels prioritizes the discovery of features which are likely to be robust to the variation present between fuel samples and which will consequently be useful in determining whether an unknown sample has been acid altered. The subsequent analysis confirmed that aromatic species are removed by the acid alteration, with the degree of removal consistent with predicted reactivity toward electrophilic aromatic sulfonation. Additionally, we observed that alkenes and alkynes were also removed from the fuel, and that sulfur dioxide or compounds that degrade to sulfur dioxide are generated by the acid alteration. In addition to applying the previously reported tile-based F-ratio method, this report also expands null distribution analysis to algorithmically determine an F-ratio threshold to confidently select only the features which are sufficiently class-distinguishing. When applied to the acid alteration of diesel fuel, the suggested per-hit F-ratio threshold was 12.4, which is predicted to maintain the false discovery rate (FDR) below 0.1%. Using this F-ratio threshold, 107 of the 3362 preliminary hits were deemed significantly changing due to the acid alteration, with the number of false positives estimated to be about 3.« less

  6. Empirical performance of the calibrated self-controlled cohort analysis within temporal pattern discovery: lessons for developing a risk identification and analysis system.

    PubMed

    Norén, G Niklas; Bergvall, Tomas; Ryan, Patrick B; Juhlin, Kristina; Schuemie, Martijn J; Madigan, David

    2013-10-01

    Observational healthcare data offer the potential to identify adverse drug reactions that may be missed by spontaneous reporting. The self-controlled cohort analysis within the Temporal Pattern Discovery framework compares the observed-to-expected ratio of medical outcomes during post-exposure surveillance periods with those during a set of distinct pre-exposure control periods in the same patients. It utilizes an external control group to account for systematic differences between the different time periods, thus combining within- and between-patient confounder adjustment in a single measure. To evaluate the performance of the calibrated self-controlled cohort analysis within Temporal Pattern Discovery as a tool for risk identification in observational healthcare data. Different implementations of the calibrated self-controlled cohort analysis were applied to 399 drug-outcome pairs (165 positive and 234 negative test cases across 4 health outcomes of interest) in 5 real observational databases (four with administrative claims and one with electronic health records). Performance was evaluated on real data through sensitivity/specificity, the area under receiver operator characteristics curve (AUC), and bias. The calibrated self-controlled cohort analysis achieved good predictive accuracy across the outcomes and databases under study. The optimal design based on this reference set uses a 360 days surveillance period and a single control period 180 days prior to new prescriptions. It achieved an average AUC of 0.75 and AUC >0.70 in all but one scenario. A design with three separate control periods performed better for the electronic health records database and for acute renal failure across all data sets. The estimates for negative test cases were generally unbiased, but a minor negative bias of up to 0.2 on the RR-scale was observed with the configurations using multiple control periods, for acute liver injury and upper gastrointestinal bleeding. The calibrated self-controlled cohort analysis within Temporal Pattern Discovery shows promise as a tool for risk identification; it performs well at discriminating positive from negative test cases. The optimal parameter configuration may vary with the data set and medical outcome of interest.

  7. Estimating the rate of biological introductions: Lessepsian fishes in the Mediterranean.

    PubMed

    Belmaker, Jonathan; Brokovich, Eran; China, Victor; Golani, Daniel; Kiflawi, Moshe

    2009-04-01

    Sampling issues preclude the direct use of the discovery rate of exotic species as a robust estimate of their rate of introduction. Recently, a method was advanced that allows maximum-likelihood estimation of both the observational probability and the introduction rate from the discovery record. Here, we propose an alternative approach that utilizes the discovery record of native species to control for sampling effort. Implemented in a Bayesian framework using Markov chain Monte Carlo simulations, the approach provides estimates of the rate of introduction of the exotic species, and of additional parameters such as the size of the species pool from which they are drawn. We illustrate the approach using Red Sea fishes recorded in the eastern Mediterranean, after crossing the Suez Canal, and show that the two approaches may lead to different conclusions. The analytical framework is highly flexible and could provide a basis for easy modification to other systems for which first-sighting data on native and introduced species are available.

  8. System for Earth Sample Registration SESAR: Services for IGSN Registration and Sample Metadata Management

    NASA Astrophysics Data System (ADS)

    Chan, S.; Lehnert, K. A.; Coleman, R. J.

    2011-12-01

    SESAR, the System for Earth Sample Registration, is an online registry for physical samples collected for Earth and environmental studies. SESAR generates and administers the International Geo Sample Number IGSN, a unique identifier for samples that is dramatically advancing interoperability amongst information systems for sample-based data. SESAR was developed to provide the complete range of registry services, including definition of IGSN syntax and metadata profiles, registration and validation of name spaces requested by users, tools for users to submit and manage sample metadata, validation of submitted metadata, generation and validation of the unique identifiers, archiving of sample metadata, and public or private access to the sample metadata catalog. With the development of SESAR v3, we placed particular emphasis on creating enhanced tools that make metadata submission easier and more efficient for users, and that provide superior functionality for users to manage metadata of their samples in their private workspace MySESAR. For example, SESAR v3 includes a module where users can generate custom spreadsheet templates to enter metadata for their samples, then upload these templates online for sample registration. Once the content of the template is uploaded, it is displayed online in an editable grid format. Validation rules are executed in real-time on the grid data to ensure data integrity. Other new features of SESAR v3 include the capability to transfer ownership of samples to other SESAR users, the ability to upload and store images and other files in a sample metadata profile, and the tracking of changes to sample metadata profiles. In the next version of SESAR (v3.5), we will further improve the discovery, sharing, registration of samples. For example, we are developing a more comprehensive suite of web services that will allow discovery and registration access to SESAR from external systems. Both batch and individual registrations will be possible through web services. Based on valuable feedback from the user community, we will introduce enhancements that add greater flexibility to the system to accommodate the vast diversity of metadata that users want to store. Users will be able to create custom metadata fields and use these for the samples they register. Users will also be able to group samples into 'collections' to make retrieval for research projects or publications easier. An improved interface design will allow for better workflow transition and navigation throughout the application. In keeping up with the demands of a growing community, SESAR has also made process changes to ensure efficiency in system development. For example, we have implemented a release cycle to better track enhancements and fixes to the system, and an API library that facilitates reusability of code. Usage tracking, metrics and surveys capture information to guide the direction of future developments. A new set of administrative tools allows greater control of system management.

  9. Early detection surveillance for an emerging plant pathogen: a rule of thumb to predict prevalence at first discovery.

    PubMed

    Parnell, S; Gottwald, T R; Cunniffe, N J; Alonso Chavez, V; van den Bosch, F

    2015-09-07

    Emerging plant pathogens are a significant problem for conservation and food security. Surveillance is often instigated in an attempt to detect an invading epidemic before it gets out of control. Yet in practice many epidemics are not discovered until already at a high prevalence, partly due to a lack of quantitative understanding of how surveillance effort and the dynamics of an invading epidemic relate. We test a simple rule of thumb to determine, for a surveillance programme taking a fixed number of samples at regular intervals, the distribution of the prevalence an epidemic will have reached on first discovery (discovery-prevalence) and its expectation E(q*). We show that E(q*) = r/(N/Δ), i.e. simply the rate of epidemic growth divided by the rate of sampling; where r is the epidemic growth rate, N is the sample size and Δ is the time between sampling rounds. We demonstrate the robustness of this rule of thumb using spatio-temporal epidemic models as well as data from real epidemics. Our work supports the view that, for the purposes of early detection surveillance, simple models can provide useful insights in apparently complex systems. The insight can inform decisions on surveillance resource allocation in plant health and has potential applicability to invasive species generally. © 2015 The Author(s).

  10. Early detection surveillance for an emerging plant pathogen: a rule of thumb to predict prevalence at first discovery

    PubMed Central

    Parnell, S.; Gottwald, T. R.; Cunniffe, N. J.; Alonso Chavez, V.; van den Bosch, F.

    2015-01-01

    Emerging plant pathogens are a significant problem for conservation and food security. Surveillance is often instigated in an attempt to detect an invading epidemic before it gets out of control. Yet in practice many epidemics are not discovered until already at a high prevalence, partly due to a lack of quantitative understanding of how surveillance effort and the dynamics of an invading epidemic relate. We test a simple rule of thumb to determine, for a surveillance programme taking a fixed number of samples at regular intervals, the distribution of the prevalence an epidemic will have reached on first discovery (discovery-prevalence) and its expectation E(q*). We show that E(q*) = r/(N/Δ), i.e. simply the rate of epidemic growth divided by the rate of sampling; where r is the epidemic growth rate, N is the sample size and Δ is the time between sampling rounds. We demonstrate the robustness of this rule of thumb using spatio-temporal epidemic models as well as data from real epidemics. Our work supports the view that, for the purposes of early detection surveillance, simple models can provide useful insights in apparently complex systems. The insight can inform decisions on surveillance resource allocation in plant health and has potential applicability to invasive species generally. PMID:26336177

  11. EarthChem and SESAR: Data Resources and Interoperability for EarthScope Cyberinfrastructure

    NASA Astrophysics Data System (ADS)

    Lehnert, K. A.; Walker, D.; Block, K.; Vinay, S.; Ash, J.

    2008-12-01

    Data management within the EarthScope Cyberinfrastructure needs to pursue two goals in order to advance and maximize the broad scientific application and impact of the large volumes of observational data acquired by EarthScope facilities: (a) to provide access to all data acquired by EarthScope facilities, and to promote their use by broad audiences, and (b) to facilitate discovery of, access to, and integration of multi-disciplinary data sets that complement EarthScope data in support of EarthScope science. EarthChem and SESAR, the System for Earth Sample Registration, are two projects within the Geoinformatics for Geochemistry program that offer resources for EarthScope CI. EarthChem operates a data portal that currently provides access to >13 million analytical values for >600,000 samples, more than half of which are from North America, including data from the USGS and all data from the NAVDAT database, a web-accessible repository for age, chemical and isotopic data from Mesozoic and younger igneous rocks in western North America. The new EarthChem GEOCHRON database will house data collected in association with GeoEarthScope, storing and serving geochronological data submitted by participating facilities. The EarthChem Deep Lithosphere Dataset is a compilation of petrological data for mantle xenoliths, initiated in collaboration with GeoFrame to complement geophysical endeavors within EarthScope science. The EarthChem Geochemical Resource Library provides a home for geochemical and petrological data products and data sets. Parts of the digital data in EarthScope CI refer to physical samples such as drill cores, igneous rocks, or water and gas samples, collected, for example, by SAFOD or by EarthScope science projects and acquired through lab-based analysis. Management of sample-based data requires the use of global unique identifiers for samples, so that distributed data for individual samples generated in different labs and published in different papers can be unambiguously linked and integrated. SESAR operates a registry for Earth samples that assigns and administers the International GeoSample Numbers (IGSN) as a global unique identifier for samples. Registration of EarthScope samples with SESAR and use of the IGSN will ensure their unique identification in publications and data systems, thus facilitating interoperability among sample-based data relevant to EarthScope CI and globally. It will also make these samples visible to global audiences via the SESAR Global Sample Catalog.

  12. Combinatorial materials approach to accelerate materials discovery for transportation (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Tong, Wei

    2017-04-01

    Combinatorial material research offers fast and efficient solutions to identify promising and advanced materials. It has revolutionized the pharmaceutical industry and now is being applied to accelerate the discovery of other new compounds, e.g. superconductors, luminescent materials, catalysts etc. Differing from the traditional trial-and-error process, this approach allows for the synthesis of a large number of compositionally diverse compounds by varying the combinations of the components and adjusting the ratios. It largely reduces the cost of single-sample synthesis/characterization, along with the turnaround time in the material discovery process, therefore, could dramatically change the existing paradigm for discovering and commercializing new materials. This talk outlines the use of combinatorial materials approach in the material discovery in transportation sector. It covers the general introduction to the combinatorial material concept, state of art for its application in energy-related research. At the end, LBNL capabilities in combinatorial materials synthesis and high throughput characterization that are applicable for material discovery research will be highlighted.

  13. An Hα-selected sample of cataclysmic variables - I. Observations of newly discovered systems

    NASA Astrophysics Data System (ADS)

    Pretorius, Magaretha L.; Knigge, Christian

    2008-04-01

    Strong selection effects are present in observational samples of cataclysmic variables (CVs), complicating comparisons to theoretical predictions. The selection criteria used to define most CV samples discriminate heavily against the discovery of short-period, intrinsically faint systems. The situation can be improved by selecting CVs for the presence of emission lines. For this reason, we have constructed a homogeneous sample of CVs selected on the basis of Hα emission. We present discovery observations of the 14 CVs and two additional CV candidates found in this search. The orbital periods of 11 of the new CVs were measured; all are above 3 h. There are two eclipsing systems in the sample, and one in which we observed a quasi-periodic modulation on a ~1000s time-scale. We also detect the secondary star in the spectrum of one system, and measure its spectral type. Several of the new CVs have the spectroscopic appearance of nova-like variables, and a few display what may be SW Sex star behaviour. In a companion paper, we discuss the implications of this new sample for CV evolution.

  14. Characterization of a Genomic Signature of Pregnancy in the Breast

    PubMed Central

    Belitskaya-Lévy, Ilana; Zeleniuch-Jacquotte, Anne; Russo, Jose; Russo, Irma H.; Bordás, Pal; Åhman, Janet; Afanasyeva, Yelena; Johansson, Robert; Lenner, Per; Li, Xiaochun; de Cicco, Ricardo López; Peri, Suraj; Ross, Eric; Russo, Patricia A.; Santucci-Pereira, Julia; Sheriff, Fathima S.; Slifker, Michael; Hallmans, Göran; Toniolo, Paolo; Arslan, Alan A.

    2012-01-01

    The objective of the current study was to comprehensively compare the genomic profiles in the breast of parous and nulliparous postmenopausal women to identify genes that permanently change their expression following pregnancy. The study was designed as a two-phase approach. In the discovery phase, we compared breast genomic profiles of 37 parous with 18 nulliparous postmenopausal women. In the validation phase, confirmation of the genomic patterns observed in the discovery phase was sought in an independent set of 30 parous and 22 nulliparous postmenopausal women. RNA was hybridized to Affymetrix HG_U133 Plus 2.0 oligonucleotide arrays containing probes to 54,675 transcripts; scanned and the images analyzed using Affymetrix GCOS software. Surrogate variable analysis, logistic regression and significance analysis for microarrays were used to identify statistically significant differences in expression of genes. The False Discovery Rate (FDR) approach was used to control for multiple comparisons. We found that 208 genes (305 probe sets) were differentially expressed between parous and nulliparous women in both discovery and validation phases of the study at a FDR of 10% and with at least a 1.25-fold change. These genes are involved in regulation of transcription, centrosome organization, RNA splicing, cell cycle control, adhesion and differentiation. The results provide persuasive evidence that full-term pregnancy induces long-term genomic changes in the breast. The genomic signature of pregnancy could be used as an intermediate marker to assess potential chemopreventive interventions with hormones mimicking the effects of pregnancy for prevention of breast cancer. PMID:21622728

  15. A serpentinite-hosted ecosystem in the Southern Mariana Forearc

    NASA Astrophysics Data System (ADS)

    Ohara, Yasuhiko; Reagan, Mark K.; Fujikura, Katsunori; Watanabe, Hiromi; Michibayashi, Katsuyoshi; Ishii, Teruaki; Stern, Robert J.; Pujana, Ignacio; Martinez, Fernando; Girard, Guillaume; Ribeiro, Julia; Brounce, Maryjo; Komori, Naoaki; Kino, Masashi

    2012-02-01

    Several varieties of seafloor hydrothermal vents with widely varying fluid compositions and temperatures and vent communities occur in different tectonic settings. The discovery of the Lost City hydrothermal field in the Mid-Atlantic Ridge has stimulated interest in the role of serpentinization of peridotite in generating H2- and CH4-rich fluids and associated carbonate chimneys, as well as in the biological communities supported in highly reduced, alkaline environments. Abundant vesicomyid clam communities associated with a serpentinite-hosted hydrothermal vent system in the southern Mariana forearc were discovered during a DSV Shinkai 6500 dive in September 2010. We named this system the "Shinkai Seep Field (SSF)." The SSF appears to be a serpentinite-hosted ecosystem within a forearc (convergent margin) setting that is supported by fault-controlled fluid pathways connected to the decollement of the subducting slab. The discovery of the SSF supports the prediction that serpentinite-hosted vents may be widespread on the ocean floor. The discovery further indicates that these serpentinite-hosted low-temperature fluid vents can sustain high-biomass communities and has implications for the chemical budget of the oceans and the distribution of abyssal chemosynthetic life.

  16. Targeting legume loci: A comparison of three methods for target enrichment bait design in Leguminosae phylogenomics.

    PubMed

    Vatanparast, Mohammad; Powell, Adrian; Doyle, Jeff J; Egan, Ashley N

    2018-03-01

    The development of pipelines for locus discovery has spurred the use of target enrichment for plant phylogenomics. However, few studies have compared pipelines from locus discovery and bait design, through validation, to tree inference. We compared three methods within Leguminosae (Fabaceae) and present a workflow for future efforts. Using 30 transcriptomes, we compared Hyb-Seq, MarkerMiner, and the Yang and Smith (Y&S) pipelines for locus discovery, validated 7501 baits targeting 507 loci across 25 genera via Illumina sequencing, and inferred gene and species trees via concatenation- and coalescent-based methods. Hyb-Seq discovered loci with the longest mean length. MarkerMiner discovered the most conserved loci with the least flagged as paralogous. Y&S offered the most parsimony-informative sites and putative orthologs. Target recovery averaged 93% across taxa. We optimized our targeted locus set based on a workflow designed to minimize paralog/ortholog conflation and thus present 423 loci for legume phylogenomics. Methods differed across criteria important for phylogenetic marker development. We recommend Hyb-Seq as a method that may be useful for most phylogenomic projects. Our targeted locus set is a resource for future, community-driven efforts to reconstruct the legume tree of life.

  17. Easing the Discovery of NASA and International Near-Real-Time Data Using the Global Change Master Directory

    NASA Astrophysics Data System (ADS)

    Ritz, S.; Olsen, L. M.; Morahan, M.; Stevens, T.; Aleman, A.; Grebas, S. K.

    2011-12-01

    The Global Change Master Directory (GCMD) provides an extensive directory of descriptive and spatial information about data sets and data-related services, which are relevant to Earth science research. The directory's data discovery components include controlled keywords, free-text searches, and map/date searches. The GCMD portal for NASA's Land Atmosphere Near-real-time Capability for EOS (LANCE) data products leverages these discovery features by providing users a direct route to NASA's Near-Real-Time (NRT) collections. This portal offers direct access to collection entries by instrument name, informing users of the availability of data. After a relevant collection entry is found through the GCMD's search components, the "Get Data" URL within the entry directs the user to the desired data. Building on the importance of Near-Real-Time (NRT) data, the Committee on Earth Observation Satellites (CEOS) International Directory Network (IDN) is targeting an effort to identify NRT data set collections from the CEOS international members. The international collections will be advertised as the "CEOS IDN NRT" portal to assist users in rapidly discovering these products, which are potentially useful for their research or public response. [This portal is expected to be released in 2012.

  18. The battle of Alzheimer's Disease - the beginning of the future Unleashing the potential of academic discoveries.

    PubMed

    Lundkvist, Johan; Halldin, Magnus M; Sandin, Johan; Nordvall, Gunnar; Forsell, Pontus; Svensson, Samuel; Jansson, Liselotte; Johansson, Gunilla; Winblad, Bengt; Ekstrand, Jonas

    2014-01-01

    Alzheimer's Disease (AD) is the most common form of dementia, affecting approximately 36 million people worldwide. To date there is no preventive or curative treatment available for AD, and in absence of major progress in therapeutic development, AD manifests a concrete socioeconomic threat. The awareness of the growing problem of AD is increasing, exemplified by the recent G8 Dementia Summit, a meeting held in order to set the stage and steer the compass for the future. Simultaneously, and paradoxically, we have seen key players in the pharmaceutical industry that have recently closed or significantly decreased their R&D spending on AD and other CNS disorders. Given the pressing need for new treatments in this area, other actors need to step-in and enter this drug discovery arena complementing the industrial efforts, in order to turn biological and technological progress into novel therapeutics. In this article, we present an example of a novel drug discovery initiative that in a non-profit setting, aims to integrate with both preclinical and clinical academic groups and pharmaceutical industry to explore the therapeutic potential of new concepts in patients, using novel biology, state of the art technologies and rapid concept testing.

  19. A serpentinite-hosted ecosystem in the Southern Mariana Forearc

    PubMed Central

    Ohara, Yasuhiko; Reagan, Mark K.; Fujikura, Katsunori; Watanabe, Hiromi; Michibayashi, Katsuyoshi; Ishii, Teruaki; Stern, Robert J.; Pujana, Ignacio; Martinez, Fernando; Girard, Guillaume; Ribeiro, Julia; Brounce, Maryjo; Komori, Naoaki; Kino, Masashi

    2012-01-01

    Several varieties of seafloor hydrothermal vents with widely varying fluid compositions and temperatures and vent communities occur in different tectonic settings. The discovery of the Lost City hydrothermal field in the Mid-Atlantic Ridge has stimulated interest in the role of serpentinization of peridotite in generating H2- and CH4-rich fluids and associated carbonate chimneys, as well as in the biological communities supported in highly reduced, alkaline environments. Abundant vesicomyid clam communities associated with a serpentinite-hosted hydrothermal vent system in the southern Mariana forearc were discovered during a DSV Shinkai 6500 dive in September 2010. We named this system the “Shinkai Seep Field (SSF).” The SSF appears to be a serpentinite-hosted ecosystem within a forearc (convergent margin) setting that is supported by fault-controlled fluid pathways connected to the decollement of the subducting slab. The discovery of the SSF supports the prediction that serpentinite-hosted vents may be widespread on the ocean floor. The discovery further indicates that these serpentinite-hosted low-temperature fluid vents can sustain high-biomass communities and has implications for the chemical budget of the oceans and the distribution of abyssal chemosynthetic life. PMID:22323611

  20. A serpentinite-hosted ecosystem in the Southern Mariana Forearc.

    PubMed

    Ohara, Yasuhiko; Reagan, Mark K; Fujikura, Katsunori; Watanabe, Hiromi; Michibayashi, Katsuyoshi; Ishii, Teruaki; Stern, Robert J; Pujana, Ignacio; Martinez, Fernando; Girard, Guillaume; Ribeiro, Julia; Brounce, Maryjo; Komori, Naoaki; Kino, Masashi

    2012-02-21

    Several varieties of seafloor hydrothermal vents with widely varying fluid compositions and temperatures and vent communities occur in different tectonic settings. The discovery of the Lost City hydrothermal field in the Mid-Atlantic Ridge has stimulated interest in the role of serpentinization of peridotite in generating H(2)- and CH(4)-rich fluids and associated carbonate chimneys, as well as in the biological communities supported in highly reduced, alkaline environments. Abundant vesicomyid clam communities associated with a serpentinite-hosted hydrothermal vent system in the southern Mariana forearc were discovered during a DSV Shinkai 6500 dive in September 2010. We named this system the "Shinkai Seep Field (SSF)." The SSF appears to be a serpentinite-hosted ecosystem within a forearc (convergent margin) setting that is supported by fault-controlled fluid pathways connected to the decollement of the subducting slab. The discovery of the SSF supports the prediction that serpentinite-hosted vents may be widespread on the ocean floor. The discovery further indicates that these serpentinite-hosted low-temperature fluid vents can sustain high-biomass communities and has implications for the chemical budget of the oceans and the distribution of abyssal chemosynthetic life.

  1. Discovery Orbiter Major Modifications

    NASA Image and Video Library

    2003-08-27

    In the Vehicle Assembly Building, Jim Landy, NDE specialist, sets up a flight crew lockers for flash thermography. He is screening the lockers for hidden damage underneath dings and dents that might occur during handling.

  2. Spotlight on Fluorescent Biosensors—Tools for Diagnostics and Drug Discovery

    PubMed Central

    2013-01-01

    Fluorescent biosensors constitute potent tools for probing biomolecules in their natural environment and for visualizing dynamic processes in complex biological samples, living cells, and organisms. They are well suited for highlighting molecular alterations associated with pathological disorders, thereby offering means of implementing sensitive and alternative technologies for diagnostic purposes. They constitute attractive tools for drug discovery programs, from high throughput screening assays to preclinical studies. PMID:24900780

  3. A Discovery-Based Hydrochlorination of Carvone Utilizing a Guided-Inquiry Approach to Determine the Product Structure from [superscript 13]C NMR Spectra

    ERIC Educational Resources Information Center

    Pelter, Michael W.; Walker, Natalie M.

    2012-01-01

    This experiment describes a discovery-based method for the regio- and stereoselective hydrochlorination of carvone, appropriate for a 3-h second-semester organic chemistry laboratory. The product is identified through interpretation of the [superscript 13]C NMR and DEPT spectra are obtained on an Anasazi EFT-60 at 15 MHz as neat samples. A…

  4. Large-scale serum protein biomarker discovery in Duchenne muscular dystrophy.

    PubMed

    Hathout, Yetrib; Brody, Edward; Clemens, Paula R; Cripe, Linda; DeLisle, Robert Kirk; Furlong, Pat; Gordish-Dressman, Heather; Hache, Lauren; Henricson, Erik; Hoffman, Eric P; Kobayashi, Yvonne Monique; Lorts, Angela; Mah, Jean K; McDonald, Craig; Mehler, Bob; Nelson, Sally; Nikrad, Malti; Singer, Britta; Steele, Fintan; Sterling, David; Sweeney, H Lee; Williams, Steve; Gold, Larry

    2015-06-09

    Serum biomarkers in Duchenne muscular dystrophy (DMD) may provide deeper insights into disease pathogenesis, suggest new therapeutic approaches, serve as acute read-outs of drug effects, and be useful as surrogate outcome measures to predict later clinical benefit. In this study a large-scale biomarker discovery was performed on serum samples from patients with DMD and age-matched healthy volunteers using a modified aptamer-based proteomics technology. Levels of 1,125 proteins were quantified in serum samples from two independent DMD cohorts: cohort 1 (The Parent Project Muscular Dystrophy-Cincinnati Children's Hospital Medical Center), 42 patients with DMD and 28 age-matched normal volunteers; and cohort 2 (The Cooperative International Neuromuscular Research Group, Duchenne Natural History Study), 51 patients with DMD and 17 age-matched normal volunteers. Forty-four proteins showed significant differences that were consistent in both cohorts when comparing DMD patients and healthy volunteers at a 1% false-discovery rate, a large number of significant protein changes for such a small study. These biomarkers can be classified by known cellular processes and by age-dependent changes in protein concentration. Our findings demonstrate both the utility of this unbiased biomarker discovery approach and suggest potential new diagnostic and therapeutic avenues for ameliorating the burden of DMD and, we hope, other rare and devastating diseases.

  5. Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci.

    PubMed

    Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate

    2017-02-14

    Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). The PAX8-target gene set was ranked 1/615 in the discovery (P GSEA <0.001; FDR=0.21), 7/615 in the replication (P GSEA =0.004; FDR=0.37), and 1/615 in the combined (P GSEA <0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10 -5 (including six with P<5 × 10 -8 ). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (P GSEA =0.025) and IGROV1 (P GSEA =0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC.

  6. Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci

    PubMed Central

    Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate

    2017-01-01

    Background: Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. Methods: All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). Results: The PAX8-target gene set was ranked 1/615 in the discovery (PGSEA<0.001; FDR=0.21), 7/615 in the replication (PGSEA=0.004; FDR=0.37), and 1/615 in the combined (PGSEA<0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10−5 (including six with P<5 × 10−8). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (PGSEA=0.025) and IGROV1 (PGSEA=0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Conclusions: Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC. PMID:28103614

  7. Causal discovery and inference: concepts and recent methodological advances.

    PubMed

    Spirtes, Peter; Zhang, Kun

    This paper aims to give a broad coverage of central concepts and principles involved in automated causal inference and emerging approaches to causal discovery from i.i.d data and from time series. After reviewing concepts including manipulations, causal models, sample predictive modeling, causal predictive modeling, and structural equation models, we present the constraint-based approach to causal discovery, which relies on the conditional independence relationships in the data, and discuss the assumptions underlying its validity. We then focus on causal discovery based on structural equations models, in which a key issue is the identifiability of the causal structure implied by appropriately defined structural equation models: in the two-variable case, under what conditions (and why) is the causal direction between the two variables identifiable? We show that the independence between the error term and causes, together with appropriate structural constraints on the structural equation, makes it possible. Next, we report some recent advances in causal discovery from time series. Assuming that the causal relations are linear with nonGaussian noise, we mention two problems which are traditionally difficult to solve, namely causal discovery from subsampled data and that in the presence of confounding time series. Finally, we list a number of open questions in the field of causal discovery and inference.

  8. Activation of the Endogenous Renin-Angiotensin-Aldosterone System or Aldosterone Administration Increases Urinary Exosomal Sodium Channel Excretion.

    PubMed

    Qi, Ying; Wang, Xiaojing; Rose, Kristie L; MacDonald, W Hayes; Zhang, Bing; Schey, Kevin L; Luther, James M

    2016-02-01

    Urinary exosomes secreted by multiple cell types in the kidney may participate in intercellular signaling and provide an enriched source of kidney-specific proteins for biomarker discovery. Factors that alter the exosomal protein content remain unknown. To determine whether endogenous and exogenous hormones modify urinary exosomal protein content, we analyzed samples from 14 mildly hypertensive patients in a crossover study during a high-sodium (HS, 160 mmol/d) diet and low-sodium (LS, 20 mmol/d) diet to activate the endogenous renin-angiotensin-aldosterone system. We further analyzed selected exosomal protein content in a separate cohort of healthy persons receiving intravenous aldosterone (0.7 μg/kg per hour for 10 hours) versus vehicle infusion. The LS diet increased plasma renin activity and aldosterone concentration, whereas aldosterone infusion increased only aldosterone concentration. Protein analysis of paired urine exosome samples by liquid chromatography-tandem mass spectrometry-based multidimensional protein identification technology detected 2775 unique proteins, of which 316 exhibited significantly altered abundance during LS diet. Sodium chloride cotransporter (NCC) and α- and γ-epithelial sodium channel (ENaC) subunits from the discovery set were verified using targeted multiple reaction monitoring mass spectrometry quantified with isotope-labeled peptide standards. Dietary sodium restriction or acute aldosterone infusion similarly increased urine exosomal γENaC[112-122] peptide concentrations nearly 20-fold, which correlated with plasma aldosterone concentration and urinary Na/K ratio. Urine exosomal NCC and αENaC concentrations were relatively unchanged during these interventions. We conclude that urinary exosome content is altered by renin-angiotensin-aldosterone system activation. Urinary measurement of exosomal γENaC[112-122] concentration may provide a useful biomarker of ENaC activation in future clinical studies. Copyright © 2016 by the American Society of Nephrology.

  9. Activation of the Endogenous Renin-Angiotensin-Aldosterone System or Aldosterone Administration Increases Urinary Exosomal Sodium Channel Excretion

    PubMed Central

    Qi, Ying; Wang, Xiaojing; Rose, Kristie L.; MacDonald, W. Hayes; Zhang, Bing; Schey, Kevin L.

    2016-01-01

    Urinary exosomes secreted by multiple cell types in the kidney may participate in intercellular signaling and provide an enriched source of kidney-specific proteins for biomarker discovery. Factors that alter the exosomal protein content remain unknown. To determine whether endogenous and exogenous hormones modify urinary exosomal protein content, we analyzed samples from 14 mildly hypertensive patients in a crossover study during a high-sodium (HS, 160 mmol/d) diet and low-sodium (LS, 20 mmol/d) diet to activate the endogenous renin-angiotensin-aldosterone system. We further analyzed selected exosomal protein content in a separate cohort of healthy persons receiving intravenous aldosterone (0.7 μg/kg per hour for 10 hours) versus vehicle infusion. The LS diet increased plasma renin activity and aldosterone concentration, whereas aldosterone infusion increased only aldosterone concentration. Protein analysis of paired urine exosome samples by liquid chromatography-tandem mass spectrometry–based multidimensional protein identification technology detected 2775 unique proteins, of which 316 exhibited significantly altered abundance during LS diet. Sodium chloride cotransporter (NCC) and α- and γ-epithelial sodium channel (ENaC) subunits from the discovery set were verified using targeted multiple reaction monitoring mass spectrometry quantified with isotope-labeled peptide standards. Dietary sodium restriction or acute aldosterone infusion similarly increased urine exosomal γENaC[112–122] peptide concentrations nearly 20-fold, which correlated with plasma aldosterone concentration and urinary Na/K ratio. Urine exosomal NCC and αENaC concentrations were relatively unchanged during these interventions. We conclude that urinary exosome content is altered by renin-angiotensin-aldosterone system activation. Urinary measurement of exosomal γENaC[112–122] concentration may provide a useful biomarker of ENaC activation in future clinical studies. PMID:26113616

  10. An interactive web application for the dissemination of human systems immunology data.

    PubMed

    Speake, Cate; Presnell, Scott; Domico, Kelly; Zeitner, Brad; Bjork, Anna; Anderson, David; Mason, Michael J; Whalen, Elizabeth; Vargas, Olivia; Popov, Dimitry; Rinchai, Darawan; Jourde-Chiche, Noemie; Chiche, Laurent; Quinn, Charlie; Chaussabel, Damien

    2015-06-19

    Systems immunology approaches have proven invaluable in translational research settings. The current rate at which large-scale datasets are generated presents unique challenges and opportunities. Mining aggregates of these datasets could accelerate the pace of discovery, but new solutions are needed to integrate the heterogeneous data types with the contextual information that is necessary for interpretation. In addition, enabling tools and technologies facilitating investigators' interaction with large-scale datasets must be developed in order to promote insight and foster knowledge discovery. State of the art application programming was employed to develop an interactive web application for browsing and visualizing large and complex datasets. A collection of human immune transcriptome datasets were loaded alongside contextual information about the samples. We provide a resource enabling interactive query and navigation of transcriptome datasets relevant to human immunology research. Detailed information about studies and samples are displayed dynamically; if desired the associated data can be downloaded. Custom interactive visualizations of the data can be shared via email or social media. This application can be used to browse context-rich systems-scale data within and across systems immunology studies. This resource is publicly available online at [Gene Expression Browser Landing Page ( https://gxb.benaroyaresearch.org/dm3/landing.gsp )]. The source code is also available openly [Gene Expression Browser Source Code ( https://github.com/BenaroyaResearch/gxbrowser )]. We have developed a data browsing and visualization application capable of navigating increasingly large and complex datasets generated in the context of immunological studies. This intuitive tool ensures that, whether taken individually or as a whole, such datasets generated at great effort and expense remain interpretable and a ready source of insight for years to come.

  11. Human cervicovaginal fluid biomarkers to predict term and preterm labor

    PubMed Central

    Heng, Yujing J.; Liong, Stella; Permezel, Michael; Rice, Gregory E.; Di Quinzio, Megan K. W.; Georgiou, Harry M.

    2015-01-01

    Preterm birth (PTB; birth before 37 completed weeks of gestation) remains the major cause of neonatal morbidity and mortality. The current generation of biomarkers predictive of PTB have limited utility. In pregnancy, the human cervicovaginal fluid (CVF) proteome is a reflection of the local biochemical milieu and is influenced by the physical changes occurring in the vagina, cervix and adjacent overlying fetal membranes. Term and preterm labor (PTL) share common pathways of cervical ripening, myometrial activation and fetal membranes rupture leading to birth. We therefore hypothesize that CVF biomarkers predictive of labor may be similar in both the term and preterm labor setting. In this review, we summarize some of the existing published literature as well as our team's breadth of work utilizing the CVF for the discovery and validation of putative CVF biomarkers predictive of human labor. Our team established an efficient method for collecting serial CVF samples for optimal 2-dimensional gel electrophoresis resolution and analysis. We first embarked on CVF biomarker discovery for the prediction of spontaneous onset of term labor using 2D-electrophoresis and solution array multiple analyte profiling. 2D-electrophoretic analyses were subsequently performed on CVF samples associated with PTB. Several proteins have been successfully validated and demonstrate that these biomarkers are associated with term and PTL and may be predictive of both term and PTL. In addition, the measurement of these putative biomarkers was found to be robust to the influences of vaginal microflora and/or semen. The future development of a multiple biomarker bed-side test would help improve the prediction of PTB and the clinical management of patients. PMID:26029118

  12. Presence of Human Hepegivirus-1 in a Cohort of People Who Inject Drugs

    PubMed Central

    Kandathil, Abraham J.; Breitwieser, Florian P.; Sachithanandham, Jaiprasath; Robinson, Matthew; Mehta, Shruti H.; Timp, Winston; Salzberg, Steven L.; Thomas, David L.

    2017-01-01

    Background Next-generation metagenomic sequencing (NGMS) has opened new frontiers in microbial discovery but has been clinically characterized in only a few settings. Objective To explore the plasma virome of persons who inject drugs while characterizing the sensitivity and accuracy of NGMS compared to quantitative clinical standards. Design Longitudinal and cross-sectional studies. Participants Hepatitis C virus (HCV) and human immunodeficiency virus (HIV) co-infected persons enrolled in a clinical trial (NCT01285050) and/or a well-characterized cohort study of persons who have injected drugs. Measurements Viral nucleic acid in plasma by NGMS and quantitative polymerase chain reaction (PCR). Results NGMS generated a total of 600 million reads, which included the expected HIV and HCV RNA sequences. HIV and HCV reads were consistently identified only when samples contained >10,000 copies or IU/ml as determined by quantitative PCR. A novel RNA virus, human hepegivirus-1 (HHpgV-1) was also detected by NGMS in 4 samples from 2 persons in the clinical trial. Because of this unexpected finding, using a quantitative PCR assay for HHpgV-1, infection was also detected in 17 (10.9%) of 156 members of a cohort of persons who injected drugs in whom HHpgV-1 viremia persisted for a median of at least 4538 days and was associated with detection of other bloodborne viruses such as HCV RNA and SEN virus D Limitations The medical importance of HHpgV-1infection remains unknown. Conclusions Although NGMS is insensitive to detection of viruses with relatively low plasma nucleic acid concentrations, it may have a broad potential for discovery of new viral infections of potential medical importance like HHpgV-1. PMID:28586923

  13. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.

    Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less

  14. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    DOE PAGES

    Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; ...

    2014-01-01

    Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less

  15. User Driven Development of Software Tools for Open Data Discovery and Exploration

    NASA Astrophysics Data System (ADS)

    Schlobinski, Sascha; Keppel, Frank; Dihe, Pascal; Boot, Gerben; Falkenroth, Esa

    2016-04-01

    The use of open data in research faces challenges not restricted to inherent properties such as data quality, resolution of open data sets. Often Open data is catalogued insufficiently or fragmented. Software tools that support the effective discovery including the assessment of the data's appropriateness for research have shortcomings such as the lack of essential functionalities like support for data provenance. We believe that one of the reasons is the neglect of real end users requirements in the development process of aforementioned software tools. In the context of the FP7 Switch-On project we have pro-actively engaged the relevant user user community to collaboratively develop a means to publish, find and bind open data relevant for hydrologic research. Implementing key concepts of data discovery and exploration we have used state of the art web technologies to provide an interactive software tool that is easy to use yet powerful enough to satisfy the data discovery and access requirements of the hydrological research community.

  16. Object-graphs for context-aware visual category discovery.

    PubMed

    Lee, Yong Jae; Grauman, Kristen

    2012-02-01

    How can knowing about some categories help us to discover new ones in unlabeled images? Unsupervised visual category discovery is useful to mine for recurring objects without human supervision, but existing methods assume no prior information and thus tend to perform poorly for cluttered scenes with multiple objects. We propose to leverage knowledge about previously learned categories to enable more accurate discovery, and address challenges in estimating their familiarity in unsegmented, unlabeled images. We introduce two variants of a novel object-graph descriptor to encode the 2D and 3D spatial layout of object-level co-occurrence patterns relative to an unfamiliar region and show that by using them to model the interaction between an image’s known and unknown objects, we can better detect new visual categories. Rather than mine for all categories from scratch, our method identifies new objects while drawing on useful cues from familiar ones. We evaluate our approach on several benchmark data sets and demonstrate clear improvements in discovery over conventional purely appearance-based baselines.

  17. AFLP fragment isolation technique as a method to produce random sequences for single nucleotide polymorphism discovery in the green turtle, Chelonia mydas.

    PubMed

    Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A

    2009-01-01

    The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.

  18. The Circle of Apollonius: A Discovery Activity.

    ERIC Educational Resources Information Center

    Cain, Ralph W.

    1994-01-01

    Presents an activity using simple constructions and a knowledge of proportions to discover that the sets of points generated by the described procedures are circles. Presents a proof of the result. (Author/MKR)

  19. New MOST Photometry of the 55 Cancri System

    NASA Astrophysics Data System (ADS)

    Dragomir, Diana; Matthews, Jaymie M.; Winn, Joshua N.; Rowe, Jason F.

    2014-04-01

    Since the discovery of its transiting nature, the super-Earth 55 Cnc e has become one of the most enthusiastically studied exoplanets, having been observed spectroscopically and photometrically, in the ultraviolet, optical and infrared regimes. To this rapidly growing data set, we contribute 42 days of new, nearly continuous MOST photometry of the 55 Cnc system. Our analysis of these observations together with the discovery photometry obtained in 2011 allows us to determine the planetary radius (1.990+0.084 -0.080) and orbital period (0.7365417+0.0000025 -0.0000028) of 55 Cnc e with unprecedented precision. We also followed up on the out-of-transit phase variation first observed in the 2011 photometry, and set an upper limit on the depth of the planet's secondary eclipse, leading to an upper limit on its geometric albedo of 0.6.

  20. Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.

    PubMed

    Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu

    2015-07-27

    Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.

Top