A machine-learned computational functional genomics-based approach to drug classification.
Lötsch, Jörn; Ultsch, Alfred
2016-12-01
The public accessibility of "big data" about the molecular targets of drugs and the biological functions of genes allows novel data science-based approaches to pharmacology that link drugs directly with their effects on pathophysiologic processes. This provides a phenotypic path to drug discovery and repurposing. This paper compares the performance of a functional genomics-based criterion to the traditional drug target-based classification. Knowledge discovery in the DrugBank and Gene Ontology databases allowed the construction of a "drug target versus biological process" matrix as a combination of "drug versus genes" and "genes versus biological processes" matrices. As a canonical example, such matrices were constructed for classical analgesic drugs. These matrices were projected onto a toroid grid of 50 × 82 artificial neurons using a self-organizing map (SOM). The distance, respectively, cluster structure of the high-dimensional feature space of the matrices was visualized on top of this SOM using a U-matrix. The cluster structure emerging on the U-matrix provided a correct classification of the analgesics into two main classes of opioid and non-opioid analgesics. The classification was flawless with both the functional genomics and the traditional target-based criterion. The functional genomics approach inherently included the drugs' modulatory effects on biological processes. The main pharmacological actions known from pharmacological science were captures, e.g., actions on lipid signaling for non-opioid analgesics that comprised many NSAIDs and actions on neuronal signal transmission for opioid analgesics. Using machine-learned techniques for computational drug classification in a comparative assessment, a functional genomics-based criterion was found to be similarly suitable for drug classification as the traditional target-based criterion. This supports a utility of functional genomics-based approaches to computational system pharmacology for drug discovery and repurposing.
Content-Based Discovery for Web Map Service using Support Vector Machine and User Relevance Feedback
Cheng, Xiaoqiang; Qi, Kunlun; Zheng, Jie; You, Lan; Wu, Huayi
2016-01-01
Many discovery methods for geographic information services have been proposed. There are approaches for finding and matching geographic information services, methods for constructing geographic information service classification schemes, and automatic geographic information discovery. Overall, the efficiency of the geographic information discovery keeps improving., There are however, still two problems in Web Map Service (WMS) discovery that must be solved. Mismatches between the graphic contents of a WMS and the semantic descriptions in the metadata make discovery difficult for human users. End-users and computers comprehend WMSs differently creating semantic gaps in human-computer interactions. To address these problems, we propose an improved query process for WMSs based on the graphic contents of WMS layers, combining Support Vector Machine (SVM) and user relevance feedback. Our experiments demonstrate that the proposed method can improve the accuracy and efficiency of WMS discovery. PMID:27861505
Hu, Kai; Gui, Zhipeng; Cheng, Xiaoqiang; Qi, Kunlun; Zheng, Jie; You, Lan; Wu, Huayi
2016-01-01
Many discovery methods for geographic information services have been proposed. There are approaches for finding and matching geographic information services, methods for constructing geographic information service classification schemes, and automatic geographic information discovery. Overall, the efficiency of the geographic information discovery keeps improving., There are however, still two problems in Web Map Service (WMS) discovery that must be solved. Mismatches between the graphic contents of a WMS and the semantic descriptions in the metadata make discovery difficult for human users. End-users and computers comprehend WMSs differently creating semantic gaps in human-computer interactions. To address these problems, we propose an improved query process for WMSs based on the graphic contents of WMS layers, combining Support Vector Machine (SVM) and user relevance feedback. Our experiments demonstrate that the proposed method can improve the accuracy and efficiency of WMS discovery.
Computational biology for cardiovascular biomarker discovery.
Azuaje, Francisco; Devaux, Yvan; Wagner, Daniel
2009-07-01
Computational biology is essential in the process of translating biological knowledge into clinical practice, as well as in the understanding of biological phenomena based on the resources and technologies originating from the clinical environment. One such key contribution of computational biology is the discovery of biomarkers for predicting clinical outcomes using 'omic' information. This process involves the predictive modelling and integration of different types of data and knowledge for screening, diagnostic or prognostic purposes. Moreover, this requires the design and combination of different methodologies based on statistical analysis and machine learning. This article introduces key computational approaches and applications to biomarker discovery based on different types of 'omic' data. Although we emphasize applications in cardiovascular research, the computational requirements and advances discussed here are also relevant to other domains. We will start by introducing some of the contributions of computational biology to translational research, followed by an overview of methods and technologies used for the identification of biomarkers with predictive or classification value. The main types of 'omic' approaches to biomarker discovery will be presented with specific examples from cardiovascular research. This will include a review of computational methodologies for single-source and integrative data applications. Major computational methods for model evaluation will be described together with recommendations for reporting models and results. We will present recent advances in cardiovascular biomarker discovery based on the combination of gene expression and functional network analyses. The review will conclude with a discussion of key challenges for computational biology, including perspectives from the biosciences and clinical areas.
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
A General Framework for Discovery and Classification in Astronomy
NASA Astrophysics Data System (ADS)
Dick, Steven J.
2012-09-01
An analysis of the discovery of 82 classes of astronomical objects reveals an extended structure of discovery, consisting of detection, interpretation and understanding, each with its own nuances and a microstructure including conceptual, technological and social roles. This is true with a remarkable degree of consistency over the last 400 years of telescopic astronomy, ranging from Galileo's discovery of satellites, planetary rings and star clusters, to the discovery of quasars and pulsars. Telescopes have served as ``engines of discovery'' in several ways, ranging from telescope size and sensitivity (planetary nebulae and spiral nebulae), to specialized detectors (TNOs) and the opening of the electromagnetic spectrum for astronomy (pulsars, pulsar planets, and most active galaxies). A few classes (radiation belts, the solar wind and cosmic rays) were initially discovered without the telescope. Classification also plays an important role in discovery. While it might seem that classification marks the end of discovery, or a post-discovery phase, in fact it often marks the beginning, even a pre-discovery phase. Nowhere is this more clearly seen than in the classification of stellar spectra, long before dwarfs, giants and supergiants were known, or their evolutionary sequence recognized. Classification may also be part of a post-discovery phase, as in the MK system of stellar classification, constructed after the discovery of stellar luminosity classes. Some classes are declared rather than detected, as in the case of gas and ice giant planets, and, infamously, Pluto as a dwarf planet. Others are inferred rather than detected, including most classes of stars.
Discovery and Classification in Astronomy
NASA Astrophysics Data System (ADS)
Dick, Steven J.
2012-01-01
Three decades after Martin Harwit's pioneering Cosmic Discovery (1981), and following on the recent IAU Symposium "Accelerating the Rate of Astronomical Discovery,” we have revisited the problem of discovery in astronomy, emphasizing new classes of objects. 82 such classes have been identified and analyzed, including 22 in the realm of the planets, 36 in the realm of the stars, and 24 in the realm of the galaxies. We find an extended structure of discovery, consisting of detection, interpretation and understanding, each with its own nuances and a microstructure including conceptual, technological and social roles. This is true with a remarkable degree of consistency over the last 400 years of telescopic astronomy, ranging from Galileo's discovery of satellites, planetary rings and star clusters, to the discovery of quasars and pulsars. Telescopes have served as "engines of discovery” in several ways, ranging from telescope size and sensitivity (planetary nebulae and spiral galaxies), to specialized detectors (TNOs) and the opening of the electromagnetic spectrum for astronomy (pulsars, pulsar planets, and most active galaxies). A few classes (radiation belts, the solar wind and cosmic rays), were initially discovered without the telescope. Classification also plays an important role in discovery. While it might seem that classification marks the end of discovery, or a post-discovery phase, in fact it often marks the beginning, even a pre-discovery phase. Nowhere is this more clearly seen than in the classification of stellar spectra, long before dwarfs, giants and supergiants were known, or their evolutionary sequence recognized. Classification may also be part of a post-discovery phase, as in the MK system of stellar classification, constructed after the discovery of stellar luminosity classes. Some classes are declared rather than discovered, as in the case of gas and ice giant planets, and, infamously, Pluto as a dwarf planet.
From Classification to Epilepsy Ontology and Informatics
Zhang, Guo-Qiang; Sahoo, Satya S; Lhatoo, Samden D
2012-01-01
Summary The 2010 International League Against Epilepsy (ILAE) classification and terminology commission report proposed a much needed departure from previous classifications to incorporate advances in molecular biology, neuroimaging, and genetics. It proposed an interim classification and defined two key requirements that need to be satisfied. The first is the ability to classify epilepsy in dimensions according to a variety of purposes including clinical research, patient care, and drug discovery. The second is the ability of the classification system to evolve with new discoveries. Multi-dimensionality and flexibility are crucial to the success of any future classification. In addition, a successful classification system must play a central role in the rapidly growing field of epilepsy informatics. An epilepsy ontology, based on classification, will allow information systems to facilitate data-intensive studies and provide a proven route to meeting the two foregoing key requirements. Epilepsy ontology will be a structured terminology system that accommodates proposed and evolving ILAE classifications, the NIH/NINDS Common Data Elements, the ICD systems and explicitly specifies all known relationships between epilepsy concepts in a proper framework. This will aid evidence based epilepsy diagnosis, investigation, treatment and research for a diverse community of clinicians and researchers. Benefits range from systematization of electronic patient records to multi-modal data repositories for research and training manuals for those involved in epilepsy care. Given the complexity, heterogeneity and pace of research advances in the epilepsy domain, such an ontology must be collaboratively developed by key stakeholders in the epilepsy community and experts in knowledge engineering and computer science. PMID:22765502
Optimizing high performance computing workflow for protein functional annotation.
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-09-10
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-01-01
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
ERIC Educational Resources Information Center
Yu, Pulan
2012-01-01
Classification, clustering and association mining are major tasks of data mining and have been widely used for knowledge discovery. Associative classification mining, the combination of both association rule mining and classification, has emerged as an indispensable way to support decision making and scientific research. In particular, it offers a…
Video mining using combinations of unsupervised and supervised learning techniques
NASA Astrophysics Data System (ADS)
Divakaran, Ajay; Miyahara, Koji; Peker, Kadir A.; Radhakrishnan, Regunathan; Xiong, Ziyou
2003-12-01
We discuss the meaning and significance of the video mining problem, and present our work on some aspects of video mining. A simple definition of video mining is unsupervised discovery of patterns in audio-visual content. Such purely unsupervised discovery is readily applicable to video surveillance as well as to consumer video browsing applications. We interpret video mining as content-adaptive or "blind" content processing, in which the first stage is content characterization and the second stage is event discovery based on the characterization obtained in stage 1. We discuss the target applications and find that using a purely unsupervised approach are too computationally complex to be implemented on our product platform. We then describe various combinations of unsupervised and supervised learning techniques that help discover patterns that are useful to the end-user of the application. We target consumer video browsing applications such as commercial message detection, sports highlights extraction etc. We employ both audio and video features. We find that supervised audio classification combined with unsupervised unusual event discovery enables accurate supervised detection of desired events. Our techniques are computationally simple and robust to common variations in production styles etc.
The functional therapeutic chemical classification system.
Croset, Samuel; Overington, John P; Rebholz-Schuhmann, Dietrich
2014-03-15
Drug repositioning is the discovery of new indications for compounds that have already been approved and used in a clinical setting. Recently, some computational approaches have been suggested to unveil new opportunities in a systematic fashion, by taking into consideration gene expression signatures or chemical features for instance. We present here a novel method based on knowledge integration using semantic technologies, to capture the functional role of approved chemical compounds. In order to computationally generate repositioning hypotheses, we used the Web Ontology Language to formally define the semantics of over 20 000 terms with axioms to correctly denote various modes of action (MoA). Based on an integration of public data, we have automatically assigned over a thousand of approved drugs into these MoA categories. The resulting new resource is called the Functional Therapeutic Chemical Classification System and was further evaluated against the content of the traditional Anatomical Therapeutic Chemical Classification System. We illustrate how the new classification can be used to generate drug repurposing hypotheses, using Alzheimers disease as a use-case. https://www.ebi.ac.uk/chembl/ftc; https://github.com/loopasam/ftc. croset@ebi.ac.uk Supplementary data are available at Bioinformatics online.
Ihmaid, Saleh K; Ahmed, Hany E A; Zayed, Mohamed F; Abadleh, Mohammed M
2016-01-30
The main step in a successful drug discovery pipeline is the identification of small potent compounds that selectively bind to the target of interest with high affinity. However, there is still a shortage of efficient and accurate computational methods with powerful capability to study and hence predict compound selectivity properties. In this work, we propose an affordable machine learning method to perform compound selectivity classification and prediction. For this purpose, we have collected compounds with reported activity and built a selectivity database formed of 153 cathepsin K and S inhibitors that are considered of medicinal interest. This database has three compound sets, two K/S and S/K selective ones and one non-selective KS one. We have subjected this database to the selectivity classification tool 'Emergent Self-Organizing Maps' for exploring its capability to differentiate selective cathepsin inhibitors for one target over the other. The method exhibited good clustering performance for selective ligands with high accuracy (up to 100 %). Among the possibilites, BAPs and MACCS molecular structural fingerprints were used for such a classification. The results exhibited the ability of the method for structure-selectivity relationship interpretation and selectivity markers were identified for the design of further novel inhibitors with high activity and target selectivity.
Mincholé, Ana; Martínez, Juan Pablo; Laguna, Pablo; Rodriguez, Blanca
2018-01-01
Widely developed for clinical screening, electrocardiogram (ECG) recordings capture the cardiac electrical activity from the body surface. ECG analysis can therefore be a crucial first step to help diagnose, understand and predict cardiovascular disorders responsible for 30% of deaths worldwide. Computational techniques, and more specifically machine learning techniques and computational modelling are powerful tools for classification, clustering and simulation, and they have recently been applied to address the analysis of medical data, especially ECG data. This review describes the computational methods in use for ECG analysis, with a focus on machine learning and 3D computer simulations, as well as their accuracy, clinical implications and contributions to medical advances. The first section focuses on heartbeat classification and the techniques developed to extract and classify abnormal from regular beats. The second section focuses on patient diagnosis from whole recordings, applied to different diseases. The third section presents real-time diagnosis and applications to wearable devices. The fourth section highlights the recent field of personalized ECG computer simulations and their interpretation. Finally, the discussion section outlines the challenges of ECG analysis and provides a critical assessment of the methods presented. The computational methods reported in this review are a strong asset for medical discoveries and their translation to the clinical world may lead to promising advances. PMID:29321268
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, P E
Tips and case histories on computer use for idea and outline processing: Productivity software to solve problems of idea hierarchy, transitions, and developments is matched to solutions for communicators. One case is text that ranges from methods and procedures to histories and legal definitions of classification for the US Department of Energy. Applications of value to writers, editors, and managers are for research; calendars; creativity; prioritization; idea discovery and manipulation; file and time management; and contents, indexes, and glossaries. 6 refs., 7 figs.
Artese, Anna; Alcaro, Stefano; Moraca, Federica; Reina, Rocco; Ventura, Marzia; Costantino, Gabriele; Beccari, Andrea R; Ortuso, Francesco
2013-05-01
During the first edition of the Computationally Driven Drug Discovery meeting, held in November 2011 at Dompé Pharma (L'Aquila, Italy), a questionnaire regarding the diffusion and the use of computational tools for drug-design purposes in both academia and industry was distributed among all participants. This is a follow-up of a previously reported investigation carried out among a few companies in 2007. The new questionnaire implemented five sections dedicated to: research group identification and classification; 18 different computational techniques; software information; hardware data; and economical business considerations. In this article, together with a detailed history of the different computational methods, a statistical analysis of the survey results that enabled the identification of the prevalent computational techniques adopted in drug-design projects is reported and a profile of the computational medicinal chemist currently working in academia and pharmaceutical companies in Italy is highlighted.
NASA Astrophysics Data System (ADS)
Buongiorno Nardelli, Marco
2015-03-01
High-Throughput Quantum-Mechanics computation of materials properties by ab initio methods has become the foundation of an effective approach to materials design, discovery and characterization. This data driven approach to materials science currently presents the most promising path to the development of advanced technological materials that could solve or mitigate important social and economic challenges of the 21st century. In particular, the rapid proliferation of computational data on materials properties presents the possibility to complement and extend materials property databases where the experimental data is lacking and difficult to obtain. Enhanced repositories such as AFLOWLIB, open novel opportunities for structure discovery and optimization, including uncovering of unsuspected compounds, metastable structures and correlations between various properties. The practical realization of these opportunities depends on the the design effcient algorithms for electronic structure simulations of realistic material systems, the systematic compilation and classification of the generated data, and its presentation in easily accessed form to the materials science community, the primary mission of the AFLOW consortium. This work was supported by ONR-MURI under Contract N00014-13-1-0635 and the Duke University Center for Materials Genomics.
Verdes, Aida; Anand, Prachi; Gorson, Juliette; Jannetti, Stephen; Kelly, Patrick; Leffler, Abba; Simpson, Danny; Ramrattan, Girish; Holford, Mandë
2016-04-19
Animal venoms comprise a diversity of peptide toxins that manipulate molecular targets such as ion channels and receptors, making venom peptides attractive candidates for the development of therapeutics to benefit human health. However, identifying bioactive venom peptides remains a significant challenge. In this review we describe our particular venomics strategy for the discovery, characterization, and optimization of Terebridae venom peptides, teretoxins. Our strategy reflects the scientific path from mollusks to medicine in an integrative sequential approach with the following steps: (1) delimitation of venomous Terebridae lineages through taxonomic and phylogenetic analyses; (2) identification and classification of putative teretoxins through omics methodologies, including genomics, transcriptomics, and proteomics; (3) chemical and recombinant synthesis of promising peptide toxins; (4) structural characterization through experimental and computational methods; (5) determination of teretoxin bioactivity and molecular function through biological assays and computational modeling; (6) optimization of peptide toxin affinity and selectivity to molecular target; and (7) development of strategies for effective delivery of venom peptide therapeutics. While our research focuses on terebrids, the venomics approach outlined here can be applied to the discovery and characterization of peptide toxins from any venomous taxa.
iPcc: a novel feature extraction method for accurate disease class discovery and prediction
Ren, Xianwen; Wang, Yong; Zhang, Xiang-Sun; Jin, Qi
2013-01-01
Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles. PMID:23761440
Analyzing Student Inquiry Data Using Process Discovery and Sequence Classification
ERIC Educational Resources Information Center
Emond, Bruno; Buffett, Scott
2015-01-01
This paper reports on results of applying process discovery mining and sequence classification mining techniques to a data set of semi-structured learning activities. The main research objective is to advance educational data mining to model and support self-regulated learning in heterogeneous environments of learning content, activities, and…
Knowledge discovery with classification rules in a cardiovascular dataset.
Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan
2005-12-01
In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.
Armutlu, Pelin; Ozdemir, Muhittin E; Uney-Yuksektepe, Fadime; Kavakli, I Halil; Turkay, Metin
2008-10-03
A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC50 values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naïve Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.
Computational approaches to predict bacteriophage–host relationships
Edwards, Robert A.; McNair, Katelyn; Faust, Karoline; Raes, Jeroen; Dutilh, Bas E.
2015-01-01
Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus–host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage–host signals. Sequence homology approaches are the most effective at identifying known phage–host pairs. Compositional and abundance-based methods contain significant signal for phage–host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage–host relationships, with potential relevance for medical and industrial applications. PMID:26657537
Ahern, Thomas P.; Beck, Andrew H.; Rosner, Bernard A.; Glass, Ben; Frieling, Gretchen; Collins, Laura C.; Tamimi, Rulla M.
2017-01-01
Background Computational pathology platforms incorporate digital microscopy with sophisticated image analysis to permit rapid, continuous measurement of protein expression. We compared two computational pathology platforms on their measurement of breast tumor estrogen receptor (ER) and progesterone receptor (PR) expression. Methods Breast tumor microarrays from the Nurses’ Health Study were stained for ER (n=592) and PR (n=187). One expert pathologist scored cases as positive if ≥1% of tumor nuclei exhibited stain. ER and PR were then measured with the Definiens Tissue Studio (automated) and Aperio Digital Pathology (user-supervised) platforms. Platform-specific measurements were compared using boxplots, scatter plots, and correlation statistics. Classification of ER and PR positivity by platform-specific measurements was evaluated with areas under receiver operating characteristic curves (AUC) from univariable logistic regression models, using expert pathologist classification as the standard. Results Both platforms showed considerable overlap in continuous measurements of ER and PR between positive and negative groups classified by expert pathologist. Platform-specific measurements were strongly and positively correlated with one another (rho≥0.77). The user-supervised Aperio workflow performed slightly better than the automated Definiens workflow at classifying ER positivity (AUCAperio=0.97; AUCDefiniens=0.90; difference=0.07, 95% CI: 0.05, 0.09) and PR positivity (AUCAperio=0.94; AUCDefiniens=0.87; difference=0.07, 95% CI: 0.03, 0.12). Conclusion Paired hormone receptor expression measurements from two different computational pathology platforms agreed well with one another. The user-supervised workflow yielded better classification accuracy than the automated workflow. Appropriately validated computational pathology algorithms enrich molecular epidemiology studies with continuous protein expression data and may accelerate tumor biomarker discovery. PMID:27729430
Computational Analysis of Behavior.
Egnor, S E Roian; Branson, Kristin
2016-07-08
In this review, we discuss the emerging field of computational behavioral analysis-the use of modern methods from computer science and engineering to quantitatively measure animal behavior. We discuss aspects of experiment design important to both obtaining biologically relevant behavioral data and enabling the use of machine vision and learning techniques for automation. These two goals are often in conflict. Restraining or restricting the environment of the animal can simplify automatic behavior quantification, but it can also degrade the quality or alter important aspects of behavior. To enable biologists to design experiments to obtain better behavioral measurements, and computer scientists to pinpoint fruitful directions for algorithm improvement, we review known effects of artificial manipulation of the animal on behavior. We also review machine vision and learning techniques for tracking, feature extraction, automated behavior classification, and automated behavior discovery, the assumptions they make, and the types of data they work best with.
Integrated Low-Rank-Based Discriminative Feature Learning for Recognition.
Zhou, Pan; Lin, Zhouchen; Zhang, Chao
2016-05-01
Feature learning plays a central role in pattern recognition. In recent years, many representation-based feature learning methods have been proposed and have achieved great success in many applications. However, these methods perform feature learning and subsequent classification in two separate steps, which may not be optimal for recognition tasks. In this paper, we present a supervised low-rank-based approach for learning discriminative features. By integrating latent low-rank representation (LatLRR) with a ridge regression-based classifier, our approach combines feature learning with classification, so that the regulated classification error is minimized. In this way, the extracted features are more discriminative for the recognition tasks. Our approach benefits from a recent discovery on the closed-form solutions to noiseless LatLRR. When there is noise, a robust Principal Component Analysis (PCA)-based denoising step can be added as preprocessing. When the scale of a problem is large, we utilize a fast randomized algorithm to speed up the computation of robust PCA. Extensive experimental results demonstrate the effectiveness and robustness of our method.
Discovery and Spectroscopic Classification of DLT18q/AT2018aoz as a young type Ia Supernova
NASA Astrophysics Data System (ADS)
Sand, D.; Valenti, S.; Wyatt, S.; Bostroem, K. A.; Reichart, D. E.; Haislip, J. B.; Kouprianov, V.
2018-04-01
We report the discovery and classification of DLT18q/AT 2018aoz. The supernova was found on 2018 April 02.1 (UT) at r 15.1 mag during the ongoing D < 40 Mpc (DLT40) supernova search, using data from the PROMPT5 0.41m telescope located at CTIO.
BayesMotif: de novo protein sorting motif discovery from impure datasets.
Hu, Jianjun; Zhang, Fan
2010-01-18
Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.
VIP: an integrated pipeline for metagenomics of virus identification and discovery
Li, Yang; Wang, Hao; Nie, Kai; Zhang, Chen; Zhang, Yi; Wang, Ji; Niu, Peihua; Ma, Xuejun
2016-01-01
Identification and discovery of viruses using next-generation sequencing technology is a fast-developing area with potential wide application in clinical diagnostics, public health monitoring and novel virus discovery. However, tremendous sequence data from NGS study has posed great challenge both in accuracy and velocity for application of NGS study. Here we describe VIP (“Virus Identification Pipeline”), a one-touch computational pipeline for virus identification and discovery from metagenomic NGS data. VIP performs the following steps to achieve its goal: (i) map and filter out background-related reads, (ii) extensive classification of reads on the basis of nucleotide and remote amino acid homology, (iii) multiple k-mer based de novo assembly and phylogenetic analysis to provide evolutionary insight. We validated the feasibility and veracity of this pipeline with sequencing results of various types of clinical samples and public datasets. VIP has also contributed to timely virus diagnosis (~10 min) in acutely ill patients, demonstrating its potential in the performance of unbiased NGS-based clinical studies with demand of short turnaround time. VIP is released under GPLv3 and is available for free download at: https://github.com/keylabivdc/VIP. PMID:27026381
NASA Astrophysics Data System (ADS)
Nguyen, T.; Pankratius, V.; Eckman, L.; Seager, S.
2018-04-01
Debris disks around stars other than the Sun have received significant attention in studies of exoplanets, specifically exoplanetary system formation. Since debris disks are major sources of infrared emissions, infrared survey data such as the Wide-Field Infrared Survey (WISE) catalog potentially harbors numerous debris disk candidates. However, it is currently challenging to perform disk candidate searches for over 747 million sources in the WISE catalog due to the high probability of false positives caused by interstellar matter, galaxies, and other background artifacts. Crowdsourcing techniques have thus started to harness citizen scientists for debris disk identification since humans can be easily trained to distinguish between desired artifacts and irrelevant noises. With a limited number of citizen scientists, however, increasing data volumes from large surveys will inevitably lead to analysis bottlenecks. To overcome this scalability problem and push the current limits of automated debris disk candidate identification, we present a novel approach that uses citizen science results as a seed to train machine learning based classification. In this paper, we detail a case study with a computer-aided discovery pipeline demonstrating such feasibility based on WISE catalog data and NASA's Disk Detective project. Our approach of debris disk candidates classification was shown to be robust under a wide range of image quality and features. Our hybrid approach of citizen science with algorithmic scalability can facilitate big data processing for future detections as envisioned in future missions such as the Transiting Exoplanet Survey Satellite (TESS) and the Wide-Field Infrared Survey Telescope (WFIRST).
Data Mining and Machine Learning in Time-Domain Discovery and Classification
NASA Astrophysics Data System (ADS)
Bloom, Joshua S.; Richards, Joseph W.
2012-03-01
The changing heavens have played a central role in the scientific effort of astronomers for centuries. Galileo's synoptic observations of the moons of Jupiter and the phases of Venus starting in 1610, provided strong refutation of Ptolemaic cosmology. These observations came soon after the discovery of Kepler's supernova had challenged the notion of an unchanging firmament. In more modern times, the discovery of a relationship between period and luminosity in some pulsational variable stars [41] led to the inference of the size of the Milky way, the distance scale to the nearest galaxies, and the expansion of the Universe (see Ref. [30] for review). Distant explosions of supernovae were used to uncover the existence of dark energy and provide a precise numerical account of dark matter (e.g., [3]). Repeat observations of pulsars [71] and nearby main-sequence stars revealed the presence of the first extrasolar planets [17,35,44,45]. Indeed, time-domain observations of transient events and variable stars, as a technique, influences a broad diversity of pursuits in the entire astronomy endeavor [68]. While, at a fundamental level, the nature of the scientific pursuit remains unchanged, the advent of astronomy as a data-driven discipline presents fundamental challenges to the way in which the scientific process must now be conducted. Digital images (and data cubes) are not only getting larger, there are more of them. On logistical grounds, this taxes storage and transport systems. But it also implies that the intimate connection that astronomers have always enjoyed with their data - from collection to processing to analysis to inference - necessarily must evolve. Figure 6.1 highlights some of the ways that the pathway to scientific inference is now influenced (if not driven by) modern automation processes, computing, data-mining, and machine-learning (ML). The emerging reliance on computation and ML is a general one - a central theme of this book - but the time-domain aspect of the data and the objects of interest presents some unique challenges. First, any collection, storage, transport, and computational framework for processing the streaming data must be able to keep up with the dataflow. This is not necessarily true, for instance, with static sky science, where metrics of interest can be computed off-line and on a timescale much longer than the time required to obtain the data. Second, many types of transient (one-off) events evolve quickly in time and require more observations to fully understand the nature of the events. This demands that time-changing events are quickly discovered, classified, and broadcast to other follow-up facilities. All of this must happen robustly with, in some cases, very limited data. Last, the process of discovery and classification must be calibrated to the available resources for computation and follow-up. That is, the precision of classification must be weighed against the computational cost of producing that level of precision. Likewise, the cost of being wrong about the classification of some sorts of sources must be balanced against the scientific gains about being right about the classification of other types of sources. Quantifying these trade-offs, especially in the presence of a limited amount of follow-up resources (such as the availability of larger telescope observations) is not straightforward and inheres domain-specific imperatives that will, in general, differ from astronomer to astronomer. This chapter presents an overview of the current directions in ML and data-mining techniques in the context of time-domain astronomy. Ultimately the goal - if not just the necessity given the data rates and the diversity of questions to be answered - is to abstract the traditional role of astronomer in the entire scientific process. In some sense, this takes us full circle from the pre modern view of the scientific pursuit presented in Vermeer's "The Astronomer" (Figure 6.2): in broad daylight, he contemplates the nighttime heavens from depictions presented to him on globe, based on observations that others have made. He is an abstract thinker, far removed from data collection and processing; his most visceral connection to the skies is just the feel of the orb under his fingers. Substitute the globe for a plot on a screen generated from a structured query language (SQL) query to a massive public database in the cloud, and we have a picture of the modern astronomer benefitting from the ML and data-mining tools operating on an almost unfathomable amount of raw data.
New technologies accelerate the exploration of non-coding RNAs in horticultural plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Degao; Mewalal, Ritesh; Hu, Rongbin
Non-coding RNAs (ncRNAs), that is, RNAs not translated into proteins, are crucial regulators of a variety of biological processes in plants. While protein-encoding genes have been relatively well-annotated in sequenced genomes, accounting for a small portion of the genome space in plants, the universe of plant ncRNAs is rapidly expanding. Recent advances in experimental and computational technologies have generated a great momentum for discovery and functional characterization of ncRNAs. Here we summarize the classification and known biological functions of plant ncRNAs, review the application of next-generation sequencing (NGS) technology and ribosome profiling technology to ncRNA discovery in horticultural plants andmore » discuss the application of new technologies, especially the new genome-editing tool clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems, to functional characterization of plant ncRNAs.« less
New technologies accelerate the exploration of non-coding RNAs in horticultural plants
Liu, Degao; Mewalal, Ritesh; Hu, Rongbin; Tuskan, Gerald A; Yang, Xiaohan
2017-01-01
Non-coding RNAs (ncRNAs), that is, RNAs not translated into proteins, are crucial regulators of a variety of biological processes in plants. While protein-encoding genes have been relatively well-annotated in sequenced genomes, accounting for a small portion of the genome space in plants, the universe of plant ncRNAs is rapidly expanding. Recent advances in experimental and computational technologies have generated a great momentum for discovery and functional characterization of ncRNAs. Here we summarize the classification and known biological functions of plant ncRNAs, review the application of next-generation sequencing (NGS) technology and ribosome profiling technology to ncRNA discovery in horticultural plants and discuss the application of new technologies, especially the new genome-editing tool clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems, to functional characterization of plant ncRNAs. PMID:28698797
NASA Astrophysics Data System (ADS)
Welther, Barbara L.
2010-01-01
In 1915, the year in which Cannon (1863-1941) completed her work of classifying stars for The Henry Draper Catalogue, she published a popular article entitled, "Pioneering in the Classification of Stellar Spectra.” In it she gave a historical overview of the field in nineteenth-century Europe. She also detailed the context for the structured and routine work she and her colleagues had been engaged in for several years in America. The motivators that kept Cannon and the other women working diligently were the exciting prospect of making new discoveries, the reward of publicity, and their own personal pride. Usually, the discoveries consisted of finding a peculiar type of spectrum and identifying the star as a nova or variable. Such a discovery often resulted in a newspaper headline about the star and a story about the discoverer. This paper will outline the contributions each woman made to the classification system, her style of working, the papers she wrote and published, and the rewards she reaped for her dedication to the field.
A computational neural approach to support the discovery of gene function and classes of cancer.
Azuaje, F
2001-03-01
Advances in molecular classification of tumours may play a central role in cancer treatment. Here, a novel approach to genome expression pattern interpretation is described and applied to the recognition of B-cell malignancies as a test set. Using cDNA microarrays data generated by a previous study, a neural network model known as simplified fuzzy ARTMAP is able to identify normal and diffuse large B-cell lymphoma (DLBCL) patients. Furthermore, it discovers the distinction between patients with molecularly distinct forms of DLBCL without previous knowledge of those subtypes.
Knowledge Discovery from Databases: An Introductory Review.
ERIC Educational Resources Information Center
Vickery, Brian
1997-01-01
Introduces new procedures being used to extract knowledge from databases and discusses rationales for developing knowledge discovery methods. Methods are described for such techniques as classification, clustering, and the detection of deviations from pre-established norms. Examines potential uses of knowledge discovery in the information field.…
Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin
2013-03-01
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Promoter Sequences Prediction Using Relational Association Rule Mining
Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely
2012-01-01
In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233
Collected Notes on the Workshop for Pattern Discovery in Large Databases
NASA Technical Reports Server (NTRS)
Buntine, Wray (Editor); Delalto, Martha (Editor)
1991-01-01
These collected notes are a record of material presented at the Workshop. The core data analysis is addressed that have traditionally required statistical or pattern recognition techniques. Some of the core tasks include classification, discrimination, clustering, supervised and unsupervised learning, discovery and diagnosis, i.e., general pattern discovery.
Ou-Yang, Si-sheng; Lu, Jun-yan; Kong, Xiang-qian; Liang, Zhong-jie; Luo, Cheng; Jiang, Hualiang
2012-01-01
Computational drug discovery is an effective strategy for accelerating and economizing drug discovery and development process. Because of the dramatic increase in the availability of biological macromolecule and small molecule information, the applicability of computational drug discovery has been extended and broadly applied to nearly every stage in the drug discovery and development workflow, including target identification and validation, lead discovery and optimization and preclinical tests. Over the past decades, computational drug discovery methods such as molecular docking, pharmacophore modeling and mapping, de novo design, molecular similarity calculation and sequence-based virtual screening have been greatly improved. In this review, we present an overview of these important computational methods, platforms and successful applications in this field. PMID:22922346
Knowledge Discovery in Literature Data Bases
NASA Astrophysics Data System (ADS)
Albrecht, Rudolf; Merkl, Dieter
The concept of knowledge discovery as defined through ``establishing previously unknown and unsuspected relations of features in a data base'' is, cum grano salis, relatively easy to implement for data bases containing numerical data. Increasingly we find at our disposal data bases containing scientific literature. Computer assisted detection of unknown relations of features in such data bases would be extremely valuable and would lead to new scientific insights. However, the current representation of scientific knowledge in such data bases is not conducive to computer processing. Any correlation of features still has to be done by the human reader, a process which is plagued by ineffectiveness and incompleteness. On the other hand we note that considerable progress is being made in an area where reading all available material is totally prohibitive: the World Wide Web. Robots and Web crawlers mine the Web continuously and construct data bases which allow the identification of pages of interest in near real time. An obvious step is to categorize and classify the documents in the text data base. This can be used to identify papers worth reading, or which are of unexpected cross-relevance. We show the results of first experiments using unsupervised classification based on neural networks.
Ahern, Thomas P; Beck, Andrew H; Rosner, Bernard A; Glass, Ben; Frieling, Gretchen; Collins, Laura C; Tamimi, Rulla M
2017-05-01
Computational pathology platforms incorporate digital microscopy with sophisticated image analysis to permit rapid, continuous measurement of protein expression. We compared two computational pathology platforms on their measurement of breast tumour oestrogen receptor (ER) and progesterone receptor (PR) expression. Breast tumour microarrays from the Nurses' Health Study were stained for ER (n=592) and PR (n=187). One expert pathologist scored cases as positive if ≥1% of tumour nuclei exhibited stain. ER and PR were then measured with the Definiens Tissue Studio (automated) and Aperio Digital Pathology (user-supervised) platforms. Platform-specific measurements were compared using boxplots, scatter plots and correlation statistics. Classification of ER and PR positivity by platform-specific measurements was evaluated with areas under receiver operating characteristic curves (AUC) from univariable logistic regression models, using expert pathologist classification as the standard. Both platforms showed considerable overlap in continuous measurements of ER and PR between positive and negative groups classified by expert pathologist. Platform-specific measurements were strongly and positively correlated with one another (r≥0.77). The user-supervised Aperio workflow performed slightly better than the automated Definiens workflow at classifying ER positivity (AUC Aperio =0.97; AUC Definiens =0.90; difference=0.07, 95% CI 0.05 to 0.09) and PR positivity (AUC Aperio =0.94; AUC Definiens =0.87; difference=0.07, 95% CI 0.03 to 0.12). Paired hormone receptor expression measurements from two different computational pathology platforms agreed well with one another. The user-supervised workflow yielded better classification accuracy than the automated workflow. Appropriately validated computational pathology algorithms enrich molecular epidemiology studies with continuous protein expression data and may accelerate tumour biomarker discovery. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.
A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less
Computer-aided drug discovery.
Bajorath, Jürgen
2015-01-01
Computational approaches are an integral part of interdisciplinary drug discovery research. Understanding the science behind computational tools, their opportunities, and limitations is essential to make a true impact on drug discovery at different levels. If applied in a scientifically meaningful way, computational methods improve the ability to identify and evaluate potential drug molecules, but there remain weaknesses in the methods that preclude naïve applications. Herein, current trends in computer-aided drug discovery are reviewed, and selected computational areas are discussed. Approaches are highlighted that aid in the identification and optimization of new drug candidates. Emphasis is put on the presentation and discussion of computational concepts and methods, rather than case studies or application examples. As such, this contribution aims to provide an overview of the current methodological spectrum of computational drug discovery for a broad audience.
Building Cognition: The Construction of Computational Representations for Scientific Discovery
ERIC Educational Resources Information Center
Chandrasekharan, Sanjay; Nersessian, Nancy J.
2015-01-01
Novel computational representations, such as simulation models of complex systems and video games for scientific discovery (Foldit, EteRNA etc.), are dramatically changing the way discoveries emerge in science and engineering. The cognitive roles played by such computational representations in discovery are not well understood. We present a…
James, Conrad D.; Aimone, James B.; Miner, Nadine E.; ...
2017-01-04
In this study, biological neural networks continue to inspire new developments in algorithms and microelectronic hardware to solve challenging data processing and classification problems. Here in this research, we survey the history of neural-inspired and neuromorphic computing in order to examine the complex and intertwined trajectories of the mathematical theory and hardware developed in this field. Early research focused on adapting existing hardware to emulate the pattern recognition capabilities of living organisms. Contributions from psychologists, mathematicians, engineers, neuroscientists, and other professions were crucial to maturing the field from narrowly-tailored demonstrations to more generalizable systems capable of addressing difficult problem classesmore » such as object detection and speech recognition. Algorithms that leverage fundamental principles found in neuroscience such as hierarchical structure, temporal integration, and robustness to error have been developed, and some of these approaches are achieving world-leading performance on particular data classification tasks. Additionally, novel microelectronic hardware is being developed to perform logic and to serve as memory in neuromorphic computing systems with optimized system integration and improved energy efficiency. Key to such advancements was the incorporation of new discoveries in neuroscience research, the transition away from strict structural replication and towards the functional replication of neural systems, and the use of mathematical theory frameworks to guide algorithm and hardware developments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
James, Conrad D.; Aimone, James B.; Miner, Nadine E.
In this study, biological neural networks continue to inspire new developments in algorithms and microelectronic hardware to solve challenging data processing and classification problems. Here in this research, we survey the history of neural-inspired and neuromorphic computing in order to examine the complex and intertwined trajectories of the mathematical theory and hardware developed in this field. Early research focused on adapting existing hardware to emulate the pattern recognition capabilities of living organisms. Contributions from psychologists, mathematicians, engineers, neuroscientists, and other professions were crucial to maturing the field from narrowly-tailored demonstrations to more generalizable systems capable of addressing difficult problem classesmore » such as object detection and speech recognition. Algorithms that leverage fundamental principles found in neuroscience such as hierarchical structure, temporal integration, and robustness to error have been developed, and some of these approaches are achieving world-leading performance on particular data classification tasks. Additionally, novel microelectronic hardware is being developed to perform logic and to serve as memory in neuromorphic computing systems with optimized system integration and improved energy efficiency. Key to such advancements was the incorporation of new discoveries in neuroscience research, the transition away from strict structural replication and towards the functional replication of neural systems, and the use of mathematical theory frameworks to guide algorithm and hardware developments.« less
Novel Hypoxia-Directed Cancer Therapeutics
2017-07-01
as anti-cancer therapies. 15. SUBJECT TERMS Hypoxia-inducible factors, mass-spectrometry, drug discovery, kidney cancer 16. SECURITY CLASSIFICATION...programs required for driving solid tumor growth in cancers of kidney , pancreas, stomach, colon and skin. We seek the discovery of drug-like...drug discovery, kidney cancer. 5 What opportunities for training and professional development has the project provided? How were the
Docking-based classification models for exploratory toxicology ...
Background: Exploratory toxicology is a new emerging research area whose ultimate mission is that of protecting human health and environment from risks posed by chemicals. In this regard, the ethical and practical limitation of animal testing has encouraged the promotion of computational methods for the fast screening of huge collections of chemicals available on the market. Results: We derived 24 reliable docking-based classification models able to predict the estrogenic potential of a large collection of chemicals having high quality experimental data, kindly provided by the U.S. Environmental Protection Agency (EPA). The predictive power of our docking-based models was supported by values of AUC, EF1% (EFmax = 7.1), -LR (at SE = 0.75) and +LR (at SE = 0.25) ranging from 0.63 to 0.72, from 2.5 to 6.2, from 0.35 to 0.67 and from 2.05 to 9.84, respectively. In addition, external predictions were successfully made on some representative known estrogenic chemicals. Conclusion: We show how structure-based methods, widely applied to drug discovery programs, can be adapted to meet the conditions of the regulatory context. Importantly, these methods enable one to employ the physicochemical information contained in the X-ray solved biological target and to screen structurally-unrelated chemicals. Shows how structure-based methods, widely applied to drug discovery programs, can be adapted to meet the conditions of the regulatory context. Evaluation of 24 reliable dockin
2012-01-01
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614
Multilevel image recognition using discriminative patches and kernel covariance descriptor
NASA Astrophysics Data System (ADS)
Lu, Le; Yao, Jianhua; Turkbey, Evrim; Summers, Ronald M.
2014-03-01
Computer-aided diagnosis of medical images has emerged as an important tool to objectively improve the performance, accuracy and consistency for clinical workflow. To computerize the medical image diagnostic recognition problem, there are three fundamental problems: where to look (i.e., where is the region of interest from the whole image/volume), image feature description/encoding, and similarity metrics for classification or matching. In this paper, we exploit the motivation, implementation and performance evaluation of task-driven iterative, discriminative image patch mining; covariance matrix based descriptor via intensity, gradient and spatial layout; and log-Euclidean distance kernel for support vector machine, to address these three aspects respectively. To cope with often visually ambiguous image patterns for the region of interest in medical diagnosis, discovery of multilabel selective discriminative patches is desired. Covariance of several image statistics summarizes their second order interactions within an image patch and is proved as an effective image descriptor, with low dimensionality compared with joint statistics and fast computation regardless of the patch size. We extensively evaluate two extended Gaussian kernels using affine-invariant Riemannian metric or log-Euclidean metric with support vector machines (SVM), on two medical image classification problems of degenerative disc disease (DDD) detection on cortical shell unwrapped CT maps and colitis detection on CT key images. The proposed approach is validated with promising quantitative results on these challenging tasks. Our experimental findings and discussion also unveil some interesting insights on the covariance feature composition with or without spatial layout for classification and retrieval, and different kernel constructions for SVM. This will also shed some light on future work using covariance feature and kernel classification for medical image analysis.
Semi-automated surface mapping via unsupervised classification
NASA Astrophysics Data System (ADS)
D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.
2017-09-01
Due to the increasing volume of the returned data from space mission, the human search for correlation and identification of interesting features becomes more and more unfeasible. Statistical extraction of features via machine learning methods will increase the scientific output of remote sensing missions and aid the discovery of yet unknown feature hidden in dataset. Those methods exploit algorithm trained on features from multiple instrument, returning classification maps that explore intra-dataset correlation, allowing for the discovery of unknown features. We present two applications, one for Mercury and one for Vesta.
Intelligent Interoperable Agent Toolkit (I2AT)
2005-02-01
Agents, Agent Infrastructure, Intelligent Agents 16. PRICE CODE 17. SECURITY CLASSIFICATION OF REPORT UNCLASSIFIED 18. SECURITY ...CLASSIFICATION OF THIS PAGE UNCLASSIFIED 19. SECURITY CLASSIFICATION OF ABSTRACT UNCLASSIFIED 20. LIMITATION OF ABSTRACT UL NSN 7540-01...those that occur while the submarine is submerged. Using CoABS Grid/Jini service discovery events backed up with a small amount of internal bookkeeping
NASA Astrophysics Data System (ADS)
Liao, Chun-Chih; Xiao, Furen; Wong, Jau-Min; Chiang, I.-Jen
Computed tomography (CT) of the brain is preferred study on neurological emergencies. Physicians use CT to diagnose various types of intracranial hematomas, including epidural, subdural and intracerebral hematomas according to their locations and shapes. We propose a novel method that can automatically diagnose intracranial hematomas by combining machine vision and knowledge discovery techniques. The skull on the CT slice is located and the depth of each intracranial pixel is labeled. After normalization of the pixel intensities by their depth, the hyperdense area of intracranial hematoma is segmented with multi-resolution thresholding and region-growing. We then apply C4.5 algorithm to construct a decision tree using the features of the segmented hematoma and the diagnoses made by physicians. The algorithm was evaluated on 48 pathological images treated in a single institute. The two discovered rules closely resemble those used by human experts, and are able to make correct diagnoses in all cases.
Advances in computational metabolomics and databases deepen the understanding of metabolisms.
Tsugawa, Hiroshi
2018-01-29
Mass spectrometry (MS)-based metabolomics is the popular platform for metabolome analyses. Computational techniques for the processing of MS raw data, for example, feature detection, peak alignment, and the exclusion of false-positive peaks, have been established. The next stage of untargeted metabolomics would be to decipher the mass fragmentation of small molecules for the global identification of human-, animal-, plant-, and microbiota metabolomes, resulting in a deeper understanding of metabolisms. This review is an update on the latest computational metabolomics including known/expected structure databases, chemical ontology classifications, and mass spectrometry cheminformatics for the interpretation of mass fragmentations and for the elucidation of unknown metabolites. The importance of metabolome 'databases' and 'repositories' is also discussed because novel biological discoveries are often attributable to the accumulation of data, to relational databases, and to their statistics. Lastly, a practical guide for metabolite annotations is presented as the summary of this review. Copyright © 2018 Elsevier Ltd. All rights reserved.
Li, Haiquan; Dai, Xinbin; Zhao, Xuechun
2008-05-01
Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.
Höller, Yvonne; Bergmann, Jürgen; Thomschewski, Aljoscha; Kronbichler, Martin; Höller, Peter; Crone, Julia S.; Schmid, Elisabeth V.; Butz, Kevin; Nardone, Raffaele; Trinka, Eugen
2013-01-01
Current research aims at identifying voluntary brain activation in patients who are behaviorally diagnosed as being unconscious, but are able to perform commands by modulating their brain activity patterns. This involves machine learning techniques and feature extraction methods such as applied in brain computer interfaces. In this study, we try to answer the question if features/classification methods which show advantages in healthy participants are also accurate when applied to data of patients with disorders of consciousness. A sample of healthy participants (N = 22), patients in a minimally conscious state (MCS; N = 5), and with unresponsive wakefulness syndrome (UWS; N = 9) was examined with a motor imagery task which involved imagery of moving both hands and an instruction to hold both hands firm. We extracted a set of 20 features from the electroencephalogram and used linear discriminant analysis, k-nearest neighbor classification, and support vector machines (SVM) as classification methods. In healthy participants, the best classification accuracies were seen with coherences (mean = .79; range = .53−.94) and power spectra (mean = .69; range = .40−.85). The coherence patterns in healthy participants did not match the expectation of central modulated -rhythm. Instead, coherence involved mainly frontal regions. In healthy participants, the best classification tool was SVM. Five patients had at least one feature-classifier outcome with p0.05 (none of which were coherence or power spectra), though none remained significant after false-discovery rate correction for multiple comparisons. The present work suggests the use of coherences in patients with disorders of consciousness because they show high reliability among healthy subjects and patient groups. However, feature extraction and classification is a challenging task in unresponsive patients because there is no ground truth to validate the results. PMID:24282545
A Cladist is a systematist who seeks a natural classification: some comments on Quinn (2017).
Williams, David M; Ebach, Malte C
2018-01-01
In response to Quinn (Biol Philos, 2017. 10.1007/s10539-017-9577-z) we identify cladistics to be about natural classifications and their discovery and thereby propose to add an eighth cladistic definition to Quinn's list, namely the systematist who seeks to discover natural classifications, regardless of their affiliation, theoretical or methodological justifications.
Combined use of computational chemistry and chemoinformatics methods for chemical discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sugimoto, Manabu, E-mail: sugimoto@kumamoto-u.ac.jp; Institute for Molecular Science, 38 Nishigo-Naka, Myodaiji, Okazaki 444-8585; CREST, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012
2015-12-31
Data analysis on numerical data by the computational chemistry calculations is carried out to obtain knowledge information of molecules. A molecular database is developed to systematically store chemical, electronic-structure, and knowledge-based information. The database is used to find molecules related to a keyword of “cancer”. Then the electronic-structure calculations are performed to quantitatively evaluate quantum chemical similarity of the molecules. Among the 377 compounds registered in the database, 24 molecules are found to be “cancer”-related. This set of molecules includes both carcinogens and anticancer drugs. The quantum chemical similarity analysis, which is carried out by using numerical results of themore » density-functional theory calculations, shows that, when some energy spectra are referred to, carcinogens are reasonably distinguished from the anticancer drugs. Therefore these spectral properties are considered of as important measures for classification.« less
Computer-Aided Drug Discovery: Molecular Docking of Diminazene Ligands to DNA Minor Groove
ERIC Educational Resources Information Center
Kholod, Yana; Hoag, Erin; Muratore, Katlynn; Kosenkov, Dmytro
2018-01-01
The reported project-based laboratory unit introduces upper-division undergraduate students to the basics of computer-aided drug discovery as a part of a computational chemistry laboratory course. The students learn to perform model binding of organic molecules (ligands) to the DNA minor groove with computer-aided drug discovery (CADD) tools. The…
Impact of computational structure-based methods on drug discovery.
Reynolds, Charles H
2014-01-01
Structure-based drug design has become an indispensible tool in drug discovery. The emergence of structure-based design is due to gains in structural biology that have provided exponential growth in the number of protein crystal structures, new computational algorithms and approaches for modeling protein-ligand interactions, and the tremendous growth of raw computer power in the last 30 years. Computer modeling and simulation have made major contributions to the discovery of many groundbreaking drugs in recent years. Examples are presented that highlight the evolution of computational structure-based design methodology, and the impact of that methodology on drug discovery.
Supporting Solar Physics Research via Data Mining
NASA Astrophysics Data System (ADS)
Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.
2012-05-01
In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.
Computational chemistry at Janssen
NASA Astrophysics Data System (ADS)
van Vlijmen, Herman; Desjarlais, Renee L.; Mirzadegan, Tara
2017-03-01
Computer-aided drug discovery activities at Janssen are carried out by scientists in the Computational Chemistry group of the Discovery Sciences organization. This perspective gives an overview of the organizational and operational structure, the science, internal and external collaborations, and the impact of the group on Drug Discovery at Janssen.
Computational intelligence techniques in bioinformatics.
Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I
2013-12-01
Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. Copyright © 2013 Elsevier Ltd. All rights reserved.
Classification of lymphoid neoplasms: the microscope as a tool for disease discovery
Harris, Nancy Lee; Stein, Harald; Isaacson, Peter G.
2008-01-01
In the past 50 years, we have witnessed explosive growth in the understanding of normal and neoplastic lymphoid cells. B-cell, T-cell, and natural killer (NK)–cell neoplasms in many respects recapitulate normal stages of lymphoid cell differentiation and function, so that they can be to some extent classified according to the corresponding normal stage. Likewise, the molecular mechanisms involved the pathogenesis of lymphomas and lymphoid leukemias are often based on the physiology of the lymphoid cells, capitalizing on deregulated normal physiology by harnessing the promoters of genes essential for lymphocyte function. The clinical manifestations of lymphomas likewise reflect the normal function of lymphoid cells in vivo. The multiparameter approach to classification adopted by the World Health Organization (WHO) classification has been validated in international studies as being highly reproducible, and enhancing the interpretation of clinical and translational studies. In addition, accurate and precise classification of disease entities facilitates the discovery of the molecular basis of lymphoid neoplasms in the basic science laboratory. PMID:19029456
Discovery and Classification of ZTF Transients
NASA Astrophysics Data System (ADS)
Lunnan, Ragnhild; Taddia, Francesco; Sollerman, Jesper; Barbarino, Cristina; Goobar, Ariel; Fremling, Christoffer; Kasliwal, Mansi; Cannella, Chris; Blagorodnova, Nadejda; Neill, J. Don; Walters, Richard; Ho, Anna; Yan, Lin; Burdge, Kevin; Schulze, Steve; Yaron, Ofer; Gal-Yam, Avishay; Yang, Yi; Graham, Melissa; Golkhou, V. Zach; Bellm, Eric; Parley, Daniel; Zwicky Transient Facility Collaboration
2018-04-01
We report spectroscopic classifications of 17 transients discovered by the Zwicky Transient Facility (ZTF; ATel #11266) during science validation. Additional classifications of older transients have been reported to and are publicly available on the TNS. Spectra were obtained with DBSP at the Palomar 200-in Hale Telescope, with DIS at the ARC 3.5m telescope at Apache Point Observatory, and with SPRAT at Liverpool Telescope.
[Recent advances in metabonomics].
Xu, Guo-Wang; Lu, Xin; Yang, Sheng-Li
2007-12-01
Metabonomics (or metabolomics) aims at the comprehensive and quantitative analysis of the wide arrays of metabolites in biological samples. Metabonomics has been labeled as one of the new" -omics" joining genomics, transcriptomics, and proteomics as a science employed toward the understanding of global systems biology. It has been widely applied in many research areas including drug toxicology, biomarker discovery, functional genomics, and molecular pathology etc. The comprehensive analysis of the metabonome is particularly challenging due to the diverse chemical natures of metabolites. Metabonomics investigations require special approaches for sample preparation, data-rich analytical chemical measurements, and information mining. The outputs from a metabonomics study allow sample classification, biomarker discovery, and interpretation of the reasons for classification information. This review focuses on the currently new advances in various technical platforms of metabonomics and its applications in drug discovery and development, disease biomarker identification, plant and microbe related fields.
Computer-aided drug discovery research at a global contract research organization
NASA Astrophysics Data System (ADS)
Kitchen, Douglas B.
2017-03-01
Computer-aided drug discovery started at Albany Molecular Research, Inc in 1997. Over nearly 20 years the role of cheminformatics and computational chemistry has grown throughout the pharmaceutical industry and at AMRI. This paper will describe the infrastructure and roles of CADD throughout drug discovery and some of the lessons learned regarding the success of several methods. Various contributions provided by computational chemistry and cheminformatics in chemical library design, hit triage, hit-to-lead and lead optimization are discussed. Some frequently used computational chemistry techniques are described. The ways in which they may contribute to discovery projects are presented based on a few examples from recent publications.
Computer-aided drug discovery research at a global contract research organization.
Kitchen, Douglas B
2017-03-01
Computer-aided drug discovery started at Albany Molecular Research, Inc in 1997. Over nearly 20 years the role of cheminformatics and computational chemistry has grown throughout the pharmaceutical industry and at AMRI. This paper will describe the infrastructure and roles of CADD throughout drug discovery and some of the lessons learned regarding the success of several methods. Various contributions provided by computational chemistry and cheminformatics in chemical library design, hit triage, hit-to-lead and lead optimization are discussed. Some frequently used computational chemistry techniques are described. The ways in which they may contribute to discovery projects are presented based on a few examples from recent publications.
W.W. Morgan and the Discovery of the Spiral Arm Structure of our Galaxy
NASA Astrophysics Data System (ADS)
Sheehan, William
2008-03-01
William Wilson Morgan was one of the great astronomers of the twentieth century. He considered himself a morphologist, and was preoccupied throughout his career with matters of classification. Though, his early life was difficult, and his pursuit of astronomy as a career was opposed by his father, he took a position at Yerkes Observatory in 1926 and remained there for the rest of his working life. Thematically, his work was also a unified whole. Beginning with spectroscopic studies under Otto Struve at Yerkes Observatory, by the late 1930s he concentrated particularly on the young O and B stars. His work an stellar classification led to the Morgan-Keenan-Kellman [MKK] system of classification of stars, and later - as he grappled with the question of the intrinsic color and brightness of stars at great distances - to the Johnson-Morgan UBV system for measuring stellar colors. Eventually these concerns with classification and method led to his greatest single achievement - the recognition of the nearby spiral arms of our Galaxy by tracing the OB associations and HII regions that outline them. After years of intensive work on the problem of galactic structure, the discovery came in a blinding flash of Archimedean insight as he walked under the night sky between his office and his house in the autumn of 1951. His optical discovery of the spiral arms preceded the radio-mapping of the spiral arms by more than a year. Morgan suffered a nervous breakdown soon after he announced his discovery, however, and so was prevented from publishing a complete account of his work. As a result of that, and the announcement soon afterward of the first radio maps of the spiral arms, the uniqueness of his achievement was not fully appreciated at the time.
Computational methods in drug discovery
Leelananda, Sumudu P
2016-01-01
The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein–ligand docking, pharmacophore modeling and QSAR techniques are reviewed. PMID:28144341
Computational methods in drug discovery.
Leelananda, Sumudu P; Lindert, Steffen
2016-01-01
The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein-ligand docking, pharmacophore modeling and QSAR techniques are reviewed.
NASA Astrophysics Data System (ADS)
Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.; Bloom, Joshua S.; Butler, Nathaniel R.; Brink, Henrik; Crellin-Quick, Arien
2012-12-01
With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Richards, Joseph W.; Starr, Dan L.; Miller, Adam A.
2012-12-15
With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In additionmore » to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.« less
Lasko, Thomas A; Denny, Joshua C; Levy, Mia A
2013-01-01
Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don't think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data - Electronic Medical Records - typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies.
Lasko, Thomas A.; Denny, Joshua C.; Levy, Mia A.
2013-01-01
Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don’t think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data – Electronic Medical Records – typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies. PMID:23826094
Discovery and Development of ATP-Competitive mTOR Inhibitors Using Computational Approaches.
Luo, Yao; Wang, Ling
2017-11-16
The mammalian target of rapamycin (mTOR) is a central controller of cell growth, proliferation, metabolism, and angiogenesis. This protein is an attractive target for new anticancer drug development. Significant progress has been made in hit discovery, lead optimization, drug candidate development and determination of the three-dimensional (3D) structure of mTOR. Computational methods have been applied to accelerate the discovery and development of mTOR inhibitors helping to model the structure of mTOR, screen compound databases, uncover structure-activity relationship (SAR) and optimize the hits, mine the privileged fragments and design focused libraries. Besides, computational approaches were also applied to study protein-ligand interactions mechanisms and in natural product-driven drug discovery. Herein, we survey the most recent progress on the application of computational approaches to advance the discovery and development of compounds targeting mTOR. Future directions in the discovery of new mTOR inhibitors using computational methods are also discussed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Automatic Classification of Time-variable X-Ray Sources
NASA Astrophysics Data System (ADS)
Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.
2014-05-01
To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.
Spectroscopic classification of supernova SN 2018Z by NUTS (NOT Un-biased Transient Survey)
NASA Astrophysics Data System (ADS)
Kuncarayakti, H.; Mattila, S.; Kotak, R.; Harmanen, J.; Reynolds, T.; Pastorello, A.; Benetti, S.; Stritzinger, M.; Onori, F.; Somero, A.; Kangas, T.; Lundqvist, P.; Taddia, F.; Ergon, M.
2018-01-01
The NOT Unbiased Transient Survey (NUTS; ATel #8992) collaboration reports the spectroscopic classification of supernova SN 2018Z in host galaxy SDSS J231809.76+212553.5 The observations were performed with the 2.56 m Nordic Optical Telescope equipped with ALFOSC (range 350-950 nm; resolution 1.6 nm) on 2018-01-09.9 UT. Survey Name | IAU Name | Discovery (UT) | Discovery mag | Observation (UT) | Redshift | Type | Phase | Notes PS18ao | SN 2018Z | 2018-01-01.2 | 19.96 | 2018-01-09.9 | 0.102 | Ia | post-maximum? | (1) (1) Redshift was derived from the SN and host absorption features.
Spectroscopic classification of SN 2017hro with NOT
NASA Astrophysics Data System (ADS)
Babooram, C.; Jormanainen, J.; Wagner, S.; Wierda, F.; Kuncarayakti, H.; Fedorets, G.; Dyrbye, S.
2017-11-01
We report the spectroscopic classification of supernova SN 2017hro (ATLAS17mwv) in host galaxy 2MASX J22161573+4003267. The observations were performed with the 2.56 m Nordic Optical Telescope equipped with ALFOSC (range 350-950 nm; resolution 1.6 nm) on 2017-11-01.8 UT. Survey Name | IAU Name | Discovery (UT) | Discovery mag | Observation (UT) | Redshift | Type | Phase | Notes ATLAS17mwv | SN 2017hro | 2017-10-28.3 | 18.765 | 2017-11-01.8 | 0.015 | II | around maximum | (1) (1) SN redshift is obtained from host emission lines and consistent with that derived from the SN spectrum.
Building Cognition: The Construction of Computational Representations for Scientific Discovery.
Chandrasekharan, Sanjay; Nersessian, Nancy J
2015-11-01
Novel computational representations, such as simulation models of complex systems and video games for scientific discovery (Foldit, EteRNA etc.), are dramatically changing the way discoveries emerge in science and engineering. The cognitive roles played by such computational representations in discovery are not well understood. We present a theoretical analysis of the cognitive roles such representations play, based on an ethnographic study of the building of computational models in a systems biology laboratory. Specifically, we focus on a case of model-building by an engineer that led to a remarkable discovery in basic bioscience. Accounting for such discoveries requires a distributed cognition (DC) analysis, as DC focuses on the roles played by external representations in cognitive processes. However, DC analyses by and large have not examined scientific discovery, and they mostly focus on memory offloading, particularly how the use of existing external representations changes the nature of cognitive tasks. In contrast, we study discovery processes and argue that discoveries emerge from the processes of building the computational representation. The building process integrates manipulations in imagination and in the representation, creating a coupled cognitive system of model and modeler, where the model is incorporated into the modeler's imagination. This account extends DC significantly, and we present some of the theoretical and application implications of this extended account. Copyright © 2014 Cognitive Science Society, Inc.
Research on hotspot discovery in internet public opinions based on improved K-means.
Wang, Gensheng
2013-01-01
How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved K-means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original K-means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original K-means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice.
Research on Hotspot Discovery in Internet Public Opinions Based on Improved K-Means
2013-01-01
How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved K-means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original K-means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original K-means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice. PMID:24106496
Railroad Classification Yard Technology Manual: Volume II : Yard Computer Systems
DOT National Transportation Integrated Search
1981-08-01
This volume (Volume II) of the Railroad Classification Yard Technology Manual documents the railroad classification yard computer systems methodology. The subjects covered are: functional description of process control and inventory computer systems,...
Selection-Fusion Approach for Classification of Datasets with Missing Values
Ghannad-Rezaie, Mostafa; Soltanian-Zadeh, Hamid; Ying, Hao; Dong, Ming
2010-01-01
This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. PMID:20212921
Functional odor classification through a medicinal chemistry approach.
Poivet, Erwan; Tahirova, Narmin; Peterlin, Zita; Xu, Lu; Zou, Dong-Jing; Acree, Terry; Firestein, Stuart
2018-02-01
Crucial for any hypothesis about odor coding is the classification and prediction of sensory qualities in chemical compounds. The relationship between perceptual quality and molecular structure has occupied olfactory scientists throughout the 20th century, but details of the mechanism remain elusive. Odor molecules are typically organic compounds of low molecular weight that may be aliphatic or aromatic, may be saturated or unsaturated, and may have diverse functional polar groups. However, many molecules conforming to these characteristics are odorless. One approach recently used to solve this problem was to apply machine learning strategies to a large set of odors and human classifiers in an attempt to find common and unique chemical features that would predict a chemical's odor. We use an alternative method that relies more on the biological responses of olfactory sensory neurons and then applies the principles of medicinal chemistry, a technique widely used in drug discovery. We demonstrate the effectiveness of this strategy through a classification for esters, an important odorant for the creation of flavor in wine. Our findings indicate that computational approaches that do not account for biological responses will be plagued by both false positives and false negatives and fail to provide meaningful mechanistic data. However, the two approaches used in tandem could resolve many of the paradoxes in odor perception.
Automated Discovery of Speech Act Categories in Educational Games
ERIC Educational Resources Information Center
Rus, Vasile; Moldovan, Cristian; Niraula, Nobal; Graesser, Arthur C.
2012-01-01
In this paper we address the important task of automated discovery of speech act categories in dialogue-based, multi-party educational games. Speech acts are important in dialogue-based educational systems because they help infer the student speaker's intentions (the task of speech act classification) which in turn is crucial to providing adequate…
Ganchev, Philip; Malehorn, David; Bigbee, William L.; Gopalakrishnan, Vanathi
2013-01-01
We present a novel framework for integrative biomarker discovery from related but separate data sets created in biomarker profiling studies. The framework takes prior knowledge in the form of interpretable, modular rules, and uses them during the learning of rules on a new data set. The framework consists of two methods of transfer of knowledge from source to target data: transfer of whole rules and transfer of rule structures. We evaluated the methods on three pairs of data sets: one genomic and two proteomic. We used standard measures of classification performance and three novel measures of amount of transfer. Preliminary evaluation shows that whole-rule transfer improves classification performance over using the target data alone, especially when there is more source data than target data. It also improves performance over using the union of the data sets. PMID:21571094
Human Expertise Helps Computer Classify Images
NASA Technical Reports Server (NTRS)
Rorvig, Mark E.
1991-01-01
Two-domain method of computational classification of images requires less computation than other methods for computational recognition, matching, or classification of images or patterns. Does not require explicit computational matching of features, and incorporates human expertise without requiring translation of mental processes of classification into language comprehensible to computer. Conceived to "train" computer to analyze photomicrographs of microscope-slide specimens of leucocytes from human peripheral blood to distinguish between specimens from healthy and specimens from traumatized patients.
Panacea, a semantic-enabled drug recommendations discovery framework.
Doulaverakis, Charalampos; Nikolaidis, George; Kleontas, Athanasios; Kompatsiaris, Ioannis
2014-03-06
Personalized drug prescription can be benefited from the use of intelligent information management and sharing. International standard classifications and terminologies have been developed in order to provide unique and unambiguous information representation. Such standards can be used as the basis of automated decision support systems for providing drug-drug and drug-disease interaction discovery. Additionally, Semantic Web technologies have been proposed in earlier works, in order to support such systems. The paper presents Panacea, a semantic framework capable of offering drug-drug and drug-diseases interaction discovery. For enabling this kind of service, medical information and terminology had to be translated to ontological terms and be appropriately coupled with medical knowledge of the field. International standard classifications and terminologies, provide the backbone of the common representation of medical data while the medical knowledge of drug interactions is represented by a rule base which makes use of the aforementioned standards. Representation is based on a lightweight ontology. A layered reasoning approach is implemented where at the first layer ontological inference is used in order to discover underlying knowledge, while at the second layer a two-step rule selection strategy is followed resulting in a computationally efficient reasoning approach. Details of the system architecture are presented while also giving an outline of the difficulties that had to be overcome. Panacea is evaluated both in terms of quality of recommendations against real clinical data and performance. The quality recommendation gave useful insights regarding requirements for real world deployment and revealed several parameters that affected the recommendation results. Performance-wise, Panacea is compared to a previous published work by the authors, a service for drug recommendations named GalenOWL, and presents their differences in modeling and approach to the problem, while also pinpointing the advantages of Panacea. Overall, the paper presents a framework for providing an efficient drug recommendations service where Semantic Web technologies are coupled with traditional business rule engines.
Diversity and evolution of class 2 CRISPR–Cas systems
Shmakov, Sergey; Smargon, Aaron; Scott, David; Cox, David; Pyzocha, Neena; Yan, Winston; Abudayyeh, Omar O.; Gootenberg, Jonathan S.; Makarova, Kira S.; Wolf, Yuri I.; Severinov, Konstantin; Zhang, Feng; Koonin, Eugene V.
2018-01-01
Class 2 CRISPR–Cas systems are characterized by effector modules that consist of a single multidomain protein, such as Cas9 or Cpf1. We designed a computational pipeline for the discovery of novel class 2 variants and used it to identify six new CRISPR–Cas subtypes. The diverse properties of these new systems provide potential for the development of versatile tools for genome editing and regulation. In this Analysis article, we present a comprehensive census of class 2 types and class 2 subtypes in complete and draft bacterial and archaeal genomes, outline evolutionary scenarios for the independent origin of different class 2 CRISPR–Cas systems from mobile genetic elements, and propose an amended classification and nomenclature of CRISPR–Cas. PMID:28111461
Computational modeling in melanoma for novel drug discovery.
Pennisi, Marzio; Russo, Giulia; Di Salvatore, Valentina; Candido, Saverio; Libra, Massimo; Pappalardo, Francesco
2016-06-01
There is a growing body of evidence highlighting the applications of computational modeling in the field of biomedicine. It has recently been applied to the in silico analysis of cancer dynamics. In the era of precision medicine, this analysis may allow the discovery of new molecular targets useful for the design of novel therapies and for overcoming resistance to anticancer drugs. According to its molecular behavior, melanoma represents an interesting tumor model in which computational modeling can be applied. Melanoma is an aggressive tumor of the skin with a poor prognosis for patients with advanced disease as it is resistant to current therapeutic approaches. This review discusses the basics of computational modeling in melanoma drug discovery and development. Discussion includes the in silico discovery of novel molecular drug targets, the optimization of immunotherapies and personalized medicine trials. Mathematical and computational models are gradually being used to help understand biomedical data produced by high-throughput analysis. The use of advanced computer models allowing the simulation of complex biological processes provides hypotheses and supports experimental design. The research in fighting aggressive cancers, such as melanoma, is making great strides. Computational models represent the key component to complement these efforts. Due to the combinatorial complexity of new drug discovery, a systematic approach based only on experimentation is not possible. Computational and mathematical models are necessary for bringing cancer drug discovery into the era of omics, big data and personalized medicine.
Discovery of User-Oriented Class Associations for Enriching Library Classification Schemes.
ERIC Educational Resources Information Center
Pu, Hsiao-Tieh
2002-01-01
Presents a user-based approach to exploring the possibility of adding user-oriented class associations to hierarchical library classification schemes. Classes not grouped in the same subject hierarchies yet relevant to users' knowledge are obtained by analyzing a log book of a university library's circulation records, using collaborative filtering…
Novel opportunities for computational biology and sociology in drug discovery
Yao, Lixia
2009-01-01
Drug discovery today is impossible without sophisticated modeling and computation. In this review we touch on previous advances in computational biology and by tracing the steps involved in pharmaceutical development, we explore a range of novel, high value opportunities for computational innovation in modeling the biological process of disease and the social process of drug discovery. These opportunities include text mining for new drug leads, modeling molecular pathways and predicting the efficacy of drug cocktails, analyzing genetic overlap between diseases and predicting alternative drug use. Computation can also be used to model research teams and innovative regions and to estimate the value of academy-industry ties for scientific and human benefit. Attention to these opportunities could promise punctuated advance, and will complement the well-established computational work on which drug discovery currently relies. PMID:19674801
Connecting the virtual world of computers to the real world of medicinal chemistry.
Glen, Robert C
2011-03-01
Drug discovery involves the simultaneous optimization of chemical and biological properties, usually in a single small molecule, which modulates one of nature's most complex systems: the balance between human health and disease. The increased use of computer-aided methods is having a significant impact on all aspects of the drug-discovery and development process and with improved methods and ever faster computers, computer-aided molecular design will be ever more central to the discovery process.
Automatic classification of time-variable X-ray sources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Kitty K.; Farrell, Sean; Murphy, Tara
2014-05-01
To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, andmore » other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.« less
Lung tumor diagnosis and subtype discovery by gene expression profiling.
Wang, Lu-yong; Tu, Zhuowen
2006-01-01
The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.
Cloud computing approaches to accelerate drug discovery value chain.
Garg, Vibhav; Arora, Suchir; Gupta, Chitra
2011-12-01
Continued advancements in the area of technology have helped high throughput screening (HTS) evolve from a linear to parallel approach by performing system level screening. Advanced experimental methods used for HTS at various steps of drug discovery (i.e. target identification, target validation, lead identification and lead validation) can generate data of the order of terabytes. As a consequence, there is pressing need to store, manage, mine and analyze this data to identify informational tags. This need is again posing challenges to computer scientists to offer the matching hardware and software infrastructure, while managing the varying degree of desired computational power. Therefore, the potential of "On-Demand Hardware" and "Software as a Service (SAAS)" delivery mechanisms cannot be denied. This on-demand computing, largely referred to as Cloud Computing, is now transforming the drug discovery research. Also, integration of Cloud computing with parallel computing is certainly expanding its footprint in the life sciences community. The speed, efficiency and cost effectiveness have made cloud computing a 'good to have tool' for researchers, providing them significant flexibility, allowing them to focus on the 'what' of science and not the 'how'. Once reached to its maturity, Discovery-Cloud would fit best to manage drug discovery and clinical development data, generated using advanced HTS techniques, hence supporting the vision of personalized medicine.
Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann
2012-09-24
Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Computational methods for a three-dimensional model of the petroleum-discovery process
Schuenemeyer, J.H.; Bawiec, W.J.; Drew, L.J.
1980-01-01
A discovery-process model devised by Drew, Schuenemeyer, and Root can be used to predict the amount of petroleum to be discovered in a basin from some future level of exploratory effort: the predictions are based on historical drilling and discovery data. Because marginal costs of discovery and production are a function of field size, the model can be used to make estimates of future discoveries within deposit size classes. The modeling approach is a geometric one in which the area searched is a function of the size and shape of the targets being sought. A high correlation is assumed between the surface-projection area of the fields and the volume of petroleum. To predict how much oil remains to be found, the area searched must be computed, and the basin size and discovery efficiency must be estimated. The basin is assumed to be explored randomly rather than by pattern drilling. The model may be used to compute independent estimates of future oil at different depth intervals for a play involving multiple producing horizons. We have written FORTRAN computer programs that are used with Drew, Schuenemeyer, and Root's model to merge the discovery and drilling information and perform the necessary computations to estimate undiscovered petroleum. These program may be modified easily for the estimation of remaining quantities of commodities other than petroleum. ?? 1980.
A new approach to the rationale discovery of polymeric biomaterials
Kohn, Joachim; Welsh, William J.; Knight, Doyle
2007-01-01
This paper attempts to illustrate both the need for new approaches to biomaterials discovery as well as the significant promise inherent in the use of combinatorial and computational design strategies. The key observation of this Leading Opinion Paper is that the biomaterials community has been slow to embrace advanced biomaterials discovery tools such as combinatorial methods, high throughput experimentation, and computational modeling in spite of the significant promise shown by these discovery tools in materials science, medicinal chemistry and the pharmaceutical industry. It seems that the complexity of living cells and their interactions with biomaterials has been a conceptual as well as a practical barrier to the use of advanced discovery tools in biomaterials science. However, with the continued increase in computer power, the goal of predicting the biological response of cells in contact with biomaterials surfaces is within reach. Once combinatorial synthesis, high throughput experimentation, and computational modeling are integrated into the biomaterials discovery process, a significant acceleration is possible in the pace of development of improved medical implants, tissue regeneration scaffolds, and gene/drug delivery systems. PMID:17644176
Automatic discovery of optimal classes
NASA Technical Reports Server (NTRS)
Cheeseman, Peter; Stutz, John; Freeman, Don; Self, Matthew
1986-01-01
A criterion, based on Bayes' theorem, is described that defines the optimal set of classes (a classification) for a given set of examples. This criterion is transformed into an equivalent minimum message length criterion with an intuitive information interpretation. This criterion does not require that the number of classes be specified in advance, this is determined by the data. The minimum message length criterion includes the message length required to describe the classes, so there is a built in bias against adding new classes unless they lead to a reduction in the message length required to describe the data. Unfortunately, the search space of possible classifications is too large to search exhaustively, so heuristic search methods, such as simulated annealing, are applied. Tutored learning and probabilistic prediction in particular cases are an important indirect result of optimal class discovery. Extensions to the basic class induction program include the ability to combine category and real value data, hierarchical classes, independent classifications and deciding for each class which attributes are relevant.
Perspectives on current tumor-node-metastasis (TNM) staging of cancers of the colon and rectum.
Hu, Huankai; Krasinskas, Alyssa; Willis, Joseph
2011-08-01
Improvements in classifications of cancers based on discovery and validation of important histopathological parameters and new molecular markers continue unabated. Though still not perfect, recent updates of classification schemes in gastrointestinal oncology by the American Joint Commission on Cancer (tumor-node-metastasis [TNM] staging) and the World Health Organization further stratify patients and guide optimization of treatment strategies and better predict patient outcomes. These updates recognize the heterogeneity of patient populations with significant subgrouping of each tumor stage and use of tumor deposits to significantly "up-stage" some cancers; change staging parameters for subsets of IIIB and IIIC cancers; and introduce of several new subtypes of colon carcinomas. By the nature of the process, recent discoveries that are important to improving even routine standards of patient care, especially new advances in molecular medicine, are not incorporated into these systems. Nonetheless, these classifications significantly advance clinical standards and are welcome enhancements to our current methods of cancer reporting. Copyright © 2011 Elsevier Inc. All rights reserved.
Majorana Fermions in Particle Physics, Solid State and Quantum Information
NASA Astrophysics Data System (ADS)
Borsten, L.; Duff, M. J.
This review is based on lectures given by M. J. Duff summarising the far reaching contributions of Ettore Majorana to fundamental physics, with special focus on Majorana fermions in all their guises. The theoretical discovery of the eponymous fcrmion in 1937 has since had profound implications for particlc physics, solid state and quantum computation. The breadth of these disciplines is testimony to Majorana's genius, which continues to permeate physics today. These lectures offer a whistle-stop tour through some limited subset of the key ideas. In addition to touching on these various applications, we will draw out some fascinating relations connecting the normed division algebras R, ℂ, H, O to spinors, trialities. K-theory and the classification of stable topological states of symmetry-protected gapped free-fermion systems.
The SED Machine: A Robotic Spectrograph for Fast Transient Classification
NASA Astrophysics Data System (ADS)
Blagorodnova, Nadejda; Neill, James D.; Walters, Richard; Kulkarni, Shrinivas R.; Fremling, Christoffer; Ben-Ami, Sagi; Dekany, Richard G.; Fucik, Jason R.; Konidaris, Nick; Nash, Reston; Ngeow, Chow-Choong; Ofek, Eran O.; O’ Sullivan, Donal; Quimby, Robert; Ritter, Andreas; Vyhmeister, Karl E.
2018-03-01
Current time domain facilities are finding several hundreds of transient astronomical events a year. The discovery rate is expected to increase in the future as soon as new surveys such as the Zwicky Transient Facility (ZTF) and the Large Synoptic Sky Survey (LSST) come online. Presently, the rate at which transients are classified is approximately one order or magnitude lower than the discovery rate, leading to an increasing “follow-up drought”. Existing telescopes with moderate aperture can help address this deficit when equipped with spectrographs optimized for spectral classification. Here, we provide an overview of the design, operations and first results of the Spectral Energy Distribution Machine (SEDM), operating on the Palomar 60-inch telescope (P60). The instrument is optimized for classification and high observing efficiency. It combines a low-resolution (R ∼ 100) integral field unit (IFU) spectrograph with “Rainbow Camera” (RC), a multi-band field acquisition camera which also serves as multi-band (ugri) photometer. The SEDM was commissioned during the operation of the intermediate Palomar Transient Factory (iPTF) and has already lived up to its promise. The success of the SEDM demonstrates the value of spectrographs optimized for spectral classification.
2015-01-01
The biopharmaceutics classification system (BCS) and biopharmaceutics drug distribution classification system (BDDCS) are complementary classification systems that can improve, simplify, and accelerate drug discovery, development, and regulatory processes. Drug permeability has been widely accepted as a screening tool for determining intestinal absorption via the BCS during the drug development and regulatory approval processes. Currently, predicting clinically significant drug interactions during drug development is a known challenge for industry and regulatory agencies. The BDDCS, a modification of BCS that utilizes drug metabolism instead of intestinal permeability, predicts drug disposition and potential drug–drug interactions in the intestine, the liver, and most recently the brain. Although correlations between BCS and BDDCS have been observed with drug permeability rates, discrepancies have been noted in drug classifications between the two systems utilizing different permeability models, which are accepted as surrogate models for demonstrating human intestinal permeability by the FDA. Here, we recommend the most applicable permeability models for improving the prediction of BCS and BDDCS classifications. We demonstrate that the passive transcellular permeability rate, characterized by means of permeability models that are deficient in transporter expression and paracellular junctions (e.g., PAMPA and Caco-2), will most accurately predict BDDCS metabolism. These systems will inaccurately predict BCS classifications for drugs that particularly are substrates of highly expressed intestinal transporters. Moreover, in this latter case, a system more representative of complete human intestinal permeability is needed to accurately predict BCS absorption. PMID:24628254
Larregieu, Caroline A; Benet, Leslie Z
2014-04-07
The biopharmaceutics classification system (BCS) and biopharmaceutics drug distribution classification system (BDDCS) are complementary classification systems that can improve, simplify, and accelerate drug discovery, development, and regulatory processes. Drug permeability has been widely accepted as a screening tool for determining intestinal absorption via the BCS during the drug development and regulatory approval processes. Currently, predicting clinically significant drug interactions during drug development is a known challenge for industry and regulatory agencies. The BDDCS, a modification of BCS that utilizes drug metabolism instead of intestinal permeability, predicts drug disposition and potential drug-drug interactions in the intestine, the liver, and most recently the brain. Although correlations between BCS and BDDCS have been observed with drug permeability rates, discrepancies have been noted in drug classifications between the two systems utilizing different permeability models, which are accepted as surrogate models for demonstrating human intestinal permeability by the FDA. Here, we recommend the most applicable permeability models for improving the prediction of BCS and BDDCS classifications. We demonstrate that the passive transcellular permeability rate, characterized by means of permeability models that are deficient in transporter expression and paracellular junctions (e.g., PAMPA and Caco-2), will most accurately predict BDDCS metabolism. These systems will inaccurately predict BCS classifications for drugs that particularly are substrates of highly expressed intestinal transporters. Moreover, in this latter case, a system more representative of complete human intestinal permeability is needed to accurately predict BCS absorption.
New Myositis Classification Criteria-What We Have Learned Since Bohan and Peter.
Leclair, Valérie; Lundberg, Ingrid E
2018-03-17
Idiopathic inflammatory myopathy (IIM) classification criteria have been a subject of debate for many decades. Despite several limitations, the Bohan and Peter criteria are still widely used. The aim of this review is to discuss the evolution of IIM classification criteria. New IIM classification criteria are periodically proposed. The discovery of myositis-specific and myositis-associated autoantibodies led to the development of clinico-serological criteria, while in-depth description of IIM morphological features improved histopathology-based criteria. The long-awaited European League Against Rheumatism and American College of Rheumatology (EULAR/ACR) IIM classification criteria were recently published. The Bohan and Peter criteria are outdated and validated classification criteria are necessary to improve research in IIM. The new EULAR/ACR IIM classification criteria are thus a definite improvement and an important step forward in the field.
Computational neuropharmacology: dynamical approaches in drug discovery.
Aradi, Ildiko; Erdi, Péter
2006-05-01
Computational approaches that adopt dynamical models are widely accepted in basic and clinical neuroscience research as indispensable tools with which to understand normal and pathological neuronal mechanisms. Although computer-aided techniques have been used in pharmaceutical research (e.g. in structure- and ligand-based drug design), the power of dynamical models has not yet been exploited in drug discovery. We suggest that dynamical system theory and computational neuroscience--integrated with well-established, conventional molecular and electrophysiological methods--offer a broad perspective in drug discovery and in the search for novel targets and strategies for the treatment of neurological and psychiatric diseases.
ERIC Educational Resources Information Center
Hulshof, Casper D.; de Jong, Ton
2006-01-01
Students encounter many obstacles during scientific discovery learning with computer-based simulations. It is hypothesized that an effective type of support, that does not interfere with the scientific discovery learning process, should be delivered on a "just-in-time" base. This study explores the effect of facilitating access to…
Zhang, Chi; Zhang, Ge; Chen, Ke-ji; Lu, Ai-ping
2016-04-01
The development of an effective classification method for human health conditions is essential for precise diagnosis and delivery of tailored therapy to individuals. Contemporary classification of disease systems has properties that limit its information content and usability. Chinese medicine pattern classification has been incorporated with disease classification, and this integrated classification method became more precise because of the increased understanding of the molecular mechanisms. However, we are still facing the complexity of diseases and patterns in the classification of health conditions. With continuing advances in omics methodologies and instrumentation, we are proposing a new classification approach: molecular module classification, which is applying molecular modules to classifying human health status. The initiative would be precisely defining the health status, providing accurate diagnoses, optimizing the therapeutics and improving new drug discovery strategy. Therefore, there would be no current disease diagnosis, no disease pattern classification, and in the future, a new medicine based on this classification, molecular module medicine, could redefine health statuses and reshape the clinical practice.
Hypergraph Based Feature Selection Technique for Medical Diagnosis.
Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar
2016-11-01
The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.
Varma, Manthena V; Gardner, Iain; Steyn, Stefanus J; Nkansah, Paul; Rotter, Charles J; Whitney-Pickett, Carrie; Zhang, Hui; Di, Li; Cram, Michael; Fenner, Katherine S; El-Kattan, Ayman F
2012-05-07
The Biopharmaceutics Classification System (BCS) is a scientific framework that provides a basis for predicting the oral absorption of drugs. These concepts have been extended in the Biopharmaceutics Drug Disposition Classification System (BDDCS) to explain the potential mechanism of drug clearance and understand the effects of uptake and efflux transporters on absorption, distribution, metabolism, and elimination. The objective of present work is to establish criteria for provisional biopharmaceutics classification using pH-dependent passive permeability and aqueous solubility data generated from high throughput screening methodologies in drug discovery settings. The apparent permeability across monolayers of clonal cell line of Madin-Darby canine kidney cells, selected for low endogenous efflux transporter expression, was measured for a set of 105 drugs, with known BCS and BDDCS class. The permeability at apical pH 6.5 for acidic drugs and at pH 7.4 for nonacidic drugs showed a good correlation with the fraction absorbed in human (Fa). Receiver operating characteristic (ROC) curve analysis was utilized to define the permeability class boundary. At permeability ≥ 5 × 10(-6) cm/s, the accuracy of predicting Fa of ≥ 0.90 was 87%. Also, this cutoff showed more than 80% sensitivity and specificity in predicting the literature permeability classes (BCS), and the metabolism classes (BDDCS). The equilibrium solubility of a subset of 49 drugs was measured in pH 1.2 medium, pH 6.5 phosphate buffer, and in FaSSIF medium (pH 6.5). Although dose was not considered, good concordance of the measured solubility with BCS and BDDCS solubility class was achieved, when solubility at pH 1.2 was used for acidic compounds and FaSSIF solubility was used for basic, neutral, and zwitterionic compounds. Using a cutoff of 200 μg/mL, the data set suggested a 93% sensitivity and 86% specificity in predicting both the BCS and BDDCS solubility classes. In conclusion, this study identified pH-dependent permeability and solubility criteria that can be used to assign provisional biopharmaceutics class at early stage of the drug discovery process. Additionally, such a classification system will enable discovery scientists to assess the potential limiting factors to oral absorption, as well as help predict the drug disposition mechanisms and potential drug-drug interactions.
A novel in silico approach to drug discovery via computational intelligence.
Hecht, David; Fogel, Gary B
2009-04-01
A computational intelligence drug discovery platform is introduced as an innovative technology designed to accelerate high-throughput drug screening for generalized protein-targeted drug discovery. This technology results in collections of novel small molecule compounds that bind to protein targets as well as details on predicted binding modes and molecular interactions. The approach was tested on dihydrofolate reductase (DHFR) for novel antimalarial drug discovery; however, the methods developed can be applied broadly in early stage drug discovery and development. For this purpose, an initial fragment library was defined, and an automated fragment assembly algorithm was generated. These were combined with a computational intelligence screening tool for prescreening of compounds relative to DHFR inhibition. The entire method was assayed relative to spaces of known DHFR inhibitors and with chemical feasibility in mind, leading to experimental validation in future studies.
SkyDiscovery: Humans and Machines Working Together
NASA Astrophysics Data System (ADS)
Donalek, Ciro; Fang, K.; Drake, A. J.; Djorgovski, S. G.; Graham, M. J.; Mahabal, A.; Williams, R.
2011-01-01
Synoptic sky surveys are now discovering tens to hundreds of transient events every clear night, and that data rate is expected to increase dramatically as we move towards the LSST. A key problem is classification of transients, which determines their scientific interest and possible follow-up. Some of the relevant information is contextual, and easily recognizable by humans looking at images, but it is very hard to encode in the data pipelines. Crowdsourcing (aka Citizen Science) provides one possible way to gather such information. SkyDiscovery.org is a website that allows experts and citizen science enthusiasts to work together and share information in a collaborative scientific discovery environment. Currently there are two projects running on the website. In the Event Classification project users help finding candidate transients through a series of questions related to the images shown. Event classification depends very much form the contextual information and humans are remarkably effective at recognizing noise in incomplete heterogeneous data and figuring out which contextual information is important. In the SNHunt project users are requested to look for new objects appearing on images of galaxies taken by the Catalina Real-time Transient Survey, in order to find all the supernovae occurring in nearby bright galaxies. Images are served alongside with other tools that can help the discovery. A multi level approach allows the complexity of the interface to be tailored to the expertise level of the user. An entry level user can just review images and validate events as being real, while a more advanced user would be able to interact with the data associated to an event. The data gathered will not be only analyzed and used directly for some specific science project, but also to train well-defined algorithms to be used in automating such data analysis in the future.
Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library
Allen, Colin; Börner, Katy; Light, Robert; McAlister, Simon; Ravenscroft, Andrew; Rose, Robert; Rose, Doori; Otsuka, Jun; Bourget, David; Lawrence, John; Reed, Chris
2017-01-01
We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of “drill-down” topic modeling—simultaneously reducing both the size of the corpus and the unit of analysis—we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of “close reading” needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted. PMID:28922416
Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library.
Murdock, Jaimie; Allen, Colin; Börner, Katy; Light, Robert; McAlister, Simon; Ravenscroft, Andrew; Rose, Robert; Rose, Doori; Otsuka, Jun; Bourget, David; Lawrence, John; Reed, Chris
2017-01-01
We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of "drill-down" topic modeling-simultaneously reducing both the size of the corpus and the unit of analysis-we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of "close reading" needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted.
Novel opportunities for computational biology and sociology in drug discovery☆
Yao, Lixia; Evans, James A.; Rzhetsky, Andrey
2013-01-01
Current drug discovery is impossible without sophisticated modeling and computation. In this review we outline previous advances in computational biology and, by tracing the steps involved in pharmaceutical development, explore a range of novel, high-value opportunities for computational innovation in modeling the biological process of disease and the social process of drug discovery. These opportunities include text mining for new drug leads, modeling molecular pathways and predicting the efficacy of drug cocktails, analyzing genetic overlap between diseases and predicting alternative drug use. Computation can also be used to model research teams and innovative regions and to estimate the value of academy–industry links for scientific and human benefit. Attention to these opportunities could promise punctuated advance and will complement the well-established computational work on which drug discovery currently relies. PMID:20349528
Variations of Human DNA Polymerase Genes as Biomarkers of Prostate Cancer Progression
2013-07-01
discovery , cancer genetics 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON USAMRMC...variations identified (including all single and double mutant combinations of the Triple mutant), and some POLK mutants • Discovery of a novel...Athens, Greece, 07/10 Makridakis N. Error-prone polymerase mutations and prostate cancer progression, COBRE /Cancer Genetics group seminar, Tulane
Session Initiation Protocol Network Encryption Device Plain Text Domain Discovery Service
2007-12-07
MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION / AVAILABILITY STATEMENT 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: a...such as the TACLANE, have developed unique discovery methods to establish Plain Text Domain (PTD) Security Associations (SA). All of these techniques...can include network and host Internet Protocol (IP) addresses, Information System Security Office (ISSO) point of contact information and PTD status
Computer-Aided Drug Design in Epigenetics
NASA Astrophysics Data System (ADS)
Lu, Wenchao; Zhang, Rukang; Jiang, Hao; Zhang, Huimin; Luo, Cheng
2018-03-01
Epigenetic dysfunction has been widely implicated in several diseases especially cancers thus highlights the therapeutic potential for chemical interventions in this field. With rapid development of computational methodologies and high-performance computational resources, computer-aided drug design has emerged as a promising strategy to speed up epigenetic drug discovery. Herein, we make a brief overview of major computational methods reported in the literature including druggability prediction, virtual screening, homology modeling, scaffold hopping, pharmacophore modeling, molecular dynamics simulations, quantum chemistry calculation and 3D quantitative structure activity relationship that have been successfully applied in the design and discovery of epi-drugs and epi-probes. Finally, we discuss about major limitations of current virtual drug design strategies in epigenetics drug discovery and future directions in this field.
Computer-Aided Drug Design in Epigenetics
Lu, Wenchao; Zhang, Rukang; Jiang, Hao; Zhang, Huimin; Luo, Cheng
2018-01-01
Epigenetic dysfunction has been widely implicated in several diseases especially cancers thus highlights the therapeutic potential for chemical interventions in this field. With rapid development of computational methodologies and high-performance computational resources, computer-aided drug design has emerged as a promising strategy to speed up epigenetic drug discovery. Herein, we make a brief overview of major computational methods reported in the literature including druggability prediction, virtual screening, homology modeling, scaffold hopping, pharmacophore modeling, molecular dynamics simulations, quantum chemistry calculation, and 3D quantitative structure activity relationship that have been successfully applied in the design and discovery of epi-drugs and epi-probes. Finally, we discuss about major limitations of current virtual drug design strategies in epigenetics drug discovery and future directions in this field. PMID:29594101
COMPUTER-AIDED DRUG DISCOVERY AND DEVELOPMENT (CADDD): in silico-chemico-biological approach
Kapetanovic, I.M.
2008-01-01
It is generally recognized that drug discovery and development are very time and resources consuming processes. There is an ever growing effort to apply computational power to the combined chemical and biological space in order to streamline drug discovery, design, development and optimization. In biomedical arena, computer-aided or in silico design is being utilized to expedite and facilitate hit identification, hit-to-lead selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile and avoid safety issues. Commonly used computational approaches include ligand-based drug design (pharmacophore, a 3-D spatial arrangement of chemical features essential for biological activity), structure-based drug design (drug-target docking), and quantitative structure-activity and quantitative structure-property relationships. Regulatory agencies as well as pharmaceutical industry are actively involved in development of computational tools that will improve effectiveness and efficiency of drug discovery and development process, decrease use of animals, and increase predictability. It is expected that the power of CADDD will grow as the technology continues to evolve. PMID:17229415
Enabling drug discovery project decisions with integrated computational chemistry and informatics
NASA Astrophysics Data System (ADS)
Tsui, Vickie; Ortwine, Daniel F.; Blaney, Jeffrey M.
2017-03-01
Computational chemistry/informatics scientists and software engineers in Genentech Small Molecule Drug Discovery collaborate with experimental scientists in a therapeutic project-centric environment. Our mission is to enable and improve pre-clinical drug discovery design and decisions. Our goal is to deliver timely data, analysis, and modeling to our therapeutic project teams using best-in-class software tools. We describe our strategy, the organization of our group, and our approaches to reach this goal. We conclude with a summary of the interdisciplinary skills required for computational scientists and recommendations for their training.
ERIC Educational Resources Information Center
Birch, William D.
1997-01-01
Defines geodiversity, compares it to biodiversity, and discusses the mineral classification system. Charts the discovery of new minerals in Australia over time and focuses on uses of these minerals in technological advances. (DDR)
Shin, Younghak; Lee, Seungchan; Ahn, Minkyu; Cho, Hohyun; Jun, Sung Chan; Lee, Heung-No
2015-11-01
One of the main problems related to electroencephalogram (EEG) based brain-computer interface (BCI) systems is the non-stationarity of the underlying EEG signals. This results in the deterioration of the classification performance during experimental sessions. Therefore, adaptive classification techniques are required for EEG based BCI applications. In this paper, we propose simple adaptive sparse representation based classification (SRC) schemes. Supervised and unsupervised dictionary update techniques for new test data and a dictionary modification method by using the incoherence measure of the training data are investigated. The proposed methods are very simple and additional computation for the re-training of the classifier is not needed. The proposed adaptive SRC schemes are evaluated using two BCI experimental datasets. The proposed methods are assessed by comparing classification results with the conventional SRC and other adaptive classification methods. On the basis of the results, we find that the proposed adaptive schemes show relatively improved classification accuracy as compared to conventional methods without requiring additional computation. Copyright © 2015 Elsevier Ltd. All rights reserved.
Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer
2015-01-01
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885
Bender, Andreas; Scheiber, Josef; Glick, Meir; Davies, John W; Azzaoui, Kamal; Hamon, Jacques; Urban, Laszlo; Whitebread, Steven; Jenkins, Jeremy L
2007-06-01
Preclinical Safety Pharmacology (PSP) attempts to anticipate adverse drug reactions (ADRs) during early phases of drug discovery by testing compounds in simple, in vitro binding assays (that is, preclinical profiling). The selection of PSP targets is based largely on circumstantial evidence of their contribution to known clinical ADRs, inferred from findings in clinical trials, animal experiments, and molecular studies going back more than forty years. In this work we explore PSP chemical space and its relevance for the prediction of adverse drug reactions. Firstly, in silico (computational) Bayesian models for 70 PSP-related targets were built, which are able to detect 93% of the ligands binding at IC(50) < or = 10 microM at an overall correct classification rate of about 94%. Secondly, employing the World Drug Index (WDI), a model for adverse drug reactions was built directly based on normalized side-effect annotations in the WDI, which does not require any underlying functional knowledge. This is, to our knowledge, the first attempt to predict adverse drug reactions across hundreds of categories from chemical structure alone. On average 90% of the adverse drug reactions observed with known, clinically used compounds were detected, an overall correct classification rate of 92%. Drugs withdrawn from the market (Rapacuronium, Suprofen) were tested in the model and their predicted ADRs align well with known ADRs. The analysis was repeated for acetylsalicylic acid and Benperidol which are still on the market. Importantly, features of the models are interpretable and back-projectable to chemical structure, raising the possibility of rationally engineering out adverse effects. By combining PSP and ADR models new hypotheses linking targets and adverse effects can be proposed and examples for the opioid mu and the muscarinic M2 receptors, as well as for cyclooxygenase-1 are presented. It is hoped that the generation of predictive models for adverse drug reactions is able to help support early SAR to accelerate drug discovery and decrease late stage attrition in drug discovery projects. In addition, models such as the ones presented here can be used for compound profiling in all development stages.
A resource management architecture based on complex network theory in cloud computing federation
NASA Astrophysics Data System (ADS)
Zhang, Zehua; Zhang, Xuejie
2011-10-01
Cloud Computing Federation is a main trend of Cloud Computing. Resource Management has significant effect on the design, realization, and efficiency of Cloud Computing Federation. Cloud Computing Federation has the typical characteristic of the Complex System, therefore, we propose a resource management architecture based on complex network theory for Cloud Computing Federation (abbreviated as RMABC) in this paper, with the detailed design of the resource discovery and resource announcement mechanisms. Compare with the existing resource management mechanisms in distributed computing systems, a Task Manager in RMABC can use the historical information and current state data get from other Task Managers for the evolution of the complex network which is composed of Task Managers, thus has the advantages in resource discovery speed, fault tolerance and adaptive ability. The result of the model experiment confirmed the advantage of RMABC in resource discovery performance.
New ligand-based approach for the discovery of antitrypanosomal compounds.
Vega, María Celeste; Montero-Torres, Alina; Marrero-Ponce, Yovani; Rolón, Miriam; Gómez-Barrio, Alicia; Escario, José Antonio; Arán, Vicente J; Nogal, Juan José; Meneses-Marcel, Alfredo; Torrens, Francisco
2006-04-01
The antitrypanosomal activity of 10 already synthesized compounds was in silico predicted as well as in vitro and in vivo explored against Trypanosoma cruzi. For the computational study, an approach based on non-stochastic linear fingerprints to the identification of potential antichagasic compounds is introduced. Molecular structures of 66 organic compounds, 28 with antitrypanosomal activity and 38 having other clinical uses, were parameterized by means of the TOMOCOMD-CARDD software. A linear classification function was derived allowing the discrimination between active and inactive compounds with a confidence of 95%. As predicted, seven compounds showed antitrypanosomal activity (%AE>70) against epimastigotic forms of T. cruzi at a concentration of 100mug/mL. After an unspecific cytotoxic assay, three compounds were evaluated against amastigote forms of the parasite. An in vivo test was carried out for one of the studied compounds.
AUTOCLASS III - AUTOMATIC CLASS DISCOVERY FROM DATA
NASA Technical Reports Server (NTRS)
Cheeseman, P. C.
1994-01-01
The program AUTOCLASS III, Automatic Class Discovery from Data, uses Bayesian probability theory to provide a simple and extensible approach to problems such as classification and general mixture separation. Its theoretical basis is free from ad hoc quantities, and in particular free of any measures which alter the data to suit the needs of the program. As a result, the elementary classification model used lends itself easily to extensions. The standard approach to classification in much of artificial intelligence and statistical pattern recognition research involves partitioning of the data into separate subsets, known as classes. AUTOCLASS III uses the Bayesian approach in which classes are described by probability distributions over the attributes of the objects, specified by a model function and its parameters. The calculation of the probability of each object's membership in each class provides a more intuitive classification than absolute partitioning techniques. AUTOCLASS III is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or omitted. The user specifies a class probability distribution function by associating attribute sets with supplied likelihood function terms. AUTOCLASS then searches in the space of class numbers and parameters for the maximally probable combination. It returns the set of class probability function parameters, and the class membership probabilities for each data instance. AUTOCLASS III is written in Common Lisp, and is designed to be platform independent. This program has been successfully run on Symbolics and Explorer Lisp machines. It has been successfully used with the following implementations of Common LISP on the Sun: Franz Allegro CL, Lucid Common Lisp, and Austin Kyoto Common Lisp and similar UNIX platforms; under the Lucid Common Lisp implementations on VAX/VMS v5.4, VAX/Ultrix v4.1, and MIPS/Ultrix v4, rev. 179; and on the Macintosh personal computer. The minimum Macintosh required is the IIci. This program will not run under CMU Common Lisp or VAX/VMS DEC Common Lisp. A minimum of 8Mb of RAM is required for Macintosh platforms and 16Mb for workstations. The standard distribution medium for this program is a .25 inch streaming magnetic tape cartridge in UNIX tar format. It is also available on a 3.5 inch diskette in UNIX tar format and a 3.5 inch diskette in Macintosh format. An electronic copy of the documentation is included on the distribution medium. AUTOCLASS was developed between March 1988 and March 1992. It was initially released in May 1991. Sun is a trademark of Sun Microsystems, Inc. UNIX is a registered trademark of AT&T Bell Laboratories. DEC, VAX, VMS, and ULTRIX are trademarks of Digital Equipment Corporation. Macintosh is a trademark of Apple Computer, Inc. Allegro CL is a registered trademark of Franz, Inc.
Classification of ASKAP Vast Radio Light Curves
NASA Technical Reports Server (NTRS)
Rebbapragada, Umaa; Lo, Kitty; Wagstaff, Kiri L.; Reed, Colorado; Murphy, Tara; Thompson, David R.
2012-01-01
The VAST survey is a wide-field survey that observes with unprecedented instrument sensitivity (0.5 mJy or lower) and repeat cadence (a goal of 5 seconds) that will enable novel scientific discoveries related to known and unknown classes of radio transients and variables. Given the unprecedented observing characteristics of VAST, it is important to estimate source classification performance, and determine best practices prior to the launch of ASKAP's BETA in 2012. The goal of this study is to identify light curve characterization and classification algorithms that are best suited for archival VAST light curve classification. We perform our experiments on light curve simulations of eight source types and achieve best case performance of approximately 90% accuracy. We note that classification performance is most influenced by light curve characterization rather than classifier algorithm.
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Papadogiannakis, S.; Taddia, F.; Petrushevska, T.; Ferretti, R.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Roy, R.; Hangard, L.; Vreeswijk, P.; Horesh, A.; Manulis, I.; Rubin, A.; Yaron, O.; Leloudas, G.; Khazov, D.; Soumagnac, M.; Knezevic, S.; Johansson, J.; Nir, G.; Cao, Y.; Blagorodnova, N.; Kulkarni, S.
2016-05-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artefacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Core-Collapse Supernovae
NASA Astrophysics Data System (ADS)
Taddia, F.; Ferretti, R.; Papadogiannakis, S.; Petrushevska, T.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Roy, R.; Hangard, L.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Bar, I.; Cao, Y.; Kulkarni, S.; Blagorodnova, N.
2016-05-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following core-collapse SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Core-Collapse Supernovae
NASA Astrophysics Data System (ADS)
Taddia, F.; Ferretti, R.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Petrushevska, T.; Roy, R.; Hangard, L.; De Cia, A.; Vreeswijk, P.; Horesh, A.; Manulis, I.; Sagiv, I.; Rubin, A.; Yaron, O.; Leloudas, G.; Khazov, D.; Soumagnac, M.; Bilgi, P.
2015-04-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Core-Collapse SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernova
NASA Astrophysics Data System (ADS)
Petrushevska, T.; Ferretti, R.; Fremling, C.; Hangard, L.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Roy, R.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Bilgi, P.; Cao, Y.; Duggan, G.; Lunnan, R.; Andreoni, I.
2015-10-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent SNe Ia
NASA Astrophysics Data System (ADS)
Ferretti, R.; Fremling, C.; Johansson, J.; Karamehmetoglu, E.; Migotto, K.; Nyholm, A.; Papadogiannakis, S.; Taddia, F.; Petrushevska, T.; Roy, R.; Ben-Ami, S.; De Cia, A.; Dzigan, Y.; Horesh, A.; Khazov, D.; Manulis, I.; Rubin, A.; Sagiv, I.; Vreeswijk, P.; Yaron, O.; Bilgi, P.; Cao, Y.; Duggan, G.
2015-02-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Papadogiannakis, S.; Taddia, F.; Ferretti, R.; Fremling, C.; Karamehmetoglu, E.; Petrushevska, T.; Nyholm, A.; Roy, R.; Hangard, L.; Vreeswijk, P.; Horesh, A.; Manulis, I.; Rubin, A.; Yaron, O.; Leloudas, G.; Khazov, D.; Soumagnac, M.; Knezevic, S.; Johansson, J.; Lunnan, R.; Blagorodnova, N.; Cao, Y.; Cenk, S. B.
2016-01-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Ferretti, R.; Fremling, C.; Hangard, L.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Petrushevska, T.; Roy, R.; Taddia, F.; Horesh, A.; Khazov, D.; Knezevic, S.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Cao, Y.; Duggan, G.; Lunnan, R.; Blagorodnova, N.
2015-11-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Core-Collapse Supernovae
NASA Astrophysics Data System (ADS)
Taddia, F.; Ferretti, R.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Petrushevska, T.; Roy, R.; Hangard, L.; Vreeswijk, P.; Horesh, A.; Manulis, I.; Rubin, A.; Yaron, O.; Leloudas, G.; Khazov, D.; Soumagnac, M.; Knezevic, S.; Johansson, J.; Duggan, G.; Lunnan, R.; Cao, Y.
2015-09-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Core-Collapse SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discovery of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Hangard, L.; Ferretti, R.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Petrushevska, T.; Roy, R.; Bar, I.; Horesh, A.; Johansson, J.; Khazov, D.; Knezevic, S.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Cao, Y.; Kulkarni, S.; Lunnan, R.; Ravi, V.; Vedantham, H. K.; Yan, L.
2016-04-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Core-Collapse Supernovae
NASA Astrophysics Data System (ADS)
Taddia, F.; Ferretti, R.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Petrushevska, T.; Roy, R.; Hangard, L.; Vreeswijk, P.; Horesh, A.; Manulis, I.; Rubin, A.; Yaron, O.; Leloudas, G.; Khazov, D.; Soumagnac, M.; Knezevic, S.; Johansson, J.; Lunnan, R.; Cao, Y.; Miller, A.
2015-11-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Core-Collapse SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Petrushevska, T.; Ferretti, R.; Fremling, C.; Hangard, L.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Roy, R.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Bilgi, P.; Cao, Y.; Duggan, G.; Lunnan, R.
2016-02-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discovery of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Hangard, L.; Taddia, F.; Ferretti, R.; Papadogiannakis, S.; Petrushevska, T.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Roy, R.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Bar, I.; Lunnan, R.; Cenk, S. B.
2016-02-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Papadogiannakis, S.; Fremling, C.; Hangard, L.; Karamehmetoglu, E.; Nyholm, A.; Ferretti, R.; Petrushevska, T.; Roy, R.; Taddia, F.; Bar, I.; Horesh, A.; Johansson, J.; Knezevic, S.; Leloudas, G.; Manulis, I.; Nir, G.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Arcavi, I.; Howell, D. A.; McCully, C.; Hosseinzadeh, G.; Valenti, S.; Blagorodnova, N.; Cao, Y.; Duggan, G.; Ravi, V.; Lunnan, R.
2016-03-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF discoveries of recent type Ia supernovae
NASA Astrophysics Data System (ADS)
Papadogiannakis, S.; Ferretti, R.; Fremling, C.; Hangard, L.; Karamehmetoglu, E.; Nyholm, A.; Petrushevska, T.; Roy, R.; De Cia, A.; Vreeswijk, P.; Horesh, A.; Manulis, I.; Sagiv, I.; Rubin, A.; Yaron, O.; Leloudas, G.; Khazov, D.; Soumagnac, M.; Knezevic, S.; Cenko, S. B.; Capone, J.; Bartakk, M.
2015-09-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discovery of Recent Type Ia Supernova
NASA Astrophysics Data System (ADS)
Hangard, L.; Petrushevska, T.; Papadogiannakis, S.; Ferretti, R.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Roy, R.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Kasliwal, M.
2015-10-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Petrushevska, T.; Ferretti, R.; Fremling, C.; Hangard, L.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Roy, R.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Bilgi, P.; Cao, Y.; Duggan, G.; Lunnan, R.; Neill, J. D.; Walters, R.
2016-04-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Papadogiannakis, S.; Taddia, F.; Petrushevska, T.; Fremling, C.; Hangard, L.; Johansson, J.; Karamehmetoglu, E.; Migotto, K.; Nyholm, A.; Roy, R.; Ben-Ami, S.; De Cia, A.; Dzigan, Y.; Horesh, A.; Khazov, D.; Soumagnac, M.; Manulis, I.; Rubin, A.; Sagiv, I.; Vreeswijk, P.; Yaron, O.; Bond, H.; Bilgi, P.; Cao, Y.; Duggan, G.
2015-03-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discovery of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Hangard, L.; Ferretti, R.; Papadogiannakis, S.; Petrushevska, T.; Fremling, C.; Karamehmetoglu, E.; Nyholm, A.; Roy, R.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Cook, D.
2015-12-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
iPTF Discoveries of Recent Type Ia Supernova
NASA Astrophysics Data System (ADS)
Petrushevska, T.; Ferretti, R.; Fremling, C.; Hangard, L.; Karamehmetoglu, E.; Nyholm, A.; Papadogiannakis, S.; Roy, R.; Horesh, A.; Khazov, D.; Knezevic, S.; Johansson, J.; Leloudas, G.; Manulis, I.; Rubin, A.; Soumagnac, M.; Vreeswijk, P.; Yaron, O.; Bilgi, P.; Cao, Y.; Duggan, G.; Lunnan, R.; Jencson, J.
2015-11-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W).
2005-03-18
simulation. This model is a basis of what is called discovery learning. Discovery learning is constructionist method of instruction, which is a concept in...2005 PAGES: 48 CLASSIFICATION: Unclassified The purpose of this study is to identify methods that could speed up the instructional system design...became obvious as the enemy attacked using asymmetric means and methods . For instance: during the war, a mine identification-training product was
NASA Astrophysics Data System (ADS)
Ginsburg, Léonard; Durrani, Khadim Hussain; Kassi, Akhtar Mohammad; Welcomme, Jean-Loup
1999-02-01
This paper deals with the discovery and the description of a new Anthracobunidae, Nakusia shahrigensis gen. et sp. nov., from the Ghazij Formation, Early Eocene, of the Kach area, west of Quetta in Pakistan. Discussion and comparison are made with the other anthracobunids, species and genera, the validity of which is discussed. Position of the Anthracobunidae in the Classification is discussed.
Nexus Between Protein–Ligand Affinity Rank-Ordering, Biophysical Approaches, and Drug Discovery
2013-01-01
The confluence of computational and biophysical methods to accurately rank-order the binding affinities of small molecules and determine structures of macromolecular complexes is a potentially transformative advance in the work flow of drug discovery. This viewpoint explores the impact that advanced computational methods may have on the efficacy of small molecule drug discovery and optimization, particularly with respect to emerging fragment-based methods. PMID:24900579
ERIC Educational Resources Information Center
Pittman, Kim
1997-01-01
Presents an activity with the objective that students apply and integrate what they learn about classification, food webs, and paleontology to the creation of a scientifically sound ecosystem. Students read and discuss literature selections during the unit. (DDR)
Teach-Discover-Treat (TDT): Collaborative Computational Drug Discovery for Neglected Diseases
Jansen, Johanna M.; Cornell, Wendy; Tseng, Y. Jane; Amaro, Rommie E.
2012-01-01
Teach – Discover – Treat (TDT) is an initiative to promote the development and sharing of computational tools solicited through a competition with the aim to impact education and collaborative drug discovery for neglected diseases. Collaboration, multidisciplinary integration, and innovation are essential for successful drug discovery. This requires a workforce that is trained in state-of-the-art workflows and equipped with the ability to collaborate on platforms that are accessible and free. The TDT competition solicits high quality computational workflows for neglected disease targets, using freely available, open access tools. PMID:23085175
LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team
2011-01-01
The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.
Anguera, A; Barreiro, J M; Lara, J A; Lizcano, D
2016-01-01
One of the major challenges in the medical domain today is how to exploit the huge amount of data that this field generates. To do this, approaches are required that are capable of discovering knowledge that is useful for decision making in the medical field. Time series are data types that are common in the medical domain and require specialized analysis techniques and tools, especially if the information of interest to specialists is concentrated within particular time series regions, known as events. This research followed the steps specified by the so-called knowledge discovery in databases (KDD) process to discover knowledge from medical time series derived from stabilometric (396 series) and electroencephalographic (200) patient electronic health records (EHR). The view offered in the paper is based on the experience gathered as part of the VIIP project. Knowledge discovery in medical time series has a number of difficulties and implications that are highlighted by illustrating the application of several techniques that cover the entire KDD process through two case studies. This paper illustrates the application of different knowledge discovery techniques for the purposes of classification within the above domains. The accuracy of this application for the two classes considered in each case is 99.86% and 98.11% for epilepsy diagnosis in the electroencephalography (EEG) domain and 99.4% and 99.1% for early-age sports talent classification in the stabilometry domain. The KDD techniques achieve better results than other traditional neural network-based classification techniques.
Gradus, Jaimie L; King, Matthew W; Galatzer-Levy, Isaac; Street, Amy E
2017-08-01
Suicide rates among recent veterans have led to interest in risk identification. Evidence of gender-and trauma-specific predictors of suicidal ideation necessitates the use of advanced computational methods capable of elucidating these important and complex associations. In this study, we used machine learning to examine gender-specific associations between predeployment and military factors, traumatic deployment experiences, and psychopathology and suicidal ideation (SI) in a national sample of veterans deployed during the Iraq and Afghanistan conflicts (n = 2,244). Classification, regression tree analyses, and random forests were used to identify associations with SI and determine their classification accuracy. Findings converged on several associations for men that included depression, posttraumatic stress disorder (PTSD), and somatic complaints. Sexual harassment during deployment emerged as a key factor that interacted with PTSD and depression and demonstrated a stronger association with SI among women. Classification accuracy for SI presence or absence was good based on the receiver operating characteristic area under the curve, men = .91, women = .92. The risk for SI was classifiable with good accuracy, with associations that varied by gender. The use of machine learning analyses allowed for the discovery of rich, nuanced results that should be replicated in other samples and may eventually be a basis for the development of gender-specific actuarial tools to assess SI risk among veterans. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Classification of large-scale fundus image data sets: a cloud-computing framework.
Roychowdhury, Sohini
2016-08-01
Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
A fuzzy hill-climbing algorithm for the development of a compact associative classifier
NASA Astrophysics Data System (ADS)
Mitra, Soumyaroop; Lam, Sarah S.
2012-02-01
Classification, a data mining technique, has widespread applications including medical diagnosis, targeted marketing, and others. Knowledge discovery from databases in the form of association rules is one of the important data mining tasks. An integrated approach, classification based on association rules, has drawn the attention of the data mining community over the last decade. While attention has been mainly focused on increasing classifier accuracies, not much efforts have been devoted towards building interpretable and less complex models. This paper discusses the development of a compact associative classification model using a hill-climbing approach and fuzzy sets. The proposed methodology builds the rule-base by selecting rules which contribute towards increasing training accuracy, thus balancing classification accuracy with the number of classification association rules. The results indicated that the proposed associative classification model can achieve competitive accuracies on benchmark datasets with continuous attributes and lend better interpretability, when compared with other rule-based systems.
A Classification of Recent Australasian Computing Education Publications
ERIC Educational Resources Information Center
Computer Science Education, 2007
2007-01-01
A new classification system for computing education papers is presented and applied to every computing education paper published between January 2004 and January 2007 at the two premier computing education conferences in Australia and New Zealand. We find that while simple reports outnumber other types of paper, a healthy proportion of papers…
Authorship Discovery in Blogs Using Bayesian Classification with Corrective Scaling
2008-06-01
4 2.3 W. Fucks ’ Diagram of n-Syllable Word Frequencies . . . . . . . . . . . . . . 5 3.1 Confusion Matrix for All Test Documents of 500...of the books which scholars believed he had. • Wilhelm Fucks discriminated between authors using the average number of syllables per word and average...distance between equal-syllabled words [8]. Fucks , too, concluded that a study such as his reveals a “possibility of a quantitative classification
ERIC Educational Resources Information Center
Njoo, Melanie; de Jong, Ton
This paper contains the results of a study on the importance of discovery learning using computer simulations. The purpose of the study was to identify what constitutes discovery learning and to assess the effects of instructional support measures. College students were observed working with an assignment and a computer simulation in the domain of…
Computationally driven drug discovery meeting-3 - Verona (Italy): 4 - 6th of March 2014.
Costantino, Gabriele
2014-12-01
The following article reports on the results and the outcome of a meeting organised at the Aptuit Auditorium in Verona (Italy), which highlighted the current applications of state-of-the-art computational science to drug design in Italy. The meeting, which had > 100 people in attendance, consisted of over 40 presentations and included keynote lectures given by world-renowned speakers. The topics included in the meeting are areas related to ligand and structure-based ligand design and library design and screening; it also provided discussion pertaining to chemometrics. The meeting also stressed the importance of public-private collaboration and reviewed the different approaches to computationally driven drug discovery taken within academia and industry. The meeting helped define the current position of state-of-the-art computational drug discovery in Italy, pointing out criticalities and assets. This kind of focused meeting is important in the sense that it lends the opportunity of a restricted yet representative community of fellow professionals to deeply discuss the current methodological approaches and provide future perspectives for computationally driven drug discovery.
NASA Technical Reports Server (NTRS)
Pell, Barney
2003-01-01
A viewgraph presentation on NASA's Discovery Systems Project is given. The topics of discussion include: 1) NASA's Computing Information and Communications Technology Program; 2) Discovery Systems Program; and 3) Ideas for Information Integration Using the Web.
1988-03-01
Mechanism; Computer Security. 16. PRICE CODE 17. SECURITY CLASSIFICATION IS. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20. UMrrATION OF ABSTRACT...denial of service. This paper assumes that the reader is a computer science or engineering professional working in the area of formal specification and...recovery from such events as deadlocks and crashes can be accounted for in the computation of the waiting time for each service in the service hierarchy
Henry, Suzanne Bakken; Warren, Judith J.; Lange, Linda; Button, Patricia
1998-01-01
Building on the work of previous authors, the Computer-based Patient Record Institute (CPRI) Work Group on Codes and Structures has described features of a classification scheme for implementation within a computer-based patient record. The authors of the current study reviewed the evaluation literature related to six major nursing vocabularies (the North American Nursing Diagnosis Association Taxonomy 1, the Nursing Interventions Classification, the Nursing Outcomes Classification, the Home Health Care Classification, the Omaha System, and the International Classification for Nursing Practice) to determine the extent to which the vocabularies include the CPRI features. None of the vocabularies met all criteria. The Omaha System, Home Health Care Classification, and International Classification for Nursing Practice each included five features. Criteria not fully met by any systems were clear and non-redundant representation of concepts, administrative cross-references, syntax and grammar, synonyms, uncertainty, context-free identifiers, and language independence. PMID:9670127
Natural Products for Drug Discovery in the 21st Century: Innovations for Novel Drug Discovery.
Thomford, Nicholas Ekow; Senthebane, Dimakatso Alice; Rowe, Arielle; Munro, Daniella; Seele, Palesa; Maroyi, Alfred; Dzobo, Kevin
2018-05-25
The therapeutic properties of plants have been recognised since time immemorial. Many pathological conditions have been treated using plant-derived medicines. These medicines are used as concoctions or concentrated plant extracts without isolation of active compounds. Modern medicine however, requires the isolation and purification of one or two active compounds. There are however a lot of global health challenges with diseases such as cancer, degenerative diseases, HIV/AIDS and diabetes, of which modern medicine is struggling to provide cures. Many times the isolation of "active compound" has made the compound ineffective. Drug discovery is a multidimensional problem requiring several parameters of both natural and synthetic compounds such as safety, pharmacokinetics and efficacy to be evaluated during drug candidate selection. The advent of latest technologies that enhance drug design hypotheses such as Artificial Intelligence, the use of 'organ-on chip' and microfluidics technologies, means that automation has become part of drug discovery. This has resulted in increased speed in drug discovery and evaluation of the safety, pharmacokinetics and efficacy of candidate compounds whilst allowing novel ways of drug design and synthesis based on natural compounds. Recent advances in analytical and computational techniques have opened new avenues to process complex natural products and to use their structures to derive new and innovative drugs. Indeed, we are in the era of computational molecular design, as applied to natural products. Predictive computational softwares have contributed to the discovery of molecular targets of natural products and their derivatives. In future the use of quantum computing, computational softwares and databases in modelling molecular interactions and predicting features and parameters needed for drug development, such as pharmacokinetic and pharmacodynamics, will result in few false positive leads in drug development. This review discusses plant-based natural product drug discovery and how innovative technologies play a role in next-generation drug discovery.
iPTF Discoveries of Recent Type Ia Supernovae
NASA Astrophysics Data System (ADS)
Petrushevska, T.; Ferretti, R.; Fremling, C.; Hangard, L.; Johansson, J.; Migotto, K.; Nyholm, A.; Papadogiannakis, S.; Ben-Ami, S.; De Cia, A.; Dzigan, Y.; Horesh, A.; Leloudas, G.; Manulis, I.; Rubin, A.; Sagiv, I.; Vreeswijk, P.; Yaron, O.; Cao, Y.; Perley, D.; Miller, A.; Waszczak, A.; Kasliwal, M. M.; Hosseinzadeh, G.; Cenko, S. B.; Quimby, R.
2015-05-01
The intermediate Palomar Transient Factory (ATel #4807) reports the discovery and classification of the following Type Ia SNe. Our automated candidate vetting to distinguish a real astrophysical source (1.0) from bogus artifacts (0.0) is powered by three generations of machine learning algorithms: RB2 (Brink et al. 2013MNRAS.435.1047B), RB4 (Rebbapragada et al. 2015AAS...22543402R) and RB5 (Wozniak et al. 2013AAS...22143105W). See ATel #7112 for additional details.
NASA Technical Reports Server (NTRS)
Joyce, A. T.
1974-01-01
Significant progress has been made in the classification of surface conditions (land uses) with computer-implemented techniques based on the use of ERTS digital data and pattern recognition software. The supervised technique presently used at the NASA Earth Resources Laboratory is based on maximum likelihood ratioing with a digital table look-up approach to classification. After classification, colors are assigned to the various surface conditions (land uses) classified, and the color-coded classification is film recorded on either positive or negative 9 1/2 in. film at the scale desired. Prints of the film strips are then mosaicked and photographed to produce a land use map in the format desired. Computer extraction of statistical information is performed to show the extent of each surface condition (land use) within any given land unit that can be identified in the image. Evaluations of the product indicate that classification accuracy is well within the limits for use by land resource managers and administrators. Classifications performed with digital data acquired during different seasons indicate that the combination of two or more classifications offer even better accuracy.
Abramoff, Michael D.; Fort, Patrice E.; Han, Ian C.; Jayasundera, K. Thiran; Sohn, Elliott H.; Gardner, Thomas W.
2018-01-01
The Early Treatment Diabetic Retinopathy Study (ETDRS) and other standardized classification schemes have laid a foundation for tremendous advances in the understanding and management of diabetic retinopathy (DR). However, technological advances in optics and image analysis, especially optical coherence tomography (OCT), OCT angiography (OCTa), and ultra-widefield imaging, as well as new discoveries in diabetic retinal neuropathy (DRN), are exposing the limitations of ETDRS and other classification systems to completely characterize retinal changes in diabetes, which we term diabetic retinal disease (DRD). While it may be most straightforward to add axes to existing classification schemes, as diabetic macular edema (DME) was added as an axis to earlier DR classifications, doing so may make these classifications increasingly complicated and thus clinically intractable. Therefore, we propose future research efforts to develop a new, comprehensive, and clinically useful classification system that will identify multimodal biomarkers to reflect the complex pathophysiology of DRD and accelerate the development of therapies to prevent vision-threatening DRD. PMID:29372250
Abramoff, Michael D; Fort, Patrice E; Han, Ian C; Jayasundera, K Thiran; Sohn, Elliott H; Gardner, Thomas W
2018-01-01
The Early Treatment Diabetic Retinopathy Study (ETDRS) and other standardized classification schemes have laid a foundation for tremendous advances in the understanding and management of diabetic retinopathy (DR). However, technological advances in optics and image analysis, especially optical coherence tomography (OCT), OCT angiography (OCTa), and ultra-widefield imaging, as well as new discoveries in diabetic retinal neuropathy (DRN), are exposing the limitations of ETDRS and other classification systems to completely characterize retinal changes in diabetes, which we term diabetic retinal disease (DRD). While it may be most straightforward to add axes to existing classification schemes, as diabetic macular edema (DME) was added as an axis to earlier DR classifications, doing so may make these classifications increasingly complicated and thus clinically intractable. Therefore, we propose future research efforts to develop a new, comprehensive, and clinically useful classification system that will identify multimodal biomarkers to reflect the complex pathophysiology of DRD and accelerate the development of therapies to prevent vision-threatening DRD.
Computational Methods in Drug Discovery
Sliwoski, Gregory; Kothiwale, Sandeepkumar; Meiler, Jens
2014-01-01
Computer-aided drug discovery/design methods have played a major role in the development of therapeutically important small molecules for over three decades. These methods are broadly classified as either structure-based or ligand-based methods. Structure-based methods are in principle analogous to high-throughput screening in that both target and ligand structure information is imperative. Structure-based approaches include ligand docking, pharmacophore, and ligand design methods. The article discusses theory behind the most important methods and recent successful applications. Ligand-based methods use only ligand information for predicting activity depending on its similarity/dissimilarity to previously known active ligands. We review widely used ligand-based methods such as ligand-based pharmacophores, molecular descriptors, and quantitative structure-activity relationships. In addition, important tools such as target/ligand data bases, homology modeling, ligand fingerprint methods, etc., necessary for successful implementation of various computer-aided drug discovery/design methods in a drug discovery campaign are discussed. Finally, computational methods for toxicity prediction and optimization for favorable physiologic properties are discussed with successful examples from literature. PMID:24381236
Gozalbes, Rafael; Carbajo, Rodrigo J; Pineda-Lucena, Antonio
2010-01-01
In the last decade, fragment-based drug discovery (FBDD) has evolved from a novel approach in the search of new hits to a valuable alternative to the high-throughput screening (HTS) campaigns of many pharmaceutical companies. The increasing relevance of FBDD in the drug discovery universe has been concomitant with an implementation of the biophysical techniques used for the detection of weak inhibitors, e.g. NMR, X-ray crystallography or surface plasmon resonance (SPR). At the same time, computational approaches have also been progressively incorporated into the FBDD process and nowadays several computational tools are available. These stretch from the filtering of huge chemical databases in order to build fragment-focused libraries comprising compounds with adequate physicochemical properties, to more evolved models based on different in silico methods such as docking, pharmacophore modelling, QSAR and virtual screening. In this paper we will review the parallel evolution and complementarities of biophysical techniques and computational methods, providing some representative examples of drug discovery success stories by using FBDD.
Pai, Priyadarshini P; Mondal, Sukanta
2017-01-01
Enzymes are biological catalysts that play an important role in determining the patterns of chemical transformations pertaining to life. Many milestones have been achieved in unraveling the mechanisms in which the enzymes orchestrate various cellular processes using experimental and computational approaches. Experimental studies generating nearly all possible mutations of target enzymes have been aided by rapid computational approaches aiming at enzyme functional classification, understanding domain organization, functional site identification. The functional architecture, essentially, is involved in binding or interaction with ligands including substrates, products, cofactors, inhibitors, providing for their function, such as in catalysis, ligand mediated cell signaling, allosteric regulation and post-translational modifications. With the increasing availability of enzyme information and advances in algorithm development, computational approaches have now become more capable of providing precise inputs for enzyme engineering, and in the process also making it more efficient. This has led to interesting findings, especially in aberrant enzyme interactions, such as hostpathogen interactions in infection, neurodegenerative diseases, cancer and diabetes. This review aims to summarize in retrospection - the mined knowledge, vivid perspectives and challenging strides in using available experimentally validated enzyme information for characterization. An analytical outlook is presented on the scope of exploring future directions. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
A full computation-relevant topological dynamics classification of elementary cellular automata.
Schüle, Martin; Stoop, Ruedi
2012-12-01
Cellular automata are both computational and dynamical systems. We give a complete classification of the dynamic behaviour of elementary cellular automata (ECA) in terms of fundamental dynamic system notions such as sensitivity and chaoticity. The "complex" ECA emerge to be sensitive, but not chaotic and not eventually weakly periodic. Based on this classification, we conjecture that elementary cellular automata capable of carrying out complex computations, such as needed for Turing-universality, are at the "edge of chaos."
MPD3: a useful medicinal plants database for drug designing.
Mumtaz, Arooj; Ashfaq, Usman Ali; Ul Qamar, Muhammad Tahir; Anwar, Farooq; Gulzar, Faisal; Ali, Muhammad Amjad; Saari, Nazamid; Pervez, Muhammad Tariq
2017-06-01
Medicinal plants are the main natural pools for the discovery and development of new drugs. In the modern era of computer-aided drug designing (CADD), there is need of prompt efforts to design and construct useful database management system that allows proper data storage, retrieval and management with user-friendly interface. An inclusive database having information about classification, activity and ready-to-dock library of medicinal plant's phytochemicals is therefore required to assist the researchers in the field of CADD. The present work was designed to merge activities of phytochemicals from medicinal plants, their targets and literature references into a single comprehensive database named as Medicinal Plants Database for Drug Designing (MPD3). The newly designed online and downloadable MPD3 contains information about more than 5000 phytochemicals from around 1000 medicinal plants with 80 different activities, more than 900 literature references and 200 plus targets. The designed database is deemed to be very useful for the researchers who are engaged in medicinal plants research, CADD and drug discovery/development with ease of operation and increased efficiency. The designed MPD3 is a comprehensive database which provides most of the information related to the medicinal plants at a single platform. MPD3 is freely available at: http://bioinform.info .
ERIC Educational Resources Information Center
Kim, Jiseon
2010-01-01
Classification testing has been widely used to make categorical decisions by determining whether an examinee has a certain degree of ability required by established standards. As computer technologies have developed, classification testing has become more computerized. Several approaches have been proposed and investigated in the context of…
A review of classification algorithms for EEG-based brain-computer interfaces.
Lotte, F; Congedo, M; Lécuyer, A; Lamarche, F; Arnaldi, B
2007-06-01
In this paper we review classification algorithms used to design brain-computer interface (BCI) systems based on electroencephalography (EEG). We briefly present the commonly employed algorithms and describe their critical properties. Based on the literature, we compare them in terms of performance and provide guidelines to choose the suitable classification algorithm(s) for a specific BCI.
Yang, Hua; Gao, Wen; Liu, Lei; Liu, Ke; Liu, E-Hu; Qi, Lian-Wen; Li, Ping
2015-11-10
Most Aconitum species, also known as aconite, are extremely poisonous, so it must be identified carefully. Differentiation of Aconitum species is challenging because of their similar appearance and chemical components. In this study, a universal strategy to discover chemical markers was developed for effective authentication of three commonly used aconite roots. The major procedures include: (1) chemical profiling and structural assignment of herbs by liquid chromatography with mass spectrometry (LC-MS), (2) quantification of major components by LC-MS, (3) probabilistic neural network (PNN) model to calculate contributions of components toward species classification, (4) discovery of minimized number of chemical markers for quality control. The MS fragmentation pathways of diester-, monoester-, and alkyloyamine-diterpenoid alkaloids were compared. Using these rules, 42 aconite alkaloids were identified in aconite roots. Subsequently, 11 characteristic compounds were quantified. A component-species modeling by PNN was then established combining the 11 analytes and 26-batch samples from three aconite species. The contribution of each analyte to species classification was calculated. Selection of fuziline, benzoylhypaconine, and talatizamine, or a combination of more compounds based on a contribution order, can be used for successful categorization of the three aconite species. Collectively, the proposed strategy is beneficial to selection of rational chemical markers for the species classification and quality control of herbal medicines. Copyright © 2015 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Robinson, William R.
2000-01-01
Describes a review of research that addresses the effectiveness of simulations in promoting scientific discovery learning and the problems that learners may encounter when using discovery learning. (WRM)
Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying
2011-08-01
The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.
Discovery of novel SERCA inhibitors by virtual screening of a large compound library.
Elam, Christopher; Lape, Michael; Deye, Joel; Zultowsky, Jodie; Stanton, David T; Paula, Stefan
2011-05-01
Two screening protocols based on recursive partitioning and computational ligand docking methodologies, respectively, were employed for virtual screens of a compound library with 345,000 entries for novel inhibitors of the enzyme sarco/endoplasmic reticulum calcium ATPase (SERCA), a potential target for cancer chemotherapy. A total of 72 compounds that were predicted to be potential inhibitors of SERCA were tested in bioassays and 17 displayed inhibitory potencies at concentrations below 100 μM. The majority of these inhibitors were composed of two phenyl rings tethered to each other by a short link of one to three atoms. Putative interactions between SERCA and the inhibitors were identified by inspection of docking-predicted poses and some of the structural features required for effective SERCA inhibition were determined by analysis of the classification pattern employed by the recursive partitioning models. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
ERIC Educational Resources Information Center
Sargent, John
The Office of Technology Policy analyzed Bureau of Labor Statistics' growth projections for the core occupational classifications of IT (information technology) workers to assess future demand in the United States. Classifications studied were computer engineers, systems analysts, computer programmers, database administrators, computer support…
Computer Classification of Triangles and Quadrilaterals--A Challenging Application
ERIC Educational Resources Information Center
Dennis, J. Richard
1978-01-01
Two computer exercises involving the classification of geometric figures are given. The mathematics required is relatively simple but comes from several areas--synthetic geometry, analytic geometry, and linear algebra. (MN)
Theoretical Interpretation of the Fluorescence Spectra of Toluene and P- Cresol
1994-07-01
NUMBER OF PAGES Toluene Geometrica 25 p-Cresol Fluorescence Is. PRICE CODE Spectra 17. SECURITY CLASSIFICATION 13. SECURITY CLASSIFICATION 19...State Frequencies of Toluene ................ 19 6 Computed and exp" Ground State Frequencies of p-Cresol ............... 20 7 Correction Factors for...Computed Ground State Vibrational Frequencies ....... 21 8 Computed and Corrected Excited State Frequencies of Toluene ............. 22 9 Computed and
Song, Yuhyun; Leman, Scotland; Monteil, Caroline L.; Heath, Lenwood S.; Vinatzer, Boris A.
2014-01-01
A broadly accepted and stable biological classification system is a prerequisite for biological sciences. It provides the means to describe and communicate about life without ambiguity. Current biological classification and nomenclature use the species as the basic unit and require lengthy and laborious species descriptions before newly discovered organisms can be assigned to a species and be named. The current system is thus inadequate to classify and name the immense genetic diversity within species that is now being revealed by genome sequencing on a daily basis. To address this lack of a general intra-species classification and naming system adequate for today’s speed of discovery of new diversity, we propose a classification and naming system that is exclusively based on genome similarity and that is suitable for automatic assignment of codes to any genome-sequenced organism without requiring any phenotypic or phylogenetic analysis. We provide examples demonstrating that genome similarity-based codes largely align with current taxonomic groups at many different levels in bacteria, animals, humans, plants, and viruses. Importantly, the proposed approach is only slightly affected by the order of code assignment and can thus provide codes that reflect similarity between organisms and that do not need to be revised upon discovery of new diversity. We envision genome similarity-based codes to complement current biological nomenclature and to provide a universal means to communicate unambiguously about any genome-sequenced organism in fields as diverse as biodiversity research, infectious disease control, human and microbial forensics, animal breed and plant cultivar certification, and human ancestry research. PMID:24586551
A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry
NASA Astrophysics Data System (ADS)
Forster, J.; Entrup, B.
2017-10-01
In this paper we present and evaluate a cognitive computing approach for classification of dissatisfaction and four complaint specific complaint classes in correspondence documents between insurance clients and an insurance company. A cognitive computing approach includes the combination classical natural language processing methods, machine learning algorithms and the evaluation of hypothesis. The approach combines a MaxEnt machine learning algorithm with language modelling, tf-idf and sentiment analytics to create a multi-label text classification model. The result is trained and tested with a set of 2500 original insurance communication documents written in German, which have been manually annotated by the partnering insurance company. With a F1-Score of 0.9, a reliable text classification component has been implemented and evaluated. A final outlook towards a cognitive computing insurance assistant is given in the end.
Computer-aided Classification of Mammographic Masses Using Visually Sensitive Image Features
Wang, Yunzhi; Aghaei, Faranak; Zarafshani, Ali; Qiu, Yuchen; Qian, Wei; Zheng, Bin
2017-01-01
Purpose To develop a new computer-aided diagnosis (CAD) scheme that computes visually sensitive image features routinely used by radiologists to develop a machine learning classifier and distinguish between the malignant and benign breast masses detected from digital mammograms. Methods An image dataset including 301 breast masses was retrospectively selected. From each segmented mass region, we computed image features that mimic five categories of visually sensitive features routinely used by radiologists in reading mammograms. We then selected five optimal features in the five feature categories and applied logistic regression models for classification. A new CAD interface was also designed to show lesion segmentation, computed feature values and classification score. Results Areas under ROC curves (AUC) were 0.786±0.026 and 0.758±0.027 when to classify mass regions depicting on two view images, respectively. By fusing classification scores computed from two regions, AUC increased to 0.806±0.025. Conclusion This study demonstrated a new approach to develop CAD scheme based on 5 visually sensitive image features. Combining with a “visual aid” interface, CAD results may be much more easily explainable to the observers and increase their confidence to consider CAD generated classification results than using other conventional CAD approaches, which involve many complicated and visually insensitive texture features. PMID:27911353
The decision tree approach to classification
NASA Technical Reports Server (NTRS)
Wu, C.; Landgrebe, D. A.; Swain, P. H.
1975-01-01
A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Zhang, Jiang; Wang, James Z; Yuan, Zhen; Sobel, Eric S; Jiang, Huabei
2011-01-01
This study presents a computer-aided classification method to distinguish osteoarthritis finger joints from healthy ones based on the functional images captured by x-ray guided diffuse optical tomography. Three imaging features, joint space width, optical absorption, and scattering coefficients, are employed to train a Least Squares Support Vector Machine (LS-SVM) classifier for osteoarthritis classification. The 10-fold validation results show that all osteoarthritis joints are clearly identified and all healthy joints are ruled out by the LS-SVM classifier. The best sensitivity, specificity, and overall accuracy of the classification by experienced technicians based on manual calculation of optical properties and visual examination of optical images are only 85%, 93%, and 90%, respectively. Therefore, our LS-SVM based computer-aided classification is a considerably improved method for osteoarthritis diagnosis.
Nom et lumière: enlightenment through nomenclature (the 1996 Kenneth F. Russell Memorial Lecture).
Pearn, J
1997-08-01
The classification of living things is both an acknowledgement of biological relationships and an identification of their differences. When Linnaeus, in 1735, published Systema Naturae, he set in place a system of biological classification that saw its apogee in the invention of binomial nomenclature: the description of every living thing being embodied simply in two names, (i.e. a genus and the species within it). Linnaeus built on the work of scientific forebears, of whom Nehemiah Grew (1641-1712) was one of the most influential. Grew was a surgeon-physician whose passionate interest was plant anatomy; his work led to the discovery and documentation of sexual dimorphism in plants. Grew's life and works are a witness to that philosophy which views nature as a continuum, a broad holistic entity in which discoveries in one biological field have ramifications in other areas. Grew allowed his scientific curiosity full rein, manifested the courage to publish his work and possessed the self-discipline to stand by the audit of his peers. Modern biological research and contemporary clinical practice owes much to the enlightenment engendered by the classification and nomenclature that developed from his work.
Rassinoux, Anne-Marie; Baud, Robert H; Rodrigues, Jean-Marie; Lovis, Christian; Geissbühler, Antoine
2007-01-01
The importance of clinical communication between providers, consumers and others, as well as the requisite for computer interoperability, strengthens the need for sharing common accepted terminologies. Under the directives of the World Health Organization (WHO), an approach is currently being conducted in Australia to adopt a standardized terminology for medical procedures that is intended to become an international reference. In order to achieve such a standard, a collaborative approach is adopted, in line with the successful experiment conducted for the development of the new French coding system CCAM. Different coding centres are involved in setting up a semantic representation of each term using a formal ontological structure expressed through a logic-based representation language. From this language-independent representation, multilingual natural language generation (NLG) is performed to produce noun phrases in various languages that are further compared for consistency with the original terms. Outcomes are presented for the assessment of the International Classification of Health Interventions (ICHI) and its translation into Portuguese. The initial results clearly emphasize the feasibility and cost-effectiveness of the proposed method for handling both a different classification and an additional language. NLG tools, based on ontology driven semantic representation, facilitate the discovery of ambiguous and inconsistent terms, and, as such, should be promoted for establishing coherent international terminologies.
A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
NASA Astrophysics Data System (ADS)
Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr
2017-08-01
This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
A decision support model for investment on P2P lending platform.
Zeng, Xiangxiang; Liu, Li; Leung, Stephen; Du, Jiangze; Wang, Xun; Li, Tao
2017-01-01
Peer-to-peer (P2P) lending, as a novel economic lending model, has triggered new challenges on making effective investment decisions. In a P2P lending platform, one lender can invest N loans and a loan may be accepted by M investors, thus forming a bipartite graph. Basing on the bipartite graph model, we built an iteration computation model to evaluate the unknown loans. To validate the proposed model, we perform extensive experiments on real-world data from the largest American P2P lending marketplace-Prosper. By comparing our experimental results with those obtained by Bayes and Logistic Regression, we show that our computation model can help borrowers select good loans and help lenders make good investment decisions. Experimental results also show that the Logistic classification model is a good complement to our iterative computation model, which motivates us to integrate the two classification models. The experimental results of the hybrid classification model demonstrate that the logistic classification model and our iteration computation model are complementary to each other. We conclude that the hybrid model (i.e., the integration of iterative computation model and Logistic classification model) is more efficient and stable than the individual model alone.
A decision support model for investment on P2P lending platform
Liu, Li; Leung, Stephen; Du, Jiangze; Wang, Xun; Li, Tao
2017-01-01
Peer-to-peer (P2P) lending, as a novel economic lending model, has triggered new challenges on making effective investment decisions. In a P2P lending platform, one lender can invest N loans and a loan may be accepted by M investors, thus forming a bipartite graph. Basing on the bipartite graph model, we built an iteration computation model to evaluate the unknown loans. To validate the proposed model, we perform extensive experiments on real-world data from the largest American P2P lending marketplace—Prosper. By comparing our experimental results with those obtained by Bayes and Logistic Regression, we show that our computation model can help borrowers select good loans and help lenders make good investment decisions. Experimental results also show that the Logistic classification model is a good complement to our iterative computation model, which motivates us to integrate the two classification models. The experimental results of the hybrid classification model demonstrate that the logistic classification model and our iteration computation model are complementary to each other. We conclude that the hybrid model (i.e., the integration of iterative computation model and Logistic classification model) is more efficient and stable than the individual model alone. PMID:28877234
Study of USGS/NASA land use classification system. [computer analysis from LANDSAT data
NASA Technical Reports Server (NTRS)
Spann, G. W.
1975-01-01
The results of a computer mapping project using LANDSAT data and the USGS/NASA land use classification system are summarized. During the computer mapping portion of the project, accuracies of 67 percent to 79 percent were achieved using Level II of the classification system and a 4,000 acre test site centered on Douglasville, Georgia. Analysis of response to a questionaire circulated to actual and potential LANDSAT data users reveals several important findings: (1) there is a substantial desire for additional information related to LANDSAT capabilities; (2) a majority of the respondents feel computer mapping from LANDSAT data could aid present or future projects; and (3) the costs of computer mapping are substantially less than those of other methods.
Preprocessing and meta-classification for brain-computer interfaces.
Hammon, Paul S; de Sa, Virginia R
2007-03-01
A brain-computer interface (BCI) is a system which allows direct translation of brain states into actions, bypassing the usual muscular pathways. A BCI system works by extracting user brain signals, applying machine learning algorithms to classify the user's brain state, and performing a computer-controlled action. Our goal is to improve brain state classification. Perhaps the most obvious way to improve classification performance is the selection of an advanced learning algorithm. However, it is now well known in the BCI community that careful selection of preprocessing steps is crucial to the success of any classification scheme. Furthermore, recent work indicates that combining the output of multiple classifiers (meta-classification) leads to improved classification rates relative to single classifiers (Dornhege et al., 2004). In this paper, we develop an automated approach which systematically analyzes the relative contributions of different preprocessing and meta-classification approaches. We apply this procedure to three data sets drawn from BCI Competition 2003 (Blankertz et al., 2004) and BCI Competition III (Blankertz et al., 2006), each of which exhibit very different characteristics. Our final classification results compare favorably with those from past BCI competitions. Additionally, we analyze the relative contributions of individual preprocessing and meta-classification choices and discuss which types of BCI data benefit most from specific algorithms.
Discovery and Classification of the z=1.86 SLSNe: DES15E2mlf
NASA Astrophysics Data System (ADS)
Pan, Y.-C.; Foley, R. J.; Galbany, L.; Gonzalez-Gaitan, S.; Forster, F.; Hamuy, M.; Prieto, J. L.; Yuan, F.; Tucker, B. E.; Lidman, C.; Martini, P.; Gshwend, Julia; Moller, A.; Zhang, B.; Desai, S.; Paech, K.; Smith, R. C.; Schubnell, M.; Kessler, R.; Lasker, J.; Scolnic, D.; Brout, D. J.; Gladney, L.; Sako, M.; Wolf, R. C.; Brown, P. J.; Krisciunas, K.; Suntzeff, N.; Nichol, R.; Papadopoulos, A.; Childress, M.; D'Andrea, C.; Prajs, S.; Smith, M.; Sullivan, M.; Maartens, R.; Gupta, R.; Kovacs, E.; Kuhlmann, S.; Spinka, H.; Ahn, E.; Finley, D. A.; Frieman, J.; Marriner, J.; Wester, W.; Aldering, G.; Kim, A. G.; Thomas, R. C.; Barbary, K.; Bloom, J. S.; Goldstein, D.; Nugent, P.; Perlmutter, S.; Casas, R.; Castander, F. J.
2015-12-01
We report the spectroscopic classification of DES15E2mlf as a superluminous supernova (SLSN) discovered by the Dark Energy Survey (ATEL #4668). DES15E2mlf was discovered on 7 November 2015 at R.A. = 00:41:33.40, Decl = -43:27:17.2 with r = 24.1 mag. We obtained spectra using GMOS on Gemini-South (520-990nm) on 06 December 2015 which indicated a redshift of z = 1.86 from Mg II 2800 absorption.
Component architecture in drug discovery informatics.
Smith, Peter M
2002-05-01
This paper reviews the characteristics of a new model of computing that has been spurred on by the Internet, known as Netcentric computing. Developments in this model led to distributed component architectures, which, although not new ideas, are now realizable with modern tools such as Enterprise Java. The application of this approach to scientific computing, particularly in pharmaceutical discovery research, is discussed and highlighted by a particular case involving the management of biological assay data.
Moretti, Loris; Sartori, Luca
2016-09-01
In the field of Computer-Aided Drug Discovery and Development (CADDD) the proper software infrastructure is essential for everyday investigations. The creation of such an environment should be carefully planned and implemented with certain features in order to be productive and efficient. Here we describe a solution to integrate standard computational services into a functional unit that empowers modelling applications for drug discovery. This system allows users with various level of expertise to run in silico experiments automatically and without the burden of file formatting for different software, managing the actual computation, keeping track of the activities and graphical rendering of the structural outcomes. To showcase the potential of this approach, performances of five different docking programs on an Hiv-1 protease test set are presented. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Non-linear molecular pattern classification using molecular beacons with multiple targets.
Lee, In-Hee; Lee, Seung Hwan; Park, Tai Hyun; Zhang, Byoung-Tak
2013-12-01
In vitro pattern classification has been highlighted as an important future application of DNA computing. Previous work has demonstrated the feasibility of linear classifiers using DNA-based molecular computing. However, complex tasks require non-linear classification capability. Here we design a molecular beacon that can interact with multiple targets and experimentally shows that its fluorescent signals form a complex radial-basis function, enabling it to be used as a building block for non-linear molecular classification in vitro. The proposed method was successfully applied to solving artificial and real-world classification problems: XOR and microRNA expression patterns. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Tissue classification for laparoscopic image understanding based on multispectral texture analysis
NASA Astrophysics Data System (ADS)
Zhang, Yan; Wirkert, Sebastian J.; Iszatt, Justin; Kenngott, Hannes; Wagner, Martin; Mayer, Benjamin; Stock, Christian; Clancy, Neil T.; Elson, Daniel S.; Maier-Hein, Lena
2016-03-01
Intra-operative tissue classification is one of the prerequisites for providing context-aware visualization in computer-assisted minimally invasive surgeries. As many anatomical structures are difficult to differentiate in conventional RGB medical images, we propose a classification method based on multispectral image patches. In a comprehensive ex vivo study we show (1) that multispectral imaging data is superior to RGB data for organ tissue classification when used in conjunction with widely applied feature descriptors and (2) that combining the tissue texture with the reflectance spectrum improves the classification performance. Multispectral tissue analysis could thus evolve as a key enabling technique in computer-assisted laparoscopy.
NASA Technical Reports Server (NTRS)
Spruce, J. P.; Smoot, James; Ellis, Jean; Hilbert, Kent; Swann, Roberta
2012-01-01
This paper discusses the development and implementation of a geospatial data processing method and multi-decadal Landsat time series for computing general coastal U.S. land-use and land-cover (LULC) classifications and change products consisting of seven classes (water, barren, upland herbaceous, non-woody wetland, woody upland, woody wetland, and urban). Use of this approach extends the observational period of the NOAA-generated Coastal Change and Analysis Program (C-CAP) products by almost two decades, assuming the availability of one cloud free Landsat scene from any season for each targeted year. The Mobile Bay region in Alabama was used as a study area to develop, demonstrate, and validate the method that was applied to derive LULC products for nine dates at approximate five year intervals across a 34-year time span, using single dates of data for each classification in which forests were either leaf-on, leaf-off, or mixed senescent conditions. Classifications were computed and refined using decision rules in conjunction with unsupervised classification of Landsat data and C-CAP value-added products. Each classification's overall accuracy was assessed by comparing stratified random locations to available reference data, including higher spatial resolution satellite and aerial imagery, field survey data, and raw Landsat RGBs. Overall classification accuracies ranged from 83 to 91% with overall Kappa statistics ranging from 0.78 to 0.89. The accuracies are comparable to those from similar, generalized LULC products derived from C-CAP data. The Landsat MSS-based LULC product accuracies are similar to those from Landsat TM or ETM+ data. Accurate classifications were computed for all nine dates, yielding effective results regardless of season. This classification method yielded products that were used to compute LULC change products via additive GIS overlay techniques.
Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.
Chen, Shizhi; Yang, Xiaodong; Tian, Yingli
2015-09-01
A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
22 CFR 9.9 - Declassification and downgrading.
Code of Federal Regulations, 2010 CFR
2010-04-01
... occur: (1) After review of material in response to a Freedom of Information Act (FOIA) request, mandatory declassification review request, discovery request, subpoena, classification challenge, or other information access or declassification request; (2) After review as part of the Department's systematic...
NASA Technical Reports Server (NTRS)
Kettig, R. L.
1975-01-01
A method of classification of digitized multispectral images is developed and experimentally evaluated on actual earth resources data collected by aircraft and satellite. The method is designed to exploit the characteristic dependence between adjacent states of nature that is neglected by the more conventional simple-symmetric decision rule. Thus contextual information is incorporated into the classification scheme. The principle reason for doing this is to improve the accuracy of the classification. For general types of dependence this would generally require more computation per resolution element than the simple-symmetric classifier. But when the dependence occurs in the form of redundance, the elements can be classified collectively, in groups, therby reducing the number of classifications required.
Quality grading of Atlantic salmon (Salmo salar) by computer vision.
Misimi, E; Erikson, U; Skavhaug, A
2008-06-01
In this study, we present a promising method of computer vision-based quality grading of whole Atlantic salmon (Salmo salar). Using computer vision, it was possible to differentiate among different quality grades of Atlantic salmon based on the external geometrical information contained in the fish images. Initially, before the image acquisition, the fish were subjectively graded and labeled into grading classes by a qualified human inspector in the processing plant. Prior to classification, the salmon images were segmented into binary images, and then feature extraction was performed on the geometrical parameters of the fish from the grading classes. The classification algorithm was a threshold-based classifier, which was designed using linear discriminant analysis. The performance of the classifier was tested by using the leave-one-out cross-validation method, and the classification results showed a good agreement between the classification done by human inspectors and by the computer vision. The computer vision-based method classified correctly 90% of the salmon from the data set as compared with the classification by human inspector. Overall, it was shown that computer vision can be used as a powerful tool to grade Atlantic salmon into quality grades in a fast and nondestructive manner by a relatively simple classifier algorithm. The low cost of implementation of today's advanced computer vision solutions makes this method feasible for industrial purposes in fish plants as it can replace manual labor, on which grading tasks still rely.
ERIC Educational Resources Information Center
Hsu, Ting-Chia
2016-01-01
In this study, a peer assessment system using the grid-based knowledge classification approach was developed to improve students' performance during computer skills training. To evaluate the effectiveness of the proposed approach, an experiment was conducted in a computer skills certification course. The participants were divided into three…
Railroad classification yard technology : computer system methodology : case study : Potomac Yard
DOT National Transportation Integrated Search
1981-08-01
This report documents the application of the railroad classification yard computer system methodology to Potomac Yard of the Richmond, Fredericksburg, and Potomac Railroad Company (RF&P). This case study entailed evaluation of the yard traffic capaci...
Mathematical modeling for novel cancer drug discovery and development.
Zhang, Ping; Brusic, Vladimir
2014-10-01
Mathematical modeling enables: the in silico classification of cancers, the prediction of disease outcomes, optimization of therapy, identification of promising drug targets and prediction of resistance to anticancer drugs. In silico pre-screened drug targets can be validated by a small number of carefully selected experiments. This review discusses the basics of mathematical modeling in cancer drug discovery and development. The topics include in silico discovery of novel molecular drug targets, optimization of immunotherapies, personalized medicine and guiding preclinical and clinical trials. Breast cancer has been used to demonstrate the applications of mathematical modeling in cancer diagnostics, the identification of high-risk population, cancer screening strategies, prediction of tumor growth and guiding cancer treatment. Mathematical models are the key components of the toolkit used in the fight against cancer. The combinatorial complexity of new drugs discovery is enormous, making systematic drug discovery, by experimentation, alone difficult if not impossible. The biggest challenges include seamless integration of growing data, information and knowledge, and making them available for a multiplicity of analyses. Mathematical models are essential for bringing cancer drug discovery into the era of Omics, Big Data and personalized medicine.
Dinov, Ivo D
2016-01-01
Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be 'team science'.
Macromolecular target prediction by self-organizing feature maps.
Schneider, Gisbert; Schneider, Petra
2017-03-01
Rational drug discovery would greatly benefit from a more nuanced appreciation of the activity of pharmacologically active compounds against a diverse panel of macromolecular targets. Already, computational target-prediction models assist medicinal chemists in library screening, de novo molecular design, optimization of active chemical agents, drug re-purposing, in the spotting of potential undesired off-target activities, and in the 'de-orphaning' of phenotypic screening hits. The self-organizing map (SOM) algorithm has been employed successfully for these and other purposes. Areas covered: The authors recapitulate contemporary artificial neural network methods for macromolecular target prediction, and present the basic SOM algorithm at a conceptual level. Specifically, they highlight consensus target-scoring by the employment of multiple SOMs, and discuss the opportunities and limitations of this technique. Expert opinion: Self-organizing feature maps represent a straightforward approach to ligand clustering and classification. Some of the appeal lies in their conceptual simplicity and broad applicability domain. Despite known algorithmic shortcomings, this computational target prediction concept has been proven to work in prospective settings with high success rates. It represents a prototypic technique for future advances in the in silico identification of the modes of action and macromolecular targets of bioactive molecules.
Hidden relationships between metalloproteins unveiled by structural comparison of their metal sites
NASA Astrophysics Data System (ADS)
Valasatava, Yana; Andreini, Claudia; Rosato, Antonio
2015-03-01
Metalloproteins account for a substantial fraction of all proteins. They incorporate metal atoms, which are required for their structure and/or function. Here we describe a new computational protocol to systematically compare and classify metal-binding sites on the basis of their structural similarity. These sites are extracted from the MetalPDB database of minimal functional sites (MFSs) in metal-binding biological macromolecules. Structural similarity is measured by the scoring function of the available MetalS2 program. Hierarchical clustering was used to organize MFSs into clusters, for each of which a representative MFS was identified. The comparison of all representative MFSs provided a thorough structure-based classification of the sites analyzed. As examples, the application of the proposed computational protocol to all heme-binding proteins and zinc-binding proteins of known structure highlighted the existence of structural subtypes, validated known evolutionary links and shed new light on the occurrence of similar sites in systems at different evolutionary distances. The present approach thus makes available an innovative viewpoint on metalloproteins, where the functionally crucial metal sites effectively lead the discovery of structural and functional relationships in a largely protein-independent manner.
Paul, J T; Singh, A K; Dong, Z; Zhuang, H; Revard, B C; Rijal, B; Ashton, M; Linscheid, A; Blonsky, M; Gluhovic, D; Guo, J; Hennig, R G
2017-11-29
The discovery of two-dimensional (2D) materials comes at a time when computational methods are mature and can predict novel 2D materials, characterize their properties, and guide the design of 2D materials for applications. This article reviews the recent progress in computational approaches for 2D materials research. We discuss the computational techniques and provide an overview of the ongoing research in the field. We begin with an overview of known 2D materials, common computational methods, and available cyber infrastructures. We then move onto the discovery of novel 2D materials, discussing the stability criteria for 2D materials, computational methods for structure prediction, and interactions of monolayers with electrochemical and gaseous environments. Next, we describe the computational characterization of the 2D materials' electronic, optical, magnetic, and superconducting properties and the response of the properties under applied mechanical strain and electrical fields. From there, we move on to discuss the structure and properties of defects in 2D materials, and describe methods for 2D materials device simulations. We conclude by providing an outlook on the needs and challenges for future developments in the field of computational research for 2D materials.
Computational methods for 2D materials: discovery, property characterization, and application design
NASA Astrophysics Data System (ADS)
Paul, J. T.; Singh, A. K.; Dong, Z.; Zhuang, H.; Revard, B. C.; Rijal, B.; Ashton, M.; Linscheid, A.; Blonsky, M.; Gluhovic, D.; Guo, J.; Hennig, R. G.
2017-11-01
The discovery of two-dimensional (2D) materials comes at a time when computational methods are mature and can predict novel 2D materials, characterize their properties, and guide the design of 2D materials for applications. This article reviews the recent progress in computational approaches for 2D materials research. We discuss the computational techniques and provide an overview of the ongoing research in the field. We begin with an overview of known 2D materials, common computational methods, and available cyber infrastructures. We then move onto the discovery of novel 2D materials, discussing the stability criteria for 2D materials, computational methods for structure prediction, and interactions of monolayers with electrochemical and gaseous environments. Next, we describe the computational characterization of the 2D materials’ electronic, optical, magnetic, and superconducting properties and the response of the properties under applied mechanical strain and electrical fields. From there, we move on to discuss the structure and properties of defects in 2D materials, and describe methods for 2D materials device simulations. We conclude by providing an outlook on the needs and challenges for future developments in the field of computational research for 2D materials.
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics
HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE
2017-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. PMID:29275361
ERIC Educational Resources Information Center
Bohát, Róbert; Rödlingová, Beata; Horáková, Nina
2015-01-01
Corpus of High School Academic Texts (COHAT), currently of 150,000+ words, aims to make academic language instruction a more data-driven and student-centered discovery learning as a special type of Computer-Assisted Language Learning (CALL), emphasizing students' critical thinking and metacognition. Since 2013, high school English as an additional…
ERIC Educational Resources Information Center
Kunsting, Josef; Wirth, Joachim; Paas, Fred
2011-01-01
Using a computer-based scientific discovery learning environment on buoyancy in fluids we investigated the "effects of goal specificity" (nonspecific goals vs. specific goals) for two goal types (problem solving goals vs. learning goals) on "strategy use" and "instructional efficiency". Our empirical findings close an important research gap,…
A bioinformatics knowledge discovery in text application for grid computing
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-01-01
Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-06-16
A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
NASA Astrophysics Data System (ADS)
Selwyn, Ebenezer Juliet; Florinabel, D. Jemi
2018-04-01
Compound image segmentation plays a vital role in the compression of computer screen images. Computer screen images are images which are mixed with textual, graphical, or pictorial contents. In this paper, we present a comparison of two transform based block classification of compound images based on metrics like speed of classification, precision and recall rate. Block based classification approaches normally divide the compound images into fixed size blocks of non-overlapping in nature. Then frequency transform like Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are applied over each block. Mean and standard deviation are computed for each 8 × 8 block and are used as features set to classify the compound images into text/graphics and picture/background block. The classification accuracy of block classification based segmentation techniques are measured by evaluation metrics like precision and recall rate. Compound images of smooth background and complex background images containing text of varying size, colour and orientation are considered for testing. Experimental evidence shows that the DWT based segmentation provides significant improvement in recall rate and precision rate approximately 2.3% than DCT based segmentation with an increase in block classification time for both smooth and complex background images.
A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery.
Dodd, Susanna; Clarke, Mike; Becker, Lorne; Mavergames, Chris; Fish, Rebecca; Williamson, Paula R
2018-04-01
There is increasing recognition that insufficient attention has been paid to the choice of outcomes measured in clinical trials. The lack of a standardized outcome classification system results in inconsistencies due to ambiguity and variation in how outcomes are described across different studies. Being able to classify by outcome would increase efficiency in searching sources such as clinical trial registries, patient registries, the Cochrane Database of Systematic Reviews, and the Core Outcome Measures in Effectiveness Trials (COMET) database of core outcome sets (COS), thus aiding knowledge discovery. A literature review was carried out to determine existing outcome classification systems, none of which were sufficiently comprehensive or granular for classification of all potential outcomes from clinical trials. A new taxonomy for outcome classification was developed, and as proof of principle, outcomes extracted from all published COS in the COMET database, selected Cochrane reviews, and clinical trial registry entries were classified using this new system. Application of this new taxonomy to COS in the COMET database revealed that 274/299 (92%) COS include at least one physiological outcome, whereas only 177 (59%) include at least one measure of impact (global quality of life or some measure of functioning) and only 105 (35%) made reference to adverse events. This outcome taxonomy will be used to annotate outcomes included in COS within the COMET database and is currently being piloted for use in Cochrane Reviews within the Cochrane Linked Data Project. Wider implementation of this standard taxonomy in trial and systematic review databases and registries will further promote efficient searching, reporting, and classification of trial outcomes. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
A new computer approach to mixed feature classification for forestry application
NASA Technical Reports Server (NTRS)
Kan, E. P.
1976-01-01
A computer approach for mapping mixed forest features (i.e., types, classes) from computer classification maps is discussed. Mixed features such as mixed softwood/hardwood stands are treated as admixtures of softwood and hardwood areas. Large-area mixed features are identified and small-area features neglected when the nominal size of a mixed feature can be specified. The computer program merges small isolated areas into surrounding areas by the iterative manipulation of the postprocessing algorithm that eliminates small connected sets. For a forestry application, computer-classified LANDSAT multispectral scanner data of the Sam Houston National Forest were used to demonstrate the proposed approach. The technique was successful in cleaning the salt-and-pepper appearance of multiclass classification maps and in mapping admixtures of softwood areas and hardwood areas. However, the computer-mapped mixed areas matched very poorly with the ground truth because of inadequate resolution and inappropriate definition of mixed features.
Plurigon: three dimensional visualization and classification of high-dimensionality data
Martin, Bronwen; Chen, Hongyu; Daimon, Caitlin M.; Chadwick, Wayne; Siddiqui, Sana; Maudsley, Stuart
2013-01-01
High-dimensionality data is rapidly becoming the norm for biomedical sciences and many other analytical disciplines. Not only is the collection and processing time for such data becoming problematic, but it has become increasingly difficult to form a comprehensive appreciation of high-dimensionality data. Though data analysis methods for coping with multivariate data are well-documented in technical fields such as computer science, little effort is currently being expended to condense data vectors that exist beyond the realm of physical space into an easily interpretable and aesthetic form. To address this important need, we have developed Plurigon, a data visualization and classification tool for the integration of high-dimensionality visualization algorithms with a user-friendly, interactive graphical interface. Unlike existing data visualization methods, which are focused on an ensemble of data points, Plurigon places a strong emphasis upon the visualization of a single data point and its determining characteristics. Multivariate data vectors are represented in the form of a deformed sphere with a distinct topology of hills, valleys, plateaus, peaks, and crevices. The gestalt structure of the resultant Plurigon object generates an easily-appreciable model. User interaction with the Plurigon is extensive; zoom, rotation, axial and vector display, feature extraction, and anaglyph stereoscopy are currently supported. With Plurigon and its ability to analyze high-complexity data, we hope to see a unification of biomedical and computational sciences as well as practical applications in a wide array of scientific disciplines. Increased accessibility to the analysis of high-dimensionality data may increase the number of new discoveries and breakthroughs, ranging from drug screening to disease diagnosis to medical literature mining. PMID:23885241
Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field.
Wójcikowski, Maciej; Zielenkiewicz, Piotr; Siedlecki, Pawel
2015-01-01
There has been huge progress in the open cheminformatics field in both methods and software development. Unfortunately, there has been little effort to unite those methods and software into one package. We here describe the Open Drug Discovery Toolkit (ODDT), which aims to fulfill the need for comprehensive and open source drug discovery software. The Open Drug Discovery Toolkit was developed as a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. ODDT reimplements many state-of-the-art methods, such as machine learning scoring functions (RF-Score and NNScore) and wraps other external software to ease the process of developing CADD pipelines. ODDT is an out-of-the-box solution designed to be easily customizable and extensible. Therefore, users are strongly encouraged to extend it and develop new methods. We here present three use cases for ODDT in common tasks in computer-aided drug discovery. Open Drug Discovery Toolkit is released on a permissive 3-clause BSD license for both academic and industrial use. ODDT's source code, additional examples and documentation are available on GitHub (https://github.com/oddt/oddt).
Discovering chemistry with an ab initio nanoreactor
Wang, Lee-Ping; Titov, Alexey; McGibbon, Robert; ...
2014-11-02
Chemical understanding is driven by the experimental discovery of new compounds and reactivity, and is supported by theory and computation that provides detailed physical insight. While theoretical and computational studies have generally focused on specific processes or mechanistic hypotheses, recent methodological and computational advances harken the advent of their principal role in discovery. Here we report the development and application of the ab initio nanoreactor – a highly accelerated, first-principles molecular dynamics simulation of chemical reactions that discovers new molecules and mechanisms without preordained reaction coordinates or elementary steps. Using the nanoreactor we show new pathways for glycine synthesis frommore » primitive compounds proposed to exist on the early Earth, providing new insight into the classic Urey-Miller experiment. Ultimately, these results highlight the emergence of theoretical and computational chemistry as a tool for discovery in addition to its traditional role of interpreting experimental findings.« less
Discovering chemistry with an ab initio nanoreactor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Lee-Ping; Titov, Alexey; McGibbon, Robert
Chemical understanding is driven by the experimental discovery of new compounds and reactivity, and is supported by theory and computation that provides detailed physical insight. While theoretical and computational studies have generally focused on specific processes or mechanistic hypotheses, recent methodological and computational advances harken the advent of their principal role in discovery. Here we report the development and application of the ab initio nanoreactor – a highly accelerated, first-principles molecular dynamics simulation of chemical reactions that discovers new molecules and mechanisms without preordained reaction coordinates or elementary steps. Using the nanoreactor we show new pathways for glycine synthesis frommore » primitive compounds proposed to exist on the early Earth, providing new insight into the classic Urey-Miller experiment. Ultimately, these results highlight the emergence of theoretical and computational chemistry as a tool for discovery in addition to its traditional role of interpreting experimental findings.« less
Scalable metagenomic taxonomy classification using a reference genome database
Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.
2013-01-01
Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782
NASA Astrophysics Data System (ADS)
Zhang, Zhoujian; Liu, Michael C.; Best, William M. J.; Magnier, Eugene A.; Aller, Kimberly M.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Hodapp, K. W.; Kaiser, N.; Kudritzki, R.-P.; Metcalfe, N.; Wainscoat, R. J.; Waters, C.
2018-05-01
We are conducting a proper-motion survey for young brown dwarfs in the Taurus-Auriga molecular cloud based on the Pan-STARRS1 3π Survey. Our search uses multi-band photometry and astrometry to select candidates, and is wider (370 deg2) and deeper (down to ≈3 M Jup) than previous searches. We present here our search methods and spectroscopic follow-up of our high-priority candidates. Since extinction complicates spectral classification, we have developed a new approach using low-resolution (R ≈ 100) near-infrared spectra to quantify reddening-free spectral types, extinctions, and gravity classifications for mid-M to late-L ultracool dwarfs (≲100–3 M Jup in Taurus). We have discovered 25 low-gravity (VL-G) and the first 11 intermediate-gravity (INT-G) substellar (M6–L1) members of Taurus, constituting the largest single increase of Taurus brown dwarfs to date. We have also discovered 1 new Pleiades member and 13 new members of the Perseus OB2 association, including a candidate very wide separation (58 kau) binary. We homogeneously reclassify the spectral types and extinctions of all previously known Taurus brown dwarfs. Altogether our discoveries have thus far increased the substellar census in Taurus by ≈40% and added three more L-type members (≲5–10 M Jup). Most notably, our discoveries reveal an older (>10 Myr) low-mass population in Taurus, in accord with recent studies of the higher-mass stellar members. The mass function appears to differ between the younger and older Taurus populations, possibly due to incompleteness of the older stellar members or different star formation processes.
Villanova, Federica; Di Meglio, Paola; Inokuma, Margaret; Aghaeepour, Nima; Perucha, Esperanza; Mollon, Jennifer; Nomura, Laurel; Hernandez-Fuentes, Maria; Cope, Andrew; Prevost, A Toby; Heck, Susanne; Maino, Vernon; Lord, Graham; Brinkman, Ryan R; Nestle, Frank O
2013-01-01
Discovery of novel immune biomarkers for monitoring of disease prognosis and response to therapy in immune-mediated inflammatory diseases is an important unmet clinical need. Here, we establish a novel framework for immunological biomarker discovery, comparing a conventional (liquid) flow cytometry platform (CFP) and a unique lyoplate-based flow cytometry platform (LFP) in combination with advanced computational data analysis. We demonstrate that LFP had higher sensitivity compared to CFP, with increased detection of cytokines (IFN-γ and IL-10) and activation markers (Foxp3 and CD25). Fluorescent intensity of cells stained with lyophilized antibodies was increased compared to cells stained with liquid antibodies. LFP, using a plate loader, allowed medium-throughput processing of samples with comparable intra- and inter-assay variability between platforms. Automated computational analysis identified novel immunophenotypes that were not detected with manual analysis. Our results establish a new flow cytometry platform for standardized and rapid immunological biomarker discovery with wide application to immune-mediated diseases.
Villanova, Federica; Di Meglio, Paola; Inokuma, Margaret; Aghaeepour, Nima; Perucha, Esperanza; Mollon, Jennifer; Nomura, Laurel; Hernandez-Fuentes, Maria; Cope, Andrew; Prevost, A. Toby; Heck, Susanne; Maino, Vernon; Lord, Graham; Brinkman, Ryan R.; Nestle, Frank O.
2013-01-01
Discovery of novel immune biomarkers for monitoring of disease prognosis and response to therapy in immune-mediated inflammatory diseases is an important unmet clinical need. Here, we establish a novel framework for immunological biomarker discovery, comparing a conventional (liquid) flow cytometry platform (CFP) and a unique lyoplate-based flow cytometry platform (LFP) in combination with advanced computational data analysis. We demonstrate that LFP had higher sensitivity compared to CFP, with increased detection of cytokines (IFN-γ and IL-10) and activation markers (Foxp3 and CD25). Fluorescent intensity of cells stained with lyophilized antibodies was increased compared to cells stained with liquid antibodies. LFP, using a plate loader, allowed medium-throughput processing of samples with comparable intra- and inter-assay variability between platforms. Automated computational analysis identified novel immunophenotypes that were not detected with manual analysis. Our results establish a new flow cytometry platform for standardized and rapid immunological biomarker discovery with wide application to immune-mediated diseases. PMID:23843942
Computational functional genomics-based approaches in analgesic drug discovery and repurposing.
Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn
2018-06-01
Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.
Wauters, Lauri D J; Miguel-Moragas, Joan San; Mommaerts, Maurice Y
2015-11-01
To gain insight into the methodology of different computer-aided design-computer-aided manufacturing (CAD-CAM) applications for the reconstruction of cranio-maxillo-facial (CMF) defects. We reviewed and analyzed the available literature pertaining to CAD-CAM for use in CMF reconstruction. We proposed a classification system of the techniques of implant and cutting, drilling, and/or guiding template design and manufacturing. The system consisted of 4 classes (I-IV). These classes combine techniques used for both the implant and template to most accurately describe the methodology used. Our classification system can be widely applied. It should facilitate communication and immediate understanding of the methodology of CAD-CAM applications for the reconstruction of CMF defects.
ERIC Educational Resources Information Center
Dalgarno, Barney; Kennedy, Gregor; Bennett, Sue
2014-01-01
Discovery-based learning designs incorporating active exploration are common within instructional software. However, researchers have highlighted empirical evidence showing that "pure" discovery learning is of limited value and strategies which reduce complexity and provide guidance to learners are important if potential learning…
A tutorial on the use of ROC analysis for computer-aided diagnostic systems.
Scheipers, Ulrich; Perrey, Christian; Siebers, Stefan; Hansen, Christian; Ermert, Helmut
2005-07-01
The application of the receiver operating characteristic (ROC) curve for computer-aided diagnostic systems is reviewed. A statistical framework is presented and different methods of evaluating the classification performance of computer-aided diagnostic systems, and, in particular, systems for ultrasonic tissue characterization, are derived. Most classifiers that are used today are dependent on a separation threshold, which can be chosen freely in many cases. The separation threshold separates the range of output values of the classification system into different target groups, thus conducting the actual classification process. In the first part of this paper, threshold specific performance measures, e.g., sensitivity and specificity, are presented. In the second part, a threshold-independent performance measure, the area under the ROC curve, is reviewed. Only the use of separation threshold-independent performance measures provides classification results that are overall representative for computer-aided diagnostic systems. The following text was motivated by the lack of a complete and definite discussion of the underlying subject in available textbooks, references and publications. Most manuscripts published so far address the theme of performance evaluation using ROC analysis in a manner too general to be practical for everyday use in the development of computer-aided diagnostic systems. Nowadays, the user of computer-aided diagnostic systems typically handles huge amounts of numerical data, not always distributed normally. Many assumptions made in more or less theoretical works on ROC analysis are no longer valid for real-life data. The paper aims at closing the gap between theoretical works and real-life data. The review provides the interested scientist with information needed to conduct ROC analysis and to integrate algorithms performing ROC analysis into classification systems while understanding the basic principles of classification.
Sandgren, Buster; Crafoord, Joakim; Garellick, Göran; Carlsson, Lars; Weidenhielm, Lars; Olivecrona, Henrik
2013-10-01
Digital radiographic images in the anterior-posterior and lateral view have been gold standard for evaluation of peri-acetabular osteolysis for patients with an uncemented hip replacement. We compared digital radiographic images and computer tomography in detection of peri-acetabular osteolysis and devised a classification system based on computer tomography. Digital radiographs were compared with computer tomography on 206 hips, with a mean follow up 10 years after surgery. The patients had no clinical signs of osteolysis and none were planned for revision surgery. On digital radiographs, 192 cases had no osteolysis and only 14 cases had osteolysis. When using computer tomography there were 184 cases showing small or large osteolysis and only 22 patients had no osteolysis. A classification system for peri-acetabular osteolysis is proposed based on computer tomography that is easy to use on standard follow up evaluation. Copyright © 2013 Elsevier Inc. All rights reserved.
Tartar, A; Akan, A; Kilic, N
2014-01-01
Computer-aided detection systems can help radiologists to detect pulmonary nodules at an early stage. In this paper, a novel Computer-Aided Diagnosis system (CAD) is proposed for the classification of pulmonary nodules as malignant and benign. The proposed CAD system using ensemble learning classifiers, provides an important support to radiologists at the diagnosis process of the disease, achieves high classification performance. The proposed approach with bagging classifier results in 94.7 %, 90.0 % and 77.8 % classification sensitivities for benign, malignant and undetermined classes (89.5 % accuracy), respectively.
Classification of skin cancer images using local binary pattern and SVM classifier
NASA Astrophysics Data System (ADS)
Adjed, Faouzi; Faye, Ibrahima; Ababsa, Fakhreddine; Gardezi, Syed Jamal; Dass, Sarat Chandra
2016-11-01
In this paper, a classification method for melanoma and non-melanoma skin cancer images has been presented using the local binary patterns (LBP). The LBP computes the local texture information from the skin cancer images, which is later used to compute some statistical features that have capability to discriminate the melanoma and non-melanoma skin tissues. Support vector machine (SVM) is applied on the feature matrix for classification into two skin image classes (malignant and benign). The method achieves good classification accuracy of 76.1% with sensitivity of 75.6% and specificity of 76.7%.
Choi, Hyungwon; Kim, Sinae; Fermin, Damian; Tsou, Chih-Chiang; Nesvizhskii, Alexey I
2015-11-03
We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Magrabi, Farah; Ong, Mei-Sing; Runciman, William; Coiera, Enrico
2010-01-01
To analyze patient safety incidents associated with computer use to develop the basis for a classification of problems reported by health professionals. Incidents submitted to a voluntary incident reporting database across one Australian state were retrieved and a subset (25%) was analyzed to identify 'natural categories' for classification. Two coders independently classified the remaining incidents into one or more categories. Free text descriptions were analyzed to identify contributing factors. Where available medical specialty, time of day and consequences were examined. Descriptive statistics; inter-rater reliability. A search of 42,616 incidents from 2003 to 2005 yielded 123 computer related incidents. After removing duplicate and unrelated incidents, 99 incidents describing 117 problems remained. A classification with 32 types of computer use problems was developed. Problems were grouped into information input (31%), transfer (20%), output (20%) and general technical (24%). Overall, 55% of problems were machine related and 45% were attributed to human-computer interaction. Delays in initiating and completing clinical tasks were a major consequence of machine related problems (70%) whereas rework was a major consequence of human-computer interaction problems (78%). While 38% (n=26) of the incidents were reported to have a noticeable consequence but no harm, 34% (n=23) had no noticeable consequence. Only 0.2% of all incidents reported were computer related. Further work is required to expand our classification using incident reports and other sources of information about healthcare IT problems. Evidence based user interface design must focus on the safe entry and retrieval of clinical information and support users in detecting and correcting errors and malfunctions.
Contextual classification of multispectral image data: Approximate algorithm
NASA Technical Reports Server (NTRS)
Tilton, J. C. (Principal Investigator)
1980-01-01
An approximation to a classification algorithm incorporating spatial context information in a general, statistical manner is presented which is computationally less intensive. Classifications that are nearly as accurate are produced.
Ali, Salman; Qaisar, Saad Bin; Saeed, Husnain; Khan, Muhammad Farhan; Naeem, Muhammad; Anpalagan, Alagan
2015-03-25
The synergy of computational and physical network components leading to the Internet of Things, Data and Services has been made feasible by the use of Cyber Physical Systems (CPSs). CPS engineering promises to impact system condition monitoring for a diverse range of fields from healthcare, manufacturing, and transportation to aerospace and warfare. CPS for environment monitoring applications completely transforms human-to-human, human-to-machine and machine-to-machine interactions with the use of Internet Cloud. A recent trend is to gain assistance from mergers between virtual networking and physical actuation to reliably perform all conventional and complex sensing and communication tasks. Oil and gas pipeline monitoring provides a novel example of the benefits of CPS, providing a reliable remote monitoring platform to leverage environment, strategic and economic benefits. In this paper, we evaluate the applications and technical requirements for seamlessly integrating CPS with sensor network plane from a reliability perspective and review the strategies for communicating information between remote monitoring sites and the widely deployed sensor nodes. Related challenges and issues in network architecture design and relevant protocols are also provided with classification. This is supported by a case study on implementing reliable monitoring of oil and gas pipeline installations. Network parameters like node-discovery, node-mobility, data security, link connectivity, data aggregation, information knowledge discovery and quality of service provisioning have been reviewed.
Ali, Salman; Qaisar, Saad Bin; Saeed, Husnain; Farhan Khan, Muhammad; Naeem, Muhammad; Anpalagan, Alagan
2015-01-01
The synergy of computational and physical network components leading to the Internet of Things, Data and Services has been made feasible by the use of Cyber Physical Systems (CPSs). CPS engineering promises to impact system condition monitoring for a diverse range of fields from healthcare, manufacturing, and transportation to aerospace and warfare. CPS for environment monitoring applications completely transforms human-to-human, human-to-machine and machine-to-machine interactions with the use of Internet Cloud. A recent trend is to gain assistance from mergers between virtual networking and physical actuation to reliably perform all conventional and complex sensing and communication tasks. Oil and gas pipeline monitoring provides a novel example of the benefits of CPS, providing a reliable remote monitoring platform to leverage environment, strategic and economic benefits. In this paper, we evaluate the applications and technical requirements for seamlessly integrating CPS with sensor network plane from a reliability perspective and review the strategies for communicating information between remote monitoring sites and the widely deployed sensor nodes. Related challenges and issues in network architecture design and relevant protocols are also provided with classification. This is supported by a case study on implementing reliable monitoring of oil and gas pipeline installations. Network parameters like node-discovery, node-mobility, data security, link connectivity, data aggregation, information knowledge discovery and quality of service provisioning have been reviewed. PMID:25815444
NASA Astrophysics Data System (ADS)
Ciany, Charles M.; Zurawski, William; Kerfoot, Ian
2001-10-01
The performance of Computer Aided Detection/Computer Aided Classification (CAD/CAC) Fusion algorithms on side-scan sonar images was evaluated using data taken at the Navy's's Fleet Battle Exercise-Hotel held in Panama City, Florida, in August 2000. A 2-of-3 binary fusion algorithm is shown to provide robust performance. The algorithm accepts the classification decisions and associated contact locations form three different CAD/CAC algorithms, clusters the contacts based on Euclidian distance, and then declares a valid target when a clustered contact is declared by at least 2 of the 3 individual algorithms. This simple binary fusion provided a 96 percent probability of correct classification at a false alarm rate of 0.14 false alarms per image per side. The performance represented a 3.8:1 reduction in false alarms over the best performing single CAD/CAC algorithm, with no loss in probability of correct classification.
Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment
ERIC Educational Resources Information Center
Cui, Ying; Gierl, Mark J.; Chang, Hua-Hua
2012-01-01
This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by…
Knowledge Discovery as an Aid to Organizational Creativity.
ERIC Educational Resources Information Center
Siau, Keng
2000-01-01
This article presents the concept of knowledge discovery, a process of searching for associations in large volumes of computer data, as an aid to creativity. It then discusses the various techniques in knowledge discovery. Mednick's associative theory of creative thought serves as the theoretical foundation for this research. (Contains…
ERIC Educational Resources Information Center
Scott, William L.; Denton, Ryan E.; Marrs, Kathleen A.; Durrant, Jacob D.; Samaritoni, J. Geno; Abraham, Milata M.; Brown, Stephen P.; Carnahan, Jon M.; Fischer, Lindsey G.; Glos, Courtney E.; Sempsrott, Peter J.; O'Donnell, Martin J.
2015-01-01
The Distributed Drug Discovery (D3) program trains students in three drug discovery disciplines (synthesis, computational analysis, and biological screening) while addressing the important challenge of discovering drug leads for neglected diseases. This article focuses on implementation of the synthesis component in the second-semester…
Scenario Educational Software: Design and Development of Discovery Learning.
ERIC Educational Resources Information Center
Keegan, Mark
This book shows how and why the computer is so well suited to producing discovery learning environments. An examination of the literature outlines four basic modes of instruction: didactic, Socratic, inquiry, and discovery. Research from the fields of education, psychology, and physiology is presented to demonstrate the many strengths of…
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Mid-year report FY17 Q2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Pugmire, David; Rogers, David
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY17.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Pugmire, David; Rogers, David
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem. Mid-year report FY16 Q2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY15 Q4.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
Kumar, Akhil; Tiwari, Ashish; Sharma, Ashok
2018-03-15
Alzheimer disease (AD) is now considered as a multifactorial neurodegenerative disorder and rapidly increasing to an alarming situation and causing higher death rate. One target one ligand hypothesis is not able to provide complete solution of AD due to multifactorial nature of disease and one target one drug seems to fail to provide better treatment against AD. Moreover, current available treatments are limited and most of the upcoming treatments under clinical trials are based on modulating single target. So the current AD drug discovery research shifting towards new approach for better solution that simultaneously modulate more than one targets in the neurodegenerative cascade. This can be achieved by network pharmacology, multi-modal therapies, multifaceted, and/or the more recently proposed term "multi-targeted designed drugs. Drug discovery project is tedious, costly and long term project. Moreover, multi target AD drug discovery added extra challenges such as good binding affinity of ligands for multiple targets, optimal ADME/T properties, no/less off target side effect and crossing of the blood brain barrier. These hurdles may be addressed by insilico methods for efficient solution in less time and cost as computational methods successfully applied to single target drug discovery project. Here we are summarizing some of the most prominent and computationally explored single target against AD and further we discussed successful example of dual or multiple inhibitors for same targets. Moreover we focused on ligand and structure based computational approach to design MTDL against AD. However is not an easy task to balance dual activity in a single molecule but computational approach such as virtual screening docking, QSAR, simulation and free energy are useful in future MTDLs drug discovery alone or in combination with fragment based method. However, rational and logical implementations of computational drug designing methods are capable of assisting AD drug discovery and play an important role in optimizing multi-target drug discovery. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
1984-12-01
52242 Prepared for the AIR FORCE OFFICE OF SCIENTIFIC RESEARCH Under Grant No. AFOSR 82-0322 December 1984 ~ " ’w Unclassified SECURITY CLASSIFICATION4...OF THIS PAGE REPORT DOCUMENTATION PAGE is REPORT SECURITY CLASSIFICATION lb. RESTRICTIVE MARKINGS Unclassified None 20 SECURITY CLASSIFICATION...designer .and computer- are 20 DIiRIBUTION/AVAILABI LIT Y 0P ABSTR4ACT 21 ABSTRACT SECURITY CLASSIFICA1ONr UNCLASSIFIED/UNLIMITED SAME AS APT OTIC USERS
An Automated Slide Classification System at Georgia Tech
ERIC Educational Resources Information Center
LoPresti, Maryellen
1973-01-01
The Georgia Tech Architecture Library slide collection is being revolutionized by adapting the Santa Cruz Slide Classification System. The slide catalog record is being transferred inexpensively to tapes and updated by the computer. Computer programs print out indexes in any of fifteen different sort fields. (Author)
Naïve and Robust: Class-Conditional Independence in Human Classification Learning
ERIC Educational Resources Information Center
Jarecki, Jana B.; Meder, Björn; Nelson, Jonathan D.
2018-01-01
Humans excel in categorization. Yet from a computational standpoint, learning a novel probabilistic classification task involves severe computational challenges. The present paper investigates one way to address these challenges: assuming class-conditional independence of features. This feature independence assumption simplifies the inference…
[The clinical and X-ray classification of osteonecrosis of the low jaw].
Medvedev, Iu A; Basin, E M; Sokolina, I A
2013-01-01
To elaborate a clinical and X-ray classification of osteonecrosis of the low jaw in people with desomorphine or pervitin addiction. Ninety-two patients with drug addiction who had undergone orthopantomography, direct frontal X-ray of the skull, and multislice computed tomography, followed by multiplanar and three-dimensional imaging reconstruction were examined. One hundred thirty four X-ray films and 74 computed tomographic images were analyzed. The authors proposed a clinical and X-ray classification of osteonecrosis of the low jaw in people with desomorphine or pervitin addiction and elaborated recommendations for surgical interventions on the basis of the developed classification. The developed clinical and X-ray classification and recommendations for surgical interventions may be used to treat osteonecroses of various etiology.
Analysis of A Drug Target-based Classification System using Molecular Descriptors.
Lu, Jing; Zhang, Pin; Bi, Yi; Luo, Xiaomin
2016-01-01
Drug-target interaction is an important topic in drug discovery and drug repositioning. KEGG database offers a drug annotation and classification using a target-based classification system. In this study, we gave an investigation on five target-based classes: (I) G protein-coupled receptors; (II) Nuclear receptors; (III) Ion channels; (IV) Enzymes; (V) Pathogens, using molecular descriptors to represent each drug compound. Two popular feature selection methods, maximum relevance minimum redundancy and incremental feature selection, were adopted to extract the important descriptors. Meanwhile, an optimal prediction model based on nearest neighbor algorithm was constructed, which got the best result in identifying drug target-based classes. Finally, some key descriptors were discussed to uncover their important roles in the identification of drug-target classes.
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.
Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne
2018-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Wavelet-based multicomponent denoising on GPU to improve the classification of hyperspectral images
NASA Astrophysics Data System (ADS)
Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco; Mouriño, J. C.
2017-10-01
Supervised classification allows handling a wide range of remote sensing hyperspectral applications. Enhancing the spatial organization of the pixels over the image has proven to be beneficial for the interpretation of the image content, thus increasing the classification accuracy. Denoising in the spatial domain of the image has been shown as a technique that enhances the structures in the image. This paper proposes a multi-component denoising approach in order to increase the classification accuracy when a classification method is applied. It is computed on multicore CPUs and NVIDIA GPUs. The method combines feature extraction based on a 1Ddiscrete wavelet transform (DWT) applied in the spectral dimension followed by an Extended Morphological Profile (EMP) and a classifier (SVM or ELM). The multi-component noise reduction is applied to the EMP just before the classification. The denoising recursively applies a separable 2D DWT after which the number of wavelet coefficients is reduced by using a threshold. Finally, inverse 2D-DWT filters are applied to reconstruct the noise free original component. The computational cost of the classifiers as well as the cost of the whole classification chain is high but it is reduced achieving real-time behavior for some applications through their computation on NVIDIA multi-GPU platforms.
False Discovery Control in Large-Scale Spatial Multiple Testing
Sun, Wenguang; Reich, Brian J.; Cai, T. Tony; Guindani, Michele; Schwartzman, Armin
2014-01-01
Summary This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US. PMID:25642138
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning
Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation. PMID:26681933
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.
Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
Telling Active Learning Pedagogies Apart: From Theory to Practice
ERIC Educational Resources Information Center
Cattaneo, Kelsey Hood
2017-01-01
Designing learning environments to incorporate active learning pedagogies is difficult as definitions are often contested and intertwined. This article seeks to determine whether classification of active learning pedagogies (i.e., project-based, problem-based, inquiry-based, case-based, and discovery-based), through theoretical and practical…
A roadmap for natural product discovery based on large-scale genomics and metabolomics
USDA-ARS?s Scientific Manuscript database
Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic ca...
ERIC Educational Resources Information Center
Khan, Zeenath Reza
2014-01-01
A year after the primary study that tested the impact of introducing blended learning and guided discovery to help teach computer application to business students, this paper looks into the continued success of using guided discovery and blended learning with learning management system in and out of classrooms to enhance student learning.…
Applications of artificial neural network in AIDS research and therapy.
Sardari, S; Sardari, D
2002-01-01
In recent years considerable effort has been devoted to applying pattern recognition techniques to the complex task of data analysis in drug research. Artificial neural networks (ANN) methodology is a modeling method with great ability to adapt to a new situation, or control an unknown system, using data acquired in previous experiments. In this paper, a brief history of ANN and the basic concepts behind the computing, the mathematical and algorithmic formulation of each of the techniques, and their developmental background is presented. Based on the abilities of ANNs in pattern recognition and estimation of system outputs from the known inputs, the neural network can be considered as a tool for molecular data analysis and interpretation. Analysis by neural networks improves the classification accuracy, data quantification and reduces the number of analogues necessary for correct classification of biologically active compounds. Conformational analysis and quantifying the components in mixtures using NMR spectra, aqueous solubility prediction and structure-activity correlation are among the reported applications of ANN as a new modeling method. Ranging from drug design and discovery to structure and dosage form design, the potential pharmaceutical applications of the ANN methodology are significant. In the areas of clinical monitoring, utilization of molecular simulation and design of bioactive structures, ANN would make the study of the status of the health and disease possible and brings their predicted chemotherapeutic response closer to reality.
Debugging Techniques Used by Experienced Programmers to Debug Their Own Code.
1990-09-01
IS. NUMBER OF PAGES code debugging 62 computer programmers 16. PRICE CODE debug programming 17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 119...Davis, and Schultz (1987) also compared experts and novices, but focused on the way a computer program is represented cognitively and how that...of theories in the emerging computer programming domain (Fisher, 1987). In protocol analysis, subjects are asked to talk/think aloud as they solve
Computational Sensing and in vitro Classification of GMOs and Biomolecular Events
2008-12-01
COMPUTATIONAL SENSING AND IN VITRO CLASSIFICATION OF GMOs AND BIOMOLECULAR EVENTS Elebeoba May1∗, Miler T. Lee2†, Patricia Dolan1, Paul Crozier1...modified organisms ( GMOs ) in the pres- ence of non-lethal agents. Using an information and coding- theoretic framework we develop a de novo method for...high through- put screening, distinguishing genetically modified organisms ( GMOs ), molecular computing, differentiating biological mark- ers
Report on Information Retrieval and Library Automation Studies.
ERIC Educational Resources Information Center
Alberta Univ., Edmonton. Dept. of Computing Science.
Short abstracts of works in progress or completed in the Department of Computing Science at the University of Alberta are presented under five major headings. The five categories are: Storage and search techniques for document data bases, Automatic classification, Study of indexing and classification languages through computer manipulation of data…
AstroCV: Astronomy computer vision library
NASA Astrophysics Data System (ADS)
González, Roberto E.; Muñoz, Roberto P.; Hernández, Cristian A.
2018-04-01
AstroCV processes and analyzes big astronomical datasets, and is intended to provide a community repository of high performance Python and C++ algorithms used for image processing and computer vision. The library offers methods for object recognition, segmentation and classification, with emphasis in the automatic detection and classification of galaxies.
Noise tolerant dendritic lattice associative memories
NASA Astrophysics Data System (ADS)
Ritter, Gerhard X.; Schmalz, Mark S.; Hayden, Eric; Tucker, Marc
2011-09-01
Linear classifiers based on computation over the real numbers R (e.g., with operations of addition and multiplication) denoted by (R, +, x), have been represented extensively in the literature of pattern recognition. However, a different approach to pattern classification involves the use of addition, maximum, and minimum operations over the reals in the algebra (R, +, maximum, minimum) These pattern classifiers, based on lattice algebra, have been shown to exhibit superior information storage capacity, fast training and short convergence times, high pattern classification accuracy, and low computational cost. Such attributes are not always found, for example, in classical neural nets based on the linear inner product. In a special type of lattice associative memory (LAM), called a dendritic LAM or DLAM, it is possible to achieve noise-tolerant pattern classification by varying the design of noise or error acceptance bounds. This paper presents theory and algorithmic approaches for the computation of noise-tolerant lattice associative memories (LAMs) under a variety of input constraints. Of particular interest are the classification of nonergodic data in noise regimes with time-varying statistics. DLAMs, which are a specialization of LAMs derived from concepts of biological neural networks, have successfully been applied to pattern classification from hyperspectral remote sensing data, as well as spatial object recognition from digital imagery. The authors' recent research in the development of DLAMs is overviewed, with experimental results that show utility for a wide variety of pattern classification applications. Performance results are presented in terms of measured computational cost, noise tolerance, classification accuracy, and throughput for a variety of input data and noise levels.
Computationally guided discovery of thermoelectric materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorai, Prashun; Stevanović, Vladan; Toberer, Eric S.
The potential for advances in thermoelectric materials, and thus solid-state refrigeration and power generation, is immense. Progress so far has been limited by both the breadth and diversity of the chemical space and the serial nature of experimental work. In this Review, we discuss how recent computational advances are revolutionizing our ability to predict electron and phonon transport and scattering, as well as materials dopability, and we examine efficient approaches to calculating critical transport properties across large chemical spaces. When coupled with experimental feedback, these high-throughput approaches can stimulate the discovery of new classes of thermoelectric materials. Within smaller materialsmore » subsets, computations can guide the optimal chemical and structural tailoring to enhance materials performance and provide insight into the underlying transport physics. Beyond perfect materials, computations can be used for the rational design of structural and chemical modifications (such as defects, interfaces, dopants and alloys) to provide additional control on transport properties to optimize performance. Through computational predictions for both materials searches and design, a new paradigm in thermoelectric materials discovery is emerging.« less
Computational Discovery of Materials Using the Firefly Algorithm
NASA Astrophysics Data System (ADS)
Avendaño-Franco, Guillermo; Romero, Aldo
Our current ability to model physical phenomena accurately, the increase computational power and better algorithms are the driving forces behind the computational discovery and design of novel materials, allowing for virtual characterization before their realization in the laboratory. We present the implementation of a novel firefly algorithm, a population-based algorithm for global optimization for searching the structure/composition space. This novel computation-intensive approach naturally take advantage of concurrency, targeted exploration and still keeping enough diversity. We apply the new method in both periodic and non-periodic structures and we present the implementation challenges and solutions to improve efficiency. The implementation makes use of computational materials databases and network analysis to optimize the search and get insights about the geometric structure of local minima on the energy landscape. The method has been implemented in our software PyChemia, an open-source package for materials discovery. We acknowledge the support of DMREF-NSF 1434897 and the Donors of the American Chemical Society Petroleum Research Fund for partial support of this research under Contract 54075-ND10.
Computationally guided discovery of thermoelectric materials
Gorai, Prashun; Stevanović, Vladan; Toberer, Eric S.
2017-08-22
The potential for advances in thermoelectric materials, and thus solid-state refrigeration and power generation, is immense. Progress so far has been limited by both the breadth and diversity of the chemical space and the serial nature of experimental work. In this Review, we discuss how recent computational advances are revolutionizing our ability to predict electron and phonon transport and scattering, as well as materials dopability, and we examine efficient approaches to calculating critical transport properties across large chemical spaces. When coupled with experimental feedback, these high-throughput approaches can stimulate the discovery of new classes of thermoelectric materials. Within smaller materialsmore » subsets, computations can guide the optimal chemical and structural tailoring to enhance materials performance and provide insight into the underlying transport physics. Beyond perfect materials, computations can be used for the rational design of structural and chemical modifications (such as defects, interfaces, dopants and alloys) to provide additional control on transport properties to optimize performance. Through computational predictions for both materials searches and design, a new paradigm in thermoelectric materials discovery is emerging.« less
Ong, Mei-Sing; Runciman, William; Coiera, Enrico
2010-01-01
Objective To analyze patient safety incidents associated with computer use to develop the basis for a classification of problems reported by health professionals. Design Incidents submitted to a voluntary incident reporting database across one Australian state were retrieved and a subset (25%) was analyzed to identify ‘natural categories’ for classification. Two coders independently classified the remaining incidents into one or more categories. Free text descriptions were analyzed to identify contributing factors. Where available medical specialty, time of day and consequences were examined. Measurements Descriptive statistics; inter-rater reliability. Results A search of 42 616 incidents from 2003 to 2005 yielded 123 computer related incidents. After removing duplicate and unrelated incidents, 99 incidents describing 117 problems remained. A classification with 32 types of computer use problems was developed. Problems were grouped into information input (31%), transfer (20%), output (20%) and general technical (24%). Overall, 55% of problems were machine related and 45% were attributed to human–computer interaction. Delays in initiating and completing clinical tasks were a major consequence of machine related problems (70%) whereas rework was a major consequence of human–computer interaction problems (78%). While 38% (n=26) of the incidents were reported to have a noticeable consequence but no harm, 34% (n=23) had no noticeable consequence. Conclusion Only 0.2% of all incidents reported were computer related. Further work is required to expand our classification using incident reports and other sources of information about healthcare IT problems. Evidence based user interface design must focus on the safe entry and retrieval of clinical information and support users in detecting and correcting errors and malfunctions. PMID:20962128
Pulsar discovery by global volunteer computing.
Knispel, B; Allen, B; Cordes, J M; Deneva, J S; Anderson, D; Aulbert, C; Bhat, N D R; Bock, O; Bogdanov, S; Brazier, A; Camilo, F; Champion, D J; Chatterjee, S; Crawford, F; Demorest, P B; Fehrmann, H; Freire, P C C; Gonzalez, M E; Hammer, D; Hessels, J W T; Jenet, F A; Kasian, L; Kaspi, V M; Kramer, M; Lazarus, P; van Leeuwen, J; Lorimer, D R; Lyne, A G; Machenschalk, B; McLaughlin, M A; Messenger, C; Nice, D J; Papa, M A; Pletsch, H J; Prix, R; Ransom, S M; Siemens, X; Stairs, I H; Stappers, B W; Stovall, K; Venkataraman, A
2010-09-10
Einstein@Home aggregates the computer power of hundreds of thousands of volunteers from 192 countries to mine large data sets. It has now found a 40.8-hertz isolated pulsar in radio survey data from the Arecibo Observatory taken in February 2007. Additional timing observations indicate that this pulsar is likely a disrupted recycled pulsar. PSR J2007+2722's pulse profile is remarkably wide with emission over almost the entire spin period; the pulsar likely has closely aligned magnetic and spin axes. The massive computing power provided by volunteers should enable many more such discoveries.
Warth, A
2015-11-01
Tumor diagnostics are based on histomorphology, immunohistochemistry and molecular pathological analysis of mutations, translocations and amplifications which are of diagnostic, prognostic and/or predictive value. In recent decades only histomorphology was used to classify lung cancer as either small (SCLC) or non-small cell lung cancer (NSCLC), although NSCLC was further subdivided in different entities; however, as no specific therapy options were available classification of specific subtypes was not clinically meaningful. This fundamentally changed with the discovery of specific molecular alterations in adenocarcinoma (ADC), e.g. mutations in KRAS, EGFR and BRAF or translocations of the ALK and ROS1 gene loci, which now form the basis of targeted therapies and have led to a significantly improved patient outcome. The diagnostic, prognostic and predictive value of imaging, morphological, immunohistochemical and molecular characteristics as well as their interaction were systematically assessed in a large cohort with available clinical data including patient survival. Specific and sensitive diagnostic markers and marker panels were defined and diagnostic test algorithms for predictive biomarker assessment were optimized. It was demonstrated that the semi-quantitative assessment of ADC growth patterns is a stage-independent predictor of survival and is reproducibly applicable in the routine setting. Specific histomorphological characteristics correlated with computed tomography (CT) imaging features and thus allowed an improved interdisciplinary classification, especially in the preoperative or palliative setting. Moreover, specific molecular characteristics, for example BRAF mutations and the proliferation index (Ki-67) were identified as clinically relevant prognosticators. Comprehensive clinical, morphological, immunohistochemical and molecular assessment of NSCLCs allow an optimized patient stratification. Respective algorithms now form the backbone of the 2015 lung cancer World Health Organization (WHO) classification.
Materials discovery guided by data-driven insights
NASA Astrophysics Data System (ADS)
Klintenberg, Mattias
As the computational power continues to grow systematic computational exploration has become an important tool for materials discovery. In this presentation the Electronic Structure Project (ESP/ELSA) will be discussed and a number of examples presented that show some of the capabilities of a data-driven methodology for guiding materials discovery. These examples include topological insulators, detector materials and 2D materials. ESP/ELSA is an initiative that dates back to 2001 and today contain many tens of thousands of materials that have been investigated using a robust and high accuracy electronic structure method (all-electron FP-LMTO) thus providing basic materials first-principles data for most inorganic compounds that have been structurally characterized. The web-site containing the ESP/ELSA data has as of today been accessed from more than 4,000 unique computers from all around the world.
Successful applications of computer aided drug discovery: moving drugs from concept to the clinic.
Talele, Tanaji T; Khedkar, Santosh A; Rigby, Alan C
2010-01-01
Drug discovery and development is an interdisciplinary, expensive and time-consuming process. Scientific advancements during the past two decades have changed the way pharmaceutical research generate novel bioactive molecules. Advances in computational techniques and in parallel hardware support have enabled in silico methods, and in particular structure-based drug design method, to speed up new target selection through the identification of hits to the optimization of lead compounds in the drug discovery process. This review is focused on the clinical status of experimental drugs that were discovered and/or optimized using computer-aided drug design. We have provided a historical account detailing the development of 12 small molecules (Captopril, Dorzolamide, Saquinavir, Zanamivir, Oseltamivir, Aliskiren, Boceprevir, Nolatrexed, TMI-005, LY-517717, Rupintrivir and NVP-AUY922) that are in clinical trial or have become approved for therapeutic use.
Discovery and Classification of Nova in M81 : P60-M81-090213
NASA Astrophysics Data System (ADS)
Kasliwal, M. M.; Cenko, S. B.; Ofek, E. O.; Quimby, R.; Rau, A.; Kulkarni, S. R.
2009-02-01
On UT 2009 Feb 13.404, P60-FasTING (Palomar 60-inch Fast Transients In Nearby Galaxies) discovered a nova in M81 at RA(J2000) = 09:55:35.96 and DEC(J2000) = +69:01:51.0, offset from the nucleus by 15"E, 124"S. P60- M81-090213 had a brightness of g=20.1 +/- 0.1 at discovery (photometric calibration wrt SDSS catalog) and faded to g=21.5 on Feb 20.237. It is not detected, with 3-sigma upper limits of g > 21.9 on Feb 3.262.
Discovery and Classification of Nova in M81 : P60-M81-080925
NASA Astrophysics Data System (ADS)
Kasliwal, M. M.; Quimby, R.; Cenko, S. B.; Rau, A.; Ofek, E. O.; Kulkarni, S. R.
2008-10-01
On UT 2008 Sep 25.49, P60-FasTING (Palomar 60-inch Fast Transients In Nearby Galaxies) discovered a nova in M81 at RA(J2000) = 09:55:59.35 DEC(J2000) = +69:05:57.1, offset from the nucleus by 2.35'E, 2.03'N. P60-M81-080925 had a brightness of g=19.5 +/- 0.1 at discovery (photometric calibration wrt SDSS catalog) and faded to g=20.8 on Oct 4.49. It is not detected, with 3-sigma upper limits of g > 21.5 on Jul 9.19.
NASA Technical Reports Server (NTRS)
Huck, F. O.; Davis, R. E.; Fales, C. L.; Aherron, R. M.
1982-01-01
A computational model of the deterministic and stochastic processes involved in remote sensing is used to study spectral feature identification techniques for real-time onboard processing of data acquired with advanced earth-resources sensors. Preliminary results indicate that: Narrow spectral responses are advantageous; signal normalization improves mean-square distance (MSD) classification accuracy but tends to degrade maximum-likelihood (MLH) classification accuracy; and MSD classification of normalized signals performs better than the computationally more complex MLH classification when imaging conditions change appreciably from those conditions during which reference data were acquired. The results also indicate that autonomous categorization of TM signals into vegetation, bare land, water, snow and clouds can be accomplished with adequate reliability for many applications over a reasonably wide range of imaging conditions. However, further analysis is required to develop computationally efficient boundary approximation algorithms for such categorization.
Dewey: How to Make It Work for You
ERIC Educational Resources Information Center
Panzer, Michael
2013-01-01
As knowledge brokers, librarians are living in interesting times for themselves and libraries. It causes them to wonder sometimes if the traditional tools like the Dewey Decimal Classification (DDC) system can cope with the onslaught of information. The categories provided do not always seem adequate for the knowledge-discovery habits of…
Text Mining in Organizational Research
Kobayashi, Vladimer B.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies. PMID:29881248
Text Mining in Organizational Research.
Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N
2018-07-01
Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.
NASA Astrophysics Data System (ADS)
Robson, Barry; Li, Jin; Dettinger, Richard; Peters, Amanda; Boyer, Stephen K.
2011-05-01
A patent data base of 6.7 million compounds generated by a very high performance computer (Blue Gene) requires new techniques for exploitation when extensive use of chemical similarity is involved. Such exploitation includes the taxonomic classification of chemical themes, and data mining to assess mutual information between themes and companies. Importantly, we also launch candidates that evolve by "natural selection" as failure of partial match against the patent data base and their ability to bind to the protein target appropriately, by simulation on Blue Gene. An unusual feature of our method is that algorithms and workflows rely on dynamic interaction between match-and-edit instructions, which in practice are regular expressions. Similarity testing by these uses SMILES strings and, less frequently, graph or connectivity representations. Examining how this performs in high throughput, we note that chemical similarity and novelty are human concepts that largely have meaning by utility in specific contexts. For some purposes, mutual information involving chemical themes might be a better concept.
A note from history: Landmarks in history of cancer, Part 6.
Hajdu, Steven I; Vadmal, Manjunath
2013-12-01
In the 3 decades from 1940 to 1970, the United States became the nucleus for research, diagnosis, and treatment of cancer. The discovery of anticancer drugs, and the clinical demonstration that chemotherapy and radiation can cure cancer and have the ability to prevent recurrence of cancer, were incontrovertibly the most remarkable groundbreaking events. Consequently, the trend of less surgery and more multimodality therapy began. The introduction of radioautography, mammography, ultrasonography, computed tomography, Papanicolaou smear, and other novel laboratory tests furthered early detection of cancer and refined accurate diagnosis. The unequivocal linking of lung cancer to cigarette smoking made medical history. The delineation of the potential role of oncogenes adduced new thoughts about oncogenesis and cancer prevention, and pathologists finalized the classification and nosology of tumors. Finally, it is worth noting that although more advances were made in the detection, diagnosis, and treatment of cancers than any other period in history, the overall mortality rate of patients with cancer remained high and unchanged. © 2013 American Cancer Society.
Shaikh, Faiq; Franc, Benjamin; Allen, Erastus; Sala, Evis; Awan, Omer; Hendrata, Kenneth; Halabi, Safwan; Mohiuddin, Sohaib; Malik, Sana; Hadley, Dexter; Shrestha, Rasu
2018-03-01
Enterprise imaging has channeled various technological innovations to the field of clinical radiology, ranging from advanced imaging equipment and postacquisition iterative reconstruction tools to image analysis and computer-aided detection tools. More recently, the advancement in the field of quantitative image analysis coupled with machine learning-based data analytics, classification, and integration has ushered in the era of radiomics, a paradigm shift that holds tremendous potential in clinical decision support as well as drug discovery. However, there are important issues to consider to incorporate radiomics into a clinically applicable system and a commercially viable solution. In this two-part series, we offer insights into the development of the translational pipeline for radiomics from methodology to clinical implementation (Part 1) and from that point to enterprise development (Part 2). In Part 2 of this two-part series, we study the components of the strategy pipeline, from clinical implementation to building enterprise solutions. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Sorich, Michael J; McKinnon, Ross A; Miners, John O; Winkler, David A; Smith, Paul A
2004-10-07
This study aimed to evaluate in silico models based on quantum chemical (QC) descriptors derived using the electronegativity equalization method (EEM) and to assess the use of QC properties to predict chemical metabolism by human UDP-glucuronosyltransferase (UGT) isoforms. Various EEM-derived QC molecular descriptors were calculated for known UGT substrates and nonsubstrates. Classification models were developed using support vector machine and partial least squares discriminant analysis. In general, the most predictive models were generated with the support vector machine. Combining QC and 2D descriptors (from previous work) using a consensus approach resulted in a statistically significant improvement in predictivity (to 84%) over both the QC and 2D models and the other methods of combining the descriptors. EEM-derived QC descriptors were shown to be both highly predictive and computationally efficient. It is likely that EEM-derived QC properties will be generally useful for predicting ADMET and physicochemical properties during drug discovery.
The importance of employing computational resources for the automation of drug discovery.
Rosales-Hernández, Martha Cecilia; Correa-Basurto, José
2015-03-01
The application of computational tools to drug discovery helps researchers to design and evaluate new drugs swiftly with a reduce economic resources. To discover new potential drugs, computational chemistry incorporates automatization for obtaining biological data such as adsorption, distribution, metabolism, excretion and toxicity (ADMET), as well as drug mechanisms of action. This editorial looks at examples of these computational tools, including docking, molecular dynamics simulation, virtual screening, quantum chemistry, quantitative structural activity relationship, principal component analysis and drug screening workflow systems. The authors then provide their perspectives on the importance of these techniques for drug discovery. Computational tools help researchers to design and discover new drugs for the treatment of several human diseases without side effects, thus allowing for the evaluation of millions of compounds with a reduced cost in both time and economic resources. The problem is that operating each program is difficult; one is required to use several programs and understand each of the properties being tested. In the future, it is possible that a single computer and software program will be capable of evaluating the complete properties (mechanisms of action and ADMET properties) of ligands. It is also possible that after submitting one target, this computer-software will be capable of suggesting potential compounds along with ways to synthesize them, and presenting biological models for testing.
Computer-aided drug design at Boehringer Ingelheim
NASA Astrophysics Data System (ADS)
Muegge, Ingo; Bergner, Andreas; Kriegl, Jan M.
2017-03-01
Computer-Aided Drug Design (CADD) is an integral part of the drug discovery endeavor at Boehringer Ingelheim (BI). CADD contributes to the evaluation of new therapeutic concepts, identifies small molecule starting points for drug discovery, and develops strategies for optimizing hit and lead compounds. The CADD scientists at BI benefit from the global use and development of both software platforms and computational services. A number of computational techniques developed in-house have significantly changed the way early drug discovery is carried out at BI. In particular, virtual screening in vast chemical spaces, which can be accessed by combinatorial chemistry, has added a new option for the identification of hits in many projects. Recently, a new framework has been implemented allowing fast, interactive predictions of relevant on and off target endpoints and other optimization parameters. In addition to the introduction of this new framework at BI, CADD has been focusing on the enablement of medicinal chemists to independently perform an increasing amount of molecular modeling and design work. This is made possible through the deployment of MOE as a global modeling platform, allowing computational and medicinal chemists to freely share ideas and modeling results. Furthermore, a central communication layer called the computational chemistry framework provides broad access to predictive models and other computational services.
Using the iPlant collaborative discovery environment.
Oliver, Shannon L; Lenards, Andrew J; Barthelson, Roger A; Merchant, Nirav; McKay, Sheldon J
2013-06-01
The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples. © 2013 by John Wiley & Sons, Inc.
Computational Tools for Allosteric Drug Discovery: Site Identification and Focus Library Design.
Huang, Wenkang; Nussinov, Ruth; Zhang, Jian
2017-01-01
Allostery is an intrinsic phenomenon of biological macromolecules involving regulation and/or signal transduction induced by a ligand binding to an allosteric site distinct from a molecule's active site. Allosteric drugs are currently receiving increased attention in drug discovery because drugs that target allosteric sites can provide important advantages over the corresponding orthosteric drugs including specific subtype selectivity within receptor families. Consequently, targeting allosteric sites, instead of orthosteric sites, can reduce drug-related side effects and toxicity. On the down side, allosteric drug discovery can be more challenging than traditional orthosteric drug discovery due to difficulties associated with determining the locations of allosteric sites and designing drugs based on these sites and the need for the allosteric effects to propagate through the structure, reach the ligand binding site and elicit a conformational change. In this study, we present computational tools ranging from the identification of potential allosteric sites to the design of "allosteric-like" modulator libraries. These tools may be particularly useful for allosteric drug discovery.
ERIC Educational Resources Information Center
Gnambs, Timo; Batinic, Bernad
2011-01-01
Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…
Using Computational Text Classification for Qualitative Research and Evaluation in Extension
ERIC Educational Resources Information Center
Smith, Justin G.; Tissing, Reid
2018-01-01
This article introduces a process for computational text classification that can be used in a variety of qualitative research and evaluation settings. The process leverages supervised machine learning based on an implementation of a multinomial Bayesian classifier. Applied to a community of inquiry framework, the algorithm was used to identify…
A Survey on the Feasibility of Sound Classification on Wireless Sensor Nodes
Salomons, Etto L.; Havinga, Paul J. M.
2015-01-01
Wireless sensor networks are suitable to gain context awareness for indoor environments. As sound waves form a rich source of context information, equipping the nodes with microphones can be of great benefit. The algorithms to extract features from sound waves are often highly computationally intensive. This can be problematic as wireless nodes are usually restricted in resources. In order to be able to make a proper decision about which features to use, we survey how sound is used in the literature for global sound classification, age and gender classification, emotion recognition, person verification and identification and indoor and outdoor environmental sound classification. The results of the surveyed algorithms are compared with respect to accuracy and computational load. The accuracies are taken from the surveyed papers; the computational loads are determined by benchmarking the algorithms on an actual sensor node. We conclude that for indoor context awareness, the low-cost algorithms for feature extraction perform equally well as the more computationally-intensive variants. As the feature extraction still requires a large amount of processing time, we present four possible strategies to deal with this problem. PMID:25822142
CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.
Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel
2016-03-01
Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. dmb.iasi.cnr.it/camur.php emanuel@iasi.cnr.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Deep learning for EEG-Based preference classification
NASA Astrophysics Data System (ADS)
Teo, Jason; Hou, Chew Lin; Mountstephens, James
2017-10-01
Electroencephalogram (EEG)-based emotion classification is rapidly becoming one of the most intensely studied areas of brain-computer interfacing (BCI). The ability to passively identify yet accurately correlate brainwaves with our immediate emotions opens up truly meaningful and previously unattainable human-computer interactions such as in forensic neuroscience, rehabilitative medicine, affective entertainment and neuro-marketing. One particularly useful yet rarely explored areas of EEG-based emotion classification is preference recognition [1], which is simply the detection of like versus dislike. Within the limited investigations into preference classification, all reported studies were based on musically-induced stimuli except for a single study which used 2D images. The main objective of this study is to apply deep learning, which has been shown to produce state-of-the-art results in diverse hard problems such as in computer vision, natural language processing and audio recognition, to 3D object preference classification over a larger group of test subjects. A cohort of 16 users was shown 60 bracelet-like objects as rotating visual stimuli on a computer display while their preferences and EEGs were recorded. After training a variety of machine learning approaches which included deep neural networks, we then attempted to classify the users' preferences for the 3D visual stimuli based on their EEGs. Here, we show that that deep learning outperforms a variety of other machine learning classifiers for this EEG-based preference classification task particularly in a highly challenging dataset with large inter- and intra-subject variability.
Cordeiro, Fernanda B; Ferreira, Christina R; Sobreira, Tiago Jose P; Yannell, Karen E; Jarmusch, Alan K; Cedenho, Agnaldo P; Lo Turco, Edson G; Cooks, R Graham
2017-09-15
We describe multiple reaction monitoring (MRM)-profiling, which provides accelerated discovery of discriminating molecular features, and its application to human polycystic ovary syndrome (PCOS) diagnosis. The discovery phase of the MRM-profiling seeks molecular features based on some prior knowledge of the chemical functional groups likely to be present in the sample. It does this through use of a limited number of pre-chosen and chemically specific neutral loss and/or precursor ion MS/MS scans. The output of the discovery phase is a set of precursor/product transitions. In the screening phase these MRM transitions are used to interrogate multiple samples (hence the name MRM-profiling). MRM-profiling was applied to follicular fluid samples of 22 controls and 29 clinically diagnosed PCOS patients. Representative samples were delivered by flow injection to a triple quadrupole mass spectrometer set to perform a number of pre-chosen and chemically specific neutral loss and/or precursor ion MS/MS scans. The output of this discovery phase was a set of 1012 precursor/product transitions. In the screening phase each individual sample was interrogated for these MRM transitions. Principal component analysis (PCA) and receiver operating characteristic (ROC) curves were used for statistical analysis. To evaluate the method's performance, half the samples were used to build a classification model (testing set) and half were blinded (validation set). Twenty transitions were used for the classification of the blind samples, most of them (N = 19) showed lower abundances in the PCOS group and corresponded to phosphatidylethanolamine (PE) and phosphatidylserine (PS) lipids. Agreement of 73% with clinical diagnosis was found when classifying the 26 blind samples. MRM-profiling is a supervised method characterized by its simplicity, speed and the absence of chromatographic separation. It can be used to rapidly isolate discriminating molecules in healthy/disease conditions by tailored screening of signals associated with hundreds of molecules in complex samples. Copyright © 2017 John Wiley & Sons, Ltd.
Multi-Agent Information Classification Using Dynamic Acquaintance Lists.
ERIC Educational Resources Information Center
Mukhopadhyay, Snehasis; Peng, Shengquan; Raje, Rajeev; Palakal, Mathew; Mostafa, Javed
2003-01-01
Discussion of automated information services focuses on information classification and collaborative agents, i.e. intelligent computer programs. Highlights include multi-agent systems; distributed artificial intelligence; thesauri; document representation and classification; agent modeling; acquaintances, or remote agents discovered through…
ERIC Educational Resources Information Center
Liu, Chen-Chung; Don, Ping-Hsing; Chung, Chen-Wei; Lin, Shao-Jun; Chen, Gwo-Dong; Liu, Baw-Jhiune
2010-01-01
While Web discovery is usually undertaken as a solitary activity, Web co-discovery may transform Web learning activities from the isolated individual search process into interactive and collaborative knowledge exploration. Recent studies have proposed Web co-search environments on a single computer, supported by multiple one-to-one technologies.…
Carroll, Kristen L; Murray, Kathleen A; MacLeod, Lynne M; Hennessey, Theresa A; Woiczik, Marcella R; Roach, James W
2011-06-01
Numerous studies underscore the poor intraobserver and interobserver reliability of both the center edge angle (CEA) and the Severin classification using plain film measurements. In this study, experienced observers applied a computer-assisted measurement program to determine the CEA in digital pelvic radiographs of adults who had been previously treated for dysplasia of the hip (DDH). Using a teaching aid/algorithm of the Severin classification, the observers then assigned a Severin rating to these hips. Intraobserver and interobserver errors were then calculated on both the CEA measurements and the Severin classifications. Four pediatric orthopaedic surgeons and 1 pediatric radiologist calculated the CEAs using the OrthoView TM planning system and then determined the Severin classification on 41 blinded digital pelvic radiographs. The radiographs were evaluated by each examiner twice, with evaluations separated by 2 months. All examiners reviewed a Severin classification algorithm before making their Severin assignments. The intraobserver and interobserver reliability for both the CEA and the Severin classification were calculated using the interclass correlation coefficients and Cohen and Fleiss κ scores, respectively. The intraobserver and interobserver reliability for CEA measurement was moderate to almost perfect. When we separated the Severin classification into 3 clinically relevant groups of good (Severin I and II), dysplastic (Severin III), and poor (Severin IV and above), our interobserver reliability neared almost perfect. The Severin classification is an extremely useful and oft-used radiographic measure for the success of DDH treatment. Our research found digital radiography, computer-aided measurement tools, the use of a Severin algorithm, and separating the Severin classification into 3 clinically relevant groups significantly increased the intraobserver and interobserver reliability of both the CEA and Severin classification. This finding will assist future studies using the CEA and Severin classification in the radiographic assessment of DDH treatment outcomes.
Bai, Ou; Lin, Peter; Vorbach, Sherry; Li, Jiang; Furlani, Steve; Hallett, Mark
2007-12-01
To explore effective combinations of computational methods for the prediction of movement intention preceding the production of self-paced right and left hand movements from single trial scalp electroencephalogram (EEG). Twelve naïve subjects performed self-paced movements consisting of three key strokes with either hand. EEG was recorded from 128 channels. The exploration was performed offline on single trial EEG data. We proposed that a successful computational procedure for classification would consist of spatial filtering, temporal filtering, feature selection, and pattern classification. A systematic investigation was performed with combinations of spatial filtering using principal component analysis (PCA), independent component analysis (ICA), common spatial patterns analysis (CSP), and surface Laplacian derivation (SLD); temporal filtering using power spectral density estimation (PSD) and discrete wavelet transform (DWT); pattern classification using linear Mahalanobis distance classifier (LMD), quadratic Mahalanobis distance classifier (QMD), Bayesian classifier (BSC), multi-layer perceptron neural network (MLP), probabilistic neural network (PNN), and support vector machine (SVM). A robust multivariate feature selection strategy using a genetic algorithm was employed. The combinations of spatial filtering using ICA and SLD, temporal filtering using PSD and DWT, and classification methods using LMD, QMD, BSC and SVM provided higher performance than those of other combinations. Utilizing one of the better combinations of ICA, PSD and SVM, the discrimination accuracy was as high as 75%. Further feature analysis showed that beta band EEG activity of the channels over right sensorimotor cortex was most appropriate for discrimination of right and left hand movement intention. Effective combinations of computational methods provide possible classification of human movement intention from single trial EEG. Such a method could be the basis for a potential brain-computer interface based on human natural movement, which might reduce the requirement of long-term training. Effective combinations of computational methods can classify human movement intention from single trial EEG with reasonable accuracy.
Zoonotic Hepatitis E Virus: Classification, Animal Reservoirs and Transmission Routes
Doceul, Virginie; Bagdassarian, Eugénie; Demange, Antonin; Pavio, Nicole
2016-01-01
During the past ten years, several new hepatitis E viruses (HEVs) have been identified in various animal species. In parallel, the number of reports of autochthonous hepatitis E in Western countries has increased as well, raising the question of what role these possible animal reservoirs play in human infections. The aim of this review is to present the recent discoveries of animal HEVs and their classification within the Hepeviridae family, their zoonotic and species barrier crossing potential, and possible use as models to study hepatitis E pathogenesis. Lastly, this review describes the transmission pathways identified from animal sources. PMID:27706110
The use of Landsat data to inventory cotton and soybean acreage in North Alabama
NASA Technical Reports Server (NTRS)
Downs, S. W., Jr.; Faust, N. L.
1980-01-01
This study was performed to determine if Landsat data could be used to improve the accuracy of the estimation of cotton acreage. A linear classification algorithm and a maximum likelihood algorithm were used for computer classification of the area, and the classification was compared with ground truth. The classification accuracy for some fields was greater than 90 percent; however, the overall accuracy was 71 percent for cotton and 56 percent for soybeans. The results of this research indicate that computer analysis of Landsat data has potential for improving upon the methods presently being used to determine cotton acreage; however, additional experiments and refinements are needed before the method can be used operationally.
Parker, Michael T.
2016-01-01
Recent advances in sequencing technologies have opened the door for the classification of the human virome. While taxonomic classification can be applied to the viruses identified in such studies, this gives no information as to the type of interaction the virus has with the host. As follow-up studies are performed to address these questions, the description of these virus-host interactions would be greatly enriched by applying a standard set of definitions that typify them. This paper describes a framework with which all members of the human virome can be classified based on principles of ecology. The scaffold not only enables categorization of the human virome, but can also inform research aimed at identifying novel virus-host interactions. PMID:27698618
A fuzzy neural network for intelligent data processing
NASA Astrophysics Data System (ADS)
Xie, Wei; Chu, Feng; Wang, Lipo; Lim, Eng Thiam
2005-03-01
In this paper, we describe an incrementally generated fuzzy neural network (FNN) for intelligent data processing. This FNN combines the features of initial fuzzy model self-generation, fast input selection, partition validation, parameter optimization and rule-base simplification. A small FNN is created from scratch -- there is no need to specify the initial network architecture, initial membership functions, or initial weights. Fuzzy IF-THEN rules are constantly combined and pruned to minimize the size of the network while maintaining accuracy; irrelevant inputs are detected and deleted, and membership functions and network weights are trained with a gradient descent algorithm, i.e., error backpropagation. Experimental studies on synthesized data sets demonstrate that the proposed Fuzzy Neural Network is able to achieve accuracy comparable to or higher than both a feedforward crisp neural network, i.e., NeuroRule, and a decision tree, i.e., C4.5, with more compact rule bases for most of the data sets used in our experiments. The FNN has achieved outstanding results for cancer classification based on microarray data. The excellent classification result for Small Round Blue Cell Tumors (SRBCTs) data set is shown. Compared with other published methods, we have used a much fewer number of genes for perfect classification, which will help researchers directly focus their attention on some specific genes and may lead to discovery of deep reasons of the development of cancers and discovery of drugs.
Computational Identification of Novel Genes: Current and Future Perspectives.
Klasberg, Steffen; Bitard-Feildel, Tristan; Mallet, Ludovic
2016-01-01
While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Paradella, W. R.; Vitorello, I.
1982-01-01
Several aspects of computer-assisted analysis techniques for image enhancement and thematic classification by which LANDSAT MSS imagery may be treated quantitatively are explained. On geological applications, computer processing of digital data allows, possibly, the fullest use of LANDSAT data, by displaying enhanced and corrected data for visual analysis and by evaluating and assigning each spectral pixel information to a given class.
An algorithm of discovering signatures from DNA databases on a computer cluster.
Lee, Hsiao Ping; Sheu, Tzu-Fang
2014-10-05
Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Tools, techniques, organisation and culture of the CADD group at Sygnature Discovery.
St-Gallay, Steve A; Sambrook-Smith, Colin P
2017-03-01
Computer-aided drug design encompasses a wide variety of tools and techniques, and can be implemented with a range of organisational structures and focus in different organisations. Here we outline the computational chemistry skills within Sygnature Discovery, along with the software and hardware at our disposal, and briefly discuss the methods that are not employed and why. The goal of the group is to provide support for design and analysis in order to improve the quality of compounds synthesised and reduce the timelines of drug discovery projects, and we reveal how this is achieved at Sygnature. Impact on medicinal chemistry is vital to demonstrating the value of computational chemistry, and we discuss the approaches taken to influence the list of compounds for synthesis, and how we recognise success. Finally we touch on some of the areas being developed within the team in order to provide further value to the projects and clients.
Tools, techniques, organisation and culture of the CADD group at Sygnature Discovery
NASA Astrophysics Data System (ADS)
St-Gallay, Steve A.; Sambrook-Smith, Colin P.
2017-03-01
Computer-aided drug design encompasses a wide variety of tools and techniques, and can be implemented with a range of organisational structures and focus in different organisations. Here we outline the computational chemistry skills within Sygnature Discovery, along with the software and hardware at our disposal, and briefly discuss the methods that are not employed and why. The goal of the group is to provide support for design and analysis in order to improve the quality of compounds synthesised and reduce the timelines of drug discovery projects, and we reveal how this is achieved at Sygnature. Impact on medicinal chemistry is vital to demonstrating the value of computational chemistry, and we discuss the approaches taken to influence the list of compounds for synthesis, and how we recognise success. Finally we touch on some of the areas being developed within the team in order to provide further value to the projects and clients.
The role of fragment-based and computational methods in polypharmacology.
Bottegoni, Giovanni; Favia, Angelo D; Recanatini, Maurizio; Cavalli, Andrea
2012-01-01
Polypharmacology-based strategies are gaining increased attention as a novel approach to obtaining potentially innovative medicines for multifactorial diseases. However, some within the pharmaceutical community have resisted these strategies because they can be resource-hungry in the early stages of the drug discovery process. Here, we report on fragment-based and computational methods that might accelerate and optimize the discovery of multitarget drugs. In particular, we illustrate that fragment-based approaches can be particularly suited for polypharmacology, owing to the inherent promiscuous nature of fragments. In parallel, we explain how computer-assisted protocols can provide invaluable insights into how to unveil compounds theoretically able to bind to more than one protein. Furthermore, several pragmatic aspects related to the use of these approaches are covered, thus offering the reader practical insights on multitarget-oriented drug discovery projects. Copyright © 2011 Elsevier Ltd. All rights reserved.
On the use of interaction error potentials for adaptive brain computer interfaces.
Llera, A; van Gerven, M A J; Gómez, V; Jensen, O; Kappen, H J
2011-12-01
We propose an adaptive classification method for the Brain Computer Interfaces (BCI) which uses Interaction Error Potentials (IErrPs) as a reinforcement signal and adapts the classifier parameters when an error is detected. We analyze the quality of the proposed approach in relation to the misclassification of the IErrPs. In addition we compare static versus adaptive classification performance using artificial and MEG data. We show that the proposed adaptive framework significantly improves the static classification methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
Bond-based linear indices in QSAR: computational discovery of novel anti-trichomonal compounds
NASA Astrophysics Data System (ADS)
Marrero-Ponce, Yovani; Meneses-Marcel, Alfredo; Rivera-Borroto, Oscar M.; García-Domenech, Ramón; De Julián-Ortiz, Jesus Vicente; Montero, Alina; Escario, José Antonio; Barrio, Alicia Gómez; Pereira, David Montero; Nogal, Juan José; Grau, Ricardo; Torrens, Francisco; Vogel, Christian; Arán, Vicente J.
2008-08-01
Trichomonas vaginalis ( Tv) is the causative agent of the most common, non-viral, sexually transmitted disease in women and men worldwide. Since 1959, metronidazole (MTZ) has been the drug of choice in the systemic treatment of trichomoniasis. However, resistance to MTZ in some patients and the great cost associated with the development of new trichomonacidals make necessary the development of computational methods that shorten the drug discovery pipeline. Toward this end, bond-based linear indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis were used to discover novel trichomonacidal chemicals. The obtained models, using non-stochastic and stochastic indices, are able to classify correctly 89.01% (87.50%) and 82.42% (84.38%) of the chemicals in the training (test) sets, respectively. These results validate the models for their use in the ligand-based virtual screening. In addition, they show large Matthews' correlation coefficients ( C) of 0.78 (0.71) and 0.65 (0.65) for the training (test) sets, correspondingly. The result of predictions on the 10% full-out cross-validation test also evidences the robustness of the obtained models. Later, both models are applied to the virtual screening of 12 compounds already proved against Tv. As a result, they correctly classify 10 out of 12 (83.33%) and 9 out of 12 (75.00%) of the chemicals, respectively; which is the most important criterion for validating the models. Besides, these classification functions are applied to a library of seven chemicals in order to find novel antitrichomonal agents. These compounds are synthesized and tested for in vitro activity against Tv. As a result, experimental observations approached to theoretical predictions, since it was obtained a correct classification of 85.71% (6 out of 7) of the chemicals. Moreover, out of the seven compounds that are screened, synthesized and biologically assayed, six compounds (VA7-34, VA7-35, VA7-37, VA7-38, VA7-68, VA7-70) show pronounced cytocidal activity at the concentration of 100 μg/ml at 24 h (48 h) within the range of 98.66%-100% (99.40%-100%), while only two molecules (chemicals VA7-37 and VA7-38) show high cytocidal activity at the concentration of 10 μg/ml at 24 h (48 h): 98.38% (94.23%) and 97.59% (98.10%), correspondingly. The LDA-assisted QSAR models presented here could significantly reduce the number of synthesized and tested compounds and could increase the chance of finding new chemical entities with anti-trichomonal activity.
Bond-based linear indices in QSAR: computational discovery of novel anti-trichomonal compounds.
Marrero-Ponce, Yovani; Meneses-Marcel, Alfredo; Rivera-Borroto, Oscar M; García-Domenech, Ramón; De Julián-Ortiz, Jesus Vicente; Montero, Alina; Escario, José Antonio; Barrio, Alicia Gómez; Pereira, David Montero; Nogal, Juan José; Grau, Ricardo; Torrens, Francisco; Vogel, Christian; Arán, Vicente J
2008-08-01
Trichomonas vaginalis (Tv) is the causative agent of the most common, non-viral, sexually transmitted disease in women and men worldwide. Since 1959, metronidazole (MTZ) has been the drug of choice in the systemic treatment of trichomoniasis. However, resistance to MTZ in some patients and the great cost associated with the development of new trichomonacidals make necessary the development of computational methods that shorten the drug discovery pipeline. Toward this end, bond-based linear indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis were used to discover novel trichomonacidal chemicals. The obtained models, using non-stochastic and stochastic indices, are able to classify correctly 89.01% (87.50%) and 82.42% (84.38%) of the chemicals in the training (test) sets, respectively. These results validate the models for their use in the ligand-based virtual screening. In addition, they show large Matthews' correlation coefficients (C) of 0.78 (0.71) and 0.65 (0.65) for the training (test) sets, correspondingly. The result of predictions on the 10% full-out cross-validation test also evidences the robustness of the obtained models. Later, both models are applied to the virtual screening of 12 compounds already proved against Tv. As a result, they correctly classify 10 out of 12 (83.33%) and 9 out of 12 (75.00%) of the chemicals, respectively; which is the most important criterion for validating the models. Besides, these classification functions are applied to a library of seven chemicals in order to find novel antitrichomonal agents. These compounds are synthesized and tested for in vitro activity against Tv. As a result, experimental observations approached to theoretical predictions, since it was obtained a correct classification of 85.71% (6 out of 7) of the chemicals. Moreover, out of the seven compounds that are screened, synthesized and biologically assayed, six compounds (VA7-34, VA7-35, VA7-37, VA7-38, VA7-68, VA7-70) show pronounced cytocidal activity at the concentration of 100 mug/ml at 24 h (48 h) within the range of 98.66%-100% (99.40%-100%), while only two molecules (chemicals VA7-37 and VA7-38) show high cytocidal activity at the concentration of 10 mug/ml at 24 h (48 h): 98.38% (94.23%) and 97.59% (98.10%), correspondingly. The LDA-assisted QSAR models presented here could significantly reduce the number of synthesized and tested compounds and could increase the chance of finding new chemical entities with anti-trichomonal activity.
Computer implemented classification of vegetation using aircraft acquired multispectral scanner data
NASA Technical Reports Server (NTRS)
Cibula, W. G.
1975-01-01
The use of aircraft 24-channel multispectral scanner data in conjunction with computer processing techniques to obtain an automated classification of plant species association was discussed. The classification of various plant species associations was related to information needed for specific applications. In addition, the necessity for multiple selection of training fields for a single class in situations where the study area consists of highly irregular terrain was detailed. A single classification was illuminated differently in different areas, resulting in the existence of multiple spectral signatures for a given class. These different signatures result since different qualities of radiation upwell to the detector from portions that have differing qualities of incident radiation. Techniques of training field selection were outlined, and a classification obtained from a natural area in Tishomingo State Park in northern Mississippi was presented.
77 FR 45345 - DOE/Advanced Scientific Computing Advisory Committee
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-31
... Recompetition results for Scientific Discovery through Advanced Computing (SciDAC) applications Co-design Public... DEPARTMENT OF ENERGY DOE/Advanced Scientific Computing Advisory Committee AGENCY: Office of... the Advanced Scientific Computing Advisory Committee (ASCAC). The Federal Advisory Committee Act (Pub...
Affective Computing and the Impact of Gender and Age
Rukavina, Stefanie; Gruss, Sascha; Hoffmann, Holger; Tan, Jun-Wen; Walter, Steffen; Traue, Harald C.
2016-01-01
Affective computing aims at the detection of users’ mental states, in particular, emotions and dispositions during human-computer interactions. Detection can be achieved by measuring multimodal signals, namely, speech, facial expressions and/or psychobiology. Over the past years, one major approach was to identify the best features for each signal using different classification methods. Although this is of high priority, other subject-specific variables should not be neglected. In our study, we analyzed the effect of gender, age, personality and gender roles on the extracted psychobiological features (derived from skin conductance level, facial electromyography and heart rate variability) as well as the influence on the classification results. In an experimental human-computer interaction, five different affective states with picture material from the International Affective Picture System and ULM pictures were induced. A total of 127 subjects participated in the study. Among all potentially influencing variables (gender has been reported to be influential), age was the only variable that correlated significantly with psychobiological responses. In summary, the conducted classification processes resulted in 20% classification accuracy differences according to age and gender, especially when comparing the neutral condition with four other affective states. We suggest taking age and gender specifically into account for future studies in affective computing, as these may lead to an improvement of emotion recognition accuracy. PMID:26939129
Cognitive context detection in UAS operators using eye-gaze patterns on computer screens
NASA Astrophysics Data System (ADS)
Mannaru, Pujitha; Balasingam, Balakumar; Pattipati, Krishna; Sibley, Ciara; Coyne, Joseph
2016-05-01
In this paper, we demonstrate the use of eye-gaze metrics of unmanned aerial systems (UAS) operators as effective indices of their cognitive workload. Our analyses are based on an experiment where twenty participants performed pre-scripted UAS missions of three different difficulty levels by interacting with two custom designed graphical user interfaces (GUIs) that are displayed side by side. First, we compute several eye-gaze metrics, traditional eye movement metrics as well as newly proposed ones, and analyze their effectiveness as cognitive classifiers. Most of the eye-gaze metrics are computed by dividing the computer screen into "cells". Then, we perform several analyses in order to select metrics for effective cognitive context classification related to our specific application; the objective of these analyses are to (i) identify appropriate ways to divide the screen into cells; (ii) select appropriate metrics for training and classification of cognitive features; and (iii) identify a suitable classification method.
ERIC Educational Resources Information Center
Wilkinson, John
2009-01-01
Since 2006, the details of bodies making up our solar system have been revised. This was largely as a result of new discoveries of a number of planet-like objects beyond the orbit of Pluto. The International Astronomical Union redefined what constituted a planet and established two new classifications--dwarf planets and plutoids. As a result, the…
The First Arab Bibliography: Fihrist al-'Ulum. Occasional Papers Number 175.
ERIC Educational Resources Information Center
Wellisch, Hans H.
The history of the "Fihrist al-'Ulum" ("The Index of the Sciences"), the first comprehensive and detailed Arabic bibliography, is traced from the work's completion in A.D. 987/988 to the discovery and translation of surviving manuscripts in modern times. The bibliography's content, classification scheme, and author are…
A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems
Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda
2018-01-01
In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems. PMID:29439442
A Survey on Data Storage and Information Discovery in the WSANs-Based Edge Computing Systems.
Ma, Xingpo; Liang, Junbin; Liu, Renping; Ni, Wei; Li, Yin; Li, Ran; Ma, Wenpeng; Qi, Chuanda
2018-02-10
In the post-Cloud era, the proliferation of Internet of Things (IoT) has pushed the horizon of Edge computing, which is a new computing paradigm with data are processed at the edge of the network. As the important systems of Edge computing, wireless sensor and actuator networks (WSANs) play an important role in collecting and processing the sensing data from the surrounding environment as well as taking actions on the events happening in the environment. In WSANs, in-network data storage and information discovery schemes with high energy efficiency, high load balance and low latency are needed because of the limited resources of the sensor nodes and the real-time requirement of some specific applications, such as putting out a big fire in a forest. In this article, the existing schemes of WSANs on data storage and information discovery are surveyed with detailed analysis on their advancements and shortcomings, and possible solutions are proposed on how to achieve high efficiency, good load balance, and perfect real-time performances at the same time, hoping that it can provide a good reference for the future research of the WSANs-based Edge computing systems.
Heifetz, Alexander; Southey, Michelle; Morao, Inaki; Townsend-Nicholson, Andrea; Bodkin, Mike J
2018-01-01
GPCR modeling approaches are widely used in the hit-to-lead (H2L) and lead optimization (LO) stages of drug discovery. The aims of these modeling approaches are to predict the 3D structures of the receptor-ligand complexes, to explore the key interactions between the receptor and the ligand and to utilize these insights in the design of new molecules with improved binding, selectivity or other pharmacological properties. In this book chapter, we present a brief survey of key computational approaches integrated with hierarchical GPCR modeling protocol (HGMP) used in hit-to-lead (H2L) and in lead optimization (LO) stages of structure-based drug discovery (SBDD). We outline the differences in modeling strategies used in H2L and LO of SBDD and illustrate how these tools have been applied in three drug discovery projects.
Chung, Yongchul G.; Gómez-Gualdrón, Diego A.; Li, Peng; Leperi, Karson T.; Deria, Pravas; Zhang, Hongda; Vermeulen, Nicolaas A.; Stoddart, J. Fraser; You, Fengqi; Hupp, Joseph T.; Farha, Omar K.; Snurr, Randall Q.
2016-01-01
Discovery of new adsorbent materials with a high CO2 working capacity could help reduce CO2 emissions from newly commissioned power plants using precombustion carbon capture. High-throughput computational screening efforts can accelerate the discovery of new adsorbents but sometimes require significant computational resources to explore the large space of possible materials. We report the in silico discovery of high-performing adsorbents for precombustion CO2 capture by applying a genetic algorithm to efficiently search a large database of metal-organic frameworks (MOFs) for top candidates. High-performing MOFs identified from the in silico search were synthesized and activated and show a high CO2 working capacity and a high CO2/H2 selectivity. One of the synthesized MOFs shows a higher CO2 working capacity than any MOF reported in the literature under the operating conditions investigated here. PMID:27757420
CADD medicine: design is the potion that can cure my disease
NASA Astrophysics Data System (ADS)
Manas, Eric S.; Green, Darren V. S.
2017-03-01
The acronym "CADD" is often used interchangeably to refer to "Computer Aided Drug Discovery" and "Computer Aided Drug Design". While the former definition implies the use of a computer to impact one or more aspects of discovering a drug, in this paper we contend that computational chemists are most effective when they enable teams to apply true design principles as they strive to create medicines to treat human disease. We argue that teams must bring to bear multiple sub-disciplines of computational chemistry in an integrated manner in order to utilize these principles to address the multi-objective nature of the drug discovery problem. Impact, resourcing principles, and future directions for the field are also discussed, including areas of future opportunity as well as a cautionary note about hype and hubris.
Dystonia: an update on phenomenology, classification, pathogenesis and treatment.
Balint, Bettina; Bhatia, Kailash P
2014-08-01
This article will highlight recent advances in dystonia with focus on clinical aspects such as the new classification, syndromic approach, new gene discoveries and genotype-phenotype correlations. Broadening of phenotype of some of the previously described hereditary dystonias and environmental risk factors and trends in treatment will be covered. Based on phenomenology, a new consensus update on the definition, phenomenology and classification of dystonia and a syndromic approach to guide diagnosis have been proposed. Terminology has changed and 'isolated dystonia' is used wherein dystonia is the only motor feature apart from tremor, and the previously called heredodegenerative dystonias and dystonia plus syndromes are now subsumed under 'combined dystonia'. The recently discovered genes ANO3, GNAL and CIZ1 appear not to be a common cause of adult-onset cervical dystonia. Clinical and genetic heterogeneity underlie myoclonus-dystonia, dopa-responsive dystonia and deafness-dystonia syndrome. ALS2 gene mutations are a newly recognized cause for combined dystonia. The phenotypic and genotypic spectra of ATP1A3 mutations have considerably broadened. Two new genome-wide association studies identified new candidate genes. A retrospective analysis suggested complicated vaginal delivery as a modifying risk factor in DYT1. Recent studies confirm lasting therapeutic effects of deep brain stimulation in isolated dystonia, good treatment response in myoclonus-dystonia, and suggest that early treatment correlates with a better outcome. Phenotypic classification continues to be important to recognize particular forms of dystonia and this includes syndromic associations. There are a number of genes underlying isolated or combined dystonia and there will be further new discoveries with the advances in genetic technologies such as exome and whole-genome sequencing. The identification of new genes will facilitate better elucidation of pathogenetic mechanisms and possible corrective therapies.
Sharma, Harshita; Zerbe, Norman; Klempert, Iris; Hellwich, Olaf; Hufnagl, Peter
2017-11-01
Deep learning using convolutional neural networks is an actively emerging field in histological image analysis. This study explores deep learning methods for computer-aided classification in H&E stained histopathological whole slide images of gastric carcinoma. An introductory convolutional neural network architecture is proposed for two computerized applications, namely, cancer classification based on immunohistochemical response and necrosis detection based on the existence of tumor necrosis in the tissue. Classification performance of the developed deep learning approach is quantitatively compared with traditional image analysis methods in digital histopathology requiring prior computation of handcrafted features, such as statistical measures using gray level co-occurrence matrix, Gabor filter-bank responses, LBP histograms, gray histograms, HSV histograms and RGB histograms, followed by random forest machine learning. Additionally, the widely known AlexNet deep convolutional framework is comparatively analyzed for the corresponding classification problems. The proposed convolutional neural network architecture reports favorable results, with an overall classification accuracy of 0.6990 for cancer classification and 0.8144 for necrosis detection. Copyright © 2017 Elsevier Ltd. All rights reserved.
Yarn-dyed fabric defect classification based on convolutional neural network
NASA Astrophysics Data System (ADS)
Jing, Junfeng; Dong, Amei; Li, Pengfei; Zhang, Kaibing
2017-09-01
Considering that manual inspection of the yarn-dyed fabric can be time consuming and inefficient, we propose a yarn-dyed fabric defect classification method by using a convolutional neural network (CNN) based on a modified AlexNet. CNN shows powerful ability in performing feature extraction and fusion by simulating the learning mechanism of human brain. The local response normalization layers in AlexNet are replaced by the batch normalization layers, which can enhance both the computational efficiency and classification accuracy. In the training process of the network, the characteristics of the defect are extracted step by step and the essential features of the image can be obtained from the fusion of the edge details with several convolution operations. Then the max-pooling layers, the dropout layers, and the fully connected layers are employed in the classification model to reduce the computation cost and extract more precise features of the defective fabric. Finally, the results of the defect classification are predicted by the softmax function. The experimental results show promising performance with an acceptable average classification rate and strong robustness on yarn-dyed fabric defect classification.
Comparison of Classifier Architectures for Online Neural Spike Sorting.
Saeed, Maryam; Khan, Amir Ali; Kamboh, Awais Mehmood
2017-04-01
High-density, intracranial recordings from micro-electrode arrays need to undergo Spike Sorting in order to associate the recorded neuronal spikes to particular neurons. This involves spike detection, feature extraction, and classification. To reduce the data transmission and power requirements, on-chip real-time processing is becoming very popular. However, high computational resources are required for classifiers in on-chip spike-sorters, making scalability a great challenge. In this review paper, we analyze several popular classifiers to propose five new hardware architectures using the off-chip training with on-chip classification approach. These include support vector classification, fuzzy C-means classification, self-organizing maps classification, moving-centroid K-means classification, and Cosine distance classification. The performance of these architectures is analyzed in terms of accuracy and resource requirement. We establish that the neural networks based Self-Organizing Maps classifier offers the most viable solution. A spike sorter based on the Self-Organizing Maps classifier, requires only 7.83% of computational resources of the best-reported spike sorter, hierarchical adaptive means, while offering a 3% better accuracy at 7 dB SNR.
Predicting Time-to-Relapse in Breast Cancer Using Neural Networks
1997-12-01
CODE 17. SECURITY CLASSIFICATION OF REPORT Unclassified 118. SECURITY CLASSIFICATION OF THIS PAGE Unclassified 19. SECURITY CLASSIFICATION OF...Lowell WE, and Davis GL. A neural network that predicts psychiatric length of stay. MD Computing 10:87-92, 1993. Ebell MH. Artificial neural netowrks
Automated analysis and classification of melanocytic tumor on skin whole slide images.
Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal
2018-06-01
This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Accurate Arabic Script Language/Dialect Classification
2014-01-01
Army Research Laboratory Accurate Arabic Script Language/Dialect Classification by Stephen C. Tratz ARL-TR-6761 January 2014 Approved for public...1197 ARL-TR-6761 January 2014 Accurate Arabic Script Language/Dialect Classification Stephen C. Tratz Computational and Information Sciences...Include area code) Standard Form 298 (Rev. 8/98) Prescribed by ANSI Std. Z39.18 January 2014 Final Accurate Arabic Script Language/Dialect Classification
Exploring Chemical Space for Drug Discovery Using the Chemical Universe Database
2012-01-01
Herein we review our recent efforts in searching for bioactive ligands by enumeration and virtual screening of the unknown chemical space of small molecules. Enumeration from first principles shows that almost all small molecules (>99.9%) have never been synthesized and are still available to be prepared and tested. We discuss open access sources of molecules, the classification and representation of chemical space using molecular quantum numbers (MQN), its exhaustive enumeration in form of the chemical universe generated databases (GDB), and examples of using these databases for prospective drug discovery. MQN-searchable GDB, PubChem, and DrugBank are freely accessible at www.gdb.unibe.ch. PMID:23019491
PRIM versus CART in subgroup discovery: when patience is harmful.
Abu-Hanna, Ameen; Nannings, Barry; Dongelmans, Dave; Hasman, Arie
2010-10-01
We systematically compare the established algorithms CART (Classification and Regression Trees) and PRIM (Patient Rule Induction Method) in a subgroup discovery task on a large real-world high-dimensional clinical database. Contrary to current conjectures, PRIM's performance was generally inferior to CART's. PRIM often considered "peeling of" a large chunk of data at a value of a relevant discrete ordinal variable unattractive, ultimately missing an important subgroup. This finding has considerable significance in clinical medicine where ordinal scores are ubiquitous. PRIM's utility in clinical databases would increase when global information about (ordinal) variables is better put to use and when the search algorithm keeps track of alternative solutions.
Glycoprotein Disease Markers and Single Protein-omics*
Chandler, Kevin; Goldman, Radoslav
2013-01-01
Glycoproteins are well represented among biomarkers for inflammatory and cancer diseases. Secreted and membrane-associated glycoproteins make excellent targets for noninvasive detection. In this review, we discuss clinically applicable markers of cancer diseases and methods for their analysis. High throughput discovery continues to supply marker candidates with unusual glycan structures, altered glycoprotein abundance, or distribution of site-specific glycoforms. Improved analytical methods are needed to unlock the potential of these discoveries in validated clinical assays. A new generation of targeted quantitative assays is expected to advance the use of glycoproteins in early detection of diseases, molecular disease classification, and monitoring of therapeutic interventions. PMID:23399550
[Diary of a hospital evacuation. Discovery of a 5 hundredweight bomb from World War II].
Katter, I; Kunitz, O; Deller, A
2008-07-01
The discovery of an aircraft bomb from World War II made the complete evacuation of a tertiary care hospital with 629 beds and 17 specialist departments including a neonatal intensive care unit necessary. Some months before an alarm plan had been issued and a fire practice had been carried out which made it obvious to all concerned how important such measures are. Nevertheless, more room for improvement could be learned from the evacuation, in particular the rapid classification of the patients into categories and the fact that 20-30% of the patients needed stretcher-based transport for evacuation.
NASA Astrophysics Data System (ADS)
Schmalz, M.; Ritter, G.
Accurate multispectral or hyperspectral signature classification is key to the nonimaging detection and recognition of space objects. Additionally, signature classification accuracy depends on accurate spectral endmember determination [1]. Previous approaches to endmember computation and signature classification were based on linear operators or neural networks (NNs) expressed in terms of the algebra (R, +, x) [1,2]. Unfortunately, class separation in these methods tends to be suboptimal, and the number of signatures that can be accurately classified often depends linearly on the number of NN inputs. This can lead to poor endmember distinction, as well as potentially significant classification errors in the presence of noise or densely interleaved signatures. In contrast to traditional CNNs, autoassociative morphological memories (AMM) are a construct similar to Hopfield autoassociatived memories defined on the (R, +, ?,?) lattice algebra [3]. Unlimited storage and perfect recall of noiseless real valued patterns has been proven for AMMs [4]. However, AMMs suffer from sensitivity to specific noise models, that can be characterized as erosive and dilative noise. On the other hand, the prior definition of a set of endmembers corresponds to material spectra lying on vertices of the minimum convex region covering the image data. These vertices can be characterized as morphologically independent patterns. It has further been shown that AMMs can be based on dendritic computation [3,6]. These techniques yield improved accuracy and class segmentation/separation ability in the presence of highly interleaved signature data. In this paper, we present a procedure for endmember determination based on AMM noise sensitivity, which employs morphological dendritic computation. We show that detected endmembers can be exploited by AMM based classification techniques, to achieve accurate signature classification in the presence of noise, closely spaced or interleaved signatures, and simulated camera optical distortions. In particular, we examine two critical cases: (1) classification of multiple closely spaced signatures that are difficult to separate using distance measures, and (2) classification of materials in simulated hyperspectral images of spaceborne satellites. In each case, test data are derived from a NASA database of space material signatures. Additional analysis pertains to computational complexity and noise sensitivity, which are superior to classical NN based techniques.
What To Do, Instead of Counterproductive, Standardized Curricula and Testing.
ERIC Educational Resources Information Center
Keegan, Mark
1993-01-01
Presents evidence against the use of standardized curricula and testing and in favor of discovery-based instruction. Describes a successful experience with a relatively new form of discovery-based instruction, the scenario educational computer software. (AEF)
Analytical Procedures for Testability.
1983-01-01
Beat Internal Classifications", AD: A018516. "A System of Computer Aided Diagnosis with Blood Serum Chemistry Tests and Bayesian Statistics", AD: 786284...6 LIST OF TALS .. 1. Truth Table ......................................... 49 2. Covering Problem .............................. 93 3. Primary and...quential classification procedure in a coronary care ward is evaluated. In the toxicology field "A System of Computer Aided Diagnosis with Blood Serum
Correlation-based pattern recognition for implantable defibrillators.
Wilkins, J.
1996-01-01
An estimated 300,000 Americans die each year from cardiac arrhythmias. Historically, drug therapy or surgery were the only treatment options available for patients suffering from arrhythmias. Recently, implantable arrhythmia management devices have been developed. These devices allow abnormal cardiac rhythms to be sensed and corrected in vivo. Proper arrhythmia classification is critical to selecting the appropriate therapeutic intervention. The classification problem is made more challenging by the power/computation constraints imposed by the short battery life of implantable devices. Current devices utilize heart rate-based classification algorithms. Although easy to implement, rate-based approaches have unacceptably high error rates in distinguishing supraventricular tachycardia (SVT) from ventricular tachycardia (VT). Conventional morphology assessment techniques used in ECG analysis often require too much computation to be practical for implantable devices. In this paper, a computationally-efficient, arrhythmia classification architecture using correlation-based morphology assessment is presented. The architecture classifies individuals heart beats by assessing similarity between an incoming cardiac signal vector and a series of prestored class templates. A series of these beat classifications are used to make an overall rhythm assessment. The system makes use of several new results in the field of pattern recognition. The resulting system achieved excellent accuracy in discriminating SVT and VT. PMID:8947674
NASA Astrophysics Data System (ADS)
Niazmardi, S.; Safari, A.; Homayouni, S.
2017-09-01
Crop mapping through classification of Satellite Image Time-Series (SITS) data can provide very valuable information for several agricultural applications, such as crop monitoring, yield estimation, and crop inventory. However, the SITS data classification is not straightforward. Because different images of a SITS data have different levels of information regarding the classification problems. Moreover, the SITS data is a four-dimensional data that cannot be classified using the conventional classification algorithms. To address these issues in this paper, we presented a classification strategy based on Multiple Kernel Learning (MKL) algorithms for SITS data classification. In this strategy, initially different kernels are constructed from different images of the SITS data and then they are combined into a composite kernel using the MKL algorithms. The composite kernel, once constructed, can be used for the classification of the data using the kernel-based classification algorithms. We compared the computational time and the classification performances of the proposed classification strategy using different MKL algorithms for the purpose of crop mapping. The considered MKL algorithms are: MKL-Sum, SimpleMKL, LPMKL and Group-Lasso MKL algorithms. The experimental tests of the proposed strategy on two SITS data sets, acquired by SPOT satellite sensors, showed that this strategy was able to provide better performances when compared to the standard classification algorithm. The results also showed that the optimization method of the used MKL algorithms affects both the computational time and classification accuracy of this strategy.
Peker, Musa; Şen, Baha; Gürüler, Hüseyin
2015-02-01
The effect of anesthesia on the patient is referred to as depth of anesthesia. Rapid classification of appropriate depth level of anesthesia is a matter of great importance in surgical operations. Similarly, accelerating classification algorithms is important for the rapid solution of problems in the field of biomedical signal processing. However numerous, time-consuming mathematical operations are required when training and testing stages of the classification algorithms, especially in neural networks. In this study, to accelerate the process, parallel programming and computing platform (Nvidia CUDA) facilitates dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU) was utilized. The system was employed to detect anesthetic depth level on related electroencephalogram (EEG) data set. This dataset is rather complex and large. Moreover, the achieving more anesthetic levels with rapid response is critical in anesthesia. The proposed parallelization method yielded high accurate classification results in a faster time.
Studying Scientific Discovery by Computer Simulation.
1983-03-30
Mendel’s laws of inheritance, the law of Gay- Lussac for gaseous reactions, tile law of Dulong and Petit, the derivation of atomic weights by Avogadro...neceseary mid identify by block number) scientific discovery -ittri sic properties physical laws extensive terms data-driven heuristics intensive...terms theory-driven heuristics conservation laws 20. ABSTRACT (Continue on revere. side It necessary and identify by block number) Scientific discovery
NASA Astrophysics Data System (ADS)
Ness, P. H.; Jacobson, H.
1984-10-01
The thrust of 'group technology' is toward the exploitation of similarities in component design and manufacturing process plans to achieve assembly line flow cost efficiencies for small batch production. The systematic method devised for the identification of similarities in component geometry and processing steps is a coding and classification scheme implemented by interactive CAD/CAM systems. This coding and classification scheme has led to significant increases in computer processing power, allowing rapid searches and retrievals on the basis of a 30-digit code together with user-friendly computer graphics.
Designing and Implementation of River Classification Assistant Management System
NASA Astrophysics Data System (ADS)
Zhao, Yinjun; Jiang, Wenyuan; Yang, Rujun; Yang, Nan; Liu, Haiyan
2018-03-01
In an earlier publication, we proposed a new Decision Classifier (DCF) for Chinese river classification based on their structures. To expand, enhance and promote the application of the DCF, we build a computer system to support river classification named River Classification Assistant Management System. Based on ArcEngine and ArcServer platform, this system implements many functions such as data management, extraction of river network, river classification, and results publication under combining Client / Server with Browser / Server framework.
Sign use and cognition in automated scientific discovery: are computers only special kinds of signs?
NASA Astrophysics Data System (ADS)
Giza, Piotr
2018-04-01
James Fetzer criticizes the computational paradigm, prevailing in cognitive science by questioning, what he takes to be, its most elementary ingredient: that cognition is computation across representations. He argues that if cognition is taken to be a purposive, meaningful, algorithmic problem solving activity, then computers are incapable of cognition. Instead, they appear to be signs of a special kind, that can facilitate computation. He proposes the conception of minds as semiotic systems as an alternative paradigm for understanding mental phenomena, one that seems to overcome the difficulties of computationalism. Now, I argue, that with computer systems dealing with scientific discovery, the matter is not so simple as that. The alleged superiority of humans using signs to stand for something other over computers being merely "physical symbol systems" or "automatic formal systems" is only easy to establish in everyday life, but becomes far from obvious when scientific discovery is at stake. In science, as opposed to everyday life, the meaning of symbols is, apart from very low-level experimental investigations, defined implicitly by the way the symbols are used in explanatory theories or experimental laws relevant to the field, and in consequence, human and machine discoverers are much more on a par. Moreover, the great practical success of the genetic programming method and recent attempts to apply it to automatic generation of cognitive theories seem to show, that computer systems are capable of very efficient problem solving activity in science, which is neither purposive nor meaningful, nor algorithmic. This, I think, undermines Fetzer's argument that computer systems are incapable of cognition because computation across representations is bound to be a purposive, meaningful, algorithmic problem solving activity.
2014 Annual Report - Argonne Leadership Computing Facility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, James R.; Papka, Michael E.; Cerny, Beth A.
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines.
2015 Annual Report - Argonne Leadership Computing Facility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, James R.; Papka, Michael E.; Cerny, Beth A.
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines.
Medicinal chemistry in drug discovery in big pharma: past, present and future.
Campbell, Ian B; Macdonald, Simon J F; Procopiou, Panayiotis A
2018-02-01
The changes in synthetic and medicinal chemistry and related drug discovery science as practiced in big pharma over the past few decades are described. These have been predominantly driven by wider changes in society namely the computer, internet and globalisation. Thoughts about the future of medicinal chemistry are also discussed including sharing the risks and costs of drug discovery and the future of outsourcing. The continuing impact of access to substantial computing power and big data, the use of algorithms in data analysis and drug design are also presented. The next generation of medicinal chemists will communicate in ways that reflect social media and the results of constantly being connected to each other and data. Copyright © 2017. Published by Elsevier Ltd.
The discovery of the causes of leprosy: A computational analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corruble, V.; Ganascia, J.G.
1996-12-31
The role played by the inductive inference has been studied extensively in the field of Scientific Discovery. The work presented here tackles the problem of induction in medical research. The discovery of the causes of leprosy is analyzed and simulated using computational means. An inductive algorithm is proposed, which is successful in simulating some essential steps in the progress of the understanding of the disease. It also allows us to simulate the false reasoning of previous centuries through the introduction of some medical a priori inherited form archaic medicine. Corroborating previous research, this problem illustrates the importance of the socialmore » and cultural environment on the way the inductive inference is performed in medicine.« less
Center for Center for Technology for Advanced Scientific Component Software (TASCS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kostadin, Damevski
A resounding success of the Scientific Discovery through Advanced Computing (SciDAC) program is that high-performance computational science is now universally recognized as a critical aspect of scientific discovery [71], complementing both theoretical and experimental research. As scientific communities prepare to exploit unprecedented computing capabilities of emerging leadership-class machines for multi-model simulations at the extreme scale [72], it is more important than ever to address the technical and social challenges of geographically distributed teams that combine expertise in domain science, applied mathematics, and computer science to build robust and flexible codes that can incorporate changes over time. The Center for Technologymore » for Advanced Scientific Component Software (TASCS)1 tackles these these issues by exploiting component-based software development to facilitate collaborative high-performance scientific computing.« less
Establishing a Framework for a Natural Area Taxonomy.
Ebach, Malte C; Michaux, Bernard
2017-09-01
The identification of areas of endemism is essential in building an area classification, but plays little role in how natural areas are discovered. Rather area monophyly, derived from cladistics, is essential in the discovery of natural area classifications or area taxonomy. We propose Area Taxonomy to be a new sub-discipline of historical biogeography, one that can be revised and debated, and which has its own area nomenclature. Separately to area taxonomy, we outline how natural areas may be discovered by transcribing the concepts of homology and monophyly from biological systematics to historical biogeography, in the form of area homologues, area homologies and area monophyly.
Implementation of Multispectral Image Classification on a Remote Adaptive Computer
NASA Technical Reports Server (NTRS)
Figueiredo, Marco A.; Gloster, Clay S.; Stephens, Mark; Graves, Corey A.; Nakkar, Mouna
1999-01-01
As the demand for higher performance computers for the processing of remote sensing science algorithms increases, the need to investigate new computing paradigms its justified. Field Programmable Gate Arrays enable the implementation of algorithms at the hardware gate level, leading to orders of m a,gnitude performance increase over microprocessor based systems. The automatic classification of spaceborne multispectral images is an example of a computation intensive application, that, can benefit from implementation on an FPGA - based custom computing machine (adaptive or reconfigurable computer). A probabilistic neural network is used here to classify pixels of of a multispectral LANDSAT-2 image. The implementation described utilizes Java client/server application programs to access the adaptive computer from a remote site. Results verify that a remote hardware version of the algorithm (implemented on an adaptive computer) is significantly faster than a local software version of the same algorithm implemented on a typical general - purpose computer).
Chiarelli, Antonio Maria; Croce, Pierpaolo; Merla, Arcangelo; Zappasodi, Filippo
2018-06-01
Brain-computer interface (BCI) refers to procedures that link the central nervous system to a device. BCI was historically performed using electroencephalography (EEG). In the last years, encouraging results were obtained by combining EEG with other neuroimaging technologies, such as functional near infrared spectroscopy (fNIRS). A crucial step of BCI is brain state classification from recorded signal features. Deep artificial neural networks (DNNs) recently reached unprecedented complex classification outcomes. These performances were achieved through increased computational power, efficient learning algorithms, valuable activation functions, and restricted or back-fed neurons connections. By expecting significant overall BCI performances, we investigated the capabilities of combining EEG and fNIRS recordings with state-of-the-art deep learning procedures. We performed a guided left and right hand motor imagery task on 15 subjects with a fixed classification response time of 1 s and overall experiment length of 10 min. Left versus right classification accuracy of a DNN in the multi-modal recording modality was estimated and it was compared to standalone EEG and fNIRS and other classifiers. At a group level we obtained significant increase in performance when considering multi-modal recordings and DNN classifier with synergistic effect. BCI performances can be significantly improved by employing multi-modal recordings that provide electrical and hemodynamic brain activity information, in combination with advanced non-linear deep learning classification procedures.
NASA Astrophysics Data System (ADS)
Chiarelli, Antonio Maria; Croce, Pierpaolo; Merla, Arcangelo; Zappasodi, Filippo
2018-06-01
Objective. Brain–computer interface (BCI) refers to procedures that link the central nervous system to a device. BCI was historically performed using electroencephalography (EEG). In the last years, encouraging results were obtained by combining EEG with other neuroimaging technologies, such as functional near infrared spectroscopy (fNIRS). A crucial step of BCI is brain state classification from recorded signal features. Deep artificial neural networks (DNNs) recently reached unprecedented complex classification outcomes. These performances were achieved through increased computational power, efficient learning algorithms, valuable activation functions, and restricted or back-fed neurons connections. By expecting significant overall BCI performances, we investigated the capabilities of combining EEG and fNIRS recordings with state-of-the-art deep learning procedures. Approach. We performed a guided left and right hand motor imagery task on 15 subjects with a fixed classification response time of 1 s and overall experiment length of 10 min. Left versus right classification accuracy of a DNN in the multi-modal recording modality was estimated and it was compared to standalone EEG and fNIRS and other classifiers. Main results. At a group level we obtained significant increase in performance when considering multi-modal recordings and DNN classifier with synergistic effect. Significance. BCI performances can be significantly improved by employing multi-modal recordings that provide electrical and hemodynamic brain activity information, in combination with advanced non-linear deep learning classification procedures.
Haptic Classification of Common Objects: Knowledge-Driven Exploration.
ERIC Educational Resources Information Center
Lederman, Susan J.; Klatzky, Roberta L.
1990-01-01
Theoretical and empirical issues relating to haptic exploration and the representation of common objects during haptic classification were investigated in 3 experiments involving a total of 112 college students. Results are discussed in terms of a computational model of human haptic object classification with implications for dextrous robot…
A fortran program for Monte Carlo simulation of oil-field discovery sequences
Bohling, Geoffrey C.; Davis, J.C.
1993-01-01
We have developed a program for performing Monte Carlo simulation of oil-field discovery histories. A synthetic parent population of fields is generated as a finite sample from a distribution of specified form. The discovery sequence then is simulated by sampling without replacement from this parent population in accordance with a probabilistic discovery process model. The program computes a chi-squared deviation between synthetic and actual discovery sequences as a function of the parameters of the discovery process model, the number of fields in the parent population, and the distributional parameters of the parent population. The program employs the three-parameter log gamma model for the distribution of field sizes and employs a two-parameter discovery process model, allowing the simulation of a wide range of scenarios. ?? 1993.
Jiang, Guoqian; Wang, Chen; Zhu, Qian; Chute, Christopher G
2013-01-01
Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.
Translating genomic information into clinical medicine: lung cancer as a paradigm.
Levy, Mia A; Lovly, Christine M; Pao, William
2012-11-01
We are currently in an era of rapidly expanding knowledge about the genetic landscape and architectural blueprints of various cancers. These discoveries have led to a new taxonomy of malignant diseases based upon clinically relevant molecular alterations in addition to histology or tissue of origin. The new molecularly based classification holds the promise of rational rather than empiric approaches for the treatment of cancer patients. However, the accelerated pace of discovery and the expanding number of targeted anti-cancer therapies present a significant challenge for healthcare practitioners to remain informed and up-to-date on how to apply cutting-edge discoveries into daily clinical practice. In this Perspective, we use lung cancer as a paradigm to discuss challenges related to translating genomic information into the clinic, and we present one approach we took at Vanderbilt-Ingram Cancer Center to address these challenges.
Duong, Luc; Cheriet, Farida; Labelle, Hubert; Cheung, Kenneth M C; Abel, Mark F; Newton, Peter O; McCall, Richard E; Lenke, Lawrence G; Stokes, Ian A F
2009-08-01
Interobserver and intraobserver reliability study for the identification of the Lenke classification lumbar modifier by a panel of experts compared with a computer algorithm. To measure the variability of the Lenke classification lumbar modifier and determine if computer assistance using 3-dimensional spine models can improve the reliability of classification. The lumbar modifier has been proposed to subclassify Lenke scoliotic curve types into A, B, and C on the basis of the relationship between the central sacral vertical line (CSVL) and the apical lumbar vertebra. Landmarks for identification of the CSVL have not been clearly defined, and the reliability of the actual CSVL position and lumbar modifier selection have never been tested independently. Therefore, the value of the lumbar modifier for curve classification remains unknown. The preoperative radiographs of 68 patients with adolescent idiopathic scoliosis presenting a Lenke type 1 curve were measured manually twice by 6 members of the Scoliosis Research Society 3-dimensional classification committee at 6 months interval. Intraobserver and interobserver reliability was quantified using the percentage of agreement and kappa statistics. In addition, the lumbar curve of all subjects was reconstructed in 3-dimension using a stereoradiographic technique and was submitted to a computer algorithm to infer the lumbar modifier according to measurements from the pedicles. Interobserver rates for the first trial showed a mean kappa value of 0.56. Second trial rates were higher with a mean kappa value of 0.64. Intraobserver rates were evaluated at a mean kappa value of 0.69. The computer algorithm was successful in identifying the lumbar curve type and was in agreement with the observers by a proportion up to 93%. Agreement between and within observers for the Lenke lumbar modifier is only moderate to substantial with manual methods. Computer assistance with 3-dimensional models of the spine has the potential to decrease this variability.
2009-05-30
in the polyphenols chlorogenic acid , quinic acid and caffeic acid ). It appears in variable concentrations in urine and at much lower concentrations...liver, kidney, toxicity, gene, expression, nephrotoxin, D-serine, hippuric acid , Puromycin, Amphotericin B 16. SECURITY CLASSIFICATION OF: 17...6 1.4.4. Hippuric Acid
ERIC Educational Resources Information Center
Kraft, Donald H., Ed.
The 2000 ASIS (American Society for Information Science) conference explored knowledge innovation. The tracks in the conference program included knowledge discovery, capture, and creation; classification and representation; information retrieval; knowledge dissemination; and social, behavioral, ethical, and legal aspects. This proceedings is…
Inhibition of Retinoblastoma Protein Inactivation
2016-09-01
Retinoblastoma protein, E2F transcription factor, high throughput screen, drug discovery, x-ray crystallography 16. SECURITY CLASSIFICATION OF: 17...developed a method to perform fragment based screening by x-ray crystallography . 2.0 KEYWORDS Retinoblastoma (Rb) pathway, E2F transcription factor...cancer, cell-cycle inhibition, activation, modulation, inhibition, high throughput screening, fragment-based screening, x-ray crystallography
ERIC Educational Resources Information Center
Atherton, Pauline; And Others
A single issue of Nuclear Science Abstracts, containing about 2,300 abstracts, was indexed by Universal Decimal Classification (UDC) using the Special Subject Edition of UDC for Nuclear Science and Technology. The descriptive cataloging and UDC-indexing records formed a computer-stored data base. A systematic random sample of 500 additional…
ERIC Educational Resources Information Center
Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas
2012-01-01
The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
Land classification of south-central Iowa from computer enhanced images
NASA Technical Reports Server (NTRS)
Lucas, J. R. (Principal Investigator); Taranik, J. V.; Billingsley, F. C.
1976-01-01
The author has identified the following significant results. The Iowa Geological Survey developed its own capability for producing color products from digitally enhanced LANDSAT data. Research showed that efficient production of enhanced images required full utilization of both computer and photographic enhancement procedures. The 29 August 1972 photo-optically enhanced color composite was more easily interpreted for land classification purposes than standard color composites.
Neural attractor network for application in visual field data classification.
Fink, Wolfgang
2004-07-07
The purpose was to introduce a novel method for computer-based classification of visual field data derived from perimetric examination, that may act as a 'counsellor', providing an independent 'second opinion' to the diagnosing physician. The classification system consists of a Hopfield-type neural attractor network that obtains its input data from perimetric examination results. An iterative relaxation process determines the states of the neurons dynamically. Therefore, even 'noisy' perimetric output, e.g., early stages of a disease, may eventually be classified correctly according to the predefined idealized visual field defect (scotoma) patterns, stored as attractors of the network, that are found with diseases of the eye, optic nerve and the central nervous system. Preliminary tests of the classification system on real visual field data derived from perimetric examinations have shown a classification success of over 80%. Some of the main advantages of the Hopfield-attractor-network-based approach over feed-forward type neural networks are: (1) network architecture is defined by the classification problem; (2) no training is required to determine the neural coupling strengths; (3) assignment of an auto-diagnosis confidence level is possible by means of an overlap parameter and the Hamming distance. In conclusion, the novel method for computer-based classification of visual field data, presented here, furnishes a valuable first overview and an independent 'second opinion' in judging perimetric examination results, pointing towards a final diagnosis by a physician. It should not be considered a substitute for the diagnosing physician. Thanks to the worldwide accessibility of the Internet, the classification system offers a promising perspective towards modern computer-assisted diagnosis in both medicine and tele-medicine, for example and in particular, with respect to non-ophthalmic clinics or in communities where perimetric expertise is not readily available.
CP-CHARM: segmentation-free image classification made accessible.
Uhlmann, Virginie; Singh, Shantanu; Carpenter, Anne E
2016-01-27
Automated classification using machine learning often relies on features derived from segmenting individual objects, which can be difficult to automate. WND-CHARM is a previously developed classification algorithm in which features are computed on the whole image, thereby avoiding the need for segmentation. The algorithm obtained encouraging results but requires considerable computational expertise to execute. Furthermore, some benchmark sets have been shown to be subject to confounding artifacts that overestimate classification accuracy. We developed CP-CHARM, a user-friendly image-based classification algorithm inspired by WND-CHARM in (i) its ability to capture a wide variety of morphological aspects of the image, and (ii) the absence of requirement for segmentation. In order to make such an image-based classification method easily accessible to the biological research community, CP-CHARM relies on the widely-used open-source image analysis software CellProfiler for feature extraction. To validate our method, we reproduced WND-CHARM's results and ensured that CP-CHARM obtained comparable performance. We then successfully applied our approach on cell-based assay data and on tissue images. We designed these new training and test sets to reduce the effect of batch-related artifacts. The proposed method preserves the strengths of WND-CHARM - it extracts a wide variety of morphological features directly on whole images thereby avoiding the need for cell segmentation, but additionally, it makes the methods easily accessible for researchers without computational expertise by implementing them as a CellProfiler pipeline. It has been demonstrated to perform well on a wide range of bioimage classification problems, including on new datasets that have been carefully selected and annotated to minimize batch effects. This provides for the first time a realistic and reliable assessment of the whole image classification strategy.
Hua, Kai-Lung; Hsu, Che-Hao; Hidayati, Shintami Chusnul; Cheng, Wen-Huang; Chen, Yu-Jen
2015-01-01
Lung cancer has a poor prognosis when not diagnosed early and unresectable lesions are present. The management of small lung nodules noted on computed tomography scan is controversial due to uncertain tumor characteristics. A conventional computer-aided diagnosis (CAD) scheme requires several image processing and pattern recognition steps to accomplish a quantitative tumor differentiation result. In such an ad hoc image analysis pipeline, every step depends heavily on the performance of the previous step. Accordingly, tuning of classification performance in a conventional CAD scheme is very complicated and arduous. Deep learning techniques, on the other hand, have the intrinsic advantage of an automatic exploitation feature and tuning of performance in a seamless fashion. In this study, we attempted to simplify the image analysis pipeline of conventional CAD with deep learning techniques. Specifically, we introduced models of a deep belief network and a convolutional neural network in the context of nodule classification in computed tomography images. Two baseline methods with feature computing steps were implemented for comparison. The experimental results suggest that deep learning methods could achieve better discriminative results and hold promise in the CAD application domain. PMID:26346558
Hua, Kai-Lung; Hsu, Che-Hao; Hidayati, Shintami Chusnul; Cheng, Wen-Huang; Chen, Yu-Jen
2015-01-01
Lung cancer has a poor prognosis when not diagnosed early and unresectable lesions are present. The management of small lung nodules noted on computed tomography scan is controversial due to uncertain tumor characteristics. A conventional computer-aided diagnosis (CAD) scheme requires several image processing and pattern recognition steps to accomplish a quantitative tumor differentiation result. In such an ad hoc image analysis pipeline, every step depends heavily on the performance of the previous step. Accordingly, tuning of classification performance in a conventional CAD scheme is very complicated and arduous. Deep learning techniques, on the other hand, have the intrinsic advantage of an automatic exploitation feature and tuning of performance in a seamless fashion. In this study, we attempted to simplify the image analysis pipeline of conventional CAD with deep learning techniques. Specifically, we introduced models of a deep belief network and a convolutional neural network in the context of nodule classification in computed tomography images. Two baseline methods with feature computing steps were implemented for comparison. The experimental results suggest that deep learning methods could achieve better discriminative results and hold promise in the CAD application domain.
Hit discovery and hit-to-lead approaches.
Keseru, György M; Makara, Gergely M
2006-08-01
Hit discovery technologies range from traditional high-throughput screening to affinity selection of large libraries, fragment-based techniques and computer-aided de novo design, many of which have been extensively reviewed. Development of quality leads using hit confirmation and hit-to-lead approaches present their own challenges, depending on the hit discovery method used to identify the initial hits. In this paper, we summarize common industry practices adopted to tackle hit-to-lead challenges and review how the advantages and drawbacks of different hit discovery techniques could affect the various issues hit-to-lead groups face.
Pacharawongsakda, Eakasit; Theeramunkong, Thanaruk
2013-12-01
Predicting protein subcellular location is one of major challenges in Bioinformatics area since such knowledge helps us understand protein functions and enables us to select the targeted proteins during drug discovery process. While many computational techniques have been proposed to improve predictive performance for protein subcellular location, they have several shortcomings. In this work, we propose a method to solve three main issues in such techniques; i) manipulation of multiplex proteins which may exist or move between multiple cellular compartments, ii) handling of high dimensionality in input and output spaces and iii) requirement of sufficient labeled data for model training. Towards these issues, this work presents a new computational method for predicting proteins which have either single or multiple locations. The proposed technique, namely iFLAST-CORE, incorporates the dimensionality reduction in the feature and label spaces with co-training paradigm for semi-supervised multi-label classification. For this purpose, the Singular Value Decomposition (SVD) is applied to transform the high-dimensional feature space and label space into the lower-dimensional spaces. After that, due to limitation of labeled data, the co-training regression makes use of unlabeled data by predicting the target values in the lower-dimensional spaces of unlabeled data. In the last step, the component of SVD is used to project labels in the lower-dimensional space back to those in the original space and an adaptive threshold is used to map a numeric value to a binary value for label determination. A set of experiments on viral proteins and gram-negative bacterial proteins evidence that our proposed method improve the classification performance in terms of various evaluation metrics such as Aiming (or Precision), Coverage (or Recall) and macro F-measure, compared to the traditional method that uses only labeled data.
Computational Chemistry in the Pharmaceutical Industry: From Childhood to Adolescence.
Hillisch, Alexander; Heinrich, Nikolaus; Wild, Hanno
2015-12-01
Computational chemistry within the pharmaceutical industry has grown into a field that proactively contributes to many aspects of drug design, including target selection and lead identification and optimization. While methodological advancements have been key to this development, organizational developments have been crucial to our success as well. In particular, the interaction between computational and medicinal chemistry and the integration of computational chemistry into the entire drug discovery process have been invaluable. Over the past ten years we have shaped and developed a highly efficient computational chemistry group for small-molecule drug discovery at Bayer HealthCare that has significantly impacted the clinical development pipeline. In this article we describe the setup and tasks of the computational group and discuss external collaborations. We explain what we have found to be the most valuable and productive methods and discuss future directions for computational chemistry method development. We share this information with the hope of igniting interesting discussions around this topic. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Njogu, Peter M; Guantai, Eric M; Pavadai, Elumalai; Chibale, Kelly
2016-01-08
Despite the tremendous improvement in overall global health heralded by the adoption of the Millennium Declaration in the year 2000, tropical infections remain a major health problem in the developing world. Recent estimates indicate that the major tropical infectious diseases, namely, malaria, tuberculosis, trypanosomiasis, and leishmaniasis, account for more than 2.2 million deaths and a loss of approximately 85 million disability-adjusted life years annually. The crucial role of chemotherapy in curtailing the deleterious health and economic impacts of these infections has invigorated the search for new drugs against tropical infectious diseases. The research efforts have involved increased application of computational technologies in mainstream drug discovery programs at the hit identification, hit-to-lead, and lead optimization stages. This review highlights various computer-aided drug discovery approaches that have been utilized in efforts to identify novel antimalarial, antitubercular, antitrypanosomal, and antileishmanial agents. The focus is largely on developments over the past 5 years (2010-2014).
Computational medicinal chemistry in fragment-based drug discovery: what, how and when.
Rabal, Obdulia; Urbano-Cuadrado, Manuel; Oyarzabal, Julen
2011-01-01
The use of fragment-based drug discovery (FBDD) has increased in the last decade due to the encouraging results obtained to date. In this scenario, computational approaches, together with experimental information, play an important role to guide and speed up the process. By default, FBDD is generally considered as a constructive approach. However, such additive behavior is not always present, therefore, simple fragment maturation will not always deliver the expected results. In this review, computational approaches utilized in FBDD are reported together with real case studies, where applicability domains are exemplified, in order to analyze them, and then, maximize their performance and reliability. Thus, a proper use of these computational tools can minimize misleading conclusions, keeping the credit on FBDD strategy, as well as achieve higher impact in the drug-discovery process. FBDD goes one step beyond a simple constructive approach. A broad set of computational tools: docking, R group quantitative structure-activity relationship, fragmentation tools, fragments management tools, patents analysis and fragment-hopping, for example, can be utilized in FBDD, providing a clear positive impact if they are utilized in the proper scenario - what, how and when. An initial assessment of additive/non-additive behavior is a critical point to define the most convenient approach for fragments elaboration.
A survey to identify the clinical coding and classification systems currently in use across Europe.
de Lusignan, S; Minmagh, C; Kennedy, J; Zeimet, M; Bommezijn, H; Bryant, J
2001-01-01
This is a survey to identify what clinical coding systems are currently in use across the European Union, and the states seeking membership to it. We sought to identify what systems are currently used and to what extent they were subject to local adaptation. Clinical coding should facilitate identifying key medical events in a computerised medical record, and aggregating information across groups of records. The emerging new driver is as the enabler of the life-long computerised medical record. A prerequisite for this level of functionality is the transfer of information between different computer systems. This transfer can be facilitated either by working on the interoperability problems between disparate systems or by harmonising the underlying data. This paper examines the extent to which the latter has occurred across Europe. Literature and Internet search. Requests for information via electronic mail to pan-European mailing lists of health informatics professionals. Coding systems are now a de facto part of health information systems across Europe. There are relatively few coding systems in existence across Europe. ICD9 and ICD 10, ICPC and Read were the most established. However the local adaptation of these classification systems either on a by country or by computer software manufacturer basis; significantly reduces the ability for the meaning coded with patients computer records to be easily transferred from one medical record system to another. There is no longer any debate as to whether a coding or classification system should be used. Convergence of different classifications systems should be encouraged. Countries and computer manufacturers within the EU should be encouraged to stop making local modifications to coding and classification systems, as this practice risks significantly slowing progress towards easy transfer of records between computer systems.
A new taxonomy for distributed computer systems based upon operating system structure
NASA Technical Reports Server (NTRS)
Foudriat, E. C.
1985-01-01
Characteristics of the resource structure found in the operating system are considered as a mechanism for classifying distributed computer systems. Since the operating system resources, themselves, are too diversified to provide a consistent classification, the structure upon which resources are built and shared are examined. The location and control character of this indivisibility provides the taxonomy for separating uniprocessors, computer networks, network computers (fully distributed processing systems or decentralized computers) and algorithm and/or data control multiprocessors. The taxonomy is important because it divides machines into a classification that is relevant or important to the client and not the hardware architect. It also defines the character of the kernel O/S structure needed for future computer systems. What constitutes an operating system for a fully distributed processor is discussed in detail.
Merz, Marius; Birngruber, Christoph G; Heidorn, Frank; Ramsthaler, Frank; Risse, Manfred; Kreutz, Kerstin; Krähahn, Jonathan; Verhoff, Marcel A
2011-01-01
In German medical and media circles (daily routine, specialist literature, press, novels), the term "domestic-setting corpse" is frequently used, but the term is only vaguely defined. The authors thus decided to perform an in-depth study of the literature, including historic textbooks and all German- and English-language medicolegal journals, going as far back as their first issues, in an attempt to more clearly define the term. Inclusion criteria used in the search were a post-mortem interval of at least 24 hours prior to discovery and discovery of the corpse in a domestic setting. In the literature, 37 cases that complied with the above-mentioned inclusion criteria were found. These cases frequently described "advanced decomposition", often "unclear cause of death" and "problems in identification". These characteristics can thus be considered as being additional pointers in the definition. However, we suggest that the two general defining characteristics of a "domestic-setting corpse" are a post-mortem interval of more than 24 hours before discovery and the discovery of the corpse in a domestic setting.
Handwriting Classification in Forensic Science.
ERIC Educational Resources Information Center
Ansell, Michael
1979-01-01
Considers systems for the classification of handwriting features, discusses computer storage of information about handwriting features, and summarizes recent studies that give an idea of the range of forensic handwriting research. (GT)
Fast Semantic Segmentation of 3d Point Clouds with Strongly Varying Density
NASA Astrophysics Data System (ADS)
Hackel, Timo; Wegner, Jan D.; Schindler, Konrad
2016-06-01
We describe an effective and efficient method for point-wise semantic classification of 3D point clouds. The method can handle unstructured and inhomogeneous point clouds such as those derived from static terrestrial LiDAR or photogammetric reconstruction; and it is computationally efficient, making it possible to process point clouds with many millions of points in a matter of minutes. The key issue, both to cope with strong variations in point density and to bring down computation time, turns out to be careful handling of neighborhood relations. By choosing appropriate definitions of a point's (multi-scale) neighborhood, we obtain a feature set that is both expressive and fast to compute. We evaluate our classification method both on benchmark data from a mobile mapping platform and on a variety of large, terrestrial laser scans with greatly varying point density. The proposed feature set outperforms the state of the art with respect to per-point classification accuracy, while at the same time being much faster to compute.
Gross, Douglas P; Zhang, Jing; Steenstra, Ivan; Barnsley, Susan; Haws, Calvin; Amell, Tyler; McIntosh, Greg; Cooper, Juliette; Zaiane, Osmar
2013-12-01
To develop a classification algorithm and accompanying computer-based clinical decision support tool to help categorize injured workers toward optimal rehabilitation interventions based on unique worker characteristics. Population-based historical cohort design. Data were extracted from a Canadian provincial workers' compensation database on all claimants undergoing work assessment between December 2009 and January 2011. Data were available on: (1) numerous personal, clinical, occupational, and social variables; (2) type of rehabilitation undertaken; and (3) outcomes following rehabilitation (receiving time loss benefits or undergoing repeat programs). Machine learning, concerned with the design of algorithms to discriminate between classes based on empirical data, was the foundation of our approach to build a classification system with multiple independent and dependent variables. The population included 8,611 unique claimants. Subjects were predominantly employed (85 %) males (64 %) with diagnoses of sprain/strain (44 %). Baseline clinician classification accuracy was high (ROC = 0.86) for selecting programs that lead to successful return-to-work. Classification performance for machine learning techniques outperformed the clinician baseline classification (ROC = 0.94). The final classifiers were multifactorial and included the variables: injury duration, occupation, job attachment status, work status, modified work availability, pain intensity rating, self-rated occupational disability, and 9 items from the SF-36 Health Survey. The use of machine learning classification techniques appears to have resulted in classification performance better than clinician decision-making. The final algorithm has been integrated into a computer-based clinical decision support tool that requires additional validation in a clinical sample.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Continuous robust sound event classification using time-frequency features and deep learning
Song, Yan; Xiao, Wei; Phan, Huy
2017-01-01
The automatic detection and recognition of sound events by computers is a requirement for a number of emerging sensing and human computer interaction technologies. Recent advances in this field have been achieved by machine learning classifiers working in conjunction with time-frequency feature representations. This combination has achieved excellent accuracy for classification of discrete sounds. The ability to recognise sounds under real-world noisy conditions, called robust sound event classification, is an especially challenging task that has attracted recent research attention. Another aspect of real-word conditions is the classification of continuous, occluded or overlapping sounds, rather than classification of short isolated sound recordings. This paper addresses the classification of noise-corrupted, occluded, overlapped, continuous sound recordings. It first proposes a standard evaluation task for such sounds based upon a common existing method for evaluating isolated sound classification. It then benchmarks several high performing isolated sound classifiers to operate with continuous sound data by incorporating an energy-based event detection front end. Results are reported for each tested system using the new task, to provide the first analysis of their performance for continuous sound event detection. In addition it proposes and evaluates a novel Bayesian-inspired front end for the segmentation and detection of continuous sound recordings prior to classification. PMID:28892478
Continuous robust sound event classification using time-frequency features and deep learning.
McLoughlin, Ian; Zhang, Haomin; Xie, Zhipeng; Song, Yan; Xiao, Wei; Phan, Huy
2017-01-01
The automatic detection and recognition of sound events by computers is a requirement for a number of emerging sensing and human computer interaction technologies. Recent advances in this field have been achieved by machine learning classifiers working in conjunction with time-frequency feature representations. This combination has achieved excellent accuracy for classification of discrete sounds. The ability to recognise sounds under real-world noisy conditions, called robust sound event classification, is an especially challenging task that has attracted recent research attention. Another aspect of real-word conditions is the classification of continuous, occluded or overlapping sounds, rather than classification of short isolated sound recordings. This paper addresses the classification of noise-corrupted, occluded, overlapped, continuous sound recordings. It first proposes a standard evaluation task for such sounds based upon a common existing method for evaluating isolated sound classification. It then benchmarks several high performing isolated sound classifiers to operate with continuous sound data by incorporating an energy-based event detection front end. Results are reported for each tested system using the new task, to provide the first analysis of their performance for continuous sound event detection. In addition it proposes and evaluates a novel Bayesian-inspired front end for the segmentation and detection of continuous sound recordings prior to classification.
Jane, Nancy Yesudhas; Nehemiah, Khanna Harichandran; Arputharaj, Kannan
2016-01-01
Clinical time-series data acquired from electronic health records (EHR) are liable to temporal complexities such as irregular observations, missing values and time constrained attributes that make the knowledge discovery process challenging. This paper presents a temporal rough set induced neuro-fuzzy (TRiNF) mining framework that handles these complexities and builds an effective clinical decision-making system. TRiNF provides two functionalities namely temporal data acquisition (TDA) and temporal classification. In TDA, a time-series forecasting model is constructed by adopting an improved double exponential smoothing method. The forecasting model is used in missing value imputation and temporal pattern extraction. The relevant attributes are selected using a temporal pattern based rough set approach. In temporal classification, a classification model is built with the selected attributes using a temporal pattern induced neuro-fuzzy classifier. For experimentation, this work uses two clinical time series dataset of hepatitis and thrombosis patients. The experimental result shows that with the proposed TRiNF framework, there is a significant reduction in the error rate, thereby obtaining the classification accuracy on an average of 92.59% for hepatitis and 91.69% for thrombosis dataset. The obtained classification results prove the efficiency of the proposed framework in terms of its improved classification accuracy.
Practical Issues in Estimating Classification Accuracy and Consistency with R Package cacIRT
ERIC Educational Resources Information Center
Lathrop, Quinn N.
2015-01-01
There are two main lines of research in estimating classification accuracy (CA) and classification consistency (CC) under Item Response Theory (IRT). The R package cacIRT provides computer implementations of both approaches in an accessible and unified framework. Even with available implementations, there remains decisions a researcher faces when…
Breaking the Cost Barrier in Automatic Classification.
ERIC Educational Resources Information Center
Doyle, L. B.
A low-cost automatic classification method is reported that uses computer time in proportion to NlogN, where N is the number of information items and the base is a parameter, some barriers besides cost are treated briefly in the opening section, including types of intellectual resistance to the idea of doing classification by content-word…
Wu, Shang-Lin; Liao, Lun-De; Lu, Shao-Wei; Jiang, Wei-Ling; Chen, Shi-An; Lin, Chin-Teng
2013-08-01
Electrooculography (EOG) signals can be used to control human-computer interface (HCI) systems, if properly classified. The ability to measure and process these signals may help HCI users to overcome many of the physical limitations and inconveniences in daily life. However, there are currently no effective multidirectional classification methods for monitoring eye movements. Here, we describe a classification method used in a wireless EOG-based HCI device for detecting eye movements in eight directions. This device includes wireless EOG signal acquisition components, wet electrodes and an EOG signal classification algorithm. The EOG classification algorithm is based on extracting features from the electrical signals corresponding to eight directions of eye movement (up, down, left, right, up-left, down-left, up-right, and down-right) and blinking. The recognition and processing of these eight different features were achieved in real-life conditions, demonstrating that this device can reliably measure the features of EOG signals. This system and its classification procedure provide an effective method for identifying eye movements. Additionally, it may be applied to study eye functions in real-life conditions in the near future.
Optimizing spectral CT parameters for material classification tasks
NASA Astrophysics Data System (ADS)
Rigie, D. S.; La Rivière, P. J.
2016-06-01
In this work, we propose a framework for optimizing spectral CT imaging parameters and hardware design with regard to material classification tasks. Compared with conventional CT, many more parameters must be considered when designing spectral CT systems and protocols. These choices will impact material classification performance in a non-obvious, task-dependent way with direct implications for radiation dose reduction. In light of this, we adapt Hotelling Observer formalisms typically applied to signal detection tasks to the spectral CT, material-classification problem. The result is a rapidly computable metric that makes it possible to sweep out many system configurations, generating parameter optimization curves (POC’s) that can be used to select optimal settings. The proposed model avoids restrictive assumptions about the basis-material decomposition (e.g. linearity) and incorporates signal uncertainty with a stochastic object model. This technique is demonstrated on dual-kVp and photon-counting systems for two different, clinically motivated material classification tasks (kidney stone classification and plaque removal). We show that the POC’s predicted with the proposed analytic model agree well with those derived from computationally intensive numerical simulation studies.
New KF-PP-SVM classification method for EEG in brain-computer interfaces.
Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian
2014-01-01
Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.
Optimizing Spectral CT Parameters for Material Classification Tasks
Rigie, D. S.; La Rivière, P. J.
2017-01-01
In this work, we propose a framework for optimizing spectral CT imaging parameters and hardware design with regard to material classification tasks. Compared with conventional CT, many more parameters must be considered when designing spectral CT systems and protocols. These choices will impact material classification performance in a non-obvious, task-dependent way with direct implications for radiation dose reduction. In light of this, we adapt Hotelling Observer formalisms typically applied to signal detection tasks to the spectral CT, material-classification problem. The result is a rapidly computable metric that makes it possible to sweep out many system configurations, generating parameter optimization curves (POC’s) that can be used to select optimal settings. The proposed model avoids restrictive assumptions about the basis-material decomposition (e.g. linearity) and incorporates signal uncertainty with a stochastic object model. This technique is demonstrated on dual-kVp and photon-counting systems for two different, clinically motivated material classification tasks (kidney stone classification and plaque removal). We show that the POC’s predicted with the proposed analytic model agree well with those derived from computationally intensive numerical simulation studies. PMID:27227430
NASA Astrophysics Data System (ADS)
Zhang, Min; Zhou, Xiangrong; Goshima, Satoshi; Chen, Huayue; Muramatsu, Chisako; Hara, Takeshi; Yokoyama, Ryojiro; Kanematsu, Masayuki; Fujita, Hiroshi
2012-03-01
We aim at using a new texton based texture classification method in the classification of pulmonary emphysema in computed tomography (CT) images of the lungs. Different from conventional computer-aided diagnosis (CAD) pulmonary emphysema classification methods, in this paper, firstly, the dictionary of texton is learned via applying sparse representation(SR) to image patches in the training dataset. Then the SR coefficients of the test images over the dictionary are used to construct the histograms for texture presentations. Finally, classification is performed by using a nearest neighbor classifier with a histogram dissimilarity measure as distance. The proposed approach is tested on 3840 annotated regions of interest consisting of normal tissue and mild, moderate and severe pulmonary emphysema of three subtypes. The performance of the proposed system, with an accuracy of about 88%, is comparably higher than state of the art method based on the basic rotation invariant local binary pattern histograms and the texture classification method based on texton learning by k-means, which performs almost the best among other approaches in the literature.
NASA Astrophysics Data System (ADS)
Senovilla, José M. M.
2010-11-01
The algebraic classification of the Weyl tensor in the arbitrary dimension n is recovered by means of the principal directions of its 'superenergy' tensor. This point of view can be helpful in order to compute the Weyl aligned null directions explicitly, and permits one to obtain the algebraic type of the Weyl tensor by computing the principal eigenvalue of rank-2 symmetric future tensors. The algebraic types compatible with states of intrinsic gravitational radiation can then be explored. The underlying ideas are general, so that a classification of arbitrary tensors in the general dimension can be achieved.
Adetiba, Emmanuel; Olugbara, Oludayo O
2015-01-01
Lung cancer is one of the diseases responsible for a large number of cancer related death cases worldwide. The recommended standard for screening and early detection of lung cancer is the low dose computed tomography. However, many patients diagnosed die within one year, which makes it essential to find alternative approaches for screening and early detection of lung cancer. We present computational methods that can be implemented in a functional multi-genomic system for classification, screening and early detection of lung cancer victims. Samples of top ten biomarker genes previously reported to have the highest frequency of lung cancer mutations and sequences of normal biomarker genes were respectively collected from the COSMIC and NCBI databases to validate the computational methods. Experiments were performed based on the combinations of Z-curve and tetrahedron affine transforms, Histogram of Oriented Gradient (HOG), Multilayer perceptron and Gaussian Radial Basis Function (RBF) neural networks to obtain an appropriate combination of computational methods to achieve improved classification of lung cancer biomarker genes. Results show that a combination of affine transforms of Voss representation, HOG genomic features and Gaussian RBF neural network perceptibly improves classification accuracy, specificity and sensitivity of lung cancer biomarker genes as well as achieving low mean square error.
Pattern recognition for passive polarimetric data using nonparametric classifiers
NASA Astrophysics Data System (ADS)
Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.
2005-08-01
Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
Texture classification of lung computed tomography images
NASA Astrophysics Data System (ADS)
Pheng, Hang See; Shamsuddin, Siti M.
2013-03-01
Current development of algorithms in computer-aided diagnosis (CAD) scheme is growing rapidly to assist the radiologist in medical image interpretation. Texture analysis of computed tomography (CT) scans is one of important preliminary stage in the computerized detection system and classification for lung cancer. Among different types of images features analysis, Haralick texture with variety of statistical measures has been used widely in image texture description. The extraction of texture feature values is essential to be used by a CAD especially in classification of the normal and abnormal tissue on the cross sectional CT images. This paper aims to compare experimental results using texture extraction and different machine leaning methods in the classification normal and abnormal tissues through lung CT images. The machine learning methods involve in this assessment are Artificial Immune Recognition System (AIRS), Naive Bayes, Decision Tree (J48) and Backpropagation Neural Network. AIRS is found to provide high accuracy (99.2%) and sensitivity (98.0%) in the assessment. For experiments and testing purpose, publicly available datasets in the Reference Image Database to Evaluate Therapy Response (RIDER) are used as study cases.
EEG Responses to Auditory Stimuli for Automatic Affect Recognition
Hettich, Dirk T.; Bolinger, Elaina; Matuz, Tamara; Birbaumer, Niels; Rosenstiel, Wolfgang; Spüler, Martin
2016-01-01
Brain state classification for communication and control has been well established in the area of brain-computer interfaces over the last decades. Recently, the passive and automatic extraction of additional information regarding the psychological state of users from neurophysiological signals has gained increased attention in the interdisciplinary field of affective computing. We investigated how well specific emotional reactions, induced by auditory stimuli, can be detected in EEG recordings. We introduce an auditory emotion induction paradigm based on the International Affective Digitized Sounds 2nd Edition (IADS-2) database also suitable for disabled individuals. Stimuli are grouped in three valence categories: unpleasant, neutral, and pleasant. Significant differences in time domain domain event-related potentials are found in the electroencephalogram (EEG) between unpleasant and neutral, as well as pleasant and neutral conditions over midline electrodes. Time domain data were classified in three binary classification problems using a linear support vector machine (SVM) classifier. We discuss three classification performance measures in the context of affective computing and outline some strategies for conducting and reporting affect classification studies. PMID:27375410
NASA Technical Reports Server (NTRS)
Kumar, Uttam; Nemani, Ramakrishna R.; Ganguly, Sangram; Kalia, Subodh; Michaelis, Andrew
2017-01-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS-national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91 percent was achieved, which is a 6 percent improvement in unmixing based classification relative to per-pixel-based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
NASA Astrophysics Data System (ADS)
Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.
2017-12-01
In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.
NASA Astrophysics Data System (ADS)
Way, Michael J.
2014-01-01
Edwin Hubble is famous for a number of discoveries that are well known to amateur and professional astronomers, students and even the general public. The origins of three of the most well-known discoveries are examined: The distances to nearby spiral nebulae, the classification of extragalactic-nebulae and the Hubble constant. In the case of the first two a great deal of supporting evidence was already in place, but little credit was given. The Hubble Constant had already been estimated in 1927 by Georges Lemaitre with roughly the same value that Hubble obtained in 1929 using redshifts provided mostly by Vesto M. Slipher. These earlier estimates were not adopted or were forgotten by the astronomical community for complex scientific, sociological and psychological reasons.
NASA Astrophysics Data System (ADS)
Buongiorno Nardelli, Marco
High-Throughput Quantum-Mechanics computation of materials properties by ab initio methods has become the foundation of an effective approach to materials design, discovery and characterization. This data driven approach to materials science currently presents the most promising path to the development of advanced technological materials that could solve or mitigate important social and economic challenges of the 21st century. In particular, the rapid proliferation of computational data on materials properties presents the possibility to complement and extend materials property databases where the experimental data is lacking and difficult to obtain. Enhanced repositories such as AFLOWLIB open novel opportunities for structure discovery and optimization, including uncovering of unsuspected compounds, metastable structures and correlations between various properties. The practical realization of these opportunities depends almost exclusively on the the design of efficient algorithms for electronic structure simulations of realistic material systems beyond the limitations of the current standard theories. In this talk, I will review recent progress in theoretical and computational tools, and in particular, discuss the development and validation of novel functionals within Density Functional Theory and of local basis representations for effective ab-initio tight-binding schemes. Marco Buongiorno Nardelli is a pioneer in the development of computational platforms for theory/data/applications integration rooted in his profound and extensive expertise in the design of electronic structure codes and in his vision for sustainable and innovative software development for high-performance materials simulations. His research activities range from the design and discovery of novel materials for 21st century applications in renewable energy, environment, nano-electronics and devices, the development of advanced electronic structure theories and high-throughput techniques in materials genomics and computational materials design, to an active role as community scientific software developer (QUANTUM ESPRESSO, WanT, AFLOWpi)
Cloud field classification based on textural features
NASA Technical Reports Server (NTRS)
Sengupta, Sailes Kumar
1989-01-01
An essential component in global climate research is accurate cloud cover and type determination. Of the two approaches to texture-based classification (statistical and textural), only the former is effective in the classification of natural scenes such as land, ocean, and atmosphere. In the statistical approach that was adopted, parameters characterizing the stochastic properties of the spatial distribution of grey levels in an image are estimated and then used as features for cloud classification. Two types of textural measures were used. One is based on the distribution of the grey level difference vector (GLDV), and the other on a set of textural features derived from the MaxMin cooccurrence matrix (MMCM). The GLDV method looks at the difference D of grey levels at pixels separated by a horizontal distance d and computes several statistics based on this distribution. These are then used as features in subsequent classification. The MaxMin tectural features on the other hand are based on the MMCM, a matrix whose (I,J)th entry give the relative frequency of occurrences of the grey level pair (I,J) that are consecutive and thresholded local extremes separated by a given pixel distance d. Textural measures are then computed based on this matrix in much the same manner as is done in texture computation using the grey level cooccurrence matrix. The database consists of 37 cloud field scenes from LANDSAT imagery using a near IR visible channel. The classification algorithm used is the well known Stepwise Discriminant Analysis. The overall accuracy was estimated by the percentage or correct classifications in each case. It turns out that both types of classifiers, at their best combination of features, and at any given spatial resolution give approximately the same classification accuracy. A neural network based classifier with a feed forward architecture and a back propagation training algorithm is used to increase the classification accuracy, using these two classes of features. Preliminary results based on the GLDV textural features alone look promising.
A transient search using combined human and machine classifications
NASA Astrophysics Data System (ADS)
Wright, Darryl E.; Lintott, Chris J.; Smartt, Stephen J.; Smith, Ken W.; Fortson, Lucy; Trouille, Laura; Allen, Campbell R.; Beck, Melanie; Bouslog, Mark C.; Boyer, Amy; Chambers, K. C.; Flewelling, Heather; Granger, Will; Magnier, Eugene A.; McMaster, Adam; Miller, Grant R. M.; O'Donnell, James E.; Simmons, Brooke; Spiers, Helen; Tonry, John L.; Veldthuis, Marten; Wainscoat, Richard J.; Waters, Chris; Willman, Mark; Wolfenbarger, Zach; Young, Dave R.
2017-12-01
Large modern surveys require efficient review of data in order to find transient sources such as supernovae, and to distinguish such sources from artefacts and noise. Much effort has been put into the development of automatic algorithms, but surveys still rely on human review of targets. This paper presents an integrated system for the identification of supernovae in data from Pan-STARRS1, combining classifications from volunteers participating in a citizen science project with those from a convolutional neural network. The unique aspect of this work is the deployment, in combination, of both human and machine classifications for near real-time discovery in an astronomical project. We show that the combination of the two methods outperforms either one used individually. This result has important implications for the future development of transient searches, especially in the era of Large Synoptic Survey Telescope and other large-throughput surveys.
Mining disease fingerprints from within genetic pathways.
Nabhan, Ahmed Ragab; Sarkar, Indra Neil
2012-01-01
Mining biological networks can be an effective means to uncover system level knowledge out of micro level associations, such as encapsulated in genetic pathways. Analysis of human disease genetic pathways can lead to the identification of major mechanisms that may underlie disorders at an abstract functional level. The focus of this study was to develop an approach for structural pattern analysis and classification of genetic pathways of diseases. A probabilistic model was developed to capture characteristic components ('fingerprints') of functionally annotated pathways. A probability estimation procedure of this model searched for fingerprints in each disease pathway while improving probability estimates of model parameters. The approach was evaluated on data from the Kyoto Encyclopedia of Genes and Genomes (consisting of 56 pathways across seven disease categories). Based on the achieved average classification accuracy of up to ~77%, the findings suggest that these fingerprints may be used for classification and discovery of genetic pathways.
Mining Disease Fingerprints From Within Genetic Pathways
Nabhan, Ahmed Ragab; Sarkar, Indra Neil
2012-01-01
Mining biological networks can be an effective means to uncover system level knowledge out of micro level associations, such as encapsulated in genetic pathways. Analysis of human disease genetic pathways can lead to the identification of major mechanisms that may underlie disorders at an abstract functional level. The focus of this study was to develop an approach for structural pattern analysis and classification of genetic pathways of diseases. A probabilistic model was developed to capture characteristic components (‘fingerprints’) of functionally annotated pathways. A probability estimation procedure of this model searched for fingerprints in each disease pathway while improving probability estimates of model parameters. The approach was evaluated on data from the Kyoto Encyclopedia of Genes and Genomes (consisting of 56 pathways across seven disease categories). Based on the achieved average classification accuracy of up to ∼77%, the findings suggest that these fingerprints may be used for classification and discovery of genetic pathways. PMID:23304411
Exploring the Role of Receptor Flexibility in Structure-Based Drug Discovery
Feixas, Ferran; Lindert, Steffen; Sinko, William; McCammon, J. Andrew
2015-01-01
The proper understanding of biomolecular recognition mechanisms that take place in a drug target is of paramount importance to improve the efficiency of drug discovery and development. The intrinsic dynamic character of proteins has a strong influence on biomolecular recognition mechanisms and models such as conformational selection have been widely used to account for this dynamic association process. However, conformational changes occurring in the receptor prior and upon association with other molecules are diverse and not obvious to predict when only a few structures of the receptor are available. In view of the prominent role of protein flexibility in ligand binding and its implications for drug discovery, it is of great interest to identify receptor conformations that play a major role in biomolecular recognition before starting rational drug design efforts. In this review, we discuss a number of recent advances in computer-aided drug discovery techniques that have been proposed to incorporate receptor flexibility into structure-based drug design. The allowance for receptor flexibility provided by computational techniques such as molecular dynamics simulations or enhanced sampling techniques helps to improve the accuracy of methods used to estimate binding affinities and, thus, such methods can contribute to the discovery of novel drug leads. PMID:24332165
Data Resources for the Computer-Guided Discovery of Bioactive Natural Products.
Chen, Ya; de Bruyn Kops, Christina; Kirchmair, Johannes
2017-09-25
Natural products from plants, animals, marine life, fungi, bacteria, and other organisms are an important resource for modern drug discovery. Their biological relevance and structural diversity make natural products good starting points for drug design. Natural product-based drug discovery can benefit greatly from computational approaches, which are a valuable precursor or supplementary method to in vitro testing. We present an overview of 25 virtual and 31 physical natural product libraries that are useful for applications in cheminformatics, in particular virtual screening. The overview includes detailed information about each library, the extent of its structural information, and the overlap between different sources of natural products. In terms of chemical structures, there is a large overlap between freely available and commercial virtual natural product libraries. Of particular interest for drug discovery is that at least ten percent of known natural products are readily purchasable and many more natural products and derivatives are available through on-demand sourcing, extraction and synthesis services. Many of the readily purchasable natural products are of small size and hence of relevance to fragment-based drug discovery. There are also an increasing number of macrocyclic natural products and derivatives becoming available for screening.
Applied metabolomics in drug discovery.
Cuperlovic-Culf, M; Culf, A S
2016-08-01
The metabolic profile is a direct signature of phenotype and biochemical activity following any perturbation. Metabolites are small molecules present in a biological system including natural products as well as drugs and their metabolism by-products depending on the biological system studied. Metabolomics can provide activity information about possible novel drugs and drug scaffolds, indicate interesting targets for drug development and suggest binding partners of compounds. Furthermore, metabolomics can be used for the discovery of novel natural products and in drug development. Metabolomics can enhance the discovery and testing of new drugs and provide insight into the on- and off-target effects of drugs. This review focuses primarily on the application of metabolomics in the discovery of active drugs from natural products and the analysis of chemical libraries and the computational analysis of metabolic networks. Metabolomics methodology, both experimental and analytical is fast developing. At the same time, databases of compounds are ever growing with the inclusion of more molecular and spectral information. An increasing number of systems are being represented by very detailed metabolic network models. Combining these experimental and computational tools with high throughput drug testing and drug discovery techniques can provide new promising compounds and leads.
An overview of aldehyde oxidase: an enzyme of emerging importance in novel drug discovery.
Rashidi, Mohammad-Reza; Soltani, Somaieh
2017-03-01
Given the rising trend in medicinal chemistry strategy to reduce cytochrome P450-dependent metabolism, aldehyde oxidase (AOX) has recently gained increased attention in drug discovery programs and the number of drug candidates that are metabolized by AOX is steadily growing. Areas covered: Despite the emerging importance of AOX in drug discovery, there are certain major recognized problems associated with AOX-mediated metabolism of drugs. Intra- and inter-species variations in AOX activity, the lack of reliable and predictive animal models using the common experimental animals, and failure in the predictions of in vivo metabolic activity of AOX using traditional in vitro methods are among these issues that are covered in this article. A comprehensive review of computational human AOX (hAOX) related studies are also provided. Expert opinion: Following the recent progress in the stem cell field, the authors recommend the application of organoids technology as an effective tool to solve the fundamental problems associated with the evaluation of AOX in drug discovery. The recent success in resolving the hAOX crystal structure can too be another valuable data source for the study of AOX-catalyzed metabolism of new drug candidates, using computer-aided drug discovery methods.
The Emergence of Organizing Structure in Conceptual Representation
ERIC Educational Resources Information Center
Lake, Brenden M.; Lawrence, Neil D.; Tenenbaum, Joshua B.
2018-01-01
Both scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form--where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach…
20 Discoveries that Shaped Our Lives: Century of the Sciences.
ERIC Educational Resources Information Center
Judson, Horace Freeland
1984-01-01
Describes (in separate articles) 20 developments in science, technology, and medicine that were made during the twentieth century and had significant impact on society. They include discoveries related to intelligence tests, plastics, aviation, antibiotics, genetics, evolution, birth control, computers, transistors, DNA, lasers, statistics,…
On evaluating clustering procedures for use in classification
NASA Technical Reports Server (NTRS)
Pore, M. D.; Moritz, T. E.; Register, D. T.; Yao, S. S.; Eppler, W. G. (Principal Investigator)
1979-01-01
The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.
ERIC Educational Resources Information Center
Mandel, Carol A.
This paper presents a synthesis of the ideas and issues developed at a conference convened to review the results of the Dewey Decimal Classification Online Project and explore the potential for future use of the Dewey Decimal Classification (DDC) and Library of Congress Classification (LCC) schedules in online library catalogs. Conference…
Concept of Smart Cyberspace for Smart Grid Implementation
NASA Astrophysics Data System (ADS)
Zhukovskiy, Y.; Malov, D.
2018-05-01
The concept of Smart Cyberspace for Smart Grid (SG) implementation is presented in the paper. The classification of electromechanical units, based on the amount of analysing data, the classification of electromechanical units, based on the data processing speed; and the classification of computational network organization, based on required resources, are proposed in this paper. The combination of the considered classifications is formalized, which can be further used in organizing and planning of SG.
Prasad, G V R
2009-11-01
This paper presents a brief review of recent advances in the classification of mammals at higher levels using fossils and molecular clocks. It also discusses latest fossil discoveries from the Cretaceous - Eocene (66-55 m.y.) rocks of India and their relevance to our current understanding of placental mammal origins and diversifications.
Applications of microarray technology in breast cancer research
Cooper, Colin S
2001-01-01
Microarrays provide a versatile platform for utilizing information from the Human Genome Project to benefit human health. This article reviews the ways in which microarray technology may be used in breast cancer research. Its diverse applications include monitoring chromosome gains and losses, tumour classification, drug discovery and development, DNA resequencing, mutation detection and investigating the mechanism of tumour development. PMID:11305951
Data Science and Optimal Learning for Material Discovery and Design
in computation and experimental techniques generating vast arrays of data, without a clear link to experimental and computational data, designing new materials and impacting computational models. This meeting computational and experimental data (c) Analysis of data from probes such as light sources, as well as other
Huang, Hung-Chung; Jupiter, Daniel; VanBuren, Vincent
2010-01-01
Background Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. Methodology/Principal Findings In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as ‘brain group’ and ‘non-brain group’; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. Conclusions/Significance The methodology employed here may be used to facilitate disease-specific biomarker discovery. PMID:20140228
GPURFSCREEN: a GPU based virtual screening tool using random forest classifier.
Jayaraj, P B; Ajay, Mathias K; Nufail, M; Gopakumar, G; Jaleel, U C A
2016-01-01
In-silico methods are an integral part of modern drug discovery paradigm. Virtual screening, an in-silico method, is used to refine data models and reduce the chemical space on which wet lab experiments need to be performed. Virtual screening of a ligand data model requires large scale computations, making it a highly time consuming task. This process can be speeded up by implementing parallelized algorithms on a Graphical Processing Unit (GPU). Random Forest is a robust classification algorithm that can be employed in the virtual screening. A ligand based virtual screening tool (GPURFSCREEN) that uses random forests on GPU systems has been proposed and evaluated in this paper. This tool produces optimized results at a lower execution time for large bioassay data sets. The quality of results produced by our tool on GPU is same as that on a regular serial environment. Considering the magnitude of data to be screened, the parallelized virtual screening has a significantly lower running time at high throughput. The proposed parallel tool outperforms its serial counterpart by successfully screening billions of molecules in training and prediction phases.
A computer model of context-dependent perception in a very simple world
NASA Astrophysics Data System (ADS)
Lara-Dammer, Francisco; Hofstadter, Douglas R.; Goldstone, Robert L.
2017-11-01
We propose the foundations of a computer model of scientific discovery that takes into account certain psychological aspects of human observation of the world. To this end, we simulate two main components of such a system. The first is a dynamic microworld in which physical events take place, and the second is an observer that visually perceives entities and events in the microworld. For reason of space, this paper focuses only on the starting phase of discovery, which is the relatively simple visual inputs of objects and collisions.
Automated analysis of biological oscillator models using mode decomposition.
Konopka, Tomasz
2011-04-01
Oscillating signals produced by biological systems have shapes, described by their Fourier spectra, that can potentially reveal the mechanisms that generate them. Extracting this information from measured signals is interesting for the validation of theoretical models, discovery and classification of interaction types, and for optimal experiment design. An automated workflow is described for the analysis of oscillating signals. A software package is developed to match signal shapes to hundreds of a priori viable model structures defined by a class of first-order differential equations. The package computes parameter values for each model by exploiting the mode decomposition of oscillating signals and formulating the matching problem in terms of systems of simultaneous polynomial equations. On the basis of the computed parameter values, the software returns a list of models consistent with the data. In validation tests with synthetic datasets, it not only shortlists those model structures used to generate the data but also shows that excellent fits can sometimes be achieved with alternative equations. The listing of all consistent equations is indicative of how further invalidation might be achieved with additional information. When applied to data from a microarray experiment on mice, the procedure finds several candidate model structures to describe interactions related to the circadian rhythm. This shows that experimental data on oscillators is indeed rich in information about gene regulation mechanisms. The software package is available at http://babylone.ulb.ac.be/autoosc/.
ERIC Educational Resources Information Center
Duncan, David R.; Litwiller, Bonnie H.
1986-01-01
Practice on computation is provided using new and novel situations, with discovery of number patterns a bonus of the computation. Activities with disguised addition and multiplication tables for whole numbers and integers are presented for varying levels. (MNS)
The Greatest Mathematical Discovery?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H.; Borwein, Jonathan M.
2010-05-12
What mathematical discovery more than 1500 years ago: (1) Is one of the greatest, if not the greatest, single discovery in the field of mathematics? (2) Involved three subtle ideas that eluded the greatest minds of antiquity, even geniuses such as Archimedes? (3) Was fiercely resisted in Europe for hundreds of years after its discovery? (4) Even today, in historical treatments of mathematics, is often dismissed with scant mention, or else is ascribed to the wrong source? Answer: Our modern system of positional decimal notation with zero, together with the basic arithmetic computational schemes, which were discovered in India aboutmore » 500 CE.« less
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.
Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor
2010-05-01
Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal
NASA Astrophysics Data System (ADS)
Sheshkus, Alexander; Limonova, Elena; Nikolaev, Dmitry; Krivtsov, Valeriy
2017-03-01
In this paper, we propose an expansion of convolutional neural network (CNN) input features based on Hough Transform. We perform morphological contrasting of source image followed by Hough Transform, and then use it as input for some convolutional filters. Thus, CNNs computational complexity and the number of units are not affected. Morphological contrasting and Hough Transform are the only additional computational expenses of introduced CNN input features expansion. Proposed approach was demonstrated on the example of CNN with very simple structure. We considered two image recognition problems, that were object classification on CIFAR-10 and printed character recognition on private dataset with symbols taken from Russian passports. Our approach allowed to reach noticeable accuracy improvement without taking much computational effort, which can be extremely important in industrial recognition systems or difficult problems utilising CNNs, like pressure ridge analysis and classification.
NASA Technical Reports Server (NTRS)
Cibula, W. G.
1981-01-01
Four LANDSAT frames, each corresponding to one of the four seasons were spectrally classified and processed using NASA-developed computer programs. One data set was selected or two or more data sets were marged to improve surface cover classifications. Selected areas representing each spectral class were chosen and transferred to USGS 1:62,500 topographic maps for field use. Ground truth data were gathered to verify the accuracy of the classifications. Acreages were computed for each of the land cover types. The application of elevational data to seasonal LANDSAT frames resulted in the separation of high elevation meadows (both with and without recently emergent perennial vegetation) as well as areas in oak forests which have an evergreen understory as opposed to other areas which do not.
Classification of cancerous cells based on the one-class problem approach
NASA Astrophysics Data System (ADS)
Murshed, Nabeel A.; Bortolozzi, Flavio; Sabourin, Robert
1996-03-01
One of the most important factors in reducing the effect of cancerous diseases is the early diagnosis, which requires a good and a robust method. With the advancement of computer technologies and digital image processing, the development of a computer-based system has become feasible. In this paper, we introduce a new approach for the detection of cancerous cells. This approach is based on the one-class problem approach, through which the classification system need only be trained with patterns of cancerous cells. This reduces the burden of the training task by about 50%. Based on this approach, a computer-based classification system is developed, based on the Fuzzy ARTMAP neural networks. Experimental results were performed using a set of 542 patterns taken from a sample of breast cancer. Results of the experiment show 98% correct identification of cancerous cells and 95% correct identification of non-cancerous cells.
Challenges in the automated classification of variable stars in large databases
NASA Astrophysics Data System (ADS)
Graham, Matthew; Drake, Andrew; Djorgovski, S. G.; Mahabal, Ashish; Donalek, Ciro
2017-09-01
With ever-increasing numbers of astrophysical transient surveys, new facilities and archives of astronomical time series, time domain astronomy is emerging as a mainstream discipline. However, the sheer volume of data alone - hundreds of observations for hundreds of millions of sources - necessitates advanced statistical and machine learning methodologies for scientific discovery: characterization, categorization, and classification. Whilst these techniques are slowly entering the astronomer's toolkit, their application to astronomical problems is not without its issues. In this paper, we will review some of the challenges posed by trying to identify variable stars in large data collections, including appropriate feature representations, dealing with uncertainties, establishing ground truths, and simple discrete classes.
Black smokers and the Tree of Life
NASA Astrophysics Data System (ADS)
Linich, Michael
The molecular biology revolution has turned the classification of life on its head. Is Whittaker's five-kingdom scheme for the classification of living things no longer relevant to life science education? Coupled with this is the discovery that most microscopic life cannot yet be brought into culture. One of the key organisms making this knowledge possible is Methanococcus jannishi a microorganism found in black smokers. This workshop presents the development of the Universal Tree of Life in a historical context and then links together major concepts in the New South Wales senior science programs of Earth and Environmental Science and Biology by examining the biological and geological aspects of changes to black smokers over geological time.
Heterogeneous data fusion for brain tumor classification.
Metsis, Vangelis; Huang, Heng; Andronesi, Ovidiu C; Makedon, Fillia; Tzika, Aria
2012-10-01
Current research in biomedical informatics involves analysis of multiple heterogeneous data sets. This includes patient demographics, clinical and pathology data, treatment history, patient outcomes as well as gene expression, DNA sequences and other information sources such as gene ontology. Analysis of these data sets could lead to better disease diagnosis, prognosis, treatment and drug discovery. In this report, we present a novel machine learning framework for brain tumor classification based on heterogeneous data fusion of metabolic and molecular datasets, including state-of-the-art high-resolution magic angle spinning (HRMAS) proton (1H) magnetic resonance spectroscopy and gene transcriptome profiling, obtained from intact brain tumor biopsies. Our experimental results show that our novel framework outperforms any analysis using individual dataset.
Inference and Discovery in an Exploratory Laboratory. Technical Report No. 10.
ERIC Educational Resources Information Center
Shute, Valerie; And Others
This paper describes the results of a study done as part of a research program investigating the use of computer-based laboratories to support self-paced discovery learning in related to microeconomics, electricity, and light refraction. Program objectives include maximizing the laboratories' effectiveness in helping students learn content…
Astronaut Jan Davis monitors Commercial Protein Crystal Growth experiment
1994-02-03
STS060-21-031 (3-11 Feb 1994) --- Using a lap top computer, astronaut N. Jan Davis monitors systems for the Commercial Protein Crystal Growth (CPCG) experiment onboard the Space Shuttle Discovery. Davis joined four other NASA astronauts and a Russian cosmonaut for eight days in space aboard Discovery.
NASA Astrophysics Data System (ADS)
Verma, Surendra P.; Rivera-Gómez, M. Abdelaly; Díaz-González, Lorena; Quiroz-Ruiz, Alfredo
2016-12-01
A new multidimensional classification scheme consistent with the chemical classification of the International Union of Geological Sciences (IUGS) is proposed for the nomenclature of High-Mg altered rocks. Our procedure is based on an extensive database of major element (SiO2, TiO2, Al2O3, Fe2O3t, MnO, MgO, CaO, Na2O, K2O, and P2O5) compositions of a total of 33,868 (920 High-Mg and 32,948 "Common") relatively fresh igneous rock samples. The database consisting of these multinormally distributed samples in terms of their isometric log-ratios was used to propose a set of 11 discriminant functions and 6 diagrams to facilitate High-Mg rock classification. The multinormality required by linear discriminant and canonical analysis was ascertained by a new computer program DOMuDaF. One multidimensional function can distinguish the High-Mg and Common igneous rocks with high percent success values of about 86.4% and 98.9%, respectively. Similarly, from 10 discriminant functions the High-Mg rocks can also be classified as one of the four rock types (komatiite, meimechite, picrite, and boninite), with high success values of about 88%-100%. Satisfactory functioning of this new classification scheme was confirmed by seven independent tests. Five further case studies involving application to highly altered rocks illustrate the usefulness of our proposal. A computer program HMgClaMSys was written to efficiently apply the proposed classification scheme, which will be available for online processing of igneous rock compositional data. Monte Carlo simulation modeling and mass-balance computations confirmed the robustness of our classification with respect to analytical errors and postemplacement compositional changes.
Current status and future prospects for enabling chemistry technology in the drug discovery process.
Djuric, Stevan W; Hutchins, Charles W; Talaty, Nari N
2016-01-01
This review covers recent advances in the implementation of enabling chemistry technologies into the drug discovery process. Areas covered include parallel synthesis chemistry, high-throughput experimentation, automated synthesis and purification methods, flow chemistry methodology including photochemistry, electrochemistry, and the handling of "dangerous" reagents. Also featured are advances in the "computer-assisted drug design" area and the expanding application of novel mass spectrometry-based techniques to a wide range of drug discovery activities.
Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques.
Kebschull, Moritz; Papapanou, Panos N
2017-01-01
Although contemporary high-throughput -omics methods produce high-dimensional data, the resulting wealth of information is difficult to assess using traditional statistical procedures. Machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups.Here, we demonstrate the utility of (1) supervised classification algorithms in class validation, and (2) unsupervised clustering in class discovery. We use data from our previous work that described the transcriptional profiles of gingival tissue samples obtained from subjects suffering from chronic or aggressive periodontitis (1) to test whether the two diagnostic entities were also characterized by differences on the molecular level, and (2) to search for a novel, alternative classification of periodontitis based on the tissue transcriptomes.Using machine learning technology, we provide evidence for diagnostic imprecision in the currently accepted classification of periodontitis, and demonstrate that a novel, alternative classification based on differences in gingival tissue transcriptomes is feasible. The outlined procedures allow for the unbiased interrogation of high-dimensional datasets for characteristic underlying classes, and are applicable to a broad range of -omics data.
Taguchi, Y-h; Iwadate, Mitsuo; Umeyama, Hideaki
2015-04-30
Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery.
Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren
2018-01-01
Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.
Zhan, Mei; Crane, Matthew M; Entchev, Eugeni V; Caballero, Antonio; Fernandes de Abreu, Diana Andrea; Ch'ng, QueeLim; Lu, Hang
2015-04-01
Quantitative imaging has become a vital technique in biological discovery and clinical diagnostics; a plethora of tools have recently been developed to enable new and accelerated forms of biological investigation. Increasingly, the capacity for high-throughput experimentation provided by new imaging modalities, contrast techniques, microscopy tools, microfluidics and computer controlled systems shifts the experimental bottleneck from the level of physical manipulation and raw data collection to automated recognition and data processing. Yet, despite their broad importance, image analysis solutions to address these needs have been narrowly tailored. Here, we present a generalizable formulation for autonomous identification of specific biological structures that is applicable for many problems. The process flow architecture we present here utilizes standard image processing techniques and the multi-tiered application of classification models such as support vector machines (SVM). These low-level functions are readily available in a large array of image processing software packages and programming languages. Our framework is thus both easy to implement at the modular level and provides specific high-level architecture to guide the solution of more complicated image-processing problems. We demonstrate the utility of the classification routine by developing two specific classifiers as a toolset for automation and cell identification in the model organism Caenorhabditis elegans. To serve a common need for automated high-resolution imaging and behavior applications in the C. elegans research community, we contribute a ready-to-use classifier for the identification of the head of the animal under bright field imaging. Furthermore, we extend our framework to address the pervasive problem of cell-specific identification under fluorescent imaging, which is critical for biological investigation in multicellular organisms or tissues. Using these examples as a guide, we envision the broad utility of the framework for diverse problems across different length scales and imaging methods.
How Effective Is Instructional Support for Learning with Computer Simulations?
ERIC Educational Resources Information Center
Eckhardt, Marc; Urhahne, Detlef; Conrad, Olaf; Harms, Ute
2013-01-01
The study examined the effects of two different instructional interventions as support for scientific discovery learning using computer simulations. In two well-known categories of difficulty, data interpretation and self-regulation, instructional interventions for learning with computer simulations on the topic "ecosystem water" were developed…
Wang, Jie; Feng, Zuren; Lu, Na; Luo, Jing
2018-06-01
Feature selection plays an important role in the field of EEG signals based motor imagery pattern classification. It is a process that aims to select an optimal feature subset from the original set. Two significant advantages involved are: lowering the computational burden so as to speed up the learning procedure and removing redundant and irrelevant features so as to improve the classification performance. Therefore, feature selection is widely employed in the classification of EEG signals in practical brain-computer interface systems. In this paper, we present a novel statistical model to select the optimal feature subset based on the Kullback-Leibler divergence measure, and automatically select the optimal subject-specific time segment. The proposed method comprises four successive stages: a broad frequency band filtering and common spatial pattern enhancement as preprocessing, features extraction by autoregressive model and log-variance, the Kullback-Leibler divergence based optimal feature and time segment selection and linear discriminate analysis classification. More importantly, this paper provides a potential framework for combining other feature extraction models and classification algorithms with the proposed method for EEG signals classification. Experiments on single-trial EEG signals from two public competition datasets not only demonstrate that the proposed method is effective in selecting discriminative features and time segment, but also show that the proposed method yields relatively better classification results in comparison with other competitive methods. Copyright © 2018 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Zi-Kui; Gleeson, Brian; Shang, Shunli
This project developed computational tools that can complement and support experimental efforts in order to enable discovery and more efficient development of Ni-base structural materials and coatings. The project goal was reached through an integrated computation-predictive and experimental-validation approach, including first-principles calculations, thermodynamic CALPHAD (CALculation of PHAse Diagram), and experimental investigations on compositions relevant to Ni-base superalloys and coatings in terms of oxide layer growth and microstructure stabilities. The developed description included composition ranges typical for coating alloys and, hence, allow for prediction of thermodynamic properties for these material systems. The calculation of phase compositions, phase fraction, and phase stabilities,more » which are directly related to properties such as ductility and strength, was a valuable contribution, along with the collection of computational tools that are required to meet the increasing demands for strong, ductile and environmentally-protective coatings. Specifically, a suitable thermodynamic description for the Ni-Al-Cr-Co-Si-Hf-Y system was developed for bulk alloy and coating compositions. Experiments were performed to validate and refine the thermodynamics from the CALPHAD modeling approach. Additionally, alloys produced using predictions from the current computational models were studied in terms of their oxidation performance. Finally, results obtained from experiments aided in the development of a thermodynamic modeling automation tool called ESPEI/pycalphad - for more rapid discovery and development of new materials.« less
Classifying EEG for Brain-Computer Interface: Learning Optimal Filters for Dynamical System Features
Song, Le; Epps, Julien
2007-01-01
Classification of multichannel EEG recordings during motor imagination has been exploited successfully for brain-computer interfaces (BCI). In this paper, we consider EEG signals as the outputs of a networked dynamical system (the cortex), and exploit synchronization features from the dynamical system for classification. Herein, we also propose a new framework for learning optimal filters automatically from the data, by employing a Fisher ratio criterion. Experimental evaluations comparing the proposed dynamical system features with the CSP and the AR features reveal their competitive performance during classification. Results also show the benefits of employing the spatial and the temporal filters optimized using the proposed learning approach. PMID:18364986
The ERTS-1 investigation (ER-600). Volume 4: ERTS-1 range analysis
NASA Technical Reports Server (NTRS)
Erb, R. B.
1974-01-01
The Range Analysis Team conducted an investigation to determine the utility of using LANDSAT 1 data for mapping vegetation-type information on range and related grazing lands. Two study areas within the Houston Area Test Site (HATS) were mapped to the highest classification level possible using manual image interpretation and computer aided classification techniques. Rangeland was distinguished from nonrangeland (water, urban area, and cropland) and was further classified as woodland versus nonwoodland. Finer classification of coastal features was attempted with some success in differentiating the lowland zone from the drier upland zone. Computer aided temporal analysis techniques enhanced discrimination among nearly all the vegetation types found in this investigation.
Land classification of south-central Iowa from computer enhanced images
NASA Technical Reports Server (NTRS)
Lucas, J. R.; Taranik, J. V.; Billingsley, F. C. (Principal Investigator)
1977-01-01
The author has identified the following significant results. Enhanced LANDSAT imagery was most useful for land classification purposes, because these images could be photographically printed at large scales such as 1:63,360. The ability to see individual picture elements was no hindrance as long as general image patterns could be discerned. Low cost photographic processing systems for color printings have proved to be effective in the utilization of computer enhanced LANDSAT products for land classification purposes. The initial investment for this type of system was very low, ranging from $100 to $200 beyond a black and white photo lab. The technical expertise can be acquired from reading a color printing and processing manual.
NASA Astrophysics Data System (ADS)
Qu, Haicheng; Liang, Xuejian; Liang, Shichao; Liu, Wanjun
2018-01-01
Many methods of hyperspectral image classification have been proposed recently, and the convolutional neural network (CNN) achieves outstanding performance. However, spectral-spatial classification of CNN requires an excessively large model, tremendous computations, and complex network, and CNN is generally unable to use the noisy bands caused by water-vapor absorption. A dimensionality-varied CNN (DV-CNN) is proposed to address these issues. There are four stages in DV-CNN and the dimensionalities of spectral-spatial feature maps vary with the stages. DV-CNN can reduce the computation and simplify the structure of the network. All feature maps are processed by more kernels in higher stages to extract more precise features. DV-CNN also improves the classification accuracy and enhances the robustness to water-vapor absorption bands. The experiments are performed on data sets of Indian Pines and Pavia University scene. The classification performance of DV-CNN is compared with state-of-the-art methods, which contain the variations of CNN, traditional, and other deep learning methods. The experiment of performance analysis about DV-CNN itself is also carried out. The experimental results demonstrate that DV-CNN outperforms state-of-the-art methods for spectral-spatial classification and it is also robust to water-vapor absorption bands. Moreover, reasonable parameters selection is effective to improve classification accuracy.
Janousova, Eva; Schwarz, Daniel; Kasparek, Tomas
2015-06-30
We investigated a combination of three classification algorithms, namely the modified maximum uncertainty linear discriminant analysis (mMLDA), the centroid method, and the average linkage, with three types of features extracted from three-dimensional T1-weighted magnetic resonance (MR) brain images, specifically MR intensities, grey matter densities, and local deformations for distinguishing 49 first episode schizophrenia male patients from 49 healthy male subjects. The feature sets were reduced using intersubject principal component analysis before classification. By combining the classifiers, we were able to obtain slightly improved results when compared with single classifiers. The best classification performance (81.6% accuracy, 75.5% sensitivity, and 87.8% specificity) was significantly better than classification by chance. We also showed that classifiers based on features calculated using more computation-intensive image preprocessing perform better; mMLDA with classification boundary calculated as weighted mean discriminative scores of the groups had improved sensitivity but similar accuracy compared to the original MLDA; reducing a number of eigenvectors during data reduction did not always lead to higher classification accuracy, since noise as well as the signal important for classification were removed. Our findings provide important information for schizophrenia research and may improve accuracy of computer-aided diagnostics of neuropsychiatric diseases. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Rational design of an enzyme mutant for anti-cocaine therapeutics
NASA Astrophysics Data System (ADS)
Zheng, Fang; Zhan, Chang-Guo
2008-09-01
(-)-Cocaine is a widely abused drug and there is no available anti-cocaine therapeutic. The disastrous medical and social consequences of cocaine addiction have made the development of an effective pharmacological treatment a high priority. An ideal anti-cocaine medication would be to accelerate (-)-cocaine metabolism producing biologically inactive metabolites. The main metabolic pathway of cocaine in body is the hydrolysis at its benzoyl ester group. Reviewed in this article is the state-of-the-art computational design of high-activity mutants of human butyrylcholinesterase (BChE) against (-)-cocaine. The computational design of BChE mutants have been based on not only the structure of the enzyme, but also the detailed catalytic mechanisms for BChE-catalyzed hydrolysis of (-)-cocaine and (+)-cocaine. Computational studies of the detailed catalytic mechanisms and the structure-and-mechanism-based computational design have been carried out through the combined use of a variety of state-of-the-art techniques of molecular modeling. By using the computational insights into the catalytic mechanisms, a recently developed unique computational design strategy based on the simulation of the rate-determining transition state has been employed to design high-activity mutants of human BChE for hydrolysis of (-)-cocaine, leading to the exciting discovery of BChE mutants with a considerably improved catalytic efficiency against (-)-cocaine. One of the discovered BChE mutants (i.e., A199S/S287G/A328W/Y332G) has a ˜456-fold improved catalytic efficiency against (-)-cocaine. The encouraging outcome of the computational design and discovery effort demonstrates that the unique computational design approach based on the transition-state simulation is promising for rational enzyme redesign and drug discovery.
NASA Astrophysics Data System (ADS)
Cialone, Claudia; Stock, Kristin
2010-05-01
EuroGEOSS is a European Commission funded project. It aims at improving a scientific understanding of the complex mechanisms which drive changes affecting our planet, identifying and establishing interoperable arrangements between environmental information systems. These systems would be sustained and operated by organizations with a clear mandate and resources and rendered available following the specifications of already existent frameworks such as GEOSS (the Global Earth Observation System of systems)1 and INSPIRE (the Infrastructure for Spatial Information in the European Community)2. The EuroGEOSS project's infrastructure focuses on three thematic areas: forestry, drought and biodiversity. One of the important activities in the project is the retrieval, parsing and harmonization of the large amount of heterogeneous environmental data available at local, regional and global levels between these strategic areas. The challenge is to render it semantically and technically interoperable in a simple way. An initial step in achieving this semantic and technical interoperability involves the selection of appropriate classification schemes (for example, thesauri, ontologies and controlled vocabularies) to describe the resources in the EuroGEOSS framework. These classifications become a crucial part of the interoperable framework scaffolding because they allow data providers to describe their resources and thus support resource discovery, execution and orchestration of varying levels of complexity. However, at present, given the diverse range of environmental thesauri, controlled vocabularies and ontologies and the large number of resources provided by project participants, the selection of appropriate classification schemes involves a number of considerations. First of all, there is the semantic difficulty of selecting classification schemes that contain concepts that are relevant to each thematic area. Secondly, EuroGEOSS is intended to accommodate a number of existing environmental projects (for example, GEOSS and INSPIRE). This requirement imposes constraints on the selection. Thirdly, the selected classification scheme or group of schemes (if more than one) must be capable of alignment (establishing different kinds of mappings between concepts, hence preserving intact the original knowledge schemes) or merging (the creation of another unique ontology from the original ontological sources) (Pérez-Gómez et al., 2004). Last but not least, there is the issue of including multi-lingual schemes that are based on free, open standards (non-proprietary). Using these selection criteria, we aim to support open and convenient data discovery and exchange for users who speak different languages (particularly the European ones for the broad scopes of EuroGEOSS). In order to support the project, we have developed a solution that employs two classification schemes: the Societal Benefit Areas (SBAs)3: the upper-level environmental categorization developed for the GEOSS project and the GEneral Multilingual Environmental Thesaurus (GEMET)4: a general environmental thesaurus whose conceptual structure has already been integrated with the spatial data themes proposed by the INSPIRE project. The former seems to provide the spatial data keywords relevant to the INSPIRE's Directive (JRC, 2008). In this way, we provide users with a basic set of concepts to support resource description and discovery in the thematic areas while supporting the requirements of INSPIRE and GEOSS. Furthermore, the use of only two classification schemes together with the fact that the SBAs are very general categories while GEMET includes much more detailed, yet still top-level, concepts, makes alignment an achievable task. Alignment was selected over merging because it leaves the existing classification schemes intact and requires only a simple activity of defining mappings from GEMET to the SBAs. In order to accomplish this task we are developing a simple, automated, open-source application to assist thematic experts in defining the mappings between concepts in the two classification schemes. The application will then generate SKOS mappings (exactMatch, closeMatch, broadMatch, narrowMatch, relatedMatch) based on thematic expert selections between the concepts in GEMET with the SBAs (including both the general Societal Benefit Areas and their subcategories). Once these mappings are defined and the SKOS files generated, resource providers will be able to select concepts from either GEMET or the SBAs (or a mixture) to describe their resources, and discovery approaches will support selection of concepts from either classification scheme, also returning results classified using the other scheme. While the focus of our work has been on the SBAs and GEMET, we also plan to provide a method for resource providers to further extend the semantic infrastructure by defining alignments to new classification schemes if these are required to support particular specialized thematic areas that are not covered by GEMET. In this way, the approach is flexible and suited to the general scope of EuroGEOSS, allowing specialists to increase at will the level of semantic quality and specificity of data to the initial infrastructural skeleton of the project. References ____________________________________________ Joint research Centre (JRC), 2008. INSPIRE Metadata Editor User Guide Pérez-Gómez A., Fernandez-Lopez M., Corcho O. Ontological engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web.Spinger: London, 2004
OpenZika: An IBM World Community Grid Project to Accelerate Zika Virus Drug Discovery
Perryman, Alexander L.; Horta Andrade, Carolina
2016-01-01
The Zika virus outbreak in the Americas has caused global concern. To help accelerate this fight against Zika, we launched the OpenZika project. OpenZika is an IBM World Community Grid Project that uses distributed computing on millions of computers and Android devices to run docking experiments, in order to dock tens of millions of drug-like compounds against crystal structures and homology models of Zika proteins (and other related flavivirus targets). This will enable the identification of new candidates that can then be tested in vitro, to advance the discovery and development of new antiviral drugs against the Zika virus. The docking data is being made openly accessible so that all members of the global research community can use it to further advance drug discovery studies against Zika and other related flaviviruses. PMID:27764115
OpenZika: An IBM World Community Grid Project to Accelerate Zika Virus Drug Discovery.
Ekins, Sean; Perryman, Alexander L; Horta Andrade, Carolina
2016-10-01
The Zika virus outbreak in the Americas has caused global concern. To help accelerate this fight against Zika, we launched the OpenZika project. OpenZika is an IBM World Community Grid Project that uses distributed computing on millions of computers and Android devices to run docking experiments, in order to dock tens of millions of drug-like compounds against crystal structures and homology models of Zika proteins (and other related flavivirus targets). This will enable the identification of new candidates that can then be tested in vitro, to advance the discovery and development of new antiviral drugs against the Zika virus. The docking data is being made openly accessible so that all members of the global research community can use it to further advance drug discovery studies against Zika and other related flaviviruses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klitsner, Tom
The recent Executive Order creating the National Strategic Computing Initiative (NSCI) recognizes the value of high performance computing for economic competitiveness and scientific discovery and commits to accelerate delivery of exascale computing. The HPC programs at Sandia –the NNSA ASC program and Sandia’s Institutional HPC Program– are focused on ensuring that Sandia has the resources necessary to deliver computation in the national interest.
Concept Formation in Scientific Knowledge Discovery from a Constructivist View
NASA Astrophysics Data System (ADS)
Peng, Wei; Gero, John S.
The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called “first-person” knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer’s first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.
Concept Formation in Scientific Knowledge Discovery from a Constructivist View
NASA Astrophysics Data System (ADS)
Peng, Wei; Gero, John S.
The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called "first-person" knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer's first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing approaches to assist scientists applying their expertise to model formation, simulation, and prediction in various domains [4], [5]. On the other hand, first-person knowledge becomes third-person theory only if it proves general by evidence and is acknowledged by a scientific community. Researchers start to focus on building interactive cooperation platforms [1] to accommodate different views into the knowledge discovery process. There are some fundamental questions in relation to scientific knowledge development. What aremajor components for knowledge construction and how do people construct their knowledge? How is this personal construct assimilated and accommodated into a scientific paradigm? How can one design a computational system to facilitate these processes? This chapter does not attempt to answer all these questions but serves as a basis to foster thinking along this line. A brief literature review about how people develop their knowledge is carried out through a constructivist view. A hydrological modeling scenario is presented to elucidate the approach.
Classification and assessment tools for structural motif discovery algorithms.
Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan
2013-01-01
Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
Dictionary learning-based CT detection of pulmonary nodules
NASA Astrophysics Data System (ADS)
Wu, Panpan; Xia, Kewen; Zhang, Yanbo; Qian, Xiaohua; Wang, Ge; Yu, Hengyong
2016-10-01
Segmentation of lung features is one of the most important steps for computer-aided detection (CAD) of pulmonary nodules with computed tomography (CT). However, irregular shapes, complicated anatomical background and poor pulmonary nodule contrast make CAD a very challenging problem. Here, we propose a novel scheme for feature extraction and classification of pulmonary nodules through dictionary learning from training CT images, which does not require accurately segmented pulmonary nodules. Specifically, two classification-oriented dictionaries and one background dictionary are learnt to solve a two-category problem. In terms of the classification-oriented dictionaries, we calculate sparse coefficient matrices to extract intrinsic features for pulmonary nodule classification. The support vector machine (SVM) classifier is then designed to optimize the performance. Our proposed methodology is evaluated with the lung image database consortium and image database resource initiative (LIDC-IDRI) database, and the results demonstrate that the proposed strategy is promising.
Information Gain Based Dimensionality Selection for Classifying Text Documents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Miles McQueen
2013-06-01
Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less
Martin, Bryan D.; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling
2017-01-01
We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy. PMID:28885550
Martin, Bryan D; Addona, Vittorio; Wolfson, Julian; Adomavicius, Gediminas; Fan, Yingling
2017-09-08
We propose and compare combinations of several methods for classifying transportation activity data from smartphone GPS and accelerometer sensors. We have two main objectives. First, we aim to classify our data as accurately as possible. Second, we aim to reduce the dimensionality of the data as much as possible in order to reduce the computational burden of the classification. We combine dimension reduction and classification algorithms and compare them with a metric that balances accuracy and dimensionality. In doing so, we develop a classification algorithm that accurately classifies five different modes of transportation (i.e., walking, biking, car, bus and rail) while being computationally simple enough to run on a typical smartphone. Further, we use data that required no behavioral changes from the smartphone users to collect. Our best classification model uses the random forest algorithm to achieve 96.8% accuracy.
Rough set classification based on quantum logic
NASA Astrophysics Data System (ADS)
Hassan, Yasser F.
2017-11-01
By combining the advantages of quantum computing and soft computing, the paper shows that rough sets can be used with quantum logic for classification and recognition systems. We suggest the new definition of rough set theory as quantum logic theory. Rough approximations are essential elements in rough set theory, the quantum rough set model for set-valued data directly construct set approximation based on a kind of quantum similarity relation which is presented here. Theoretical analyses demonstrate that the new model for quantum rough sets has new type of decision rule with less redundancy which can be used to give accurate classification using principles of quantum superposition and non-linear quantum relations. To our knowledge, this is the first attempt aiming to define rough sets in representation of a quantum rather than logic or sets. The experiments on data-sets have demonstrated that the proposed model is more accuracy than the traditional rough sets in terms of finding optimal classifications.
Optical tomographic detection of rheumatoid arthritis with computer-aided classification schemes
NASA Astrophysics Data System (ADS)
Klose, Christian D.; Klose, Alexander D.; Netz, Uwe; Beuthan, Jürgen; Hielscher, Andreas H.
2009-02-01
A recent research study has shown that combining multiple parameters, drawn from optical tomographic images, leads to better classification results to identifying human finger joints that are affected or not affected by rheumatic arthritis RA. Building up on the research findings of the previous study, this article presents an advanced computer-aided classification approach for interpreting optical image data to detect RA in finger joints. Additional data are used including, for example, maximum and minimum values of the absorption coefficient as well as their ratios and image variances. Classification performances obtained by the proposed method were evaluated in terms of sensitivity, specificity, Youden index and area under the curve AUC. Results were compared to different benchmarks ("gold standard"): magnet resonance, ultrasound and clinical evaluation. Maximum accuracies (AUC=0.88) were reached when combining minimum/maximum-ratios and image variances and using ultrasound as gold standard.
CIMOSA process classification for business process mapping in non-manufacturing firms: A case study
NASA Astrophysics Data System (ADS)
Latiffianti, Effi; Siswanto, Nurhadi; Wiratno, Stefanus Eko; Saputra, Yudha Andrian
2017-11-01
A business process mapping is one important means to enable an enterprise to effectively manage the value chain. One of widely used approaches to classify business process for mapping purpose is Computer Integrated Manufacturing System Open Architecture (CIMOSA). CIMOSA was initially designed for Computer Integrated Manufacturing (CIM) system based enterprises. This paper aims to analyze the use of CIMOSA process classification for business process mapping in the firms that do not fall within the area of CIM. Three firms of different business area that have used CIMOSA process classification were observed: an airline firm, a marketing and trading firm for oil and gas products, and an industrial estate management firm. The result of the research has shown that CIMOSA can be used in non-manufacturing firms with some adjustment. The adjustment includes addition, reduction, or modification of some processes suggested by CIMOSA process classification as evidenced by the case studies.
Non-parametric analysis of LANDSAT maps using neural nets and parallel computers
NASA Technical Reports Server (NTRS)
Salu, Yehuda; Tilton, James
1991-01-01
Nearest neighbor approaches and a new neural network, the Binary Diamond, are used for the classification of images of ground pixels obtained by LANDSAT satellite. The performances are evaluated by comparing classifications of a scene in the vicinity of Washington DC. The problem of optimal selection of categories is addressed as a step in the classification process.
SRI PUFF 8 Computer Program for One-Dimensional Stress Wave Propagation
1980-03-01
raial product. UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE (Tfhen Data Entered) REPORT DOCUMENTATION PAGE READ INSTRUCTIONS BEFORE COMPLETING...EDITION OF I NOV 6S (S OBSOLETE UNCLASSIFIED SECURITY CLASSIFICATIOK OF THIS PAGE (When Data Entered) UNCLASSIFIED SECURITY CLASSIFICATION OF THIS...aspects of wave propagation calculations. UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGEfWhen Data Entered) FOREWORD This volume constitutes a
Do pre-trained deep learning models improve computer-aided classification of digital mammograms?
NASA Astrophysics Data System (ADS)
Aboutalib, Sarah S.; Mohamed, Aly A.; Zuley, Margarita L.; Berg, Wendie A.; Luo, Yahong; Wu, Shandong
2018-02-01
Digital mammography screening is an important exam for the early detection of breast cancer and reduction in mortality. False positives leading to high recall rates, however, results in unnecessary negative consequences to patients and health care systems. In order to better aid radiologists, computer-aided tools can be utilized to improve distinction between image classifications and thus potentially reduce false recalls. The emergence of deep learning has shown promising results in the area of biomedical imaging data analysis. This study aimed to investigate deep learning and transfer learning methods that can improve digital mammography classification performance. In particular, we evaluated the effect of pre-training deep learning models with other imaging datasets in order to boost classification performance on a digital mammography dataset. Two types of datasets were used for pre-training: (1) a digitized film mammography dataset, and (2) a very large non-medical imaging dataset. By using either of these datasets to pre-train the network initially, and then fine-tuning with the digital mammography dataset, we found an increase in overall classification performance in comparison to a model without pre-training, with the very large non-medical dataset performing the best in improving the classification accuracy.
Full-motion video analysis for improved gender classification
NASA Astrophysics Data System (ADS)
Flora, Jeffrey B.; Lochtefeld, Darrell F.; Iftekharuddin, Khan M.
2014-06-01
The ability of computer systems to perform gender classification using the dynamic motion of the human subject has important applications in medicine, human factors, and human-computer interface systems. Previous works in motion analysis have used data from sensors (including gyroscopes, accelerometers, and force plates), radar signatures, and video. However, full-motion video, motion capture, range data provides a higher resolution time and spatial dataset for the analysis of dynamic motion. Works using motion capture data have been limited by small datasets in a controlled environment. In this paper, we explore machine learning techniques to a new dataset that has a larger number of subjects. Additionally, these subjects move unrestricted through a capture volume, representing a more realistic, less controlled environment. We conclude that existing linear classification methods are insufficient for the gender classification for larger dataset captured in relatively uncontrolled environment. A method based on a nonlinear support vector machine classifier is proposed to obtain gender classification for the larger dataset. In experimental testing with a dataset consisting of 98 trials (49 subjects, 2 trials per subject), classification rates using leave-one-out cross-validation are improved from 73% using linear discriminant analysis to 88% using the nonlinear support vector machine classifier.
Applying matching pursuit decomposition time-frequency processing to UGS footstep classification
NASA Astrophysics Data System (ADS)
Larsen, Brett W.; Chung, Hugh; Dominguez, Alfonso; Sciacca, Jacob; Kovvali, Narayan; Papandreou-Suppappola, Antonia; Allee, David R.
2013-06-01
The challenge of rapid footstep detection and classification in remote locations has long been an important area of study for defense technology and national security. Also, as the military seeks to create effective and disposable unattended ground sensors (UGS), computational complexity and power consumption have become essential considerations in the development of classification techniques. In response to these issues, a research project at the Flexible Display Center at Arizona State University (ASU) has experimented with footstep classification using the matching pursuit decomposition (MPD) time-frequency analysis method. The MPD provides a parsimonious signal representation by iteratively selecting matched signal components from a pre-determined dictionary. The resulting time-frequency representation of the decomposed signal provides distinctive features for different types of footsteps, including footsteps during walking or running activities. The MPD features were used in a Bayesian classification method to successfully distinguish between the different activities. The computational cost of the iterative MPD algorithm was reduced, without significant loss in performance, using a modified MPD with a dictionary consisting of signals matched to cadence temporal gait patterns obtained from real seismic measurements. The classification results were demonstrated with real data from footsteps under various conditions recorded using a low-cost seismic sensor.
Network-based approaches to climate knowledge discovery
NASA Astrophysics Data System (ADS)
Budich, Reinhard; Nyberg, Per; Weigel, Tobias
2011-11-01
Climate Knowledge Discovery Workshop; Hamburg, Germany, 30 March to 1 April 2011 Do complex networks combined with semantic Web technologies offer the next generation of solutions in climate science? To address this question, a first Climate Knowledge Discovery (CKD) Workshop, hosted by the German Climate Computing Center (Deutsches Klimarechenzentrum (DKRZ)), brought together climate and computer scientists from major American and European laboratories, data centers, and universities, as well as representatives from industry, the broader academic community, and the semantic Web communities. The participants, representing six countries, were concerned with large-scale Earth system modeling and computational data analysis. The motivation for the meeting was the growing problem that climate scientists generate data faster than it can be interpreted and the need to prepare for further exponential data increases. Current analysis approaches are focused primarily on traditional methods, which are best suited for large-scale phenomena and coarse-resolution data sets. The workshop focused on the open discussion of ideas and technologies to provide the next generation of solutions to cope with the increasing data volumes in climate science.
Computer-Aided Discovery Tools for Volcano Deformation Studies with InSAR and GPS
NASA Astrophysics Data System (ADS)
Pankratius, V.; Pilewskie, J.; Rude, C. M.; Li, J. D.; Gowanlock, M.; Bechor, N.; Herring, T.; Wauthier, C.
2016-12-01
We present a Computer-Aided Discovery approach that facilitates the cloud-scalable fusion of different data sources, such as GPS time series and Interferometric Synthetic Aperture Radar (InSAR), for the purpose of identifying the expansion centers and deformation styles of volcanoes. The tools currently developed at MIT allow the definition of alternatives for data processing pipelines that use various analysis algorithms. The Computer-Aided Discovery system automatically generates algorithmic and parameter variants to help researchers explore multidimensional data processing search spaces efficiently. We present first application examples of this technique using GPS data on volcanoes on the Aleutian Islands and work in progress on combined GPS and InSAR data in Hawaii. In the model search context, we also illustrate work in progress combining time series Principal Component Analysis with InSAR augmentation to constrain the space of possible model explanations on current empirical data sets and achieve a better identification of deformation patterns. This work is supported by NASA AIST-NNX15AG84G and NSF ACI-1442997 (PI: V. Pankratius).
Role of Open Source Tools and Resources in Virtual Screening for Drug Discovery.
Karthikeyan, Muthukumarasamy; Vyas, Renu
2015-01-01
Advancement in chemoinformatics research in parallel with availability of high performance computing platform has made handling of large scale multi-dimensional scientific data for high throughput drug discovery easier. In this study we have explored publicly available molecular databases with the help of open-source based integrated in-house molecular informatics tools for virtual screening. The virtual screening literature for past decade has been extensively investigated and thoroughly analyzed to reveal interesting patterns with respect to the drug, target, scaffold and disease space. The review also focuses on the integrated chemoinformatics tools that are capable of harvesting chemical data from textual literature information and transform them into truly computable chemical structures, identification of unique fragments and scaffolds from a class of compounds, automatic generation of focused virtual libraries, computation of molecular descriptors for structure-activity relationship studies, application of conventional filters used in lead discovery along with in-house developed exhaustive PTC (Pharmacophore, Toxicophores and Chemophores) filters and machine learning tools for the design of potential disease specific inhibitors. A case study on kinase inhibitors is provided as an example.
Multi-discipline resource inventory of soils, vegetation and geology
NASA Technical Reports Server (NTRS)
Simonson, G. H. (Principal Investigator); Paine, D. P.; Lawrence, R. D.; Norgren, J. A.; Pyott, W. Y.; Herzog, J. H.; Murray, R. J.; Rogers, R.
1973-01-01
The author has identified the following significant results. Computer classification of natural vegetation, in the vicinity of Big Summit Prairie, Crook County, Oregon was carried out using MSS digital data. Impure training sets, representing eleven vegetation types plus water, were selected from within the area to be classified. Close correlations were visually observed between vegetation types mapped from the large scale photographs and the computer classification of the ERTS data (Frame 1021-18151, 13 August 1972).
Epistemic Gameplay and Discovery in Computational Model-Based Inquiry Activities
ERIC Educational Resources Information Center
Wilkerson, Michelle Hoda; Shareff, Rebecca; Laina, Vasiliki; Gravel, Brian
2018-01-01
In computational modeling activities, learners are expected to discover the inner workings of scientific and mathematical systems: First elaborating their understandings of a given system through constructing a computer model, then "debugging" that knowledge by testing and refining the model. While such activities have been shown to…
The Experimental Mathematician: The Pleasure of Discovery and the Role of Proof
ERIC Educational Resources Information Center
Borwein, Jonathan M.
2005-01-01
The emergence of powerful mathematical computing environments, the growing availability of correspondingly powerful (multi-processor) computers and the pervasive presence of the Internet allow for mathematicians, students and teachers, to proceed heuristically and "quasi-inductively." We may increasingly use symbolic and numeric computation,…
Tan, Maxine; Pu, Jiantao; Zheng, Bin
2014-01-01
Purpose: Improving radiologists’ performance in classification between malignant and benign breast lesions is important to increase cancer detection sensitivity and reduce false-positive recalls. For this purpose, developing computer-aided diagnosis (CAD) schemes has been attracting research interest in recent years. In this study, we investigated a new feature selection method for the task of breast mass classification. Methods: We initially computed 181 image features based on mass shape, spiculation, contrast, presence of fat or calcifications, texture, isodensity, and other morphological features. From this large image feature pool, we used a sequential forward floating selection (SFFS)-based feature selection method to select relevant features, and analyzed their performance using a support vector machine (SVM) model trained for the classification task. On a database of 600 benign and 600 malignant mass regions of interest (ROIs), we performed the study using a ten-fold cross-validation method. Feature selection and optimization of the SVM parameters were conducted on the training subsets only. Results: The area under the receiver operating characteristic curve (AUC) = 0.805±0.012 was obtained for the classification task. The results also showed that the most frequently-selected features by the SFFS-based algorithm in 10-fold iterations were those related to mass shape, isodensity and presence of fat, which are consistent with the image features frequently used by radiologists in the clinical environment for mass classification. The study also indicated that accurately computing mass spiculation features from the projection mammograms was difficult, and failed to perform well for the mass classification task due to tissue overlap within the benign mass regions. Conclusions: In conclusion, this comprehensive feature analysis study provided new and valuable information for optimizing computerized mass classification schemes that may have potential to be useful as a “second reader” in future clinical practice. PMID:24664267
Comparative Oncogenomics for Peripheral Nerve Sheath Cancer Gene Discovery
2015-06-01
neurofibromas and MPNSTs, establish gene signatures defining distinct tumor subtypes and functionally test the role of selected driver mutations ...allografted tumor cells, and a variety of in vitro functional assays. We will validate the relevance of these mutated mouse genes in human neurofibromas...and MPNSTs by determining whether these same genes are mutated in human tumors. 15. SUBJECT TERMS Nothing listed 16. SECURITY CLASSIFICATION OF: 17
1964-06-01
of the Migeya meteorite, which contains volatile organic compounds (a feature which proves the absence of overheating during its life), is 4.3...pattern in their discovery of gallium and germanium in iron meteorites as small ad- mixtures. Iron meteorites are divided into four groups by their content...as a basis for the classification of meteorites by their composition that we have suggested- By comparing the data they obtained on gallium and
Protein-protein interactions (PPIs) mediate the transmission and regulation of oncogenic signals that are essential to cellular proliferation and survival, and thus represent potential targets for anti-cancer therapeutic discovery. Despite their significance, there is no method to experimentally disrupt and interrogate the essentiality of individual endogenous PPIs. The ability to computationally predict or infer PPI essentiality would help prioritize PPIs for drug discovery and help advance understanding of cancer biology.
Current status and future prospects for enabling chemistry technology in the drug discovery process
Djuric, Stevan W.; Hutchins, Charles W.; Talaty, Nari N.
2016-01-01
This review covers recent advances in the implementation of enabling chemistry technologies into the drug discovery process. Areas covered include parallel synthesis chemistry, high-throughput experimentation, automated synthesis and purification methods, flow chemistry methodology including photochemistry, electrochemistry, and the handling of “dangerous” reagents. Also featured are advances in the “computer-assisted drug design” area and the expanding application of novel mass spectrometry-based techniques to a wide range of drug discovery activities. PMID:27781094
Gene selection for tumor classification using neighborhood rough sets and entropy measures.
Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu
2017-03-01
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.
Novel, improved grading system(s) for IDH-mutant astrocytic gliomas.
Shirahata, Mitsuaki; Ono, Takahiro; Stichel, Damian; Schrimpf, Daniel; Reuss, David E; Sahm, Felix; Koelsche, Christian; Wefers, Annika; Reinhardt, Annekathrin; Huang, Kristin; Sievers, Philipp; Shimizu, Hiroaki; Nanjo, Hiroshi; Kobayashi, Yusuke; Miyake, Yohei; Suzuki, Tomonari; Adachi, Jun-Ichi; Mishima, Kazuhiko; Sasaki, Atsushi; Nishikawa, Ryo; Bewerunge-Hudler, Melanie; Ryzhova, Marina; Absalyamova, Oksana; Golanov, Andrey; Sinn, Peter; Platten, Michael; Jungk, Christine; Winkler, Frank; Wick, Antje; Hänggi, Daniel; Unterberg, Andreas; Pfister, Stefan M; Jones, David T W; van den Bent, Martin; Hegi, Monika; French, Pim; Baumert, Brigitta G; Stupp, Roger; Gorlia, Thierry; Weller, Michael; Capper, David; Korshunov, Andrey; Herold-Mende, Christel; Wick, Wolfgang; Louis, David N; von Deimling, Andreas
2018-04-23
According to the 2016 World Health Organization Classification of Tumors of the Central Nervous System (2016 CNS WHO), IDH-mutant astrocytic gliomas comprised WHO grade II diffuse astrocytoma, IDH-mutant (AII IDHmut ), WHO grade III anaplastic astrocytoma, IDH-mutant (AAIII IDHmut ), and WHO grade IV glioblastoma, IDH-mutant (GBM IDHmut ). Notably, IDH gene status has been made the major criterion for classification while the manner of grading has remained unchanged: it is based on histological criteria that arose from studies which antedated knowledge of the importance of IDH status in diffuse astrocytic tumor prognostic assessment. Several studies have now demonstrated that the anticipated differences in survival between the newly defined AII IDHmut and AAIII IDHmut have lost their significance. In contrast, GBM IDHmut still exhibits a significantly worse outcome than its lower grade IDH-mutant counterparts. To address the problem of establishing prognostically significant grading for IDH-mutant astrocytic gliomas in the IDH era, we undertook a comprehensive study that included assessment of histological and genetic approaches to prognosis in these tumors. A discovery cohort of 211 IDH-mutant astrocytic gliomas with an extended observation was subjected to histological review, image analysis, and DNA methylation studies. Tumor group-specific methylation profiles and copy number variation (CNV) profiles were established for all gliomas. Algorithms for automated CNV analysis were developed. All tumors exhibiting 1p/19q codeletion were excluded from the series. We developed algorithms for grading, based on molecular, morphological and clinical data. Performance of these algorithms was compared with that of WHO grading. Three independent cohorts of 108, 154 and 224 IDH-mutant astrocytic gliomas were used to validate this approach. In the discovery cohort several molecular and clinical parameters were of prognostic relevance. Most relevant for overall survival (OS) was CDKN2A/B homozygous deletion. Other parameters with major influence were necrosis and the total number of CNV. Proliferation as assessed by mitotic count, which is a key parameter in 2016 CNS WHO grading, was of only minor influence. Employing the parameters most relevant for OS in our discovery set, we developed two models for grading these tumors. These models performed significantly better than WHO grading in both the discovery and the validation sets. Our novel algorithms for grading IDH-mutant astrocytic gliomas overcome the challenges caused by introduction of IDH status into the WHO classification of diffuse astrocytic tumors. We propose that these revised approaches be used for grading of these tumors and incorporated into future WHO criteria.
Dai, Shengfa; Wei, Qingguo
2017-01-01
Common spatial pattern algorithm is widely used to estimate spatial filters in motor imagery based brain-computer interfaces. However, use of a large number of channels will make common spatial pattern tend to over-fitting and the classification of electroencephalographic signals time-consuming. To overcome these problems, it is necessary to choose an optimal subset of the whole channels to save computational time and improve the classification accuracy. In this paper, a novel method named backtracking search optimization algorithm is proposed to automatically select the optimal channel set for common spatial pattern. Each individual in the population is a N-dimensional vector, with each component representing one channel. A population of binary codes generate randomly in the beginning, and then channels are selected according to the evolution of these codes. The number and positions of 1's in the code denote the number and positions of chosen channels. The objective function of backtracking search optimization algorithm is defined as the combination of classification error rate and relative number of channels. Experimental results suggest that higher classification accuracy can be achieved with much fewer channels compared to standard common spatial pattern with whole channels.
Wang, Shijun; McKenna, Matthew T; Nguyen, Tan B; Burns, Joseph E; Petrick, Nicholas; Sahiner, Berkman; Summers, Ronald M
2012-05-01
In this paper, we present development and testing results for a novel colonic polyp classification method for use as part of a computed tomographic colonography (CTC) computer-aided detection (CAD) system. Inspired by the interpretative methodology of radiologists using 3-D fly-through mode in CTC reading, we have developed an algorithm which utilizes sequences of images (referred to here as videos) for classification of CAD marks. For each CAD mark, we created a video composed of a series of intraluminal, volume-rendered images visualizing the detection from multiple viewpoints. We then framed the video classification question as a multiple-instance learning (MIL) problem. Since a positive (negative) bag may contain negative (positive) instances, which in our case depends on the viewing angles and camera distance to the target, we developed a novel MIL paradigm to accommodate this class of problems. We solved the new MIL problem by maximizing a L2-norm soft margin using semidefinite programming, which can optimize relevant parameters automatically. We tested our method by analyzing a CTC data set obtained from 50 patients from three medical centers. Our proposed method showed significantly better performance compared with several traditional MIL methods.
2013-01-01
Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960
Wang, Shijun; McKenna, Matthew T.; Nguyen, Tan B.; Burns, Joseph E.; Petrick, Nicholas; Sahiner, Berkman
2012-01-01
In this paper we present development and testing results for a novel colonic polyp classification method for use as part of a computed tomographic colonography (CTC) computer-aided detection (CAD) system. Inspired by the interpretative methodology of radiologists using 3D fly-through mode in CTC reading, we have developed an algorithm which utilizes sequences of images (referred to here as videos) for classification of CAD marks. For each CAD mark, we created a video composed of a series of intraluminal, volume-rendered images visualizing the detection from multiple viewpoints. We then framed the video classification question as a multiple-instance learning (MIL) problem. Since a positive (negative) bag may contain negative (positive) instances, which in our case depends on the viewing angles and camera distance to the target, we developed a novel MIL paradigm to accommodate this class of problems. We solved the new MIL problem by maximizing a L2-norm soft margin using semidefinite programming, which can optimize relevant parameters automatically. We tested our method by analyzing a CTC data set obtained from 50 patients from three medical centers. Our proposed method showed significantly better performance compared with several traditional MIL methods. PMID:22552333
Ma, Xiang; Schonfeld, Dan; Khokhar, Ashfaq A
2009-06-01
In this paper, we propose a novel solution to an arbitrary noncausal, multidimensional hidden Markov model (HMM) for image and video classification. First, we show that the noncausal model can be solved by splitting it into multiple causal HMMs and simultaneously solving each causal HMM using a fully synchronous distributed computing framework, therefore referred to as distributed HMMs. Next we present an approximate solution to the multiple causal HMMs that is based on an alternating updating scheme and assumes a realistic sequential computing framework. The parameters of the distributed causal HMMs are estimated by extending the classical 1-D training and classification algorithms to multiple dimensions. The proposed extension to arbitrary causal, multidimensional HMMs allows state transitions that are dependent on all causal neighbors. We, thus, extend three fundamental algorithms to multidimensional causal systems, i.e., 1) expectation-maximization (EM), 2) general forward-backward (GFB), and 3) Viterbi algorithms. In the simulations, we choose to limit ourselves to a noncausal 2-D model whose noncausality is along a single dimension, in order to significantly reduce the computational complexity. Simulation results demonstrate the superior performance, higher accuracy rate, and applicability of the proposed noncausal HMM framework to image and video classification.
Anatomical classification of breast sentinel lymph nodes using computed tomography-lymphography.
Fujita, Tamaki; Miura, Hiroyuki; Seino, Hiroko; Ono, Shuichi; Nishi, Takashi; Nishimura, Akimasa; Hakamada, Kenichi; Aoki, Masahiko
2018-05-03
To evaluate the anatomical classification and location of breast sentinel lymph nodes, preoperative computed tomography-lymphography examinations were retrospectively reviewed for sentinel lymph nodes in 464 cases clinically diagnosed with node-negative breast cancer between July 2007 and June 2016. Anatomical classification was performed based on the numbers of lymphatic routes and sentinel lymph nodes, the flow direction of lymphatic routes, and the location of sentinel lymph nodes. Of the 464 cases reviewed, anatomical classification could be performed in 434 (93.5 %). The largest number of cases showed single route/single sentinel lymph node (n = 296, 68.2 %), followed by multiple routes/multiple sentinel lymph nodes (n = 59, 13.6 %), single route/multiple sentinel lymph nodes (n = 53, 12.2 %), and multiple routes/single sentinel lymph node (n = 26, 6.0 %). Classification based on the flow direction of lymphatic routes showed that 429 cases (98.8 %) had outward flow on the superficial fascia toward axillary lymph nodes, whereas classification based on the height of sentinel lymph nodes showed that 323 cases (74.4 %) belonged to the upper pectoral group of axillary lymph nodes. There was wide variation in the number of lymphatic routes and their branching patterns and in the number, location, and direction of flow of sentinel lymph nodes. It is clinically very important to preoperatively understand the anatomical morphology of lymphatic routes and sentinel lymph nodes for optimal treatment of breast cancer, and computed tomography-lymphography is suitable for this purpose.
GMM-based speaker age and gender classification in Czech and Slovak
NASA Astrophysics Data System (ADS)
Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich
2017-01-01
The paper describes an experiment with using the Gaussian mixture models (GMM) for automatic classification of the speaker age and gender. It analyses and compares the influence of different number of mixtures and different types of speech features used for GMM gender/age classification. Dependence of the computational complexity on the number of used mixtures is also analysed. Finally, the GMM classification accuracy is compared with the output of the conventional listening tests. The results of these objective and subjective evaluations are in correspondence.
J-Plus: Morphological Classification Of Compact And Extended Sources By Pdf Analysis
NASA Astrophysics Data System (ADS)
López-Sanjuan, C.; Vázquez-Ramió, H.; Varela, J.; Spinoso, D.; Cristóbal-Hornillos, D.; Viironen, K.; Muniesa, D.; J-PLUS Collaboration
2017-10-01
We present a morphological classification of J-PLUS EDR sources into compact (i.e. stars) and extended (i.e. galaxies). Such classification is based on the Bayesian modelling of the concentration distribution, including observational errors and magnitude + sky position priors. We provide the star / galaxy probability of each source computed from the gri images. The comparison with the SDSS number counts support our classification up to r 21. The 31.7 deg² analised comprises 150k stars and 101k galaxies.
Discovery and Classification of Optical Transient in M81 : P60- M81-081203
NASA Astrophysics Data System (ADS)
Kasliwal, M. M.; Cenko, S. B.; Quimby, R.; Rau, A.; Ofek, E. O.; Kulkarni, S. R.
2008-12-01
On UT 2008 Dec 3.303, P60-FasTING (Palomar 60-inch Fast Transients In Nearby Galaxies) discovered an optical transient in M81 at RA(J2000) = 09:55:16.920, DEC(J2000)=+69:02:17.68, offset from the nucleus by 87.2"W, 97.4"S. P60-M81-081203 had a brightness of g = 20.5 +/- 0.1 at discovery. It was not detected by P60 to g > 22.3 on Nov 30.410. There is no counterpart in SDSS or SIMBAD. Follow-up spectroscopy with the Double Beam Spectrograph on the Palomar Hale telescope showed a featureless spectrum on Dec 4 UT and Dec 5 UT.
Discovery and Classification of Nova in M31 : P60-M31-081230
NASA Astrophysics Data System (ADS)
Kasliwal, M. M.; Rau, A.; Salvato, M.; Cenko, S. B.; Ofek, E. O.; Quimby, R.; Kulkarni, S. R.
2009-01-01
On UT 2008 Dec 30.207, P60-FasTING (Palomar 60-inch Fast Transients In Nearby Galaxies) discovered an optical transient in M31 at RA(J2000) = 00:43:05.027, DEC(J2000)=+41:17:52.25, offset from the nucleus by 233.4"E,103.8"N. P60-M31-081230 had a brightness of g = 20.5 +/- 0.2 at discovery. It was not detected by P60 to g > 22.0 on Dec 29.140. There is no counterpart in SIMBAD. Follow-up spectroscopy with the Double Beam Spectrograph on the Palomar Hale telescope on Dec 31.104 revealed prominent Balmer emission and strong P Cygni profiles of several Fe II lines.
Backyard Worlds: Finding Nearby Brown Dwarfs Through Citizen Science
NASA Astrophysics Data System (ADS)
Kuchner, Marc
Recent discoveries of cool brown dwarfs in the solar neighborhood and microlensing surveys both point to an undiscovered population of brown dwarfs and rogue planets in the solar neighborhood. We propose to develop and sustain a novel website that enables a unique and powerful citizen-science based search for these and other high-proper-motion objects at 3.5 and 4.6 microns. Through this search, we have an opportunity to discover new ultracool Y dwarfs, crucial links between star formation and planet formation, and also the Sun's nearest neighbors-potentially a system closer than Proxima Centauri. NASA's Wide-field Infrared Survey Explorer mission (WISE) is nominally sensitive enough to detect a 250 K brown dwarf to > 6 pc and even a Jupiter analog to > 0.6 pc. However, high proper motion objects like these can easily be confused with variable stars, electronic noise, latent images, optical ghosts, cosmic ray hits, and so on in the WISE archive. Computer-based searches for high-proper motion objects falter in dense star fields, necessitating visual inspection all candidates. Our citizen science project, called "Backyard Worlds: Planet 9", remedies this problem by engaging volunteers to visually inspect WISE and NEOWISE images. Roughly 104,000 participants have already begun using a preliminary version of the site to examine time-resolved co-adds of unWISE-processed images, four epochs spanning 2010 to 2014. They have already performed more than 3.6 million classifications of these images since the site's launch on February 15, 2017. Besides seeking new brown dwarfs and nearby stars, this site is also the most sensitive all-sky WISE-based search for a planet orbiting the Sun beyond Pluto (sometimes called Planet Nine). Preliminary analysis data from the site has resulted in the discovery of 13 brown dwarf candidates including 6 T dwarfs. We obtained a spectrum of one of these candidates and published it in Astrophysical Journal Letters, with four citizen scientists as co-authors. Backyard Worlds: Planet 9 was launched with seed funding from a NASA Science Innovation Fund grant, but is no longer funded. This proposed ADAP project will allow us to finish building the website, communicate with our large user community to improve their skills and foster participation, harvest, analyze, and publish the classification data output by the site, research the objects we discover, and as <25% of the effort, seed a follow-up program to gather more data about the objects we discover. We will include citizen scientists as co-authors on all publications that result from this work, as appropriate, and aim to complete the project in time for JWST follow-up of the best discoveries.
VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs
USDA-ARS?s Scientific Manuscript database
Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...